Testing of Young Children Challenged

Statement or the problem

The practice of administering standardized tests to young children bas increased dramatically in recent years. Many school systems now routinely administer some form of standardized developmental screening or readiness test for admittance to kindergarten or standardized achievement test for promotion to first grade. As a result, more and more 5- and 6-year olds are denied admission to school or are assigned to some form of extra-year tracking such as “developmental kindergarten,” retention in kindergarten, or “transitional” first grade (Meisels, 1987; Shepard & Smith, in press). Such practices (often based on in­ appropriate uses of readiness or screening tests) disregard the potential, documented long-term negative effects of retention on children’s self-esteem and the fact that such practices disproportionately affect low-income and minority children; further, these practices have been implemented in the absence of research documenting that they positively affect children’s later academic achievement (Gredler, 1984; Shep­ ard & Smith, 1986, 1987; Smith & Shepard, 1987).

A simultaneous trend that has influenced and been influenced by the use of standardized testing is the increasingly academic emphasis of the curriculum imposed on Kindergartners. Many kindergartens are now highly structured, “watered-down” first grades, emphasizing workbooks and other paper-and-pencil activities that are developmentally inappropriate for 5-year-olds (Bredekamp, 1987; Durkin, 1987; Katz, Raths, & Torres, un­dated). The trend further trickles down to preschool and child care programs that feel their mission is to get children “ready” for kindergarten. Too many school systems, expecting children to conform to an inappropriate curriculum and finding large numbers of “unready” children, react to the problem by raising the entrance age for kindergarten and/or labeling the children failures (Shepard & Smith, 1986, in press).

The negative influence of standardized testing on the curriculum is not limited to kindergarten. Throughout the primary grades, schools assess achievement using tests that frequently do not reflect current theory and research about how children learn. For example, current research on reading instruction stresses a whole language/literacy approach that integrates oral language, writing, reading, and spelling in meaningful context, emphasizing comprehension. However, standardized tests of reading achievement still define reading exclusively as phonics and word recognition and measure isolated skill acquisition (Farr & Carey, 1986; Teale, Hiebert, & Chittenden, 1987; Valencia & Pearson, 1987). Similarly, current theory of mathematics instruction stresses the child’s construction of number concepts through firsthand experiences, while achievement tests continue to define mathematics as knowledge of numerals (Kamri, 1985a, a985b). As a result, to many school systems teach to the test or continue to use outdated instructional methods so that children will perform adequately on standardized tests.

Toe widespread use of standardized tests also drains resources of time and funds without clear demonstration that the investment is beneficial for children. Days may be devoted to testing (or preparing for it) that could be better spent in valuable instructional time (National Center for Fair and Open Testing, 1987).

Ironically, the calls for excellence in education that have produced widespread reliance on standardized testing may have had the opposite effect – mediocrity. Children are being taught to provide the one “right” answer on the answer sheet, but are not being challenged to think. Rather than producing excellence, the overuse (and misuse) of standardized testing has led to the adoption of appropriate teaching practices as well as admission and retention policies that are not in the best interests of individual children or the nation as a whole.

Statement of the Position

NAEYC believes that the most important consideration in evaluating and using standardized tests is the utility criterion: The purpose of testing must be to improve services for children and ensure that children benefit from their educational experiences. Decisions about testing and assessment instruments must be based on the usefulness of the assessment procedure for improving services to children and improving outcomes for children. The ritual use even of “good tests” (those that are judged to be valid and reliable measures) is to be discouraged in the absence of documented research showing that children benefit from their use.

Determining the utility of a given testing program is not easy. It requires thorough study of the potential effects, both positive and negative. For example, using a readiness or developmental test to admit children to kindergarten or first grade is often defended by teachers and administrators who point to the fact that the children have kept back perform better the next year. Such intuitive reports overlook the fact that no comparative information is available about how the individual child would have fared had he or she been permitted to proceed with schooling. In addition, such pronouncements rarely address the possible effects of failure on the admission test on the child’s self-esteem, the parent’s perceptions, or the educational impact of labeling or mislabeling the child as being behind the peer group (Gredler, 1978; Shepard & Smith, 1986, in press; Smith & Shepard, 1987).

The following guidelines are intended to enhance the utility of standardized tests and guide early childhood professionals in making decisions about the appropriate use of testing.

1. All standardized tests used in early childhood programs must be reliable and valid according to the technical standards of test development (AERA, APA, & NCME, 1985).

Administrators making decisions about standardized testing must recognize that the younger the child, the more difficult it is to obtain reliable and valid results from standardized tests. For example, no avail­ able school readiness test (as contrasted to a developmental screening test) is accurate enough to screen children for placement into special programs without a 50% error rate .(Shepard & Smith, 1986). Development in young children occurs rapidly; early childhood educators recognize the existence of general stages and sequence of development but also recognize that enormous individual variation occurs in patterns and timing of growth and development that is quite normal and not indicative of pathology. Therefore, the results obtained on a single admini tration of a test must be confirmed through periodic screening and assessment and corroborated by other sources of information to be considered reliable (Meisels, 1984).

2. Decisions that have a major impact on children such as en­rollment, retention, or assignment to remedial or special classes should be based on multiple sources or information and should never be based on a single test score.

Appropriate sources of information may include combinations of the following:

  • systematic observations, by teachers and other professionals, that are objective, carefully recorded. reliable (produce similar results over time and among different observers), and valid (produce accurate measures of carefully defined, mutually exclusive categories of observable behavior);
  • samples of children’s work such as drawings, paintings, dictated stories, writing samples, projects, and other activities (not limited to worksheets);
  • observations and anecdotes related by parents and other family members; and
  • test scores, if and only if appropriate, reliable, and valid tests have been used.

In practice, multiple measures are sometimes used in an attempt to find some supporting evidence for a decision that teachers or administrators are predisposed to make regarding a child’s place­ment. Such practice is an inappropriate application of this guideline. To meet this guideline, the collected set of evidence obtained through multiple sources of infor­mation should meet validity standards.

3. It is the professional responsibility of administrators and teachers to critically evaluate, carefully select, and use standardized tests only for the purpose for which they are intended and for which data exists demonstrating the test’s validity (the degree to which the test accurately measures what it purports to measure). 

Unfortunately, readiness tests (based on age-related nonnative data) that are designed to measure the skills children have acquired compared to other children in their age range are sometimes used inappropriately. The intended purpose of such instruments is typically to provide teachers with information that will help them improve instruction, by informing them of what children already know and the skills they have acquired. In practice, however, teachers have been found to systematically administer such tests and then proceed to teach all children the same content using the same methods; for example, testing all kindergartners and then instructing the whole group using phonics workbooks (Durkin, 1987). The practice of making placement decisions about children on the basis of the results of readiness tests is becoming more common despite the absence of data that such tests are valid predictors of later achievement (Meisels, 1985, 1987).

4. It is the professional respon­sibility of administrators and teachers to be knowledgeable about testing and to interpret test results accurately and cautiously to parents, school personnel, and the media.

Accurate interpretation of test results is essential. It is the professional obligation of administrators and teachers to become informed about measurement issues, to use tests responsibly, to exert leadership within early childhood programs and school systems regarding the use of testing, to influence test developers to produce adequate tests and to substantiate claims made in support of tests, and to accurately report and interpret test results without making undue claims about their meaning or implications.

5. Selection or standardized tests to assess achievement and/or evaluate how well a program is meeting its goals should be based on how well a given test matches the locally determined theory, philosophy, and objectives or the specific program.

Standardized tests used in early childhood programs must have content validity; that is, they must accurately.measure the content of the curriculum presented to children. If no existing test matches the curriculum, it is better not to use a standardized test or to develop an instrument to measure the program’s objectives rather than to change an appropriate program to fit a pre-existing test. Too often the content of a standardized test unduly influences the content of the curriculum. If a test is used, the curriculum should deter­ mine its selection; the test should not dictate the content of the curriculum.

Another difficulty related to content validity in measures for young children is that many critically important content areas in early childhood programs such as developing self-esteem, social competence, creativity, or dispositions toward learning (Katz, 1985) are considered “unmeasurable” and are therefore omitted from tests. As a result, tests for young children often address the more easily measured, but no more important, aspects of developing and learning.

6. Testing of young children must be conducted by individuals who are knowledgeable about and sensitive to the developmental needs of young children and who are qualified to administer tests.

Young children are not good test takers. THe younger the child the more inappropriate paper-and-pencil, large group test administrations become. Standards for the administration of tests require that reasonable comfort be provided to the test taker {AERA, APA, & NCME, 1985). Such a standard must be broadly interpreted when applied to young children. Too often, standardized tests are administered to children in large groups, in unfamiliar environments, by strange people, perhaps during the·first few days at a new school or under other stressful conditions. During ​​such test administrations, children are asked to perform unfamiliar tasks, for no reason that they can understand. For test results to be valid, tests are best administered to children individually in familiar, comfortable circumstances by adults whom the child has come to know and trust and who are also qualified to administer the tests.

7. Testing or young children must recognize and be sensitive to individual diversity.

Test developers frequently ignore two important sources of variety in human experiences – cultural variations and variations in the quality of educational experiences provided for different children. It is easier: to mass produce tests if one assumes that cultural differences are minimal or meaningless or if one assumes that test subjects are exposed to personal and educational opportunities of equally high quality. These assumptions permit attributing all variances-or differences in test scores to differences in individual children’s capacities. However, these assumptions are false.

Early childhood educators recognize that children’s skills, abilities, and aptitudes are most apparent when they can be demonstrated in familiar cultural contexts. Because standardized tests must use particular cultural material, they may be inappropriate for assessing the skills, abilities, or aptitudes of children whose primary cultures differ from the mainstream. Language is the special feature of culture that creates the greatest problem for test developers. There are many language varieties in the United States, some of which are not apparent to the casual observer or test developer. Although having a common language is definitely desirable, useful, and a major goal of education, testing must be based on reality. For non-native English speakers or speakers of some dialects of English, any test administered in English is primarily a language or literacy test (AERA, APA, & NCME, 1985). Stan­dardized tests should not be used in multicultural/multilingual communities if they are not sensitive to the effects of cultural diversity or bilingualism (Meisels, 1985). If testing is to be done, children should be tested in their native language.

Conclusion

NAEYC’s position on standardized testing in early childhood programs restricts the use of tests to situations in which testing provides information that will clearly contribute to improved outcomes for children. Standardized tests have an important role to play in ensuring that children’s achievement or special needs are objectively and accurately assessed and that appropriate instructional services are planned and implemented for individual children. However, standardized tests are only one of multiple sources of assessment information that should be used when decisions are made about what is best for young children. Tests may become a burden on the educational system, requiring considerable effort and expense to administer and yielding meager benefits. Given the scarcity of resources, the intrusiveness of testing, and the real potential for measurement error and/or bias, tests should be used only when it is clear that their use represents a meaningful contribution to the improvement of  instruction for children and only as one of many sources of information. Rather than to use tests of doubtful validity, it is bet­ter not to test, because false labels that come from tests may cause educators or parents to alter inappropriately their treatment of children. The potential for misdiagnosing or mislabeling is particularly great with young children where there is wide variation in what may be considered normal behavior.

The burden of proof for the validity and reliability of tests is on the test developers and the advocates for their use. The burden of proof for the utility of tests is on administrators or teachers of early childhood programs who make decisions about the use of tests in individual”classrooms. Similarly, the burden of responsibility for choosing, administering, scoring, and interpreting a score from a standardized test rests with the early childhood professional and thus demands that professionals be both skilled and responsible. Ensuring that tests meet scientific standards, reflect the most current scientific knowledge, and are used appropriately requires constant vigi­lance on the part of educators.

Selected Resources

Cohen, R. (1969). Conceptual styles, culture conflict, and non-verbal tests of intelligence. American Anthropologist, 71(5), 828-857.

Cole, M., & Scribner, S. (1974). Culture and thought: A psychological introduction. New York: Wiley.

Heath, S. (1983). Ways with words: Language, life and work in communities and classrooms. Cambridge, England: Cambridge University Press.

Heller, K.A., Holtzman, W.H., & Messick. S.(Eds.).(1982). Placing children in special education: A strategy for equity. Washington, DC: National Academy Press.

Reprinted with permission of the National Association for the Education of Young Children (NAEYC), 1834 Connecticut Avenue, N.W., Washington D.C. 20009, from Young Children, March, 1988.