Promises and Pitfalls:
Implications of No Child Left Behind for Defining,
Assessing, and Serving English Language Learners
Michael Kieffer,
Nonie Lesaux, and Catherine Snow
Harvard Graduate School of Education
The fundamental
underlying principles of the No Child Left Behind (NCLB) Act of 2001
focus on holding all learners to high standards of learning and
instruction, and in turn, increasing academic achievement of all identified
subgroups in the K-12 population. One of these subgroups is the growing population
of language minority (LM) learners—those students for whom English is not the
primary language of the home (see below for technical definitions). In
contrast to the historic lack of emphasis on tracking the achievement patterns
of all LM learners and ensuring their academic growth, one of the significant
benefits of NCLB has been an increase in awareness of the academic needs and
achievement of this population; schools are now accountable for teaching
English and content knowledge to these learners. There is little disagreement
about the spirit of the law as it relates to LM learners, that is, to ensure
that states and districts meet these students' academic needs. However, as with
any law or initiative, finding an approach to implementation that ensures that
intended benefits are achieved can be difficult. In the specific case of NCLB,
with its presumption that test-based accountability (across subgroups of
learners who differ in important ways) is the motor for educational change, the
issues of valid and equitable implementation are challenging.
The policies
imposed by NCLB have indeed raised awareness of the needs of LM learners and
the challenges of teaching English and content knowledge to them. However, the
policies currently in place fall short in ensuring that all LM students
benefit, and risk disadvantaging LM students and misleading schools and
districts about their accomplishments and needs. There are at least two
specific ways in which NCLB has a very significant impact on the education of
LM Learners. The first is through the procedures for categorizing LM students
for purposes of disaggregation and achievement monitoring, and for identifying
the proportion who will receive specialized support for language development.
These procedures warrant considerable attention and refinement if they are
serve LM learners optimally. The second relates not to the population of LM
learners itself but to the instruments that are used to assess and monitor both
their language development and their academic progress in content areas such as
mathematics, science and English language arts. Underlying these two issues is
the basic problem of how to define the population.
Issue #1. Defining the LM
learner population and monitoring their academic progress
We define LM
students as those who come from a home where a language other than English is
the primary language spoken. Many of these students are fully bilingual in the
home language and English; some are more proficient in English than in the home
language; and some speak no English at all upon school entry (August &
Hakuta, 1997; August & Shanahan, 2006). In most educational policy
contexts it is only that subset of LM learners who are categorized as English
Language Learners (ELLs; previously called Limited English Proficient or LEP)
who are attended to, because of requirements under civil rights provisions that
they be provided access to meaningful learning experiences (e.g., Development
Associates, 2003). Evidence suggests, though, that many LM students who do not
qualify as LEP may have urgent educational needs (e.g., de Jong, 2004; Gandara,
Rumberger, Maxwell-Jolly, & Callahan, 200).
Although the exact
terminology may vary, there are typically three different designations used to
classify language minority learners at various stages in their schooling. An
initially fluent English proficient (I-FEP) student is one who enrolls in
school with English proficiency considered to be sufficient for meaningful
participation in mainstream classrooms without any language learning support.
A student considered to have an English proficiency level that compromises
meaningful participation in mainstream classrooms is classified as ELL and
receives language learning support, whereas an ELL student redesignated as
fluent English proficient (R-FEP) no longer receives language support because
he or she is assumed to have attained proficiency in English. A student is
never designated R-FEP upon initial assessment; this designation is only
assigned to a student who has qualified for reclassification from a specific
ELL program to a mainstream classroom. Originally, under NCLB, no R-FEP
students were included in the ELL subgroup for accountability purposes.
However, in February of 2004, the U.S. Department of Education established a
policy by which states had the option of including R-FEP students for up to two
years after redesignation within the "ELL" subgroup (U.S. Department
of Education, 2004). All other R-FEP students as well as I-FEP students are represented
in the disaggregation system only to the extent that they also qualify under
racial/ethnic or socio-economic disaggregation guidelines.
Identifying
individual LM students as ELL, I-FEP, or R-FEP is an art practiced quite
differently in different states and districts (see Ragan & Lesaux, 2006).
Although states differ in their patterns of in-migration and in immigrant
residential patterns, differences in classification criteria no doubt account
for much of the variability across states in the ratio of ELLs to LMs (14% in New Jersey, 65% in New Mexico). If we are to protect the equal rights of all students to
appropriate services and monitoring, LM students identified for ELL services in
New Jersey should also be so identified in New Mexico, and vice versa. This
is currently demonstrably not the case.
We argue in this
paper that the entire range of LM students deserves the opportunity for special
educational attention and that, as a population, their academic achievement
should be monitored over the long term. Consistent with the logic behind
disaggregated reporting for racial groups and economically disadvantaged
students, a permanent LM learner sub-group would recognize that all students in
this group are at elevated educational risk on the basis of entering school
with a primary language other than English. Attending to the entire population
of LM students would have at least three advantages over the current practice
of targeting only students classified as ELL. First, it would allow for a
rational approach to monitoring progress over time by identifying a stable
group of students, rather than a group with constantly shifting membership.
Second, it would provide a uniform standard across states and districts for
defining the population to be served. Third, it would promote the idea that
all students at risk for school failure on the basis of their LM status must be
served, rather than simply those that fall below an arbitrary (and often poorly
measured) standard for English proficiency.
Disaggregation Categories
We argue that
identifying the whole population of LM students as a disaggregation category
when calculating AYP would contribute to improved achievement for LM students
in general, as well as for the ELL tail of the LM population. Such a
rethinking would restore interpretability to results about progress toward
proficiency for this subgroup, and would increase instructional attention to LM
students who are not considered ELL but still lack skills needed for academic
success.
As noted above,
NCLB implements a minimally nuanced system of categorizing language minority
learners, distinguishing only those limited in English proficiency, those fully
proficient, or a third, intermediate category of formerly limited but now
redesignated. The ELL classification is designed to be temporary, unlike all
the other disaggregation categories (gender, race, limited income) for which
results are reported. Furthermore, exit from the ELL category is premised on
performance on tests that are part of or very like the accountability
assessments themselves. This creates a problematic situation when trying to
estimate the size of this population, monitor its academic achievement, and
determine the factors that influence its progress, and in particular, factors
that are most related to academic success. The ELL population brings into
particularly sharp focus the problem of subgroup performance being interpreted
using cross-sectional analyses, rather than longitudinal analyses that
represent growth in academic achievement of fixed subgroups of learners.
A specific irony
inheres in the use of mainstream accountability assessments with ELLs. The
expectation is that the ELL subgroup within schools will achieve the level of
performance that defines AYP, just like the other subgroups. But as soon as an
individual ELL student gets to a point of scoring pretty well on the
assessment, s/he is likely to be reclassified as FEP, which of course reduces
the likelihood that the ELL subgroup will make progress. So, on the one hand
there is pressure on schools to keep students classified as ELL so as to
improve that subgroup's performance to meet the goals of NCLB's Title I. On the
other hand there is incentive to reclassify as soon as possible; NCLB's Title
III—the lesser discussed cousin to Title I that provides much of the funding
for programs to serve ELLs—requires that states establish and enforce goals for
ELLs' progress in English, including increasing the percentage who reach
proficiency.
The basic flaw in logic
is treating, for purposes of monitoring academic progress, a temporary category
of ELLs just like a fixed category such as African-American. Membership in
racially and ethnically defined subgroups is not temporary; thus, it is
entirely possible for the subgroup in a school or district to show AYP, and the
performance of the subgroup is informative to policy makers and practitioners
whether it shows progress or not. But because ELLs get reclassified on the
basis of assessments like those used for mainstream accountability purposes,
the schools which are most successful at moving ELLs quickly out of special
programs are punished the most severely by not being able to represent the
scores of the most successful learners in that subgroup. Alternately, those schools
who are successful at improving the academic skills of their ELLs but do not
move them out of the ELL subgroup may show AYP but face consequences under
their state's implementation of Title III. In addition to the inherent
tensions and dilemmas that districts face in serving this population of
learners and retaining appropriate funds to do so, a practical consequence is
that the statistics on the academic achievement of LM learners are based only
on those students with a formal designation (LEP or ELL). They do not include
those who have gained the proficiency in English language needed to participate
in grade level classes without supports. Thus, for example, districts with
good preschool and kindergarten programs that produce students classified as
FEP early in their school careers are in effect punished by losing those high
achievers from their ELL category. Thus, using ELL rather than LM as the
designator almost certainly underestimates the achievement outcomes of the
overall population of LM learners, contributes to the public perception that
immigrant groups are not learning English, and distorts the information
available to districts, states, and the federal government.
Policy-makers
have—to some extent—acknowledged this irony, as indicated by the 2004 rule to
allow the inclusion of R-FEP students for up to two years after redesignation.
Although such a stop-gap measure may allow some schools to avoid a failing
label (if they happen to have a large population of these recently redesignated
students performing well), it does little to address the fundamental problem of
treating a temporary category as if it were fixed. Similarly, Senator Mike
Crapo's recent proposal to improve NCLB would make the 2004 provision permanent
and add an additional year in which R-FEP students can be counted. Although
the Crapo proposal would increase flexibility for schools, the new cut-off
point is no less arbitrary than previous ones. The reality of LM learners is
that they approach the proficiency of English-only learners gradually, that
they constitute a subgroup of importance, and that their academic development should
be supported and monitored over the full span of time from initial exposure to
English until full proficiency is achieved.
A fixed LM learner
category would not only address this irony but would also be coherent with the
underlying logic of disaggregated reporting—that schools must meet the needs of
students at elevated risk for educational failure on the basis of their
demographic characteristics. Just as students of color and students coming
from economically disadvantaged homes are at elevated risk, the entire
population of LM learners is at elevated educational risk, particularly for
reading difficulties, but also for school failure more generally (August &
Hakuta, 1997; August & Shanahan, 2006). Some might argue that a fixed LM
learner category is inappropriate because educational risk is not equal for all
LM learners. Certainly it is greater for students with less proficiency in
English. Although it is undoubtedly true that not all LM learners are at the
same level of educational risk, this is equally true for the sub-group of
African-American students or students receiving free/reduced lunch. Because we
define those sub-groups based on their existence as populations with elevated
risk on average, we do not limit these sub-groups to those students most
at risk (e.g., by excluding from the economically disadvantaged category
students whose parents are pursuing graduate degrees or by limiting the
African-American category to students with a history of low achievement).
Thus, it is a rational extension of this logic to include all LM learners,
including those at very high risk due to their very limited conversational
English skills, those at medium risk due to limited academic English, and those
at only slight risk due to previous success in acquiring basic and academic
English skills.
Another potential
disadvantages of identifying a fixed LM learner category is that it risks the
promotion of a deficit model of bilingualism, in which the potential benefits
of learning two languages are ignored. Some might worry that the permanent LM
label would promote the tracking of students beyond the time when they need
support for language learning. However, a fixed LM learner category would be
no more pejorative than a permanent African-American category, and would lead
to no more segregated classrooms than already exist (any more than including
racial sub-groups in NCLB can be argued to promote racial segregation).
Educators recognize that growing up as an African-American in the United States not only limits access to certain resources (namely economic and social
capital of particular kinds) but also provides access to resources (especially
cultural and social capital of other kinds). As we raise awareness of the
needs of LM learners, we should help educators recognize that LM learners enter
school not only having had limited access to certain linguistic resources
(especially exposure to academic English) but also having had special access to
linguistic resources (related to learning two languages). We argue that
including the entire population of LM learners would in fact not promote a
deficit model of bilingualism, but rather could dispel such a conception in so
far as it would allow us to track the success of all these learners, not just
document the struggles of a sub-group defined by their low test scores.
Moreover, a fixed
LM learner category would allow for greater attention to those LM learners who
may have not held the ELL designation for many years but continue to struggle
with the academic demands of school. There is little—if any—evidence to
suggest that redesignation as fully English proficient is a reliable and valid
predictor of ability to succeed in mainstream classrooms without any language
support (e.g., Linquanti, 2001). There is considerable pressure in many states
to reclassify students as soon as possible, and in California, Arizona, and Massachusetts special educational settings for ELL students are available for
only one year. If redesignated students' scores are reported for only two
additional years, the entire span of time during which support is available and
academic achievement is monitored may be as little as three years. For many
children this is too short. Furthermore, the learning challenges on which
redesignation decisions are based may not adequately represent the actual
demands of the curriculum, in particular the demands for understanding and
producing academic language. Of special note is that for a large proportion of
the LM population the three years of language support are kindergarten, first,
and second grades, when exposure to academic language—in the classroom and in
print—is severely limited.
Academic Language for Academic
Success
Many skills are
wrapped up in the notion of academic language. Vocabulary knowledge (including
the multiple meanings of many English words), the ability to handle increasing
word complexity and length, understanding complex sentence structures, and
extended discourse structures are all aspects of academic language (e.g.,
Scarcella, 2003). For example, among second graders being read a storybook,
several LM students missed the meaning of a paragraph on account of the
sentence: The mother made him get out and he ran off. In this
case, made did not carry the most frequent meaning, to create or
build, but rather the less common causative meaning to force. Aspects
of academic language relate to the language of text, including the organization
of paragraphs, the function of transitions such as therefore and in
contrast, and a wide range of vocabulary that appears far more often in
text than in oral conversation. Consider this sentence: John was very hungry
despite having just eaten a large plate of beans and rice. The term
“despite” is key to the meaning of the sentence yet is not a term typically
used in everyday conversation with school-aged children.
Academic
vocabulary plays an especially prominent role in the upper elementary, middle,
and high school years as students read to learn about concepts, ideas, and
facts in math, science, and social studies classes. In these classrooms, LM
learners encounter many words that are not part of everyday classroom
conversation (e.g., analyze, sustain) yet are key to
comprehension and the acquisition of knowledge.
Thus, LM learners
with the ELL designation may be reclassified at first grade and may indeed be
“proficient” enough for the language demands of the primary grade classrooms
and texts. But as those learners move through school, they may be fully
dependent on classroom experiences to ensure they develop sufficient
sophistication with academic language to function in the upper elementary,
middle, and high school years, whereas English-only classmates have resources
outside school to help them develop academic language. Thus, the same learners
classified as fully proficient in the primary grades may later lack the level
of proficiency needed to participate meaningfully in mainstream classrooms
without specialized language support.
Indeed, many LM
learners who face academic challenges in the years beyond the primary grades
have been enrolled in US schools since kindergarten and do not have a formal
designation licensing support services for language development. These
learners typically have good conversational English skills, but may lack much
of the academic language that is central to text and school success. Several
studies of elementary/middle grade minority learners--whether formally
designated ELL or not—revealed that their vocabulary levels were between the 20th
and 30th percentiles (Carlo et al., 2004; Francis et al.,
2006; Kieffer & Lesaux, in press; Proctor, Carlo, August, & Snow, 2005;
Tabors, Páez, & Lopez, 2003). Yet, the role that language plays in
determining students’ success with academic content cannot be overstated. Proficient
use of—and control over—academic language is the key to reading comprehension
and content area learning. Lack of proficiency in academic language affects LM
learners’ ability to comprehend and analyze texts in the years beyond the
primary grades, limits their ability to write and express themselves
effectively, and can hinder their acquisition of academic content in all
academic areas, including mathematics.
For example, in
the domains of math and science the sentences below may appear on worksheets or
in problem sets. For each sentence, students need a specific understanding of
at least two words that are considered academic in nature and are likely to be
encountered only in print.
1)Directions:
Make and record three observations.
2)Calculate
the sum of the first eight terms of the sequence.
Thus, many LM
students—whether designated ELL or not—are likely to struggle with academic
language, especially in the years beyond the primary grades. However, at this
time there is no mechanism in place to systematically monitor the progress of
the population as a whole. Monitoring of the subset typically lasts only a few
years. If a longitudinal mindset were adopted to monitoring the progress of
all students who were ever subject to designation as ELL, then the oxymoronic
nature of the current system would be reduced. But the problems of
state-to-state variation in designation criteria and the increased academic
risk of many LM students never designated as ELL would only be addressed by
defining the disaggregation category as LM. We argue that this would be the
optimal approach to ensuring both attention to and equity for students who, as
a group, show heightened risk of academic failure.
We do not argue
that the importance of English proficiency for LM learners should be
downplayed, nor do we argue that the temporary ELL designation should be
abolished. The careful and valid identification of those students most in need
of additional language support for accessing the curriculum at any given grade
level is essential for providing equal opportunities to learn for these
students. In fact, we assert that a uniform and psychometrically valid
standard for classifying students as ELL imposed nationally would have enormous
benefits for ensuring that learners are provided with more equitable support
services across states and districts. However, for the reasons described above,
such a standard should not define the category by which a school's
success in promoting the learning of the LM population is judged.
Creating a fixed
category for LM status is, as previously noted, only one of two issues that
need to be addressed in order to increase the positive impact of NCLB on LM
learners. The second (related) issue, discussed in the section that follows,
focuses on the actual assessments used to evaluate and monitor LM learners’
language development and academic progress (in science and mathematics as well
as English language arts). Standardized and standards-based measures play a central
role in defining this population and monitoring its academic achievement, as
well as in making decisions about the placement of individual LM learners
(e.g., reclassification). Several questions about content and the predictive
validity of these assessments must be addressed in order to establish that the
system is indeed serving these learners effectively.
Issue #2. Assessments of Language
Development and Academic Progress for LM Learners
Language Proficiency Assessments
NCLB had the
beneficial effect of initiating the development of language proficiency
assessments for use in the required monitoring of ELL students nation-wide.
Although many states had already been assessing some of their LM learners’
proficiency and academic progress, at various levels and for varying purposes,
NCLB has ensured that these data are systematically collected and reported,
using valid instruments. The potential benefits here include increased breadth
and improved psychometric properties of the tests being used to evaluate and
monitor the development of LM students’ English proficiency – whether in
speaking, listening, writing or reading. These also include the potential to
shift instruction for ELLs to reflect standards for English language
development. The dangers, however, relate to a persistent lack of emphasis
within the language proficiency assessments on the complex academic language
needed for success in content area classes, and thus the risk that this key
domain be neglected in instruction for ELL students.
Though U.S. schools have been classifying incoming students as LEP since the early 1970s, the
criteria for those decisions and the tools available to make them were
generally unsatisfactory. Indeed, there was considerable ambiguity about the
basis for providing services to LM students, i.e., whether simply a low level
of English knowledge or discrepancy between knowledge of English and knowledge
of another language should trigger access. More serious, though, was the
absence of well-designed and rigorous tests that reflected the skills that
would be a sensible basis for sending students to mainstream classrooms.
Probably the most widely used assessment between 1980 and 2000 was the Language
Assessment Scales (LAS; de Avila & Duncan, 2005), a test that, although
appropriate for assessing basic proficiencies, is not linked to any particular
academic outcomes, and has insufficient alternate forms or psychometric
sensitivity to be used to monitor progress (e.g., Pray, 2005; Del Vecchio &
Guerrero, 1995). The test measures speaking, listening, reading and writing
skills in either English or Spanish for K-12 students; most often it is
administered in English in order to determine proficiency level for placement
in, and exit from, programs for ELLs. Given its emphasis on reading and
writing as well as oral proficiency, the LAS itself represented a vast
improvement on the Bilingual Syntax Measure (Burt, Dulay, & Hernandez
Chavez, 1976), the instrument most widely used previously, which focused
exclusively on oral proficiency, and primarily in conversational contexts. In
spite of this, in the context of NCLB requirements, the LAS is not optimal as a
tool for classification, reclassification, or progress monitoring.
Thus, in the wake
of NCLB's requirement that all ELLs be tested annually on English speaking,
listening, reading, and writing skills, several proficiency tests have been
developed (some states developed their own tests, while other states formed
themselves into consortia to reduce development costs). Presumably these new
tests will be used as part of the system for classifying students as ELL or
formerly ELL as well as to meet the accountability requirements of NCLB's Title
III; since their psychometric properties are almost certainly better than those
of tests used previously, they will probably represent an improvement. It is
worthy of note that Title III emphasizes the importance of monitoring ELLs’
growth in English proficiency over the range from beginner to intermediate, not
just redesignation, as a purpose for assessment. Thus, to meet Title III
expectations, English Language Development tests have to be sensitive to growth
across a wide range of proficiency levels.
Furthermore,
whereas the previously used tests focus primarily on social language as opposed
to academic content, the state-developed tests are typically aligned with state
standards for second language development. In addition, the NCLB guidelines
for these proficiency assessments specify reading and writing as well as
listening and speaking, which may help educators to conceptualize language
proficiency as more than basic oral conversational skills. In theory, this
feature should promote greater instructional emphasis on academic language and
content. However, the guidelines do not specify anything in particular about
academic language or predicting performance in content area classrooms; after
about 3rd grade, it is performance in content areas, not just proficiency in
English, which constitutes academic success. Thus, the likelihood of being
able to do the work in math, science, social studies, and literature study
ought to be the criterion against which LM learners’ progress is monitored. To
the extent that 'academic performance' is included in current tests, it is
typically operationalized as alignment with English Language Arts standards,
rather than as the full range of skills needed for achievement across the content
areas. Some districts and states include grades or performance on achievement
tests as additional criteria for exiting ELL programs (Ragan & Lesaux,
2006), but again ELA performance is the most likely to be the focus.
Figure 1 provides two examples drawn
from Grade 6-8 Texas' Reading Proficiency Test in English (TRPT), one of the
older state tests of English proficiency. Although it is a reading test with
some content-based items, the TRPT shows a strong emphasis on basic vocabulary
and everyday uses of language, such as telling time or reading a calendar, even
in this version of the test for middle school students. These items might be
very valuable in measuring students’ status and growth in basic English
proficiency. English proficiency tests should be sensitive to
differences in students’ proficiency at the low ranges, particularly for the
purposes of informing instruction and monitoring the growth of students who are
very recent immigrants. However, tests used to make decisions about transitioning
students to mainstream instruction or used to determine whether schools are
serving the needs of ELLs at all levels of proficiency must measure
challenging academic language skills.
Evidence about the
limitations of the English Language Proficiency (ELP) assessments as indicators
of academic language comes from an analysis of cross sectional data from an
entire state, comparing ELLs’ performance on Title I and Title III assessments
(Francis, 2006). Francis’ goal was to see how ELP scores relate to content
area scores, as background to thinking about an index useable for
reclassification decisions that might incorporate both. The descriptive data
he presents make clear that time in the U.S. relates much more strongly to
increases in scores on the ELP assessment than to scores on the math or English
language arts assessments, suggesting that academic language skills which
depend on more than just exposure to English are key to performing well on the
latter but not the former. While ELP scores did predict performance on both
English language arts and math accountability assessments, the reading,
writing, and listening subtests were much more powerful predictors than the
speaking subtest, another indicator that definitions of proficiency useful for
predicting academic outcomes should not focus too heavily on conversational
skills.
The absence of
sufficient attention to the full range of meaningful indicators of academic
performance on the proficiency tests may be biasing instruction in the
bilingual and Structured English Immersion (SEI) classrooms that serve ELLs
toward conversational English or reading/writing the simple narrative texts
that predominate in literacy instruction in the primary grades. Of course, the
different language proficiency tests vary enormously in the degree to which
they attend to higher-level academic performance. Even the format of the
listening and speaking components of the assessments vary widely. California and Arizona use individually administered tests with standardized listening and
speaking prompts whereas Massachusetts and Texas have teachers, trained and
qualified on the measure, administer a tool to assess language proficiency by
observing students performing academic and social tasks in the classroom, over
a period of time. The (structured) observations are taken in conjunction with
a rubric that focuses on students’ ability to communicate, and result in a set
of scores that reflect language proficiency. It is an open question which of
these two approaches will have the greatest benefits for students. Although
there is very little empirical evidence comparing these two approaches, we
might suspect that the standardized test approach will likely yield more
reliable and comparable scores whereas the rubric approach, if well
implemented, may provide a more valid measure of students' performance in class
and has the potential to raise teachers' awareness of the importance of
academic language skills, given the structured observations of language in
which they must engage.
The range of
skills associated with reading and writing proficiency, successful use of
academic language, and adequate oral English are difficult to characterize for
either native speakers or for second language learners (e.g., Scarcella, 2003;
Bailey & Butler, 2003). However, it is very clear that, in most states,
the expectations for language minority students are derived from English
Language Arts standards developed for native speakers, and thus may well be
quite irrelevant to success in reading and writing for purposes of learning
math, science, or social studies. As previously noted, it is important to
distinguish academic from conversational language skills, and noteworthy that many
of the LM Learners who struggle academically have well-developed conversational
English skills. Thus, there is a need for academic language to be a prominent
feature of language proficiency assessments, especially as it relates to
content area material.
Finally in addition to
considerations of the content and validity of Language Proficiency assessments
for LM students, a significant danger with the shift to an assessment-based
system relates to the way in which the assessment results will be used.
Although the proficiency tests have been, and will be, designed for the
purposes of evaluating schools’ success in moving ELLs toward English
proficiency, states and districts undoubtedly use these assessments for other
purposes for which they are not designed. For instance, California districts
routinely use the California English Language Development Test for initial
placement, annual monitoring, and reclassification of students, as well as to
inform decisions about interventions for struggling learners. Given the
psychometric properties of the test and the complexity of the language
proficiency construct, this single measure cannot possibly serve all of these
four purposes well. A test designed strictly to identify whether a learner is
above or below a particular threshold to make a re-designation decision is
likely to be insensitive to fine distinctions such as those between beginning
and early intermediate students. This same test is likely to provide little,
or no, information on which to base interventions for individual children who
are struggling. Such practices are examples of inappropriate and unethical
test use[1];
in these instances the benefits of tracking the academic achievement of the
population are in fact outweighed by the costs of using the tests in
inappropriate ways.
Mainstream Accountability Assessments
Under NCLB
guidelines, students classified as ELLs must be included in the state
accountability assessments after one year in U.S. schools, and must be tested
entirely in English after three years[2].
Here the potential benefits include tracking the academic achievement of these
learners and ensuring that districts, schools, and teachers incorporate this
population in their instructional plans and efforts. In contrast, the very
significant potential danger derives from the fact that any test is to some
degree a language test and thus, that LM students’ scores are not necessarily a
reflection of their academic ability, but instead a reflection of degree of
understanding of the test items and the academic language needed for content
area success.
It might seem that
ELL students are not particularly disadvantaged in performance on some content
areas of standards-based tests. For example, math is thought of as a rather
language-free zone, especially in the elementary school years. A very common
misconception about mathematics is that it is a “universal language,” one that
is synonymous with numbers and symbols, and a “culture-free” static body of
knowledge. Clearly, especially in the elementary grades, learning of
mathematics is verbally mediated; the association of verbal labels to
mathematical forms and expressions is common (e.g., Lager, 2006). Mathematics
language is often a specialized form of natural, conventional language and
requires a re-interpretation of the way language is used in everyday settings
(e.g., Cuevas, 1984). Much instruction and assessment in mathematics
curriculum occurs via discourse and text that is characterized by academic
language (e.g., Cazden, 1986).
A careful look at
the math items on many state tests, in particular the more challenging state
tests, reveals that they make enormous language and reading comprehension
demands on students. Some of these demands derive from unnecessary linguistic
complexity that has nothing to do with the central domain being assessed and thus
calls into question the validity of inferences based on the scores (e.g.,
Abedi, Lord, & Hofstetter, 1998). However some of these demands are
intricately related to the conceptual understanding and application involved in
the domain. Figure 2 shows two released items from the California Standards
Test for grade 6. In each, notice that there is unnecessarily rare vocabulary
that may be unknown to some ELLs, such as orchard, harvested, acre,
and band of a hat, as well as unnecessarily complex sentence structure,
such as the verb phrases could be solved and is shaped. However,
each item also presents linguistic complexity that is central to the
mathematical concepts being assessed, including the math vocabulary proportion,
cylinder, measure, and diameter.
There is
substantial evidence that the size of the math achievement gap between ELLs and
native English speakers differs as a function of the language demands of the
items (e.g., Abedi, Lord, & Hofstetter, 1998; Abedi, 2003). As noted
above, some of these language demands are irrelevant to the measurement of math
achievement, but derive from the academic language central to understanding and
solving mathematical problems. This is evident through studies showing that
even when the unnecessary linguistic complexity (e.g., complex syntax) of math
test items is removed, ELLs often perform no better than they did on the
original items and continue to perform substantially worse than native English
speakers (e.g., Abedi, Courtney, & Leon, 2003; Abedi, Courtney, Mirocha,
Leon, & Goldberg, 2005; Abedi, Hofstetter, Baker, & Lord, 2001; Abedi,
Lord, & Hofstetter, 1998). This suggests that ELLs in the U. S. are not being taught the necessary academic language involved in doing
sophisticated mathematical problem solving.
As a way to
address some of concerns about the language demands of content area
assessments, NCLB requires that states provide testing accommodations for ELLs,
such as English dictionaries, bilingual dictionaries, extra time, and native
language versions of the test. However, states have not received guidance
about which accommodations to provide or how to provide them, which has led
some states to adopt accommodations for ELLs that have no theoretical
justification, such as preferred seating or testing in small groups (Rivera
& Collum, 2006). To date, there is little supporting evidence for the
efficacy of even those accommodations considered sensible and appropriate in
addressing ELLs' language difficulties without changing the construct being measured;
a recent review and meta-analysis of eleven experimental studies comprising 37
samples found the most commonly used accommodations to be largely ineffective
(Francis, Lesaux, Rivera, Kieffer, & Rivera, 2006). Providing bilingual
dictionaries or native language versions of the test, although often touted as
the most fair treatment, does not necessarily yield higher or more accurate
test scores for ELLs; there is some evidence that ELLs perform no differently
when provided bilingual dictionaries and even perform worse when tested in
their native language. Of course, the efficacy of native language tests or
dictionaries depends greatly on whether the students have ever learned the
material being tested in their native language. Francis et al. did find that
providing English dictionaries in some cases was somewhat effective, but had
only a very small, if significant, effect on narrowing the substantial gap in
content area performance between ELLs and native English speakers.
Although
test-makers must ensure that they do not introduce unnecessary linguistic
complexity, educators must also realize that better assessments will not
eliminate the real differences in content area achievement between ELLs and
their native English speaking peers. Academic language is indispensable in
presenting higher grade-level material in every content area and in providing
all students the skills they need to function at higher levels in those
subjects. Having tests that reduce the use of complex, academic language is in
fact a disservice to these learners if those tests are also omitting crucial
content needed for academic success. Ultimately, the LM population must receive
high quality instruction—which includes an emphasis on the language of the
domain—in content areas, and in doing so be held to the same academic standards
as their English-only peers.
Summary
Transcending the
historic lack of emphasis on tracking the achievement patterns of all LM
learners across the nation, and ensuring their academic growth, one of the
significant benefits of NCLB has been an increase in awareness of the academic
needs and achievement of students from non-English speaking homes; schools are
now accountable for teaching English and content knowledge to these learners.
There is little disagreement about the spirit of the law as it relates to LM
learners, that is, to ensure that states and districts meet their academic
needs. However, a prerequisite for tests administered on a large scale is that
they be valid and capable of ensuring equitable outcomes. This is a particular
challenge for tests administered to LM learners, given the complexity of second
language acquisition, the differences in language proficiency within the
population, and the difficulties in designing tests in which language
proficiency is not one of the primary skills measured.
Although we are
not opposed to the use of a test-based system to hold schools and districts
accountable for serving LM students, to be successful such a system requires a
rational approach to defining the population and careful attention to the valid
assessment of their skills. This growing population indeed deserves to be part
of the accountability system if in fact, as planned, accountability results in more
systematic and thoughtful delivery of educational services to meet their
needs. This is particularly important for a population that has historically
been underserved in many ways. However, the current design of the system under
NCLB, in particular the procedures for defining the ELL population and devising
assessments for them, fails to serve the purposes of the law in a number of
ways. To lessen the negative and increase the positive impact, and to be
consistent with the spirit of the law, subsequent attention must focus on:
A national definition and operationalization of the
constructs of LM learner and ELL:
1. LM: students from homes where
the primary language of use, as reported by parents, is not English. LM
learners would then constitute a fixed category for the purposes of data
reporting under NCLB. This would eliminate the problem of an inauthentic and
inaccurate picture of achievement among LM learners that is created when the
focus is only on a small subgroup—which happens to represent the tail end of
the distribution of language and literacy achievement—of the population. This
would not preclude the need for a distinction between ELL and LM in order to
determine which students receive more intensive support services.
2. ELL: the subset of the LM
population who need intensive language support services in order to participate
meaningfully in mainstream classrooms, based on English language proficiency
measures that reflect academic language skills and provide valid inferences for
students' future success. The use of multiple measures and procedures to
identify this subset of the LM population should be made uniform across states
and districts. An ELL student redesignated as fluent English proficient (R-FEP)
no longer receives language support because he or she has attained proficiency
in English; such students, however, remain in the LM group for the purposes of
accountability and progress monitoring.
The need for academic language to play a well-defined role
in the assessment of language proficiency. Different language proficiency
assessments place very different degrees of emphasis on academic language
as an indicator of progress toward full proficiency. A systematic study
of the progress of ELLs (both progress in getting reclassified, and
concomitant performance on state accountability and NAEP assessments)
could exploit this natural experiment to explore the impact of different
test designs on long-term student performance, and establish the extent to
which reclassification is a valid indicator of ability to thrive in mainstream
content area classrooms. Results of this research would address basic
policy questions: On what grounds should ELLs be reclassified? Do lax or
stringent criteria for reclassification make a difference in long-term
outcomes? How much support do reclassified ELLs need to access classroom
learning?
The need for content area assessments to reflect the
academic language demands of the content and the concepts reflected by
that language. While we are not arguing for unnecessary complexity in the
tests, instruction for LM students must reflect high academic standards
and provide the opportunity for them to develop the academic language of
content areas.
The need for tests and measures to be used in an
appropriate and ethical manner. Currently, many tests are being used for
multiple, competing purposes despite designs and psychometric properties
that do not support such use. The implementation of the Reading First
provisions of NCLB included recommendations concerning assessments to be
used, as well as guidelines for preparing teachers to use the information
derived from assessment. A similar set of policies should be introduced
for those teaching the LM population: a set of assessments together with
guidelines for their use, and guidance to teachers about interpreting test
results as a basis for planning instruction. Furthermore, incentives for
the development of multiple measures of language and content for LM
students would help ensure the availability of tests to serve different
functions (e.g., diagnosis, placement, progress monitoring)
appropriately. A test that reliably monitors the achievement of students
at the population level cannot simultaneously provide student-level
information that is useful for purposes of placement and/or to select
interventions (e.g., to identify if students need code-focused,
fluency-focused, or oral language-focused interventions); thus a wide
array of psychometrically sound measures is needed. The development of
multiple measures would also increase opportunities to establish validity
of the measures for LM learners
Figure 1. Items from Texas' Reading Proficiency Test in
English for grades 6 - 8.
Figure 1. Items from Texas' Reading Proficiency Test in
English for grades 6 - 8.
Figure 2. Items from the California Math Standards Test
for grade 6
References
Abedi, J., Courtney, M., & Leon, S. (2003). Effectiveness
and validity of
accommodations for English
language learners in large-scale assessments. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Testing.
Abedi, J., Courtney, M., Mirocha, J., Leon, S., &
Goldberg, J. (2005). Language
accommodations for English language
learners in large-scale assessments: Bilingual dictionaries and linguistic
modification. Los Angeles, CA: Center for Research on Evaluation, Standards,
and Student Testing.
Abedi, J., Hofstetter, C., Baker, E., & Lord, C. (2001).
NAEP math performance test
accommodations: Interaction with
student language background. Los Angeles, CA: Center for Research on
Evaluation, Standards, and Student Testing.
Abedi, J. (2003). Impact of student language background
on content-based performance:
Analysis of extant data. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Testing.
Abedi, J., Lord, C. & Hofstetter, C. (1998). Impact
of selected background variables on
students’ NAEP math performance.
Los Angeles, CA: Center for Research on Evaluation, Standards, and Student
Testing.
August, D., & Hakuta, K. E. (1997). Improving
schooling for language-minority
children: A research agenda.
Washington, DC: National Academy Press.
August, D. & Shanahan, T. (2006). Understanding
literacy development in a second
language: The report of the
national literacy panel. Mahwah, NJ: Lawrence Erlbaum Associates.
Bailey, A. L., & Butler, F. A. (2003). An evidentiary
framework for operationalizing
academic language for broad
application to K-12 education: A design document. Los Angeles, CA: Center for Research on Evaluation, Standards, and Student Testing.
Burt, M. K., Dulay, H. C., & Hernández-Chávez, E.
(1976). Bilingual syntax measure.
San Antonio, TX: Harcourt, Brace,
Jovanovich, Inc.
Carlo,
M. S., August, D., McLaughlin, B., Snow, C. E., Dressler, C., Lippman, D. N.,
Lively, T. J., & White, C. E.
(2004). Closing the gap: Addressing the vocabulary needs of English-language
learners in bilingual and mainstream classrooms. Reading Research Quarterly,
39, 188-215.
Cazden, C. (1986). Classroom discourse. IN M.C. Wittrock
(ed.), Handbook of research
on teaching (3rd
ed.) (pp. 432-463). New York: MacMillan.
Cuevas, G. J. (1984). Mathematics learning in English as a
second language. Journal for
Research in Mathematics Education,
15(2), 134-144.
Del Vecchio, A. & Guerrero, M. (1995). Handbook of
English language proficiency tests.
Albuquerque, New Mexico: Evaluation Assistance Center.
Development Associates (2003). Descriptive Study of
Services to LEP Students and LEP
Students with Disabilities.
Volume I: Research Report. Report submitted to U.S. Department of
Education, OELA. Arlington VA: Author.
de Avila, E. A., & Duncan, S. E. (2005). Language
assessment scales, English.
Monterey, CA: CTB MacMillan
McGraw-Hill.
de
Jong, E. J. (2004). After exit: Academic achievement patterns of former English
language learners. Education
Policy Analysis Archives, 12 (50), 1-18. Retrieved November 3, 2006 from http://epaa.asu.edu/epaa/v12n50/.
Francis, D.J. (2006). Bridging Title I and
Title III assessment and accountability. Unpublished manuscript.
Francis, D. J., Snow, C. E., August, D., Carlson, C. D.,
Miller, J., Iglesias, A. (2006).
Measures of reading comprehension:
A latent variable analysis of the Diagnostic Assessment of Reading
Comprehension. Scientific Studies of Reading, 10(3), 301-322.
Francis, D. J., Lesaux, N., Rivera, M., Kieffer, M. , &
Rivera, H. (2006). Research-based
recommendations for the use of
accommodations in large-scale assessments. Portsmouth, NH: Center on
Instruction.
Gandara,
P., Rumberger, R., Maxwell-Jolly, J., & Callahan, R. (2003). English
learners in
California schools: Unequal
resources, unequal outcomes. Education Policy Analysis Achives, 11 (36),
1-52. Retrieved November 3, 2006 from http://epaa.asu.edu/epaa/v11n36/.
Kieffer, M. J., & Lesaux, N. K. (in press). Breaking
down words to build meaning:
Morphology, vocabulary, and reading
comprehension in the urban classroom. The Reading Teacher.
Lager, C. A. (2006). Types of mathematics-language reading
interactions that
unnecessarily hinder algebra
learning and assessment. Reading Psychology, 27, 165-204.
Linquanti, R. (2001). The redesignation dilemma: Challenges
and choices in fostering
meaningful accountability for
English learners. Policy report to University of California Linguistic Minority
Research Institute.
Pray,
L. (2005). How well do commonly used language instruments measure English
oral-language proficiency? Bilingual
Research Journal, 29 (2), 387-409.
Proctor,
C. P., Carlo, M., August, D., & Snow, C. E. (2005). Native Spanish-speaking
children reading in English: Toward
a model of comprehension. Journal of Educational Psychology, 97(2),
246-256.
Ragan,
A., & Lesaux, N. (2006). Federal, state, and district level English
language
learner program entry and exit
requirements: Effects on the education of language minority learners. Education
Policy Analysis Archives, 14(20). Retrieved November 3, 2006 from http://epaa.asu.edu/epaa/v14n20/.
Rivera, C. & Collum, E. (2006). State assessment
policy and practice for English
language learners: A national
perspective (pp. 1-173). Mahwah, NJ: Lawrence Erlbaum Associates.
Sattler, J. (2001). Assessment of Children: Cognitive
Applications (4th Ed.) La Mesa, CA:
Sattler Publishing Inc.
Scarcella, R. (2003). Academic English: A conceptual
framework. Los Angeles:
Linguistic Minority Research
Institute.
Tabors,
P., Paez, M., & Lopez, L. (2003). Dual language abilities of bilingual
four-year-
olds: Initial findings from the
Early Childhood Study of language and literacy development of Spanish-speaking
children. NABE Journal of Research and Practice, 1(1), 70-91.
U.S. Department of Education. (2004). Fact sheet: NCLB provisions ensure
flexibility
and
accountability for limited English proficient students. Retrieved on November
3, 2006 from http://ed.gov.
[1]
For a discussion of appropriate and ethical test use, see Sattler, 2001
[2]
Originally, NCLB required assessment of all students but a 2004 decision
allowed states the option to give recent immigrants a single year until they
are required to be tested (U.S. Department of Education, 2004). There have
been proposals to make this option permanent as well as to extend this time
period.