Search just this section


 
Print View
March 2005 - Jeff Koon, Saint Paul

Dear People:

This e-mail is a long explication, being sent only after lots of time/work on it, of some of the major things that are wrong with the State's School Report Cards, and how those relate to the NCLB.  Please consider it in making improvements before the Report Cards for next year.  (I strongly recommend that you print this e-mail out for reference.)  If you have colleagues who should have received this e-mail and didn't, please forward it to them.

    Thank you very much.
    Jeff Koon, Ph.D. (more background information on me is at the end here)
 

INTRODUCTION:  AN EXAMPLE SHOWING SOME OF THE PROBLEMS WITH THE STATE REPORT CARD (& NCLB)

To more effectively convey the general points made in this text, I am including as an example the State Report Card data for Phalen Lake Elementary School in St. Paul.  I've given some of the highlights from that school's profile next.  (However, the last time I checked about three months ago, access to the Phalen Lake data had mysteriously disappeared from the State's website.)

Phalen Lake's demographic characteristics include:  88% Free & Reduced Price Lunch, 48% Limited English Proficient, 11% Special Education.  The ethnic breakdowns are:  44% Asian, 21% White, 20% Black, 14% Hispanic; 1% American Indians.  Almost all of the Asians are Hmong.

According to State/NCLB proficiency calculations, Phalen Lake is failing to make "Adequate Yearly Progress" (AYP) in Reading for 5 of the 6 groups for which it is accountable, and in Math for 1 of the 6 groups for which it is accountable.

Phalen Lake is obviously a pitiful school that is failing miserably.  Or is it?

If we look at Phalen Lake's Report Card for 2004, however, AND, IF we assume that the 2004 Phalen Lake 5th graders, when they were in 3rd grade in 2002, scored like the 2004 3rd graders, THEN many of the 5th graders have made some substantial gains in the last two years.

I'm using only the numbers for Phalen Lake's 3rd and 5th grades from 2004 here:  What would it say about the quality of education at Phalen Lake-- which is clearly among the most demographically "challenged" schools in the State-- if the 3rd graders had gone from 29% proficient in 2002 to 53% proficient in MCA Reading as 5th graders in 2004?  And if the same students had gone from 43% to 59% proficient in MCA math in those two years, while many fewer scored at the lowest level in math (31% vs. 8%)?

Would those be the results of a failing school, rightly labeled as such by the NCLB and by two stars on the State's Report Card, and rightly deserving of the final NCLB-mandated consequences occur?  Will there be a substantially increased likelihood that student performance at Phalen Lake will improve as a result of such consequences?  Or might Phalen Lake be a pretty good school, falsely branded as failing by the NCLB and rightly deserving of more than two stars in any fair report card?

I do have to say "might" because I have only projected backwards from the data for 2004 here.  But I'd still have to say "might" even if I had examined the "trend" data in the back-up portions of the State Report Card.  The trend data enables comparisons of data for 3rd graders for 2002 with data for 5th graders in 2004-- but the trend data are NOT reported only for continuously enrolled students at a given school.  So I can't ascertain student learning gains with any surety, especially for a school with fairly high mobility.  (From District sources, the mobility level for Phalen Lake during the previous school year, 2002-03, was 29%; stability was 88%.)  That implies a turnover of roughly 25% of the school's population during the two school years between 3rd and 5th.  PLUS, there will have been changes in student clientele over the two intervening summers (through Sept. 30)-- changes that don't count as mobility and don't disrupt the school year much, but changes that may well bring lower-achieving students into a school serving predominantly low-income neighborhoods.

So perhaps I'm wrong in surmising that Phalen Lake might actually be a pretty good school.  But if I am wrong, what about some of these other St. Paul schools (this time from trend data):  Ames:  Reading 2002 3rd (24% Proficient) vs. 2004 5th (51%)?  Bruce Vento:  Reading 2002 3rd (27% Proficient) vs. 2004 5th (52%)?  Como Park:  Reading 3rd 2002 (26% Proficient) vs. 2004 5th (51%)?  Highwood Hills:  Reading 2002 3rd (36%) vs. 5th 2004 (55%)?

ONE MORE INTRODUCTORY POINT

Just as is the case in surveys repeated over time-- my field is survey research-- maintaining a degree of continuity from year to year in evaluation/assessment and reporting is a worthy goal.  But when the first iterations of an effort, such as the State's Report Card, are replete with severe weaknesses, it is better to fix it sooner rather than later.  Waiting only postpones major changes, makes continuity worse, and makes more years of portions of the data of little value to the record.  

POINT #1.  MINNESOTA NEEDS TO GIVE CREDIT ON STATE REPORT CARDS FOR LARGE GAINS IN PROFICIENCY
Even if the NCLB does not do so (and it should), Minnesota needs to find ways to give credit in its State Report Cards to elementary (and secondary) schools in which continuously enrolled students are showing large learning gains (and/or other measurements of value-added).  The public-- including parents who are considering enrolling in these schools-- need to know the truth about such schools, including that some are successes.

It is all the more essential to give credit for learning gains because the NCLB will be labeling ("branding"?) as failures (i.e., as not making AYP) more and more elementary schools, including some which produce large gains, because the levels of achievement of one or more subgroups at the schools cannot be ratcheted high enough because the challenges are simply too great to reach the mandated levels of proficiency in the time allotted (e.g., by 3rd grade).  Major State initiatives for students who need pre-schooling and English-language learning before kindergarten, and more funding for educational programming after school, would help student achievement considerably in such schools.  But if schools with challenges like Phalen Lake are to be judged fairly, learning gains/value-added must be the major factor in those judgments. 

POINT #2.  HOW TO REPORT LEARNING GAINS ON STATE REPORT CARDS-- AN ELEMENTARY EXAMPLE
For all elementary schools encompassing grades 3-5, it is a very simple matter to provide some indication on the State's Report Card of the extent of learning gains.

For the NEXT Report Card on schools and districts, the public needs a separate section that shows, for both MCA Reading and MCA Math, the scores for continuously enrolled students who were 3rd graders in Spring 2003 and 5th graders in Spring 2005.  At minimum, the percentage who score as "proficient" should be shown, but much more information is conveyed, including the extent of growth beyond level 3 (minimum proficiency), by the current bar graphs showing all five scores.

With respect to elementary schools and these learning gains, there is no need to wait for a more perfect system to be researched/developed-- refinements made subsequently should result in a part of the report card system very close to what is proposed here.  Moreover, it is unfair to wait, because damage is being done NOW to some school reputations and teacher morale, and parents are being misled as to choice of schools.

(Note:  The approach suggested here is to show the percentage proficient over time for continuously enrolled students.  Differences between new and old tests/standards, and methodological questions involving whether the MCA is designed to measure gains, don't really matter much in our actual context.  The reason this is so is that far higher stakes are already associated with school proficiency percentages, and if you can sanction or condemn a school based on the level of achievement indicated by those numbers, you can certainly examine and report its learning gains based on the same data.  The State is responsible because it has defined "proficiency" at various grade levels, and that is the basis of NCLB judgments.)

(Further note:  If State level data cannot be calculated for continuously enrolled students, that too should be fixed.  The current "trend" data can be used to provide an estimate but, as described above, trend data would be likely produce under-estimates of student learning gains at schools where mobility is high, where new very low-income residents enter the neighborhood while economically more successful residents leave it.)  

POINT #3.  THE FAILURE TO INCLUDE LEARNING GAINS/VALUE-ADDED IN THE STATE'S REPORT CARD EXACERBATES THE INEQUITIES IN THE NCLB
The NCLB needs to be changed to recognize/reward schools in which students show strong learning gains for continuously enrolled students, or schools that otherwise demonstrate substantial value-added (e.g., through comparative analysis, strong positive "residual gains" in a regression analysis, etc.)  Once a definition of what constitutes sufficiently strong gain is developed, schools with strong gains should either be seen as making AYP, OR should be granted safe harbor.  In the interim (i.e., before the NCLB is repaired), such a definition should be used to award stars on the State Report Card.  (See also Point #6.)

Excluding proficiency-related learning gains and other forms of value-added data from the State's Report Card indicates, in effect, that the State has little or no problem with this omission in the NCLB.  Indeed, by not including learning gains in its star-awarding system, the State adds to the negative branding of schools that fail to make AYP, even when students at those schools show large gains.  

POINT #4.  "MAKING AYP" AT HIGHER GRADE LEVELS SHOULD TRUMP "NOT MAKING AYP" IN LOWER GRADE LEVELS IN THE SAME SCHOOL (OR DISTRICT)
Provision should somehow be made to enable an elementary school that meets AYP targets at the highest grade level tested to be granted safe harbor for not making AYP at lower grade levels.

For example, if the 5th graders are making AYP at a school, the 3rd and 4th graders are probably en route to do so too, and the school should not be branded as a failure.  At minimum, the school should be granted safe harbor.

State authorities should inquire with federal authorities whether such an adjustment is legally possible now and, if not, should urge that the NCLB be changed in this regard.

(Note:  For this recommended change to work well without concealing school weaknesses by reducing the size of some demographics subgroups below the threshold of 20 students, some variations on existing criteria would need to be adopted.  For example, when conducting a more in-depth examination of schools that make AYP at 5th but not at 3rd, the minimum sizes for subgroups in grade 3 and 5 would need adjustment downward, because the minimum number of students may well not be reached for all subgroups at both grade levels.  But because the data for smaller numbers of students won't be as reliable, perhaps another minimum requirement could be that there at least a nominal upward trend in each subject tested for all subgroups numbering 10 or more students in both grades 3 and 5.)  

POINT #5.  TO SHOW THE VALUE-ADDED BY A SCHOOL, THE STAR-AWARDING CRITERIA FOR STATE REPORT CARDS SHOULD INCORPORATE RESULTS OF COMPARATIVE ANALYSES FOR ALL SCHOOLS, USING DATA FROM SIMILAR SCHOOLS AND POPULATIONS, OR A REGRESSION ANALYSIS
Currently, State Report Cards give credit for comparative performance only when a school is making AYP.  Limiting the potential to earn credit for comparative performance is yet another way the State Report Card is unfair to schools with low achievement but high learning gains.

In this text, comparative performance data means comparisons are made between schools with very similar demographics, or between demographic subgroups within a school that are compared with the same subgroups in more than one other school.  Probably the best approach to such a comparative analysis is a regression-based approach.

POINT #5A.  DIFFERENCES BETWEEN ELEMENTARY & SECONDARY SCHOOLS AFFECT COMPARATIVE ANALYSES; VERY HIGH-ACHIEVING SCHOOLS MAY NEED DIFFERENT TREATMENT 
The LEVEL of student achievement/performance shown by middle school students (and, even more so, by 2-year junior high students) is heavily contingent on the level of achievement of students who come from the "feeder" elementary schools.  The same is true for high schools vis-a-vis middle schools.  Accordingly, whenever there is only one MCA test in a subject area within a school's grade levels, it is important for making fair judgments of the value added by that school to use some statistical control (e.g., in a regression analysis involving all secondaries of a given kind) for the students' average level of achievement on the last MCA test in the same area that was taken PRIOR to their current school.  Because the prior achievement level of a school's students incorporates not only the effects of previous schooling but also many/most of the effects of the relevant demographic factors (which influenced the results of previous schooling), the remaining effects associated with those demographic factors may be small when prior achievement is controlled first.  However, when there is no viable measure of prior achievement, a somewhat rougher estimate of value-added can be obtained by controlling statistically for the relevant demographic factors.  In either case, the estimates of value-added should be used to award stars on the State Report Cards.

(Note:  In a regression analysis, "residual gain" is the extent of achievement at a school that is not explained by prior achievement or by demographic factors-- so the residual gain is the extent of achievement presumably attributable to the school.  Residual gains are a measure of the extent of value-added.  If a school's residual gain is positive, it has, compared to other schools of its kind, added more value to student achievement; conversely, when residual gain is negative, comparatively less value has been added by that school.)

ALSO RELEVANT IN ELEMENTARIES:  A value-added analysis (i.e., not just learning gains) is relevant in the elementaries too.  There is no other measure of how well elementaries do in their first three grades (prior to the first MCAs).  A value-added (regression) analysis would look at the elementary schools' value-added (residual gains) after controlling for the relevant demographic factors, such as percentage of students in ELL status, in Special Education, and percentage of students eligible for a free or reduced-price lunch.  Most if not almost all of the effects of mobility can be eliminated by eliminating students from the samples who entered their schools after the beginning of kindergarten (or first grade?).

IMPORTANT:  HOWEVER, different provisions probably need to be made for schools that have FEW educationally "challenging" clienteles, and have VERY HIGH levels of student achievement.  In order not to misjudge/disadvantage these schools, sustaining high levels of initial achievement should provide an alternate way to award the stars on State Report Cards that are assigned to value-added (including learning gains).  For example, schools in the Twin Cities wealthier suburbs, which have children mainly from higher-income families and which have high achievement levels, should be expected to sustain their relatively high(er) levels of achievement.  Thus a school that has 100% proficiency in MCA Reading at grade 3 and has 97% proficiency two years later at grade 5 should be seen as having fully sustained its level of performance (because of ceiling effects and the limits of reliability).  But a decline from 95% proficiency to 85% proficiency might cost a school a star.  I have used proficiency levels as an example here, but a better measure might be the average scores obtained over time in Math or Reading.

POINT #6.  THE SYSTEM FOR AWARDING STARS IN THE STATE'S REPORT CARD IS UPSIDE DOWN
The State Report Card unduly limits the extent to which the public can ascertain the effectiveness of MN schools by automatically restricting to two stars those schools that do not make AYP (and more and more schools will fall into this group).  Schools that are not making AYP should be able to earn as many as 4 stars, or even 5 stars in exceptional cases, depending on their student learning gains/ value-added/ comparative performance.  Thus the current calculations are "upside down" in that they limit from the bottom (no more than 2 stars if not making AYP) rather than subtracting from the top (one star typically deducted if not making AYP, with a few exceptions).

Although the NCLB mandates public reporting of school performance by the States, the State should, insofar as legally permissible and evaluatively sound/reasonable, uncouple its "star" rating system from NCLB requirements.  Because the NCLB's approach only looks at student achievement levels (disregarding both challenge factors and student learning gains/value-added), it is a methodologically erroneous and unfair way to judge school quality, the moreso because it is tied to a system of sanctions.  (Indeed, the recent federal emphasis on experimental research methods as THE way to identify educational "best practices" stands in sharp contrast to the rash presumptions involved in how the NCLB calculates AYP, makes judgments, and applies sanctions.)

AYP calculations are already reported in the State Report Cards, so why should the State reiterate the same erroneous approach, over-and-over again as it has been doing (as we've seen above), in its star-awarding system??

Instead, the differences between the State's carefully reasoned and analyzed approach to judging school quality and the NCLB's simplistic and unfair judgments should help to highlight the weaknesses in the NCLB and increase the pressure to improve the law.  Indeed, perhaps the main reason to retain the stars as part of the State's School Report Cards is to seek to show the truth about school performance.

A DUAL SCHEMA FOR AWARDING STARS ON STATE REPORT CARDS (EXAMPLE)
This schema presents a possible structure, with a dual track, for awarding stars within a fairer State Report Card.  Of course this example needs refinement based on closer examination of the actual data, regression analyses, etc. 

Although this schema will need refinement, it is as close to fair as I could make it, evaluatively speaking (while still leaning a little toward NCLB-related perspectives), so the State's final version should look a lot like this schema.

The dual track illustrated below divides schools into those making AYP and those not making AYP, but not counting Special Education students in either case.  Also, within the track that is making AYP, I have not tried to illustrate criteria or "decision rules" that might pertain to demographic subgroups, but some manner of including that information in the schema should also be devised.

In the explanations below, "value-added" is meant to include both directly measured learning gains (e.g., gains in the percentage of students who are "proficient," or gains in average proficiency scores) AND the results of a regression analysis (or some other form of comparative analysis).

As is the case currently, a separate grade needs to be calculated for math, reading, etc.

TRACK 1:  MAKING AYP (excludes Special Education)

5 Stars:  Very high to stellar level of achievement that is sustained from year to year; or
        High level of achievement and value-added significantly above average; or
        Medium level of achievement and value-added substantially above average.

4 Stars:  High level of achievement that is sustained from year to year (and value-added near
            average); or
        Medium level of achievement and value-added significantly above average; or
        Very high level of achievement but a significant decline from previous year (and value-added
            tending to be below average).

3 Stars:  Medium level of achievement that is sustained from year to year (and value-added near
            average); or
        Very high level of achievement but substantial decline from previous year and value-added
            significantly below average; or
        High level of achievement but significant decline from previous year (and value-added tending
            to be below average).

2 Stars:  Medium level of achievement but significant decline from previous year and value-added
            significantly below average; or
        High level of achievement but substantial decline from previous year and value-added
            significantly to substantially below average.

1 Star:  Medium level of achievement but substantial decline from previous year; or
        Medium level of achievement but value-added substantially below average.

TRACK 2:  NOT MAKING AYP (also excludes Special Education)
(Note:  This track is described jointly in terms of learning gains-- applicable mainly if not entirely in the elementaries-- and the assessment of value-added with the use of a regression analysis.  The learning gains part is described in terms of changes in the proficiency level-- because that is the emphasis in NCLB and because many of the schools not making AYP now are low-achievement schools.  However,  average score gains could be used as the primary measure instead.)

5 Stars:  Stellar value-added (e.g., an average 25 percentage point gain in proficiency from grades
        3 to 5, with all viable subgroups showing at least a nominal gain; a regression residual more
        than 2 standard deviations above average).

4 Stars:  High overall value-added (e.g., 15-24 percentage point gain in proficiency from grades
        3-5, with all viable subgroups showing at least a nominal gain; a regression residual between
        1 and 2 standard deviations above average).

3 Stars:  Positive overall value-added (e.g., 5-14 percentage point gain in proficiency from grades
        3-5, with no viable subgroups having a significant decline; a positive regression residual).

2 Stars:  Value-added near average (e.g., -5 to +4 percentage point change in proficiency from
        grades 3-5; regression residual nominally or significantly below average, to 1 standard
       deviation).

1 Star:  Value-added significantly to substantially below average (e.g., loss of -6 or more
        percentage points in proficiency from grades 3-5; regression residual more than 1 standard
        deviation below average).

IMPORTANT NOTES

  1. Not all school measurements will fit into these descriptions.  As mentioned earlier, criteria will have to be adjusted for the actual distributions of results.  For example, if a change in proficiency between +4 and -5 percentage points from 3rd to 5th grade is more than 1 standard deviation below average in the actual data (e.g., in terms of net value added), the two stars awarded in the listing immediately above would be changed to be 1 star, and requirements for stars 3-5 would be raised somewhat.

  2. After the criteria & decision rules have been formulated for these two tracks, the amounts required to be placed in the various categories probably should be held constant for several years in a row in order to have a more criterion-referenced system and to give schools a fixed target.  If new norms are established every year, approximately half of all schools will automatically be below average each and every year.  

POINT #7.  NCLB JUDGMENTS BASED ON ONE SMALL SUBGROUP ARE  QUESTIONABLE
Currently, schools are regarded as not making AYP if even one subgroup fails to make AYP, irrespective of how all the other students at the school perform.

While the subgroup-oriented analyses mandated by the NCLB are extremely valuable in terms of forcing schools to pay attention to all of their subgroups, to achievement gaps, etc., perhaps there should be a safe harbor-like status for schools in which only one small subgroup fails to make AYP-- especially the Special Education subgroup (or if an ELL contingent consists of many students relatively new to the country).  However, my statement here assumes that proficiency levels, or learning gains/value-added, are otherwise at least average in comparison to similar schools.  

POINT #8.  NEVER 100% IN ALL OF HISTORY?
Probably never in history has there been a large age group which was 100% proficient at something of substantial complexity and importance in their society-- let alone 3 things (reading, math, science).  Yet 100% proficiency by 2014 is what the NCLB requires, such that, as indicated in last year's report by the Legislative Analyst, every school will ultimately be branded as a failure, whether the school is a good school, an average school, or a poor school.  (Possible exception:  a very few small schools in high-income areas with few subgroups may drift in and out of making/not making 100% proficiency.)

And as we saw above, even some good schools will get branded as failures early on, and it won't be long now before these schools too (just like some schools that aren't so good) get reconstituted as part of NCLB's consequences-- to the detriment of their student learners.  What good does that do?

Although proficiency LEVELS should remain ONE of the foci of a revised NCLB, there are more reasonable and realistic approaches-- given the extremely large differences in student demographics across the State and country, and from urban to suburban to rural areas-- and given the powerful general effects of low-income status, mobility, and having other than English as a home language.  The NCLB needs to be revised to include much more emphasis on value-added/learning gains, mainly for continuously enrolled students.  (And, where achievement is high by 3rd grade, the emphasis should be on maintaining those high levels of performance over succeeding grades.)

And perhaps a more realistic NCLB goal would be 80-90% proficiency by 8th grade, counting only those students who have been continuously enrolled in a district for several years.  Even among students who are continuously enrolled, there will always be some who have major personal, psychological, social, medical or family problems, who lack motivation, who rebel against authority and the regimentation of schooling, who get too involved in their social life, their work life, or with the law, who aren't very intelligent but aren't classified as Special Education, etc.  Thus there will always be some students for whom achievement in one or more required subject areas will not match the State's schedule for attaining proficiency.  This country has long recognized this situation in its design of social/educational institutions adapted for life-long learning.

Jeff Koon, Ph.D.

Background on Jeff Koon:

  • Ph.D. in (Higher) Education.
  • Survey researcher for over 30 years.  Conducted surveys of students and/or parents in five St. Paul schools at all levels over the last 14 years.
  • First author (with Harry Murray) of an article in The Journal of Higher Education regarding the validity of college student evaluations of their teachers.
  • Occasional analysis of data for St. Paul Public Schools (mainly as a community service).
  • District-level committee participation includes:  PER (former State-mandated curriculum committee); Grad Standards; Standardized Test Selection (2); Social Promotion; Professional/Staff Development; Accountability Task Force; Gifted Services Advisory Council.

Jeff can be reached at 651-647-9199.