105 Peavey Rd, Ste 116
Chaska, MN 55318

What You Should Know About the Stanford 9

January 20, 2004

What You Should Know About the Stanford 9
by C.R. Lewis


The following treatise was composed over a two year period at two Washington, D.C. public charter schools and reflects a four year exposure mostly to the mathematics side of the Stanford 9 "achievement" battery. The Stanford 9 (often called the "SAT 9") is administered usually twice yearly across the grade levels, and is probably the closest thing this nation has to a recognized universal standardized measure of school performance through pupil competency, administered on a year-to-year basis.

This paper will discuss a wide range of serious flaws associated with the Stanford 9. The most salient defect has to do with the issue of its worthiness or unworthiness of being the gauge of educational peformance that the previous paragraph suggests it is becoming. One would think that any instrument general enough to measure academic success across the continuum of geographic locations and curricular priorities of this country would of necessity focus on basic skills, core knowledge, and fundamental comprehension.

On the contrary, at least with the quantitative side of the SAT 9, nothing could appear further from the truth. Throughout the levels, the battery is all over the map with disconnected, superficial tidbits. To teach math with an eye toward success on the Stanford 9 is, in our view, to guarantee deficiency in fundamental skills, lack of a foundation of knowledge in the discipline, a serious dearth of comprehension, and, through these failings, an extraordinarily low level of retention of the subject matter.

Ironically, the above may help explain the popularity of the Stanford 9, in that in so being it conforms to the currently popular slant in math education in this country. In fact, it seems to offer a convenient device for appearing to show at least marginal advances in achievement even as real math proficiency in the U.S. (as consistently indicated by the test batteries we have developed and implemented) plummets ever further into the abyss of last place among industrialized nations. If this is indeed the case, no phenomenon could be less welcome at this point in our history.

What You Should Know About the Stanford 9
by C. R. Lewis

When we opened IDEA Public Charter School in Northeast Washington in fall, 1998, one of the first things we did was devise pre-tests to measure the math acuity of entering students on the pre-algebra, algebra I, geometry, and algebra II levels, tests we supplemented with very comparable post-tests that would document the progress of these same students. These tests, which are available for public perusal, were reasonably challenging gauges of the things that have long been universally accepted as integral to the respective courses.

The results of the pre-tests over our three years of existence was what led us not to accept math credits from other schools, and instead insist that all math credits be contingent on the passing of our tests. Most of our students were veterans of the D.C. Public School system, but we also had many who had previously attended suburban schools, and, because of a relationship with Bolling Air Force Base, quite a few from various other parts of the country as well. The pre-test results of students from other parts were not, on the whole, all that much higher than those of purely local students. The results were as follows:

Incoming students who had already received credit in pre-algebra generally scored in single digits, percentage-wise, on our 125-question arithmetic test, which included sections on fractions, decimals, per cents, signed (positive/negative) quantities, powers and roots, and a few miscellaneous topics - the material someone beginning an algebra course should be expected to know. Those who had already received credit in algebra fared only a mite better on the arithmetic test - to the tune of average scores in the teens. Those with credits in geometry and/or algebra II averaged around 30% on this arithmetic exam. A passing score on the exam was 70% or higher.

Those with credits in algebra, geometry, or algebra II before arriving were tested in these subjects. Among those whose highest level credit was in algebra, the highest score ever recorded on our algebra test was 21%. The average score was in low single digits.

As for those whose highest credit was geometry or algebra II, the results were as follows: Their average score on our geometry test (a rudimentary instrument with only two easy proofs counting for 8 points) was, again, in the low single digits. As for the algebra II test, we never had a student score a single point.

Three typical anecdotes amplify this situation. One day a student halfway through a geometry course at a local parochial school attended classes with a friend enrolled at IDEA. To keep him busy we administered our arithmetic, algebra, and geometry tests to him. His scores were 5%, 4%, and 0 respectively.

An IDEA student who had just passed our pre-algebra final with a score of 72% took a day off to visit the DC Public School from which he had transferred to us. While there he took the school's algebra I final and scored 86%.

Another of our students, a behavior problem later expelled, failed our pre-algebra final and went to summer school at a DCPS school. The summer curriculum did not offer pre-algebra or even algebra, so he enrolled in an algebra II course, which he passed. He returned to us that fall and scored 5% on our arithmetic test.

In response to such shortfalls, we developed our own very comprehensive textbooks in basic arithmetic, pre-algebra, and algebra. Among those who have completed the courses based on these texts, the passing rates on our final exams (which were equivalent to the pre-tests) has been 100%. Many of our pupils achieved these results in well under a year. Typical scores on the pre-algebra final were in the 80's and 90's. It can be done.

But herein lies the problem. Since we were a D.C Public School, our success in teaching math was judged by our students' scores on the Stanford 9 exam. We knew from very early on that this represented a huge dilemma. Our pupils were coming in at an average math computation level of 2nd or 3rd grade and a math comprehension level even lower.

We would develop a text, curriculum, and standards that would prove capable of elevating any pupil willing to put forth an honest effort all the way to the beginning algebra I level in a year. That is an increase of several grade levels. But inspection of the freshman level Stanford 9 math test by which our pupils' progress would be judged showed that the exam included nothing below the algebra level.

In fact, among the various versions we inspected, only about 20% of the material has involved algebra I. The other 80% focuses on geometry, a touch of trig, and a large proportion of probability and statistics, a subject that has not traditionally been included at all in the standard high school math sequence.

Compounding all of this, the test is almost 100% word problems. Word problems measure reading comprehension at least as much as they measure math achievement. They are not math, but applied math, which is well and good, except how do you teach someone to apply something of which he or she has no knowledge in the first place? Word problem tests of this sort in any event tend to measure aptitude rather than achievement; our students' achievement, and through it our school's efficacy, was being evaluated via the results of what is principally an aptitude test.

What this has meant is that we raised loads of pupils from the early elementary level to the high school level in one year, and the sum total of their improvement did not register the slightest blip on the test by which our program is judged. What is worse, even their subsequent year of algebra could make only the slightest of impacts.

There is a flip side to the Stanford 9 that is even more vexxing. You see, while the questions on the exam begin at the algebra I level and move on to more advanced levels, within these levels the problems call for extremely superficial knowledge. For example, on the most recent version, there was not a single question requiring the slightest familiarity with signed quantities and how they interact.

Math teachers have long found that comprehension of signed quantities is the foremost pillar of success in algebra and beyond, and we went to great lengths to require that our pupils demonstrate mastery of the most complex situations involving them; yet this mastery did them no good on the Stanford 9. And the list of key, sophisticated algebra and geometry topics consistently ignored by the Stanford 9 is far greater than the list of those it addresses.

The test is fatally flawed from a logistical standpoint as well. It is bad enough that the SAT allows calculators. To allow them on a test administered at the grade levels covered by the Stanford 9 is, our view, inexcusable. (The vast majority of our pupils have entered having great problems performing the basic operations, probably owing to the presence of calulators in their early grade math exposure.)

But even this is not the worst of it. We consider it indefensible that the Stanford 9, a multiple choice test, deducts no points for wrong answers. As a result, Stanford 9 schools repeatedly pound the notion home to their pupils that every bubble must be filled in, whether the questions are read or not.

The results of this can be stunning. My son, who has severe ADHD (there is such a thing - though it is much rarer than authorities suggest), entered the second grade just about totally illiterate. There was no way he could have read any of the questions on the English portion of the fall Stanford 9, yet he dutifully filled in all the bubbles, probably without even opening the book. With no penalty for wrong answers, the percentages are with you if you do this, and, by blind luck, he was able to score in the top third of his class. As a result, he wasted a semester well over his head in the advanced reading group before the school finally relented and put him on his real level.

The worst part of this no-correction-for-guessing dilemma is that it causes us to teach the worst thing we can teach young minds - in a word, hypocrisy. On the one hand, we are telling pupils exactly what the Stanford 9 people tell us to tell pupils - that the test is to show us exactly what they know and what we still need to teach them; the breakdown on the post-test reports we receive supposedly facilitates this, with skill-by-skill itemization for each test taker. On the other hand, we tell them to make certain that they answer all questions, including the ones they do not have time to read.

The latter clearly negates the former. Statistically, it is most likely that the average Stanford 9 test taker is given credit for several correct answers to problems about which he or she actually has no clue. Since the report indicates that he or she has these skills, presumably they are not given further attention.

[I would like to suggest here that in my humble opinion any school that needs a Stanford 9 to tell it in which skills its individual pupils are strong and in which they are weak is not performing its principle function as a school. I, for instance, can at any given point tell you exactly which mathematics skills each of my pupils has and has not mastered, and I always could. But then again I pre-test thoroughly and, based on the results, teach sequentially.]

But again the worst of this is the message being sent to the children, the doubletalk aspect. In the one instance, we want to know specifically what they know and do not know, so that we may help them learn. Then we trump this by indicating that what really counts is how high a score they can achieve to make the school look good - by whatever means necessary.

In my later role as director of studies (at the World Public Charter School), I insist that my teachers instruct their pupils to answer only the questions whose answers they feel they know. The reports we will receive back will indicate accurately these pupils' strengths and weaknesses. For various reasons reasons outlined herein, these reports would have in fact been of little use to us in instructional planning; likewise my school's scores automatically went down several notches (although they must not have been too bad - the system, which closed us that year - highjacked our results and never delivered them). But none of this matters; it is the principle of the thing. And I believe that when you train young minds it is always the principle that matters.

Chances for cheating on the Stanford 9 are far from unavailable. I have proctored several exams and have yet to see a monitor. Even so, I am aware of one case where a monitor actually caught an elementary school teacher changing answers on students' tests; the school's principal somehow talked the monitor out of reporting this gross malfeasance, and the teacher was subsequently given a promotion. There is no telling to what degree this goes on throughout the city and elsewhere. I can imagine schools whose scores are limited only by the academic limitations of the incompetent teachers doing the cheating. I taught in the public schools for many years, and I never once saw any pronounced emphasis placed on honesty until I arrived at IDEA.

Such conjecture aside, we faced a major moral dilemma in our approach toward preparing our kids. Based on the extreme superficiality of the treatment of topics on the exam, I firmly believe that if we were to have spent all of our time teaching to the test - if we were to have taught things totally disjoint from the theoretical and intuitive foundations that our pupils so desperately lacked on arrival - in a year's time we could have brought almost all of our pupils at least up to the "basic" level on the exam.

"Basic," mind you, still means somewhat below grade level, but nonetheless it is a coveted milestone in our schools. I believe we could have achieved it in relatively short order. And it would have been of absolutely no benefit to our students. They would not have had the slightest idea what they were doing. Disconnected, as it would be, from anything they had have internalized, from the fundamental building blocks that make it all function, it would also have been ephemeral. It would be there for the test and then, "poof."

Such accomplishment would have been very much in the immediate interests of the school, but completely against those of the students. It never entered our minds to take anything but the high road. That meant three years before any noticeable results could be detectable on the Stanford 9, but by then our students should have become masters of the material on a profound level, capable of scoring in the "proficient" and "advanced" Stanford 9 ranges. That they were already far and away outperforming - on subject exams - pupils (from other schools) supposedly three and four levels more advanced gave us every confidence that this would occur, although a badly flawed measure like the Stanford 9 will never do such accomplishments full justice.

As a matter of personal choice, I refused to provide any Stanford 9 prep to my pre-algebra students, even though they all but begged me to do so. My response was that they would get the necessary preparation once they advanced to the level where they could fully make sense of it, and pre-algebra affords no such opportunities. Even in the algebra course, the only Stanford 9 prep-type materials I provide are those intrinsic to the course, and most of these do not appear until my book's second to last section. My guiding principle is that every decision in terms of course materials must be based on the best means toward the intellectual growth of our students. Presenting things out of sequence - "throwing stuff" at the pupils and asking them to take my word that it is true, rather than making them see that it must be true, cheats them out of an education.

There really should not be any such thing as "Stanford 9 prep" anyway. The test is supposedly designed to measure what students learn in their normal math curriculum, and we had felt we had designed a curriculum that is second to none (one I have since, at the World School, supplemented to reach down to the early elementary level). Unfortunately, the test does not measure what it ought to measure.

Yes, I could have forsaken the fundamentals and tried to bolster the school's image attempting to piece together a curriculum just to boost scores on the test. But IDEA and World were charter schools, and pupils do not need charter schools if what they want is to be force fed matter that they do not and cannot understand just so that school test scores can be padded; they can avail themselves of such instruction at any regular public school, where it is not only the norm, it is mandated. These practices, notwithstanding any possible cheating, account for the system's modest gains over the past few years; judging by our pre-tests, there have been no gains whatsoever in real math comprehension, meaning that the published Stanford 9 improvements only mask the deplorable truth.

True, a City Paper article published during my IDEA tenure listed IDEA as one of the two charter high schools with improved Stanford 9 scores in both English and math. I would love to bask in the credit for that improvement, but in light of what I revealed above, that would be patent hypocrisy. Unless our emphasis on thinking things through had highly unanticipated transfer quality, what we have done up to now should not have impacted the scores. I view our improved scores as just another fluke of a grotesquely flukey test.

True, we elevated our pupils several grade levels on the average. True, at the end of a year of arithmetic they were vastly outperforming kids from all over the country who had had three and four courses supposedly above that level. Untrue any of this has any relevance to the Stanford 9. And that is a sad commentary on the Stanford 9.

We at IDEA were thoroughly committed to the concept of criterion based grading. No one passed our math courses without passing their rigorous final exams. Our pupils knew about these high expectations from the start and, in spite of huge initial resistance, they consistently measured up. In my classes, what you got on the final exam was your grade for the course. That may seem extreme to some, but I saw it as a mark of my confidence in my youngsters and the text and techniques with which I empowered them.

But if tests are to be the yardstick, they must be valid tests. And the concept of validity is at once crucial and problematic. I would like a voice in the construction of the next generation of national standardized tests. Here is what I would like them to embody:

They should test the complete array of arithmetic computational and transformational skills, all of the topics covered on IDEA's and World's arithmetic exams; a pupil with no idea how to subtract or divide mixed fractions or simplify complex expressions containing signed quantities should not be able to out-score one that is a master of these skills. The most difficult aspects of each given skill should be tested; spoon fed problems that do not involve the more tricky nuances do not test true comprehension.

At least until such time as our schools have corrected and remediated the deficiencies in math instruction that have left us last in the industrialized world in the discipline, our tests should be appropriate to the entry level of the individual students; those that cannot perform simple arithmetic in the fall should be tested on arithmetic in the spring, while those ready for more advanced evaluation should be given more advanced tests.

The tests should focus on meat and potatoes topics of the traditional curriculum, and within these topics test for deep comprehension.

The tests should be course-based. Arithmetic, algebra, geometry, and pre-calculus should be dealt with on separate exams. Integrated math as advocated by the powers that be has been a disaster so far; our pupils who arrive having taken it are generally among our biggest underachievers. There is good reason; if you want to design an alphabet soup that leaves students at a complete loss as to connections - either to common sense and intuition or to previously mastered math - you cannot miss with integrated math. Nonetheless, for schools that insist on adhering to it, there should be separate tests for the various levels of integrated math as well.

Emphasis should be on achievement, not aptitude. Surely we want to know who our more naturally gifted pupils are. But its pupils' natural gifts have nothing to do with what a school does for these pupils. Schools should be graded on how they develop such skills, and achievement tests are what measure this. Let the SAT and IQ tests measure aptitude.

The tests should involve mathematics more than applied mathematics. The bulk of such tests, at least until our country's schools have risen to the level where the fundamentals themselves are a given, should consist mainly of straightforwardly presented math problems, not paragraph-style word problems. There is no reason that regular math problems cannot be sophisticated, profound, probing, nuanced mettle testers; if we want to know how well our students know math, we need to ask them math questions.

The tests should be administered under conditions of air-tight security.

The tests should take pains to measure topics across the curriculum of whatever given course. Schools should be given lists of the tested topics (which should include examples from the full range of the given course), but no specific prep materials. Ideally, perhaps different school districts should be able to choose the topics to be tested on a given exam, a kind of mix and match option. But the key is that there should be no paradigm that allows for teaching to the test. As much as possible, the tests should measure how well day to day math instruction imparts the desired knowledge, skills, and comprehension - not how well a school prepares its students to take a specific test.

The tests should be of appropriate length to measure subject comprehension in depth. The standard 40 minutes (and corresponding paucity of questions) that the Stanford 9 gives for testing skills across a 3 or 4 course breadth is ludicrously inadequate.

Tests should be given not in April but at the end of the school year.


My experiences in testing pupils throughout the summer of 2001 and beyond at the World Public Charter School, where I had become Director of Studies, only hardened my resolve against the Stanford 9. One particular instance exemplifies the results:

The mother of a young man who had just passed the fourth grade in a D.C. Public School showed me with pride his astronomical Stanford 9 scores in math. As was customary in the case of pupils at that grade level, I administered to him the final exam for 4th graders I had developed for the school.

On a timed portion of fifty "basic fact" (single digit addition and multiplication and their inverses in subtraction and division) problems, this youngster was able to answer correctly only eighteen. This meant that it took him an average of ten seconds to do each of these problems, whereas we feel they should not require much more than two seconds apiece, if that. [By contrast, a girl of the same age who had recently finished the same grade in China (and who had arrived in this country only the day before taking the test) recorded a score of forty-eight out of fifty on the same timed test.]

The young man's performance on the rest of the test illustrated both the effects of his non-memorization of basic facts and his general non-mastery of other mathematical procedures relevant to his grade level. In the case of his "tables" indadequacies, his having to think his way through such simple examples caused him to take about three hours to complete the remaining seventy-five question portion of the test, even though he left almost half of these questions unanswered.

As to his lack of general knowledge, among other shortfalls:

  • he was able to give only one answer correct among twenty-five problems involving the skill of converting between numbers written in digits and the same numbers written in words,
  • he was unable to perform either multiplication or division where the quantities involved contained more than single digits,
  • he was able to successfully complete only a minority of the test's multi-digit addition and subtraction problems,
  • he exhibited, in attempting to perform such operations as those cited above, extremely deficient number sense, in writing answers such as "0002" (zero thousand two) instead of simply "2."

    To be sure, the young man was exceedingly bright. His Stanford 9 results (the exam being an aptitude test) demonstrated this clearly. He is highly capable of learning. However, our test showed that he had not been schooled in even the most basic of mathematical concepts proper to his pronounced grade level. Any test on which a pupil (bright though he be) with this young man's gaps in comprehension and competence could score in the highest ranges (as he did) cannot possibly be a valid measure of his school's success in educating him.

    And this young man's case was not the most extreme. A young lady who had completed the third grade, where she had attained the highest possible score ("advanced") across the board on the math portion of the Stanford 9 took a version of our third grade end-of-course mathematics exam the following September. Her score was 0 (zero). She could neither multiply nor divide in problems where either or both of the quantities had more than one digit. She could neither multiply by, nor divide into, 0. She could not multiply an integer by 100. Nor could she answer any of a group of extremely rudimentary pre-geometry questions, compute the area of a rectangle, or square a single digit integer. Whatever it is the Stanford 9 tests for does not require knowledge or competency in the most fundamental areas of a given level.

    In any event, an exam that encourages, almost obliges, schools to teach topics out of sequence, disconnected, and superficially - with little chance for real comprehension or assimilation and even less chance for retention - has the potential to do great harm to a system that is already in peril.


    EdWatch is entirely user-supported. The continuation of our research and distribution work is entirely dependent on individual contributors. If you want to assure that our work continues, Link to -- www.edwatch.org

    Please e-mail us to subscribe to this EdWatch e-mail service.

    (c) EdWatch - All rights reserved.