ArchivesSite MapSubmitOur GangHot Sites
1983-2015
tearing the rag off the bush again
The Florida Test PDF E-mail

In the fall of that year I'd moved to Minneapolis and attempted to continue my graphics contracting business. Work was slow coming in as I made the rounds of print shops, ad agencies, and presses. By March the temp ads in the newspapers had started to look more appealing. The one that caught my eye offered reasonable pay (for a temp job) and the assignment was limited—I could be finished in time to take up a graphics project that was promised for May. The temp job ad stressed that applicants needed to bring proof of college graduation to the job interview. I'd never been asked to show proof of college graduation at any job interview, and as it turned out, the requirement was no more stringently applied than any other job I'd applied for. The work was grading standardized tests for the state of Florida. The grader, a company in the Midwest, would be hiring several hundred diploma-bearing college graduates to grade the short essay responses of 150,000 or so of Florida’s tenth-graders. I hadn’t thought much about standardized testing since high school. I’d taken the various tests necessary to apply for college, and if I hadn’t taken those tests I would never have qualified for this temp job—so it all works out in the end after all.

From what I’ve learned about college admissions, these tests are mainly an expensive decision-avoidance mechanism for school admissions administrators and are, consequently, big business. Testing is a cornerstone in the government’s response to perceived problems in American education. The students aren’t learning? We’ll fix that: we’ll test them. States across the country are using, or looking into using, standardized testing as a way to save education. Some of these states claim the tests are only diagnostic, others will use the results to apportion tax money and students to schools. Some states will use student test results to decide if the teacher deserves a raise.

I was interviewed at the company’s testing center, the first floor of a local technical college set between a car dealership and a strip mall. We were tested for employability in batches of five and six. I arrived a few minutes early and was told to go ahead and join the group already underway in the seminar room. One of the office administrators gave a talk about the nature of the work we were to do. Her presentation about that work was rather vague, but she spoke very precisely about the length of our lunch breaks. She had us take a multiple-choice test on grammar and spelling and write a short essay. For the essay, we were asked to define the word ‘Team’ and illustrate what makes a good Team.

Apparently the Team ideology that swept through the American workplace in the 90s started out almost as a workplace democracy movement: workers and management coming together in a utopian dream of equality and rational division of labor. In any workplace where I’ve seen the theory applied, however, it’s always worked out the same way: an attempt (cynical or not) to get more work out of fewer workers for less pay. Team-building swept through American workplaces with all the staying power and effectiveness of a bullet point on a middle-manager’s resume. It may be only coincidental that Team-building became so popular with management during the same era when so many team members were getting laid-off. Time Quality Management (TQM) became the velvet glove covering the hand delivering the pink slip.

I ended up writing an essay comparing the Team to the human body, pointing out that the body had to work together, but that the functions of different body parts were not always apparent and sometimes only revealed their usefulness with the fullness of time. People who might appear lazy or stupid in their youth, such as Edison, or Einstein, may prove the most useful members of a Team, or society, if given time to fail.

Our essays were graded and while the administrator told me my scores were good, they had a surfeit of graders. They would let me know.

Two weeks later they let me know: a number of graders had discovered previous commitments, so I could have the job if I still wanted it. We had two days of training. The company held the training on an old farmstead. There were 300 potential test-scorers herded into a converted barn. The first order of business was to have us all sign a confidentiality statement. We were forbidden to speak to anyone, even each other, about the test outside of the barn. If we were contacted by the media (god forbid) we were to say nothing, but let the corporate reps know about any pesky reporters and the company would handle it.

Testing supervisors hovered around the room. We were told that these supervisors had once been lowly graders like us, but that through their diligence, they had risen to the rank of supervisor. The supervisors handed out blue binders of sample tests. These were both study-guides and reference books for the next three weeks of scoring.

While the supervisors handed out our binders, the vice-president for testing management took the podium. She explained that while we were scoring the tests for the state of Florida, the ‘People from Florida’ would be visiting and there were to be no jokes about that state’s notorious difficulties with counting ballots. A ripple of laughter went through the room, but the vice-president for testing did not laugh.

We graders sat at our rows of folding tables listening to Bobby the trainer—a PhD candidate in philosophy from a Midwestern university—describe the test and give us clues as to how we were to grade. Bobby was a good speaker, patient with questions and he loved to take the philosophical view of answering questions, which is to say every explanation tended to produce more questions.

We were to grade the students on three short answers. They could earn a 0, 1, or 2. A 0 score meant the student had missed the boat completely—an ‘F’ in other words; a 2 meant the student demonstrated superior understanding of the reading (an ‘A’); while a score of 1 covered everything else, from a good, but not great answer to the lucky guess (‘D’ to ‘B+’).

The tests, we were told, were to be graded “holistically.” What this meant in practice was that, although we were told that we were to evaluate the tests “overall” to see if the students gave evidence of reading comprehension, the students had to demonstrate that comprehension by the use of particular phrases. What those phrases might be, however, we could not be told—that would not be holistic. Instead, we looked at the examples in the book and Bobby pulled out key words here and there that would count as signifying the student’s understanding. The graders immediately began building cases where a student might use key words and phrases, yet the word “comprehension” would in no way apply. My fellow graders also formulated hypothetical student answers that might reveal comprehension, but wouldn’t use the phrases Florida thought significant. It did not matter. Since, according to the holistic ideal, there was no one single answer to a particular question—yet the answers had to parrot (or approximate) key phrases—we spent hours going over the minute shadings of words and how they might in one test mean a 2, but in another, a 0. The temperature began to rise in the room as it became clear that several folks in the room appeared to be moonlighting lawyers. With some 300 test-scorers all trying to define what ‘holistic’ meant for them, the morning proceeded step by step: “if the student writes ‘green,’ it’s incorrect; but what if they use ‘teal?’ How about ‘aqua?’” By the second morning the game of catching out the contradictions in the test was proceeding at a rollicking pace. We’d backed our Ph.D. candidate into a couple of tautologies and cut the throat of Florida’s pretence to objectivity.

Around about mid-morning the Vice-President for Testing marched again to the podium and in a voice shaking with anger, said, “we’re not going to argue today over why things are scored the way they are. Make the adjustment according to what Florida wants and be happy with that. This isn’t Burger King - you don’t get it your way. Frankly, the client [Florida] is not interested in your opinion. You are required to conform your understanding to the test and do your job. If you don’t like it, there’s the door.”

As she left the podium, her face set in stone, about a third of my fellow test-scorers broke into applause.

All the students had to do, really, was copy one or two phrases from the readings into the answer box. I didn’t see how this demonstrated comprehension of the material, but, according to Florida, that was my opinion. And we all have to live with what Florida has decided. I considered the proffered door, but I was curious to see how this business would shake out in practice, plus I needed rent money.

Bobby finished the training with a relatively docile group of test-scorers and then went back to his Midwestern university where they’re working out the problems of teal and aqua. We were counted off into groups of ten-person ‘pods’ and moved back to the company headquarters building at the technical college in the strip mall. Knots of students—more or less successful former 10th graders all of them—huddled about the doors in the cold, smoking. It -hadn’t snowed in Minnesota all winter and then, mid-March the skies opened and it didn’t stop until May.

We worked at computer terminals. There were some 150,000 tests to evaluate. These tests had all been scanned into a computer and cataloged; hundreds of test graders had been interviewed and trained. This was a prodigious achievement, truly awesome for a state with difficulties putting together an effective ballot-counting system. It helps to remember standardized testing is a flexing of political muscle. In the 2000 election people talked about a recount of the state ballot as if it was tantamount to the Apollo moon-shot. My fellow graders and I could have polished it off in less than a week. Hanging chads? No problem, we had Bobby and the Vice-President of Testing Management.

Each grader was assigned a workstation and after logging onto the system (with an assigned password so that one’s grading could be tracked), a small square would appear on one’s screen, more or less filled with a 10th-grader’s writing. The grader would read the student’s answer and grade it as quickly as possible, clicking on the 0, 1, or 2 buttons. The screen would prompt: ‘Are you sure?’ And after that split second re-consideration, the grader would consign the student to his or her future circle in Hell. Once the grader confirmed a decision, it was not possible to go back and change the grade (though I did try on more than one occasion). The student’s scrawling script would disappear and the next scanned image would appear on the screen.

The students were only required to write a sentence or three. I don’t know what time constraints they labored under-if they had five minutes or twenty to answer. The students had a space of eight lines about three inches in width. Occasionally, some especially verbose kid would use up all eight lines, fill the margins of the box and the answer would trail off—outside the scan zone. The handwriting was terrible, of course. Cursive writing is on its way out. The future of human penmanship will be bad approximations of computer printing. I saw Caslon, Tiepolo, and Franklin Gothic. Some students wrote in tiny, tiny script. Their letters matched their test-taking confidence, although the writers whose hand-writing ballooned out to fill every millimeter of space were no easier to read. Jackson Pollack is alive and well and living in Florida as a 15 year-old girl.

After the first hour of grading one’s eyes begin to glaze. As with most temp jobs (even those requiring college diplomas) the main obstacle of the day is boredom. The children of Florida could be a little more entertaining. Nevertheless, I believe the members of my workgroup graded conscientiously. No one that I knew of scored all the tests ‘1’ (though that may have worked out statistically). Everyone seemed to make a good faith effort to gauge the students’ understanding and give the kids the benefit of the doubt - even though they had to live in Florida.

I was disappointed in the serious lack of bad behavior in the children. If enough of them had simply put an ‘x’ through their tests it would have skewed the scores and the tests would have been meaningless (though they’re not exactly meaningful when the children play along either). Of course, we adults were given the opportunity to walk out on the exercise as well by the Vice-President of Testing, but not one of us stood up. The heroic child who wrote, “Screw this test. Why should I finish this? Do I get a piece of chocolate?” (besides writing three rather cogent sentences) evinced a clear comprehension of standardized testing. He (or she) did not, however, use one of the key phrases Florida required. And, as every American office worker knows (even the temps), the chocolate is kept up front, near the receptionist’s desk. It’s not very good, and it has nothing to do with performance, but it’s the oil of office survival.

Unfortunately, most kids tried to enter into the twisted spirit of the questions. Even those who were utterly clueless, and had clearly not read the short articles upon which the questions were based, would attempt a guess. There were hopeless phrases, or runs at an idea: ‘the surfer liked to surf because...’ fading off into silence. Where did they go? Would they have done better with more chocolate? Most of the poor scorers seemed to suffer this sort of aphasia. They seemed to want to answer, but were incapable of answering. Is this the same as not understanding?

We had three separate sets of student responses to grade. In the first short-essay question, the students had been asked to read a short article about the rhythms and cycles in the lives of three different people: a surfer, a farmer and a composer. The students then had to name and compare the way the people in the article adapted to their cycles. Notice there are two parts to this question: the student has to identify a cycle that the surfer, farmer or composer used; and then the student has to point out how the person adapted to that cycle. The student had to write, for example, that the surfer anticipated the cycles of ocean tides and seasonal weather in order to take advantage of the best surfing conditions, just as the dairy farmer learned to time the calving of his cows so that he had milk all year round. (Is anticipating a wave cycle an adaptation? Go ask Florida.)

Nearly all of the students were unable to name two cycles, much less compare them or offer an adaptation. After four days of scoring the word came down from the test administrators that the adaptation aspect of the first question was dropped.

A few days after that, the bar was lowered again (apparently the curve on the test was still not right). The requirement to identify two cycles used by two of the people was dropped and the student could score a ‘2’ if they managed to name one cycle for two people mentioned in the article. The fate of those students unlucky enough to have been scored in the first six or seven days of grading seemed harsh, but when this was brought to the administrator’s attention we were told, “that’s none of your business. Your business is to sit on your butts and score tests.” I should note that no graders had been scoring while standing up, nor did we stand to demand explanations.

It wasn’t the students’ inability to meet that ever-lowering bar that intrigued me, however, as much as their odd misreading of simple information. For instance, most of the students had difficulty with the sex of the surfer. The surfer underwent a female to male sex-change in most student responses. I suppose a woman on a surfboard is so unheard of that the 10th graders of Florida may be forgiven for not comprehending this behavior. But if the student writes consistently about the surfer using the wrong pronoun does that signify poor comprehension or is the student instead evincing a clear understanding that this person, described as an ‘environmental engineer living in San Francisco’s Marina district’ was not a real being at all, but a sort of trust-funded ephemera generated by the gaseous fumes of the Marina’s notorious land-fill yuppie-trap? If the students named two cycles of the surfer, with adaptations, but got her sex wrong, did they still receive credit?

Yep. Name, sex, species of the farmer, surfer and composer were irrelevant, so long as the student mentioned a cycle that the article named. I suppose this was holistic.



The second question that the students answered (more or less) dealt with a short article about avian anatomy. The article was several pages long, but the students only had to copy a sentence or two from one short section of the piece. Word for word copying from the article was not plagiarism, we were told, but a correct answer. This section of the test was the most quantitative, and the easiest to score. Did they copy one of the correct sentences? Yes: Score ‘1’. Did they copy both sentences? Yes: Score ‘2’.

Minnesota talk radio was full of programs that month about standardized tests. A referendum was coming up on the issue. Texas and Florida are both big on these tests. In Texas, individual teachers’ pay raises are tied to how well their students perform on their tests. Not surprisingly, teachers in Texas are ‘teaching to the test,’ spending large amounts of classroom time training their students how to copy a sentence out of an article. 



The final question asked the students to summarize a short article from Dian Fossey’s book Gorillas in the Mist. The article described in warm, fuzzy tones how Fossey nursed two baby gorillas back to health after they had been captured and ill-treated by poachers. Fossey “mothers” them, giving the gorillas names: “Coco” and “Pucker Puss.” She describes their journey toward learning to trust her as they regain their health. When they’re strong enough, the poachers return and the two gorillas are crated up and sent off to a zoo in Germany.

I looked up the episode in her book, and in context, Fossey makes clear that her cooperation with the poachers was under duress. She does everything she can to keep the gorillas in Africa. In the excerpt the students read for the test, however, Fossey’s behavior seems more equivocal. The article rushes over the business of shipping the gorillas to Europe and instead plays up how much the baby gorillas taught Fossey. They were instrumental in her field research in gorilla vocalization.

The students were asked to describe how the two baby gorillas changed as a result of their contact with Fossey. The correct answer, according to Florida, is that they “learned to trust humans.” The article, however, clearly suggests that the trust is misplaced, or at least that it wasn’t enough to save the gorillas from their fate. In addition, the text asked 15 year-olds to recognize the valuable research the mother-figure obtained by selling her babies to the poachers.

Cognitive dissonance ensued. In test after test the students would recognize that the gorillas learned to ‘trust’ humans, but the implicit betrayal was misread and somehow the students (bless ‘em) would have Coco and Pucker Puss back in the jungle at the end of the story. “The gorillas learned to trust Dian and he (sic) set them free,” was a typical response. This would get a score of ‘1.’

Those clear-eyed students who wrote things such as, “it didn’t matter how they changed, they still ended up in a cage,” may have comprehended the story better, but they would have earned a score of 0. Most kids set Coco free, however, or they wrote a half-stab at an answer, “Coco and Pucker Puss change because...” and then I imagine their thoughts drifted off to, say, Judas laying a kiss on Christ and their little pencils froze in mid-thought until the buzzer rang.

Hollywood would have had the gorillas escape back to their own families. It’s the Spielberg-Schindler’s List effect: the camera focuses on the half dozen that survive; the thousand that are carted off exit the frame. They are without interest. Maybe those who are absent (off-camera)—those who don’t survive Occam’s Razor of evolution—change us as much as those who are still here.  The best response I read in the three weeks of reading was: “The gorillas learn to feel not fear, but sadness toward some humans.” The child who wrote this would earn a ‘0.’

We finished the 150,000 tests under budget and in record time, so for our reward, the company bought us all pizza on the last day before sending us home early. Released out into the early spring day, I took the bus home. The snow was finally melting from the streets. As the bus wound its way into Minneapolis, something went wrong with the reverse gear. There was some sort of malfunction and the bus’s backing-up beeper began to play and from the frustrated shouting of the bus driver it was not going to stop. The ear-splitting beeping accompanied us as we drove along the burnt-out streets of North Minneapolis—the continuous beep-beep-beep of the bus worked like tiny woodscrews in the brain.

At a neighborhood stop, across from the boarded up Value Rite and the house with its satellite dish nailed to the doghouse roof, three young black kids get on. Fifteen, sixteen years-old? They should be in school. Someone will demand they answer questions about Coco and Pucker Puss and then where will they be? The young men are somber, quiet as they roll down the aisle beep-beep-beep sitting down behind me in the sideways-facing seats at the back of the beep-beep-beep bus. The bus starts up again and I’m wearing my fingers in my ears, though the frequency is so high that stopping one’s ears doesn’t stop the noise. (We’re in the bus. It’s like being in the trumpet.) Then I hear this other noise from behind me. One of the young men has started singing a rhythm. He’s singing the sound of drums. He’s quite good - syncopating, anticipating the beep-beep-beep rhythm of the bus. He weaves an odd music out of that noise. I look out the window, listening.

###
 
< Prev   Next >