I always get conflicted about reading an isolated study. I *know *I’m going to read it poorly. There will be lots of terms I don’t know; I won’t get the context of the results. I’m assured of misreading.

On the other side of the ledger, though, is curiosity, and the fun that comes from trying to puzzle these sort of things out. (The other carrot is insight. You never know when insight will hit.)

So, when I saw Heidi talk about this piece on twitter, I thought it would be fun to give it a closer read. It’s mathematically interesting, and much of it is obscure to me. Turns out that the piece is openly available, so you can play along at home. So, let’s take a closer look.

**I. **

The stakes of this study are both high and crushingly low. Back in 2014 when this was published, the paper caught some press that picked up on its ‘Math Wars’ angle. For example, you have NPR‘s summary of the research:

Math teachers will often try to get creative with their lesson plans if their students are struggling to grasp concepts. But in “Which Instructional Practices Most Help First-Grade Students With and Without Mathematics Difficulties?” the researchers found that plain, old-fashioned practice and drills — directed by the teacher — were far more effective than “creative” methods such as music, math toys and student-directed learning.

Pushes all your teachery buttons, right?

But if the stakes *seem *high, the paper is also easy to disbelieve, if you don’t like the results.

Evidence about teaching comes in a lot of different forms. Sometimes, it comes from an experiment; *y’all (randomly chosen people) try doing this, everyone else do that, and we see what happens*. Other times we skip the ‘random’ part and find reasonable groups to compare (a ‘quasi-experiment‘). Still other times we don’t try for statistically valid comparisons between groups, and instead a team of researchers will look very, very closely at teaching in a methodologically rich and cautious way.

And sometimes we take a big pile of data and poke at it with a stick. That’s what the authors of this study set out to do.

I don’t mean to be dismissive of the paper. I’m writing about it because I think it’s worth writing about. But I also know that lots of us in education use research as a bludgeon. This leads to educators reading research with two questions in mind: *(a) Can I bludgeon someone with this research? (b) How can I avoid getting bludgeoned by this research?*

That’s why I’m taking pains to lower the stakes. This paper isn’t a crisis or a boon for anyone. It’s just the story of how a bunch of people analyzed a bunch of interesting data.

Freed of the responsibility of figuring out if this study threatens us or not, let’s muck around and see what we find.

**II. **

The researchers lead off with a nifty bit of statistical work called *factor analysis*. It’s an analytical move that, as I read more about, I find both supremely cool and metaphysically questionable.

You might have heard of socioeconomic status. Socioeconomic status is supposed to explain a lot about the world we live in. But what *is *socioeconomic status?

You can’t directly measure someone’s socioeconomic status. It’s a *latent variable*, one responsible for a myriad other observable variables, such as parental income, occupational prestige, the number of books you lying around your parents’ house, and so on.

None of these observables, on their own, can explain much of the variance in student academic performance. If your parents have a lot of books at home, that’s just it: your parents have a lot of books. That doesn’t make you a measurably better student.

Here’s the way factor analysis works, in short. You get a long list of responses to a number of questions, or a long list of measurements. I don’t know, maybe there are 100 variables you’re looking at. And you wonder (or program a computer to wonder) whether these can be explained by some smaller set of latent variables. You see if some of your 100 variables tend to vary as a group, e.g. when income goes up by a bit, does educational attainment tend to rise too? You do this for all your variables, and hopefully you’re able to identify just a few latent variables that stand behind your big list. This makes the rest of your analysis a lot easier; much better to compare 3 variables than 100.

That’s what we do for socioeconomic status. That’s also what the authors of this paper do for instructional techniques teachers use with First Graders..

I’m new to all this, so please let me know if I’m messing any of this up, but it sure seems to me tough to figure out what exactly these latent variables are. One possibility is that all the little things that vary together — the parental income, the educational attainment, etc. — all contribute to academic outcomes, but just a little bit. Any one of them would be statistically irrelevant, but together, they have *oomph*.

This would be fine, I guess, but then why bother grouping them into some other latent variable? Wouldn’t we be better off saying that a bunch of little things can add up to something significant?

The other possibility is that socioeconomic status is some real, other thing, and all those other measurable variables are just pointing to this big, actual cause of academic success. What this ‘other thing’ actually is, though, remains up in the air.

(In searching for other people who worried about this, I came across a piece from *History and Philosophy of Psychology Bulletin* called ‘Four Queries About Factor Reality.’ Leading line: ‘When I first learned about factor analysis, there were four methodological questions that troubled me. They still do.’)

So, that’s the first piece of statistical wizardry in this paper. Keep reading: there’s more!

**III.**

Back to First Graders. The authors of this paper didn’t collect this data; the Department of Education, through the National Center for Education Statistics, ran the survey.

The NCES study was immense. It’s longitudinal, so we’re following the same group of students over many years. I don’t really know the details, but they’re aiming for a nationally representative sample of participants in the study. We’re talking over ten-thousand students; their parents; thousands of teachers; they measured kids’ *height*, for crying out loud. It’s an awe-inspiring dataset, or at least it seems that way to me.

As part of the survey, they ask First Grade teachers to answer questions about their math teaching. First, 19 instructional activities…

…and then, 29 mathematical skills.

Now, we can start seeing the outlines of a research plan. Teachers tell you how they teach; we have info about how well these kids performed in math in Kindergarten and in First Grade; let’s find out how the teaching impacts the learning.

Sounds, good, except HOLY COW look at all these variables. 19 instructional techniques and 29 skills. That’s a lot of items.

I think you know what’s coming next…

FACTOR ANALYSIS, BABY!

So we do this factor analysis (beep bop boop boop) and it turns out that, yes, indeed some of the variables vary together, suggesting that there are some latent, unmeasured factors that we can study instead of all 48 of these items.

Some good news: the instructional techniques only got grouped with other instructional techniques, and skills got groups with skills. (It would be a bit weird if teachers who teach math through music focused more on place value, or something.)

I’m more interested in the instructional factors, so I’ll focus on the way these 19 instructional techniques got analytically grouped:

The factor loadings, as far as I understand, can be interpreted as correlation coefficients, i.e. higher means a tighter fit with the latent variable. (I don’t yet understand Cronbach’s Alpha or what it signifies. For me, that’ll have to wait.)

Some of these loadings seem pretty impressive. If a teacher says they frequently give worksheets, yeah, it sure seems like they also talk about frequently running routine drills. Ditto with ‘movement to learn math’ and ‘music to learn math.’

But here’s something I find interesting about all this. The factor analysis tells you what responses to this survey tended to vary together, and it helps you identify four groups of covarying instructional techniques. But — and this is the part I find so important — the RESEARCHERS DECIDE WHAT TO CALL THEM.

The first group of instructional techniques all focus on practicing solving problems: students practice on worksheets, or from textbooks, or drill, or do math on a chalkboard. The researchers name this latent variable ‘teacher-directed instruction.’

The second group of covarying techniques are: mixed ability group work, work on a problem with several solutions, solving a real life math problem, explaining stuff, and running peer tutoring activities. The researchers name this latent variable ‘student-centered instruction.’

I want to ask the same questions that I asked about socioeconomic status above. What *is* student-centered instruction? Is it just a little bit of group work, a little bit of real life math and peer tutoring, all mushed up and bundled together for convenience’s sake? Or is it some other thing, some *style *of instruction that these measurable variables are pointing us towards?

The researchers take pains to argue that it’s the latter. Student-centered activities, they say, *‘provide students with opportunities to be actively involved in the process of generating mathematical knowledge.’ *That’s what they’re identifying with all these measurable things.

I’m unconvinced, though. We’re supposed to believe that these six techniques, though they vary together, are really a coherent style of teaching, in disguise. But there seems to me a gap between the techniques that teachers reported on and the style of teaching they describe as ‘student-centered.’ How do we know that these markers are indicators of that style?

Which leads me to think that they’re just six techniques that teachers often happen to use together. They go together, but I’m not sure the techniques stand for much more than what they are.

Eventually — I promise, we’re getting there — the researchers are going to find that teachers who emphasize the first set of activities help their weakest students more than teachers emphasizing the second set. And, eventually, NPR is going to pick up this study and run with it.

If the researchers decide to call the first group ‘individual math practice’ and the second ‘group work and problem solving’ then the headline news is “WEAKEST STUDENTS BENEFIT FROM INDIVIDUAL PRACTICE.” Instead, the researchers went for ‘teacher-directed’ and ‘student-centered’ and the headlines were “TEACHERS CODDLING CHILDREN; RUINING FUTURE.”

I’m not saying it’s the wrong choice. I’m saying it’s a choice.

**IV. **

Let’s skip to the end. Teacher-directed activities helped the weakest math students (MD = math difficulties) more than student-centered activities.

The researchers note that the effect sizes are small. Actually, they seem a bit embarrassed by this and argue that their results are conservative, and the real gains of teacher-directed instruction might be higher. Whatever. (Freddie deBoer reminds us that effect sizes in education tend to be modest, anyway. We can do less than we think we can.)

Also ineffective for learning to solve math problems: movement and music, calculating the answers instead of figuring them out, and ‘manipulatives.’ (The researchers call all of these ‘student-centered.’)

There’s one bit of cheating in the discussion, I think. The researchers found another interesting thing from the teacher survey data. When a teacher has a lot of students with math difficulty in a class, they are more likely to do activities involving calculators and with movement/music then they otherwise might be:

You might recall that these activities aren’t particularly effective math practice, and so they don’t lead to kids getting much better at solving problems.

By the time you get to the discussion of the results, though, here’s what they’re calling this: “the **increasing reliance on non-teacher-directed instruction** by first grade teachers when their classes include higher percentages of students with MD.”

Naming, man.

This got picked up by headlines, but I think the thing to check out is that the ‘student-directed’ category did not correlate with percentage of struggling math students in a class. That doesn’t sound to me like non-teacher-directed techniques get relied on when teachers have more weak math students in their classes.

The headline news for this study was “TEACHERS RELY ON INEFFECTIVE METHODS WHEN THE GOING GETS ROUGH.” But the headline probably should have been “KIDS DON’T LEARN TO ADD FROM USING CALCULATORS OR SINGING.”

**V. **

Otherwise, though, I believe the results of this study pretty unambiguously.

Some people on Twitter worried about using a test with young children, but that doesn’t bother me so much. There are a lot of things that a well-designed test can’t measure that I care about, but it certainly measures some of the things I care about.

Big studies like this are not going to be subtle. You’re not going to get a picture into the most effective classrooms for struggling students. You’re not going to get details about what, precisely, it is that is ineffective about ineffective teaching. We’re not going to get nuance.

Then again, it’s not like education is a particularly nuanced place. There are plenty of people out there who take the stage to provide ridiculously simple slogans, and I think it’s helpful to take the slogans at their word.

Meaning: to the extent that your slogan is ‘fewer worksheets, more group work!’, that slogan is not supported by this evidence. Ditto with ‘less drill, more real life math!’

(I don’t have links to people providing these slogans, but that’s partly because scrolling through conference hashtags gives me indigestion.)

And, look, is it really so shocking that students with math difficulties benefit from classes that include proportionally more individual math practice?

No, or at least based on my experience it shouldn’t be. But the thing that the headlines get wrong is that this sort of teaching is anything simple. It’s hard to find the right sort of practice for students. It’s also hard to find classroom structures that give strong and struggling students valuable practice to work on at the same time. It’s hard to vary practice formats, hard to keep it interesting. Hard to make sure kids are making progress during practice. All of this is craft.

My takeaway from this study is that struggling students need more time to practice their skills. If you had to blindly choose a classroom that emphasized practice or real-life math for such a student, you might want to choose practice.

But I know from classroom teaching that there’s nothing simple about helping kids practice. It takes creativity, listening, and a lot of careful planning. Once we get past some of the idealistic sloganeering, I’m pretty sure most of us know this. So let’s talk about *that*: the ways we help kids practice their skills in ways that keep everybody in the room thinking, engaged, and that don’t make children feel stupid or that math hates them.

But as long as we trash-talk teacher-directed work and practice, I think we’ll need pieces like this as a correction.

Why with all the “precision” of the scores on the mathematical variables do these people reduce the MD scores to a scale of 1 to 5 with fairly arbitrary cutoffs?

Also, factor analysis is a technique for grouping variables, and only a guide to meaning of the factors. The “concept” of “latent variables” is no more than an association, not a cause.

LikeLiked by 1 person

I don’t feel so troubled by the arbitrary cutoffs for MD because their results were fairly consistent across the lower range of math achievement, in all three of the MD groups. If they had been, like, ‘kids need teacher-directed instruction only when they’re sort of MD but not too MD,’ I’d be suspicious. But their results seem pretty steady across the bottom three achievement groups.

This is certainly one possibility but, again, then why bother identifying a latent factor at all? Just say that you have a bunch of covarying factors that each contribute to the effect?

I’ll quote again from that awesome ‘Four Queries’ piece I wrote about in the piece:

It seems thata mathematical factor can correspond to a causally efficacious composite whose elements are qualitatively unlike. Is this objectionable for some reason?

Seems to me like a good question. But, Howard, I gather you think it’s settled and simple?

LikeLike

Simple, yes. Settled, not really.

The word “can” in your last but one para is a giveaway.

It’s a long time since I delved into Statistics, but I have still a weakness for multiple regression methods.

LikeLike

There are critical factors within these factors, perhaps the most important being “that there’s nothing simple about helping kids practice.” The other stuff isn’t simple, either.

Then there’s affect.

If a student with difficulties is using all the cognitive juice on emotional survival issues… practice is likely to be much more effective than stuff with social interaction. On the other hand, a teacher could ever so carefully and constructively build an environment where the best things happened with “student led” activities.

LikeLike

Thoughts in reverse order:

1) I really like the sentiment of your second to last paragraph.

2) I’m not sure I unambiguously believe the results of this study. FULL caveat, I’m basing this on the intellectual work you just did. I’m being lazy here and piggy-backing.

3) I will set aside all of the statistical analysis. I just do not know enough to have an opinion.

3.5) I will also set aside things I thought about while reading but that would make for a totally different and more abstract conversation.

4) My mind keeps going back to the 19 instructional activities and the 29 mathematical skills. The instructional activities tell me little about what kind of teaching they represent. The mathematical skills don’t tell me anything about teaching.

If I may borrow your phrasing, “there’s nothing simple about using instructional activities,” which means there is a lot of variability in what they can look like from classroom to classroom.

I’ve read three books that talk about California’s math reform efforts in the 80’s and they’re all coming to mind here. One story from California in Building A Better Teacher exemplifies how misleading trying to extrapolate a pedagogy from an instructional activity can be. I think this link should take you to the relevant pages:

https://books.google.com/books?id=Ua5bAwAAQBAJ&pg=PT69&dq=%22over+the+next+several+years,+the+group+observed+classrooms+across+the+state%22&hl=en&sa=X&ved=0ahUKEwiWwfj6hvfTAhUr9YMKHVDQA1cQ6AEIJzAA#v=onepage&q=%22over%20the%20next%20several%20years%2C%20the%20group%20observed%20classrooms%20across%20the%20state%22&f=false

Since so much of the reasoning and methodology in this study that gets us to the charged headline is downstream of this questionnaire about instructional activities, which I am skeptical of, how can I not be skeptical of any conclusions without drawing on preconceived notions I may already have?

LikeLike

OK, so that book is citing a paper that Charlotte Sharpe (dropping knowledge, below) recommended to me last time I wrote one of these long posts: A Revolution in One Classroom: The Case of Mrs. Oublier

I haven’t read it yet, but it’s looking like I should. Maybe that’ll be the next paper we read together in this space.

Any other recommendations for books or research you think we should blog-review next?

LikeLike

Wouldn’t you know it, the author of Mrs. O also wrote one of the other books I alluded to above, “Learning Policy”, which goes into more detail about the California reform. If you blog-review Mrs. Oublier, I’m all aboard.

http://www.journals.uchicago.edu/doi/abs/10.1086/377676?journalCode=aje

As for other recommendations, where to begin? I guess it’s time for me to dust off the folders of PDFs I have saved that haven’t gotten any attention. Your interest in gaining a broader perspective on a particular topic is spot on so in keeping with that, maybe pick a thread that emerged from this post and dig in?

I also want to thank you for your comments about research and bludgeoning. I think that, perhaps, we educators feel under attack and so the instinct is to grab a weapon and bludgeon or grab a shield and defend. Just reading your paragraph about that allowed me to be aware of it, let down my guard, and read this to learn, instead of to weaponize. I would like to keep choosing “learn” instead of “fight.”

LikeLike

Great post! I enjoyed following your thinking here.

So, some thoughts:

I love factor analysis. LOVE. I love the suspicion and faith-like feelings it evokes in me, and I love the idea that factor analysis represents a people’s desire to know what might be unknowable. It feels and sounds so faith-like.

Also – factor loadings tell you the strength of one item to the GROUP. the alpha score tells you the strength of the group’s relationship itself. So, the manipulatives/calculator group is less cohesive than the others – while each of those is most likely going to vary more with one another than with members of the other groups in predictable ways, there’s also more likely to be error in how well that group ‘hangs together’ than there will be in the “student-directed” group, where not only do most of the items tend to individually follow a predictable response pattern with the group, but the group itself tends to hang together as a whole better. As you can imagine, the fewer items in a group, and the lower the factor loadings of each item, the lower the alpha score. That’s why the calculator group alpha is so low – only 3 items, one of which is not particularly predictable based on the others. (this isn’t technically correct, but you can sort of imagine that means that if the teacher reports higher on counting manipulatives AND geometric manipulatives, you can predict with slightly better than random chance accuracy that they’ll also use calculators. Because it’s less predictable, you can use any grouping of items in that group to accurately predict the others a little less than half of the time. Again, not what those numbers technically mean, but it’s a metaphorically useful interpretation).

Anyway, I love factor analysis as an idea, as an art, as a statistical process, and as a faith system. I like the “parts of the elephant” metaphor, or perhaps an image of 10,000 compases (or wind socks) laid out across the state of new york to determine the shape of the magnetic field (or wind patterns) in a way that no 1 compass (or wind sock) could.

OK, so other things I like to consider…

— the more measurable a teaching practice is through self-reporting, the less we actually stand to learn about the teacher’s intention, goal, or skill. that is, concrete, highly specified, well-understood and widely used practices are way easier to measure on self report instruments than things like “explain how a math problem is solved.” As you’ve been pointing to recently, explanations are NOT SIMPLE, they are heavily contextualized, dependent on students current understanding and the goal (and the size of the gap between them), the teachers’ knowledge/experience with using representations/explanations well….yada yada. So, it’s no wonder that the factor loading for ‘explanations’ (notice, it went more into the student directed!) is pretty poor: this tells ME that self-report data are not that useful for that aspect of teaching practice. Because one person’s explanation is another person’s step-by-step modeling. Or it could look like a number talk. And all of those people might call it the same thing.

More generally, the things we stand to learn about what teaching is ‘effective’ from self-report instruments are often not that interesting. Because either they ARE Really interesting and important aspects of teaching and NOT LIKELY WELL-MEASURED ON SURVEYS, or they’re not that interesting or value-laden (and thus not that helpful for thinking about how to support teachers), like “drill,” “worksheets,” or “using manipulatives” (but not with any particular goals).

One other thing I find interesting about this study (that apparently the authors did not) is the measurement piece. The variable that had the strongest and biggest relationships (size and strength were both larger) is the measurement stuff – e.g., how often the teacher reported allowing students to use tools for measurement, using them accurately, and recognizing fractions):

-the measurement factor loadings and alpha score are stronger than the teacher-directed instructional practices factor loadings. So the items in that group are more predictably related to the group and to each other than some of the others, even though there’s fewer items.

-a teachers’ likelihood to report greater attention to measurement was NOT related to the percent of MDs (which I actually like)

-the relationship between focusing more on measurement and student gains is bigger, and stronger!

-higher frequency of teaching measurement was more related to student-directed instruction and manipulatives than teacher-directed instruction

so, the headline could have read, “measurement tasks and tools, which are more likely to be used in student-directed ways, support students with difficulties in mathematics!”

Of course, it could be that the achievement test items are weighted so that even if there are fewer items for a domain (maybe only 3 items about measurement) they are weighted equally against another domain (against ‘counting’ which maybe had 10 items).

I’m curious about the authors’ choice not to highlight this finding, and I think it goes back to the spin/angle bit you pointed at early in your blog post. Math wars stuff gets press.

LikeLiked by 1 person

Ahhh thank you so much for that whole paragraph. I’m reading it and rereading it. Much that I don’t yet understand.

Yes! This is so interesting to me too.

The picture we get is that teachers don’t spend more time on measurement or whatever with a higher percentage of MD students in their classes. What we DO see is that teachers are spending increasing amounts of time on the ‘basics’ of Ordering and Number/Quantity. These skills groups seems to hang together pretty well with alphas of 0.80 and 0.73, and include things like ordering objects, writing numbers one to ten, adding single-digit numbers.

Now, that’s fine. But check out Table 5! Spending more time on Ordering has a -.05 effect on the performance of two out of the three MD groups! That’s nuts. That’s the exact same effect size as the positive outcomes of teacher-directed activities that the authors make so much about.

So here’s ACTUALLY the most exciting headline, to me, from this paper: we’re spending too much time on Ordering skills, especially with MD students.

As you note, measurement and fractions activities help a ton. Why? We can speculate. When young students are working with measurement and fractions they’re giving their number skills a great workout, but in a different context than their naked-numerical practice.

I also notice that Reading 2-, 3-digit numbers has a nice effect on MD students. So the headline we’re getting towards is to give First Grade MD students varied opportunities to work with numbers at a sophisticated level. Don’t fall into the trap of spending too much of the year working on the ‘basics’ just because they don’t have the basics yet. In First Grade, at least, these other opportunities to work with numbers help.

(It’s unclear to me if this is a generally useful principle. First Graders will find good, helpful number experiences in a lot of places that wouldn’t really give middle school students a workout. Then again, maybe well-chosen measurement activities would help middle school students with fractions in a similar way.)

LikeLike

Hi Michael! This was a super fun read. Thanks for sharing how you approach a research article. The paper sounded familiar to me and, yup, it turns out to be the exact same one I went a few rounds on twitter over when it first came out. At the time, Jessica Lahey posted a tweet about it, which led to a convo, which led to her publishing an Atlantic piece with her husband, Tim. This one: https://www.theatlantic.com/education/archive/2014/07/how-to-read-education-data-without-jumping-to-conclusions/374045/

In that piece, she plucked out a little quote, as journalists have to do. But I thought it might be interesting to read what I said at the time. I dug out my email. It’s long, but you’re into #longreads these days. 🙂 Here’s how I reacted at the time, (which I generally stand by, but is different than what I’d write now):

Hi Jess,

OK, I’ve pulled out the red pen and read the article in more detail. I know you are looking for quotes, so I’m not going to tie my thoughts together into an essay, especially given the hour. But let me know if you need me to structure more of an argument. Happy to clarify. I’m going to break my thoughts down into two groups: thoughts about this particular study, and thoughts about educational research articles and the media. Caveat. I’m not an educational statistician. I was dually endorsed in K-8 education and special education and I’m comfortable with research. I also have great colleagues. So here’s my take:

META JOURNALISM/RESEARCH STUFF:

One pattern I’ve noticed is that, when journalists report on a new research finding in medicine or science, they are careful to consult experts in the field who were not involved in the study, and ask them to weigh in on the findings. Is this an important study? Was it well designed? How should we interpret or generalize these results? If there’s a new neurology paper out of California, journalists consult unaffiliated neurologists in Boston. I don’t see the same pattern in research about education. Instead, journalists usually report the findings from the study, unquestioned, perhaps with additional quotes from the authors of the study. The article you linked earlier is a perfect example of this phenomenon, and it’s a problem for several reasons:

1) It’s not fair to expect that journalists be experts in mathematics education, or science education, or literacy education. They’re not. They should be able to read research, but they should also consult sources with field-specific expertise for deeper understanding of the fields.

2) Education research deserves respect. Just this April, I attended the National Council of Teachers of Mathematics conference in New Orleans, and was able to attend a session in the Research Conference that precedes it. I sat in a room of thoughtful, highly educated, intelligent people who have made it their life’s work to figure out which teaching and learning practices are most effective.

3) Journalists could gain perspective about how the findings should be interpreted by consulting experts. In particular, experts could help journalists understand that each paper is part of a longer conversation between researchers. If a paper has interesting results, other researchers need time to study the results, and then either try to refute, replicate, clarify, or add to them. Over-emphasizing any one part of the conversation is like overhearing one sentence in a passing couple’s argument and thinking we’ve got it figured out.

4) Public education has always been politicized, but we’ve recently jumped the shark. (Just ask teachers in Louisiana who were getting on board with Common Core because they’re governor wanted them to, until the same governor changed his mind.) Catchy articles about education circulate widely, for understandable reason, but I wish education reporters would resist the impulse to over-generalize or sensationalize research findings. In this case, we have a study of 1st graders who scored in the bottom 15% on a standardized test. As tempting as it is for people to write, post, or tweet about how we “should teach math” across the board, it’s important to look at what this study is and is not.

THIS STUDY:

Whenever I read any educational research that makes a claim about students’ “achievement,” my first question is HOW DID THE RESEARCHERS DEFINE AND MEASURE ACHIEVEMENT? This is a question I don’t see journalists ask. In this case, the researchers used the ECLS-K standardized assessment, administered in 1998-1999. I have tried to access a copy of this assessment, but it is protected. (Extremely frustrating; I wish they’d at least release some sample items! http://nces.ed.gov/ecls/pdf/Copyrighted_Measures.pdf.) Based on what I have been able to gather, it’s a fairly traditional assessment, heavy on procedure.

So, I am not at all surprised that students taught mathematics in a teacher-directed, procedural way would do better on a standardized assessment that measures procedural competency. If the instruction matches the test, the scores will be higher.

The question is, is procedural competency the same thing as mathematics “achievement?” Some would argue yes. Many of us in mathematics education would argue that students need to compute fluently, but there is much more to mathematics than that. Even within the so-called basics, what use is it for a student to compute a worksheet of subtraction problems perfectly if that student doesn’t understand why we subtract, and doesn’t recognize subtraction in context? Meaningful math education involves fluency and understanding.

Perhaps an analogy would help. If students practice reading and writing musical scales, they’ll do well on an assessment of reading and writing musical scales. But what do those students know about music?

Assessing students’ mathematical thinking and proficiency is complex. No standardized test or product—especially one administered to kindergarteners—will capture it.

In this particular study, the researchers are adding to the body of research about an important question: what instructional methods work best for students who are having difficulty in mathematics? These researchers focused specifically on instructional practices in first grade, and some of the findings are interesting. As is typical of peer-reviewed research, the authors of the study are careful to point out the strengths and limitations of their study, present both their positive and negative findings, and use cautious language. It might be useful to compare this language to what circulated the internet today. For example:

“We found no significant relation between the percentage of MD students in the classroom and the frequency of teacher-directed or student-centered instructional activities. However, we did find that the percentage of students with MD in the class was positively and significantly associated with manipulatives/calculators and movement/music activities, as well as teaching the ordering and number/quantity sets of skills.”

“The p value for this F test was .08, offering modest support to the existence of differential effects across groups.”

“For the frequency with which the eight groups of skills are taught, the associations were much less clear. There was little pattern…”

“Thus, there was little evidence of a relationship between these variables and children’s mathematical achievement.”

“This study is limited by its reliance on first-grade teacher self-reports of the frequency of their instructional practices…”

“We were unable to measure the relative quality with which these practices were implemented.”

“Despite these limitations, the study’s estimates should be considered fairly robust.”

“The magnitude of our study’s reported instructional practices ESs are small, but consistent with magnitude of reading or mathematical instructional ESs reported in other published, high-quality studies.”

Most significant to me:

“The ECLS-K measures of instructional practice may not fully capture what is generally considered to constitute teacher-directed or student-centered instruction. For example, quick pacing and frequent corrective feedback are considered optimal features of teacher-directed instruction (e.g., Stein et al., 2004), whereas problem solving, reasoning, and other cognitive processes are strongly emphasized in student-centered approaches (NCTM, 2000). Yet these aspects either are not measured in the ECLS-K surveys or are only measured using superficial frequency ratings.” (16)

This quote is extremely important. The thrust of this paper is looking at the difference in effectiveness between so-called “teacher-directed” and “student-centered” teaching practices. (I would argue this isn’t the most helpful lens, but that’s another issue.) By the authors’ admission right in the paper, the metric they used doesn’t particularly align with either of these groups of teaching practices. To me, this quote calls into question the usefulness of the study.

In addition, there is a lot in the methodology that gives me pause:

-The data are 15 years old.

-They’re relying on teachers self-reporting.

-They never set foot in any of these classrooms.

-There is no control for quality.

-The categories and grouping of the categories are not without controversy. For example, they call music/movement a “student-centered” strategy. But I have seen teachers use so-called music in math class to teach students mnemonic tricks to memorize algorithms. The researchers call textbooks a “teacher-directed” practice, but some texts are rote and others are conceptual. These researchers were looking at a massive, aggregated data set to try to figure out what works. In doing so, they grouped practices into collections that may or may not actually hold together.

-It takes a careful read to understand how they classified classrooms. Students were not put into either a student-centered or teacher-directed classroom, in some kind of controlled experiment. Rather, students received instruction in a variety of ways, and the researchers looked only at the relative frequency of those techniques. They’re arguing that the classrooms with more time spent on teacher-directed lessons benefited students who struggle, but the variables are not controlled. Those students were also receiving the other types of instruction. The interaction between the practices are not explored.

-The findings are very small. For example, the researchers said they found that teachers with more students with mathematical difficulties were using more music/movement in their classrooms, with no positive effects. When I looked at the data, the range was from 1.4 instances of music in math per 20 school days to 2.0 instances of music in math per 20 school days. This is not a big range. While that difference might have been “statistically significant,” it’s not necessarily clinically important. This is one of the tricky issues with statistics jargon: significant is not the same as important.

-I was struck by some of their arguments. This particular quote jumped out at me:

“Teacher-directed practices should help students increase their procedural fluency in applying explicitly taught and repeatedly practiced sets of procedures to solve mathematics problems, which should result in more effective use of higher order thinking and problem-solving skills (Stein, Silbert, & Carmine, 2004).

There are two very important “shoulds” in that quote. The first one is a little easier for me to swallow, I suppose. The second one really makes me balk. They are offering zero evidence that procedural fluency from repeated practice sets results in higher order thinking or problem-solving. If it should, as the reference apparently argued in 2004, then where’s the evidence in the intervening 10 years? This current study can make no claims about higher order thinking or problem-solving skills.

-Finally, I was struck by the reference choices. It’s striking that they omitted the longitudinal results from Jo Boaler, for example…. [Michael, I’m omitting this excerpt. Sparing you having to read about Railside and Phoenix Park again.]

To me, that’s a large set of longitudinal research that should have gone in this paper. She has done other work with students who were failing math class when taught traditionally, and then excelled when taught with a problem-solving approach. So, I expected her name in the reference list.

I also expected Behrend’s name. She has written some great papers about students with special needs and effective math instruction. For example, I’m a fan of this one: http://classes.uleth.ca/200903/educ3700n/ED3700_Fall_2009/Welcome_files/Mathematical%20Rules%20and%20Understanding.pdf

And then there’s this book about assessing math proficiency: http://www.amazon.com/dp/0521697662/ref=pe_385040_30332200_TE_item

Now, again, to clarify, I am NOT a researcher and I am NOT a statistician. I am a teacher looking to apply the lessons learned from research into the classroom. To me, the question of how best to teach students who are not succeeding in math is an essential one that remains open. I didn’t think this study made a compelling case that so-called teacher-directed instruction is the way to go.

I’m honestly exhausted after dealing with in-laws all day, and it’s very late. If you need something different, just let me know. I’m not sure how coherent this was. Hope it was helpful. I’m glad you want to take this on.

Tracy

–Michael, more in the next comment…

LikeLiked by 1 person

OK, so here are some things that interest me. We got interested in totally different things! (Not surprising, really, given our history, but still striking!) My eyes kind of glazed at factor analysis when I read it back then. But look how much fun you had with that! I was more focused on measurement, which you weren’t so keen on. And then study design. Study design.

This is why I was glad to see you reading Mrs. Oublier, one of my all-time favorite papers. That paper, for me, wrecked any kinds of papers like this study. The whole thing was based on self-reporting. But look at Mrs. O! Mrs. O shows us just what a minefield self-reporting is. She had moved her kids out of rows and into groups, and said her kids were engaging in “cooperative learning.” Cohen said, “No student ever spoke to another about mathematical ideas as a part of the public discourse.” Her use of manipulatives is supposed to be a crucial feature of her reform, but Cohen chronicles how she has all the students moving the beans at her directions and claps, whether they understand or not. She’d report, though, that she has students working in groups and using manipulatives. If she were one of the teachers in this study, she’d sound progressive, reform, whatever you want to call it. But through Cohen’s lens, she’s not.

I’m at a funny place in my career. The longer I’m in education, the more tension I feel about this issue of scale. Scaling things up. Aggregating data. Making big changes. I’m much more convinced by this study of one teacher, Mrs. O., then I am of meta-data analyses because as soon as you scratch the surface in these sweeping “how do kids learn?” studies, you run into huge issues of measurement and quality control and naming (as you pointed out) and differences of interpretation and bad statistics. Ultimately, I care more about Mrs. O, and thinking about Mrs. O and her journey and how to support her better. The cognitive scientists will pshaw on the Mrs. O paper. Call it anecdotal and subjective. And it is. But it’s also more useful to me than sweeping generalizations that lack nuance and deep understanding and context.

This is why I told Dylan Wiliam that he’d hate my book. And this is why I haven’t read all the cognitive load theory stuff. And I’m probably being stupid and should make myself read it so I can at least follow these heated twitter spats. But I also know I’ll be asking the same core questions over and over again:

What do you mean by “student-centered?” What do you mean by “achievement?” What do you mean by “success?” What do you mean by “mathematics?” What do you mean by “effective?”

And then I sound pedantic and wordsmithy, but really, if we can’t agree on what effective is, what the hell are we measuring?

LikeLiked by 1 person

BTW, I just love your writing style here:

“And sometimes we take a big pile of data and poke at it with a stick. That’s what the authors of this study set out to do.

“I don’t mean to be dismissive of the paper. I’m writing about it because I think it’s worth writing about. But I also know that lots of us in education use research as a bludgeon. This leads to educators reading research with two questions in mind: (a) Can I bludgeon someone with this research? (b) How can I avoid getting bludgeoned by this research?

“That’s why I’m taking pains to lower the stakes. This paper isn’t a crisis or a boon for anyone. It’s just the story of how a bunch of people analyzed a bunch of interesting data.

“Freed of the responsibility of figuring out if this study threatens us or not, let’s muck around and see what we find.”

LikeLiked by 1 person

One of the best parts of the Mrs. O article for me — not to preempt what I expect will be an excellent and thought-provoking review — is that David Cohen writes about this teacher with such kindness and understanding. Mrs. O thinks she’s doing everything right, but he seems more inclined to see what she is doing — viewed from a narrow right/wrong perspective — as, well, wrong. And yet, he still urges us to see her the way she sees herself. “Mrs. O described the changes she has made as a revolution,” he says, then adds. “I do not think that she was deluded.” He then takes his analysis in a direction that is so obvious that it seldom gets called what it is: revolutionary. That is, he goes on to put Mrs. O’s beliefs — and the evidence that would seem to contradict them — at the feet of a policy that lacks depth and reasonableness.

“If such learning is difficult for students,” he asks, “should it be any less so for teachers?” Learning a new way of doing math demands a new way of thinking about math, he says. This is more than learning. This is UN-learning and un-learning involves making mistakes. And mistakes, he observes, “are a particular problem for teachers.” To say it plainly, I don’t know any other well-respected academic or researcher writing today who acknowledges such a plain fact. If more people did, I think we could more easily expect substantive — even revolutionary — change. Revolutions, it turns out, happen not with thunderous proclamations but in quiet moments of self-assuredness and in the public affirmation of obvious truths.

LikeLiked by 2 people

Doesn’t he also sort of mock her pretty often? (Or what else in that recurring ‘infielder’ line doing?) And when describing her teaching Cohen’s tone is fairly apoplectic, on my read.

LikeLike