I always get conflicted about reading an isolated study. I *know *I’m going to read it poorly. There will be lots of terms I don’t know; I won’t get the context of the results. I’m assured of misreading.

On the other side of the ledger, though, is curiosity, and the fun that comes from trying to puzzle these sort of things out. (The other carrot is insight. You never know when insight will hit.)

So, when I saw Heidi talk about this piece on twitter, I thought it would be fun to give it a closer read. It’s mathematically interesting, and much of it is obscure to me. Turns out that the piece is openly available, so you can play along at home. So, let’s take a closer look.

**I. **

The stakes of this study are both high and crushingly low. Back in 2014 when this was published, the paper caught some press that picked up on its ‘Math Wars’ angle. For example, you have NPR‘s summary of the research:

Math teachers will often try to get creative with their lesson plans if their students are struggling to grasp concepts. But in “Which Instructional Practices Most Help First-Grade Students With and Without Mathematics Difficulties?” the researchers found that plain, old-fashioned practice and drills — directed by the teacher — were far more effective than “creative” methods such as music, math toys and student-directed learning.

Pushes all your teachery buttons, right?

But if the stakes *seem *high, the paper is also easy to disbelieve, if you don’t like the results.

Evidence about teaching comes in a lot of different forms. Sometimes, it comes from an experiment; *y’all (randomly chosen people) try doing this, everyone else do that, and we see what happens*. Other times we skip the ‘random’ part and find reasonable groups to compare (a ‘quasi-experiment‘). Still other times we don’t try for statistically valid comparisons between groups, and instead a team of researchers will look very, very closely at teaching in a methodologically rich and cautious way.

And sometimes we take a big pile of data and poke at it with a stick. That’s what the authors of this study set out to do.

I don’t mean to be dismissive of the paper. I’m writing about it because I think it’s worth writing about. But I also know that lots of us in education use research as a bludgeon. This leads to educators reading research with two questions in mind: *(a) Can I bludgeon someone with this research? (b) How can I avoid getting bludgeoned by this research?*

That’s why I’m taking pains to lower the stakes. This paper isn’t a crisis or a boon for anyone. It’s just the story of how a bunch of people analyzed a bunch of interesting data.

Freed of the responsibility of figuring out if this study threatens us or not, let’s muck around and see what we find.

**II. **

The researchers lead off with a nifty bit of statistical work called *factor analysis*. It’s an analytical move that, as I read more about, I find both supremely cool and metaphysically questionable.

You might have heard of socioeconomic status. Socioeconomic status is supposed to explain a lot about the world we live in. But what *is *socioeconomic status?

You can’t directly measure someone’s socioeconomic status. It’s a *latent variable*, one responsible for a myriad other observable variables, such as parental income, occupational prestige, the number of books you lying around your parents’ house, and so on.

None of these observables, on their own, can explain much of the variance in student academic performance. If your parents have a lot of books at home, that’s just it: your parents have a lot of books. That doesn’t make you a measurably better student.

Here’s the way factor analysis works, in short. You get a long list of responses to a number of questions, or a long list of measurements. I don’t know, maybe there are 100 variables you’re looking at. And you wonder (or program a computer to wonder) whether these can be explained by some smaller set of latent variables. You see if some of your 100 variables tend to vary as a group, e.g. when income goes up by a bit, does educational attainment tend to rise too? You do this for all your variables, and hopefully you’re able to identify just a few latent variables that stand behind your big list. This makes the rest of your analysis a lot easier; much better to compare 3 variables than 100.

That’s what we do for socioeconomic status. That’s also what the authors of this paper do for instructional techniques teachers use with First Graders..

I’m new to all this, so please let me know if I’m messing any of this up, but it sure seems to me tough to figure out what exactly these latent variables are. One possibility is that all the little things that vary together — the parental income, the educational attainment, etc. — all contribute to academic outcomes, but just a little bit. Any one of them would be statistically irrelevant, but together, they have *oomph*.

This would be fine, I guess, but then why bother grouping them into some other latent variable? Wouldn’t we be better off saying that a bunch of little things can add up to something significant?

The other possibility is that socioeconomic status is some real, other thing, and all those other measurable variables are just pointing to this big, actual cause of academic success. What this ‘other thing’ actually is, though, remains up in the air.

(In searching for other people who worried about this, I came across a piece from *History and Philosophy of Psychology Bulletin* called ‘Four Queries About Factor Reality.’ Leading line: ‘When I first learned about factor analysis, there were four methodological questions that troubled me. They still do.’)

So, that’s the first piece of statistical wizardry in this paper. Keep reading: there’s more!

**III.**

Back to First Graders. The authors of this paper didn’t collect this data; the Department of Education, through the National Center for Education Statistics, ran the survey.

The NCES study was immense. It’s longitudinal, so we’re following the same group of students over many years. I don’t really know the details, but they’re aiming for a nationally representative sample of participants in the study. We’re talking over ten-thousand students; their parents; thousands of teachers; they measured kids’ *height*, for crying out loud. It’s an awe-inspiring dataset, or at least it seems that way to me.

As part of the survey, they ask First Grade teachers to answer questions about their math teaching. First, 19 instructional activities…

…and then, 29 mathematical skills.

Now, we can start seeing the outlines of a research plan. Teachers tell you how they teach; we have info about how well these kids performed in math in Kindergarten and in First Grade; let’s find out how the teaching impacts the learning.

Sounds, good, except HOLY COW look at all these variables. 19 instructional techniques and 29 skills. That’s a lot of items.

I think you know what’s coming next…

FACTOR ANALYSIS, BABY!

So we do this factor analysis (beep bop boop boop) and it turns out that, yes, indeed some of the variables vary together, suggesting that there are some latent, unmeasured factors that we can study instead of all 48 of these items.

Some good news: the instructional techniques only got grouped with other instructional techniques, and skills got groups with skills. (It would be a bit weird if teachers who teach math through music focused more on place value, or something.)

I’m more interested in the instructional factors, so I’ll focus on the way these 19 instructional techniques got analytically grouped:

The factor loadings, as far as I understand, can be interpreted as correlation coefficients, i.e. higher means a tighter fit with the latent variable. (I don’t yet understand Cronbach’s Alpha or what it signifies. For me, that’ll have to wait.)

Some of these loadings seem pretty impressive. If a teacher says they frequently give worksheets, yeah, it sure seems like they also talk about frequently running routine drills. Ditto with ‘movement to learn math’ and ‘music to learn math.’

But here’s something I find interesting about all this. The factor analysis tells you what responses to this survey tended to vary together, and it helps you identify four groups of covarying instructional techniques. But — and this is the part I find so important — the RESEARCHERS DECIDE WHAT TO CALL THEM.

The first group of instructional techniques all focus on practicing solving problems: students practice on worksheets, or from textbooks, or drill, or do math on a chalkboard. The researchers name this latent variable ‘teacher-directed instruction.’

The second group of covarying techniques are: mixed ability group work, work on a problem with several solutions, solving a real life math problem, explaining stuff, and running peer tutoring activities. The researchers name this latent variable ‘student-centered instruction.’

I want to ask the same questions that I asked about socioeconomic status above. What *is* student-centered instruction? Is it just a little bit of group work, a little bit of real life math and peer tutoring, all mushed up and bundled together for convenience’s sake? Or is it some other thing, some *style *of instruction that these measurable variables are pointing us towards?

The researchers take pains to argue that it’s the latter. Student-centered activities, they say, *‘provide students with opportunities to be actively involved in the process of generating mathematical knowledge.’ *That’s what they’re identifying with all these measurable things.

I’m unconvinced, though. We’re supposed to believe that these six techniques, though they vary together, are really a coherent style of teaching, in disguise. But there seems to me a gap between the techniques that teachers reported on and the style of teaching they describe as ‘student-centered.’ How do we know that these markers are indicators of that style?

Which leads me to think that they’re just six techniques that teachers often happen to use together. They go together, but I’m not sure the techniques stand for much more than what they are.

Eventually — I promise, we’re getting there — the researchers are going to find that teachers who emphasize the first set of activities help their weakest students more than teachers emphasizing the second set. And, eventually, NPR is going to pick up this study and run with it.

If the researchers decide to call the first group ‘individual math practice’ and the second ‘group work and problem solving’ then the headline news is “WEAKEST STUDENTS BENEFIT FROM INDIVIDUAL PRACTICE.” Instead, the researchers went for ‘teacher-directed’ and ‘student-centered’ and the headlines were “TEACHERS CODDLING CHILDREN; RUINING FUTURE.”

I’m not saying it’s the wrong choice. I’m saying it’s a choice.

**IV. **

Let’s skip to the end. Teacher-directed activities helped the weakest math students (MD = math difficulties) more than student-centered activities.

The researchers note that the effect sizes are small. Actually, they seem a bit embarrassed by this and argue that their results are conservative, and the real gains of teacher-directed instruction might be higher. Whatever. (Freddie deBoer reminds us that effect sizes in education tend to be modest, anyway. We can do less than we think we can.)

Also ineffective for learning to solve math problems: movement and music, calculating the answers instead of figuring them out, and ‘manipulatives.’ (The researchers call all of these ‘student-centered.’)

There’s one bit of cheating in the discussion, I think. The researchers found another interesting thing from the teacher survey data. When a teacher has a lot of students with math difficulty in a class, they are more likely to do activities involving calculators and with movement/music then they otherwise might be:

You might recall that these activities aren’t particularly effective math practice, and so they don’t lead to kids getting much better at solving problems.

By the time you get to the discussion of the results, though, here’s what they’re calling this: “the **increasing reliance on non-teacher-directed instruction** by first grade teachers when their classes include higher percentages of students with MD.”

Naming, man.

This got picked up by headlines, but I think the thing to check out is that the ‘student-directed’ category did not correlate with percentage of struggling math students in a class. That doesn’t sound to me like non-teacher-directed techniques get relied on when teachers have more weak math students in their classes.

The headline news for this study was “TEACHERS RELY ON INEFFECTIVE METHODS WHEN THE GOING GETS ROUGH.” But the headline probably should have been “KIDS DON’T LEARN TO ADD FROM USING CALCULATORS OR SINGING.”

**V. **

Otherwise, though, I believe the results of this study pretty unambiguously.

Some people on Twitter worried about using a test with young children, but that doesn’t bother me so much. There are a lot of things that a well-designed test can’t measure that I care about, but it certainly measures some of the things I care about.

Big studies like this are not going to be subtle. You’re not going to get a picture into the most effective classrooms for struggling students. You’re not going to get details about what, precisely, it is that is ineffective about ineffective teaching. We’re not going to get nuance.

Then again, it’s not like education is a particularly nuanced place. There are plenty of people out there who take the stage to provide ridiculously simple slogans, and I think it’s helpful to take the slogans at their word.

Meaning: to the extent that your slogan is ‘fewer worksheets, more group work!’, that slogan is not supported by this evidence. Ditto with ‘less drill, more real life math!’

(I don’t have links to people providing these slogans, but that’s partly because scrolling through conference hashtags gives me indigestion.)

And, look, is it really so shocking that students with math difficulties benefit from classes that include proportionally more individual math practice?

No, or at least based on my experience it shouldn’t be. But the thing that the headlines get wrong is that this sort of teaching is anything simple. It’s hard to find the right sort of practice for students. It’s also hard to find classroom structures that give strong and struggling students valuable practice to work on at the same time. It’s hard to vary practice formats, hard to keep it interesting. Hard to make sure kids are making progress during practice. All of this is craft.

My takeaway from this study is that struggling students need more time to practice their skills. If you had to blindly choose a classroom that emphasized practice or real-life math for such a student, you might want to choose practice.

But I know from classroom teaching that there’s nothing simple about helping kids practice. It takes creativity, listening, and a lot of careful planning. Once we get past some of the idealistic sloganeering, I’m pretty sure most of us know this. So let’s talk about *that*: the ways we help kids practice their skills in ways that keep everybody in the room thinking, engaged, and that don’t make children feel stupid or that math hates them.

But as long as we trash-talk teacher-directed work and practice, I think we’ll need pieces like this as a correction.

Why with all the “precision” of the scores on the mathematical variables do these people reduce the MD scores to a scale of 1 to 5 with fairly arbitrary cutoffs?

Also, factor analysis is a technique for grouping variables, and only a guide to meaning of the factors. The “concept” of “latent variables” is no more than an association, not a cause.

LikeLiked by 1 person

I don’t feel so troubled by the arbitrary cutoffs for MD because their results were fairly consistent across the lower range of math achievement, in all three of the MD groups. If they had been, like, ‘kids need teacher-directed instruction only when they’re sort of MD but not too MD,’ I’d be suspicious. But their results seem pretty steady across the bottom three achievement groups.

This is certainly one possibility but, again, then why bother identifying a latent factor at all? Just say that you have a bunch of covarying factors that each contribute to the effect?

I’ll quote again from that awesome ‘Four Queries’ piece I wrote about in the piece:

It seems thata mathematical factor can correspond to a causally efficacious composite whose elements are qualitatively unlike. Is this objectionable for some reason?

Seems to me like a good question. But, Howard, I gather you think it’s settled and simple?

LikeLike

Simple, yes. Settled, not really.

The word “can” in your last but one para is a giveaway.

It’s a long time since I delved into Statistics, but I have still a weakness for multiple regression methods.

LikeLike

There are critical factors within these factors, perhaps the most important being “that there’s nothing simple about helping kids practice.” The other stuff isn’t simple, either.

Then there’s affect.

If a student with difficulties is using all the cognitive juice on emotional survival issues… practice is likely to be much more effective than stuff with social interaction. On the other hand, a teacher could ever so carefully and constructively build an environment where the best things happened with “student led” activities.

LikeLike

Thoughts in reverse order:

1) I really like the sentiment of your second to last paragraph.

2) I’m not sure I unambiguously believe the results of this study. FULL caveat, I’m basing this on the intellectual work you just did. I’m being lazy here and piggy-backing.

3) I will set aside all of the statistical analysis. I just do not know enough to have an opinion.

3.5) I will also set aside things I thought about while reading but that would make for a totally different and more abstract conversation.

4) My mind keeps going back to the 19 instructional activities and the 29 mathematical skills. The instructional activities tell me little about what kind of teaching they represent. The mathematical skills don’t tell me anything about teaching.

If I may borrow your phrasing, “there’s nothing simple about using instructional activities,” which means there is a lot of variability in what they can look like from classroom to classroom.

I’ve read three books that talk about California’s math reform efforts in the 80’s and they’re all coming to mind here. One story from California in Building A Better Teacher exemplifies how misleading trying to extrapolate a pedagogy from an instructional activity can be. I think this link should take you to the relevant pages:

https://books.google.com/books?id=Ua5bAwAAQBAJ&pg=PT69&dq=%22over+the+next+several+years,+the+group+observed+classrooms+across+the+state%22&hl=en&sa=X&ved=0ahUKEwiWwfj6hvfTAhUr9YMKHVDQA1cQ6AEIJzAA#v=onepage&q=%22over%20the%20next%20several%20years%2C%20the%20group%20observed%20classrooms%20across%20the%20state%22&f=false

Since so much of the reasoning and methodology in this study that gets us to the charged headline is downstream of this questionnaire about instructional activities, which I am skeptical of, how can I not be skeptical of any conclusions without drawing on preconceived notions I may already have?

LikeLike

OK, so that book is citing a paper that Charlotte Sharpe (dropping knowledge, below) recommended to me last time I wrote one of these long posts: A Revolution in One Classroom: The Case of Mrs. Oublier

I haven’t read it yet, but it’s looking like I should. Maybe that’ll be the next paper we read together in this space.

Any other recommendations for books or research you think we should blog-review next?

LikeLike

Wouldn’t you know it, the author of Mrs. O also wrote one of the other books I alluded to above, “Learning Policy”, which goes into more detail about the California reform. If you blog-review Mrs. Oublier, I’m all aboard.

http://www.journals.uchicago.edu/doi/abs/10.1086/377676?journalCode=aje

As for other recommendations, where to begin? I guess it’s time for me to dust off the folders of PDFs I have saved that haven’t gotten any attention. Your interest in gaining a broader perspective on a particular topic is spot on so in keeping with that, maybe pick a thread that emerged from this post and dig in?

I also want to thank you for your comments about research and bludgeoning. I think that, perhaps, we educators feel under attack and so the instinct is to grab a weapon and bludgeon or grab a shield and defend. Just reading your paragraph about that allowed me to be aware of it, let down my guard, and read this to learn, instead of to weaponize. I would like to keep choosing “learn” instead of “fight.”

LikeLike

Great post! I enjoyed following your thinking here.

So, some thoughts:

I love factor analysis. LOVE. I love the suspicion and faith-like feelings it evokes in me, and I love the idea that factor analysis represents a people’s desire to know what might be unknowable. It feels and sounds so faith-like.

Also – factor loadings tell you the strength of one item to the GROUP. the alpha score tells you the strength of the group’s relationship itself. So, the manipulatives/calculator group is less cohesive than the others – while each of those is most likely going to vary more with one another than with members of the other groups in predictable ways, there’s also more likely to be error in how well that group ‘hangs together’ than there will be in the “student-directed” group, where not only do most of the items tend to individually follow a predictable response pattern with the group, but the group itself tends to hang together as a whole better. As you can imagine, the fewer items in a group, and the lower the factor loadings of each item, the lower the alpha score. That’s why the calculator group alpha is so low – only 3 items, one of which is not particularly predictable based on the others. (this isn’t technically correct, but you can sort of imagine that means that if the teacher reports higher on counting manipulatives AND geometric manipulatives, you can predict with slightly better than random chance accuracy that they’ll also use calculators. Because it’s less predictable, you can use any grouping of items in that group to accurately predict the others a little less than half of the time. Again, not what those numbers technically mean, but it’s a metaphorically useful interpretation).

Anyway, I love factor analysis as an idea, as an art, as a statistical process, and as a faith system. I like the “parts of the elephant” metaphor, or perhaps an image of 10,000 compases (or wind socks) laid out across the state of new york to determine the shape of the magnetic field (or wind patterns) in a way that no 1 compass (or wind sock) could.

OK, so other things I like to consider…

— the more measurable a teaching practice is through self-reporting, the less we actually stand to learn about the teacher’s intention, goal, or skill. that is, concrete, highly specified, well-understood and widely used practices are way easier to measure on self report instruments than things like “explain how a math problem is solved.” As you’ve been pointing to recently, explanations are NOT SIMPLE, they are heavily contextualized, dependent on students current understanding and the goal (and the size of the gap between them), the teachers’ knowledge/experience with using representations/explanations well….yada yada. So, it’s no wonder that the factor loading for ‘explanations’ (notice, it went more into the student directed!) is pretty poor: this tells ME that self-report data are not that useful for that aspect of teaching practice. Because one person’s explanation is another person’s step-by-step modeling. Or it could look like a number talk. And all of those people might call it the same thing.

More generally, the things we stand to learn about what teaching is ‘effective’ from self-report instruments are often not that interesting. Because either they ARE Really interesting and important aspects of teaching and NOT LIKELY WELL-MEASURED ON SURVEYS, or they’re not that interesting or value-laden (and thus not that helpful for thinking about how to support teachers), like “drill,” “worksheets,” or “using manipulatives” (but not with any particular goals).

One other thing I find interesting about this study (that apparently the authors did not) is the measurement piece. The variable that had the strongest and biggest relationships (size and strength were both larger) is the measurement stuff – e.g., how often the teacher reported allowing students to use tools for measurement, using them accurately, and recognizing fractions):

-the measurement factor loadings and alpha score are stronger than the teacher-directed instructional practices factor loadings. So the items in that group are more predictably related to the group and to each other than some of the others, even though there’s fewer items.

-a teachers’ likelihood to report greater attention to measurement was NOT related to the percent of MDs (which I actually like)

-the relationship between focusing more on measurement and student gains is bigger, and stronger!

-higher frequency of teaching measurement was more related to student-directed instruction and manipulatives than teacher-directed instruction

so, the headline could have read, “measurement tasks and tools, which are more likely to be used in student-directed ways, support students with difficulties in mathematics!”

Of course, it could be that the achievement test items are weighted so that even if there are fewer items for a domain (maybe only 3 items about measurement) they are weighted equally against another domain (against ‘counting’ which maybe had 10 items).

I’m curious about the authors’ choice not to highlight this finding, and I think it goes back to the spin/angle bit you pointed at early in your blog post. Math wars stuff gets press.

LikeLiked by 1 person

Ahhh thank you so much for that whole paragraph. I’m reading it and rereading it. Much that I don’t yet understand.

Yes! This is so interesting to me too.

The picture we get is that teachers don’t spend more time on measurement or whatever with a higher percentage of MD students in their classes. What we DO see is that teachers are spending increasing amounts of time on the ‘basics’ of Ordering and Number/Quantity. These skills groups seems to hang together pretty well with alphas of 0.80 and 0.73, and include things like ordering objects, writing numbers one to ten, adding single-digit numbers.

Now, that’s fine. But check out Table 5! Spending more time on Ordering has a -.05 effect on the performance of two out of the three MD groups! That’s nuts. That’s the exact same effect size as the positive outcomes of teacher-directed activities that the authors make so much about.

So here’s ACTUALLY the most exciting headline, to me, from this paper: we’re spending too much time on Ordering skills, especially with MD students.

As you note, measurement and fractions activities help a ton. Why? We can speculate. When young students are working with measurement and fractions they’re giving their number skills a great workout, but in a different context than their naked-numerical practice.

I also notice that Reading 2-, 3-digit numbers has a nice effect on MD students. So the headline we’re getting towards is to give First Grade MD students varied opportunities to work with numbers at a sophisticated level. Don’t fall into the trap of spending too much of the year working on the ‘basics’ just because they don’t have the basics yet. In First Grade, at least, these other opportunities to work with numbers help.

(It’s unclear to me if this is a generally useful principle. First Graders will find good, helpful number experiences in a lot of places that wouldn’t really give middle school students a workout. Then again, maybe well-chosen measurement activities would help middle school students with fractions in a similar way.)

LikeLike