Cognitive Load Theory and Why Students Are Answer-Obsessed

It’s true: math education doesn’t give a ton of attention to Sweller and cognitive load theory. Math education researchers who are aware of Sweller are most familiar with his attack on problem-based, experiential, discovery and constructivist learning (“An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential and Inquiry-Based Learning“). As Raymond mentioned on twitter, those within math education who are likely to recognize Sweller are equally likely to dismiss him and his work.

Part of this, I think, has to do with focusing on the wrong aspects of Sweller’s work. Ask 100 people what the key idea of Sweller’s work is, and I bet 99 would say: it’s easy to overload the working memory of students. For learning, it’s important not to. So, don’t. An important but limited insight. (We’re trying not to overload anyone!)

The last 1 person out of the 100 is me. As far as math education is concerned, I think the key idea of Sweller’s work is about problem solving, not cognitive load. Here is that key idea: problem solving often forces a person into answer-getting mode, and answer-getting mode is incompatible with learning something new.

(“Answer-getting” mode also has to do with expectations that students have about math class and the sorts of activities they think are valued in mathematics. Sweller shows it has a cognitive element too.)

Sweller’s early work was with number puzzles. Participants in his studies solved the puzzle successfully, but never came to notice a fairly simple pattern which was sort of the “key” to finding any solution. Why? There were two reasons:

  1. When you’re looking for the solution to a problem, your attention is massively restricted to those things that are directly relevant to finding the solution. Lots of important details of the scenario or environment get ignored.
  2. Attention is a zero-sum game. There’s only so much that a person can notice. A person focused on finding the solution is unable to focus on much else.

(For more, read this part of my essay.)

I have found this to be absolutely true and deeply insightful. The first time the idea really hit me was during Christopher Danielson’s talk, titled “What’s the Difference Between Solving A Problem and Learning Mathematics?” There is a difference. Sweller helps us get specific about some of the reasons why.

These limitations of problem solving guide my daily classroom work. My 8th Graders are wrapping up their study of linear functions and moving on to exponential functions. Yesterday, I found myself wanting my students to start thinking about the differences between linear and exponential graphs and patterns. I took this image from David Wees’ project and displayed it on the board:

Screenshot 2016-04-13 at 6.05.12 AM

In the past, my first instinct would have been to pose the problem as quickly as possible. “What are the coordinates of point B? of point A?” I would then give my students time to think, and I would have expected some learning to have occurred.

Now I know that this could be a particularly bad way to ask my students to begin their work. They probably wouldn’t notice what I want them to notice. Instead, they’d probably go into that answer-getting mode that focuses all their resources in an unproductive way:

Screenshot 2016-04-13 at 6.08.40 AM.png

Another key insight of Sweller has to do with how to avoid ensnaring students in this unproductive struggle. One suggestion of Sweller’s is to ask less-specific questions. These nonspecific questions don’t funnel attention in the way specific questions do, and they therefore don’t overload students in quite the same way.

Sweller first described the power of nonspecific questions with regards to angle problems. Rather than asking students to find a particular angle, he asked “Calculate the value of as many variables as you can.”

Screenshot 2016-04-13 at 6.15.32 AM
Sweller, Mawar & Ward, 1983. 


With my 8th Graders, yesterday I began class with two nonspecific questions. I asked these questions so that they’d notice as much about the diagram as possible and start putting together some of the pieces about exponential relationships.

My first question: “What do you notice?” I waited for lots of hands to go up, and then I quickly called on three students. (I find it’s important to move quickly here — not so interesting to rattle through everyone’s noticings.)

My second question: “Study the diagram and find something to figure out.” I asked students to do this in their heads, alone. Then, “Talk to your partner — come up with at least two different things to figure out, then as many as you can.” (What counts as something “figured out”? We’ve done this routine many times, so my students know from experience.)

Here is an incomplete list of what my students calculated/figured out from the exponential graph:

  1. The y-coordinates are doubling
  2. The y-axis is going up by 4
  3. The slopes are changing between each pair of points
  4. The graph is non-proportional
  5. The next coordinate would be (6, 64)

If my students had mentioned, at this phase, that the coordinates of B were (2,4) we would have moved on. Since they hadn’t, and since they were saying so many smart things, I decided that this would be a great time to ask a third question:

“What are the coordinates of point B? point A?”

My students were able to answer these specific questions, but that’s hardly the point. Sweller’s research suggests that you can’t use problem-solving success as a gauge of whether kids have learned something or not.

I do think, though, that the reasons my students gave for their correct answers are revealing. Some students, in justifying their answers, mentioned that you could be sure that point A was at (0,1) because the y-coordinate seems to be 1/4 of the way up to 4. Other students then pointed out that (0, 1) fits the general pattern. What’s interesting is that this first observation — the position of point A up the axis — never came up in the first two questions I asked. That makes sense, because that way of looking at the position of point A has nothing to do with the exponential pattern.  In fact, it’s the sort of hyper-focused response that you’d only expect to hear when a very specific goal has been set by the teacher — find the coordinates of point A. Otherwise, that’s not the thing that’s worth noticing here (probably). It misses the forest for the trees in the way people do when they are focused on achieving a narrow goal.

The second response, though, showed that some of my students had started making good connections. They justified the coordinates of points A and B based on the general pattern.

All this suggests to me that while some of my students are ready for working on specific problems, many of them aren’t yet there.

Asking more nonspecific problems isn’t the only recommendation that Sweller makes, of course. He’s better known for recommending the heavy use of worked-out examples and explanations in class. We do those too, though probably not as often as Sweller would like. Still, there’s more to Sweller’s theory than worked examples.

The key idea here is that specific questions cause students to chase specific goals. Chasing a goal isn’t always helpful for learning. On the one hand, I think this makes the case for developing a specific question more slowly, asking students to notice before posing a problem. On the other, this calls for us to be more cautious and deliberate about how we use problems in our teaching, especially in the early stages of teaching a new idea.



21 thoughts on “Cognitive Load Theory and Why Students Are Answer-Obsessed

    1. Particularly in geometry I’m finding that a useful technique. So much of geometry comes down to noticing new things and relationships — they usually aren’t even so complex, compared to a lot of the stuff we ask kids to swallow in high school algebra.

      I’m not ready to sign up for this as the sole or major use of class time. There’s so much modeling of what geometric discourse should look like that’s needed. There are lots of facts and formulas that I don’t think could be developed this way, for many classes.

      But I definitely like this technique, and I definitely use it often.


      1. Yeah he spent the first couple months on discourse in general before even taking up geometry. So it’s definitely not the only mode. Still it’s more “doing math” than solving someone else’s problem is.


      2. I still want to tread carefully. Whether or not this is more authentically mathematical than solving someone else’s problem I can’t say. (It seems to me that solving someone else’s problem is very mathematical. Poincare’s conjecture gets proved by someone else.)

        But I also don’t want to assume that what’s most authentically mathematical should decide how we teach. There really are differences between newbies and experts. Problem solving is very mathematical, but it might be an activity for a few days down the road of my 8th Grade class.


  1. This line struck me:

    “When you’re looking for the solution to a problem, your attention is massively restricted to those things that are directly relevant to finding the solution. Lots of important details of the scenario or environment get ignored.”

    And it made me think of what happens when students get taught algorithms to solve problems. I saw it happen with my own children–when my oldest hit fourth grade and was taught how to use the standard algorithm for adding, he stopped making sense of addition problems by decomposing, etc and other interesting methods of mental math.

    Also, those math teachers that have relied heavily on algebraic solutions to problems, have difficulties noticing or considering other methods of thinking about a problem. And sometimes they miss a representation that is so much more rich than trying to represent it algebraically. Additionally, they miss an opportunity to make a connection for students that are not ready to reach for an algebraic solution to a problem.


  2. Hi Michael, two quick questions.

    1. How come you chose in this article (and in the essay of yours to which you link) to avoid the term ‘working memory’ and use ‘attention’?

    2. I tweeted to you and @dylanwiliam to ask how this relates to backwards design. Dylan indicated that it was compatible. I broadly understand Sweller’s ‘goal free effect’, but following on from what Dylan suggested, I’m assuming that the fact that we’re not explicitly asking students to solve a problem (at least at first) doesn’t mean that we (the teacher) don’t have a specific learning goal in mind. In the case of Sweller’s initial experiment, this goal was for students to realise that the problems could be solved by alternating between x 3 and -29. In the above example what was it that you wanted students to be able to do by the end of the lesson that they couldn’t do at the start, and how did you check that they could do this thing? Or have I missed something?




    1. How come you chose in this article (and in the essay of yours to which you link) to avoid the term ‘working memory’ and use ‘attention’?

      In the essay I don’t believe I avoided the term working memory. Check out this section (here) where I discuss the relationship between attention and working memory in CLT. Some self-quotes:

      Up until this point, the leading actor in Sweller’s theory was attention. Starting in 1988, attention would abruptly disappear from Sweller’s work. Taking its place was cognitive load, which Sweller increasingly used to explain his experimental results.

      Why did Sweller make the move from attention to cognitive load? It wasn’t because he had to. Sweller mentions no flaw or contradiction with his earlier theoretical explanations. He even points out that, in many ways, selective attention and limited cognitive load are two sides of the same coin: “Rather than using cognitive processing capacity terms, we could just as easily describe these circumstances in attentional terms.”

      I’ll point you there for the fuller discussion. I often find the attentional frame more useful than the cognitive load frame when I’m thinking through teaching problems. When I talk about cognitive load, I often forget that the kids are thinking about stuff. When I talk about attention, it helps me think about what the kid are thinking about — and how what they’re thinking about may or may not be instructionally valuable.

      But I’m not a hardliner about this. It seems to me that the focus on attention resonates with teachers in a way that cognitive load might not, but I’m not deeply committed one way or the other.

      In the above example what was it that you wanted students to be able to do by the end of the lesson that they couldn’t do at the start, and how did you check that they could do this thing?

      I wanted kids to be able to use the structure of an exponential relationship to find missing values in an exponential relationship. There was instruction that came after the activity in this post, and that instruction gave me a chance to further develop and assess their thinking.

      The image that I showed my students had a lot of information for kids to study. To understand the big picture of an exponential relationship — what it’s like, how it grows, that it’s non-linear, what happens to each coordinate as the other changes — involves studying some paradigmatic examples of the relationship. (At least, that was my thought.) This activity contributed to that goal (I thought) because it drew my kid’s attention to all these details.

      If the kids had all used the exponential relationship to find the missing values in the relationship, I would have been pretty happy. It was only a mixed success, as far as I could tell — some kids still used the coordinate grid instead of looking at the relationship to find the coordinates. (Of course, that’s totally valid and so this question may not be so informative as an assessment.)

      So, yeah: I see no reason why goal-free problems would be incompatible with backwards planning. Often your goal is for students to learn new mathematical structures and relationships. And often when that’s my goal, I use problems such as these with my kids.

      Does that help? Would love to chat more!

      Liked by 1 person

      1. First point.
        Ok, cool, thanks for that. I just went to the essay page and used find function for ‘working memory’, should have tried ‘cognitive load too’ : ) Thanks for your explanation about your preference, it makes sense.

        My personal preference is to talk about cognitive load or working memory. This is for a few interrelated reasons. The first is that for some teachers the words ‘working memory’ or ‘cognitive load’ may be new to them, so by using them I create a knowledge gap where the teacher realises that there’s something they don’t know, and they’ll look harder to try to work it out, or maybe ask about it. Secondly, and relatedly, if I say ‘attention’ we can happily go along talking about ‘attention’ together without realising we may be talking about two completely different things. Thirdly, attention makes me think of multi-tasking (and the impossibilities of it). So it starts to get confusing if we talk about the impossibility of multi-tasking at the same time as having 4 (cowan) or 7 (miller) slots. How can we have 4-7 attentional slots but we can’t multi-task? Obviously they’re different concepts but just easier to get confused in my opinion.

        Interesting to have the historical context, that Sweller started with ‘attention’ then moved to CL and WM terminology. Thanks for that. May have been due to some of the above reasons, may have just wanted to sound more fancy ; ) Both work for sure.

        Second point.

        Your communication of the learning intention is helpful. I guess I was looking for where you wanted the class to end up. I could kind of see but I guess what I was wondering (and what I’m still keen to hear about) is how you brought the lesson together in the end and furnished students with an understanding of how to use the structure of an exponential relationship to find missing values in an exponential relationship. Did you just say ‘ok, what I was keen for someone to spot here was…’ or did you highlight the response of a student with ‘I really like what Nazma did here when she …(highlight what you wanted all to do)’, or did you just leave it for the next lesson (esp. given that you highlight that ‘while some of my students are ready for working on specific problems, many of them aren’t yet there.’).

        Perhaps the year level is relevant to how explicit you are in the lesson’s closing, and how much flexibility you have w.r.t how explicitly you tell them how to solve such a problem following the exploration. I’m teaching Y12, and need to ensure that each lesson students tick off 3 – 5 skills that they’ll need to replicate in the end of year exam. Conceptual understanding is the holy grail, but when I struggle to get that across (despite explorations, activities, visualisations, spacing repetition, interleaving, etc) they still need to have a process to operationalise by the end of the lesson. Would love your thoughts.

        In the time between your blog post and your response to my question I actually wanted to give this approach a go in my own classroom. Just managed to put it in writing now, here’s my take and what I learnt from it.

        Thanks for your detailed response Michael, am enjoying the dialogue.



      2. Hey, I really liked your post! Thanks for sharing it, Oliver.

        Perhaps the year level is relevant to how explicit you are in the lesson’s closing, and how much flexibility you have w.r.t how explicitly you tell them how to solve such a problem following the exploration.

        For what it’s worth, I think that this sort of activity can be a good introduction to an example-problem pair. It could help clarify the problem featured in the example and lighten the load for the example. Obviously whether you need to do this or not depends on the particular situation.


  3. Hi Michael,

    I am one of the 99 out of 100 that you mention. : ) Thanks for sharing this unexpected aspect of Sweller’s work. I generally think of this kind of “noticing” to be more within the preparation for future learning framework (e.g., Bransford, Schwartz), so it’s really interesting to me that Sweller’s earlier work complements it. In the research literature at least, cognitive load is often contrasted against the work on preparation for future learning.

    But I also see the relationship between cognitive load and this kind of noticing: “unproductive struggle” is a perfect way to put it.

    Have you read Heckler’s research on solving physics problems? And Eric Kuo’s recent follow-up? They both explore this phenomenon of asking physics students to solve problems using free-body diagrams. Turns out that asking them to do so makes them less insightful when solving the problem. They get into the “finding the answer” mode and perform all of the standard steps instead of realizing that the problem can be solved much more easily conceptually. Seems quite related to the Sweller work you mention and your classroom approach.

    Ironically, I think this line of research parallels one of the dangers of worked examples: excessive stress placed on procedural steps and insufficient stress placed on noticing patterns.

    Anyhow, I realize this is quite an old post, but have you changed how you employ these “noticing” questions? Any follow-up?





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s