The Limits and Interpretation Pitfalls of Fitts' Law

I'm getting old, and there are things that I don't notice too well anymore. But I'm pretty sure UI elements have been getting bigger. Everywhere.

That makes a lot of sense on touch devices. Fingers are big and bulky, and the widgets that press them need to be big and bulky. But why are UI elements getting bigger everywhere, even on devices that aren't touch-enabled, or for applications that are nowhere near being productively used on touch-enabled devices, like IDEs?

There is a reason that everyone is citing ad nauseam these days: Fitts' law. It's "popular" formulation states that bigger widgets are easier to hit, and the obvious interpretation is that we need to make them bigger. The reason why you can fit about as much code on your 24" Full HD monitor as you could fit on the old Trinitron you had back you saw The Matrix at the cinema is SCIENCE!

I want to argue that:

  1. This is an incomplete and narrow interpretation of Fitts' law, and
  2. That this is an unproductive use of Fitts' law, because 2a) it is routinely applied without any numerical analysis, and 2b) it fails to account for other metrics, and consequently it rarely results in good design trade-offs

This Interpetation of Fitts' Law is Narrow and Incomplete

Fitts' law states (more or less) that the time required to rapidly move to a target area is a function (specifically, a logarithmic function) of the ratio between the distance to the target's center and the width of the target. The width of the target should be understood as the distance that the cursor travels between one margin of the target and the other, not the physical distance between the target's margins. The distinction is, as we'll see immediately, not without importance.

Fitts' model uses an index of difficulty (ID), which roughly characterizes "how much processing" the human motor system would need to do in order to perform a motion, under the assumption that there's a maximum amount of information that it can process. Given a target of width W, whose center can be reached after the actuator (hand, finger, cursor, lever etc.) moves over a distance D, the difficulty index is given by:

$$ID = log_2\frac{2D}{W}$$

Furthermore, the average time required to perform that motion is proportional to the difficulty index. Hence, the lower the difficulty index, the faster can the motion be performed.

This model is beautifully straightforward. As we learned back when we still thought growing up was going to be awesome, minimizing \(ID\) can be done by either making \(D\) as small as possible, or by making \(W\) as big as possible.

Making \(W\) as big as possible by making the widgets as large as possible is the most straightforward solution but it's not the only one. There are at least two other options:

  1. Increasing \(W\) without making the widgets larger. For example, on devices with non-touch pointing devices, that can be done by placing widgets on top of the screen. The cursor cannot travel past the screen's edge, no matter how much the mouse is moved  in that direction, so the width that Fitts' model "sees" is effectively infinite.
  2. Decreasing \(D\) by placing widgets close to the cursor -- using context menus, pie menus etc..

Increasing the size of a widget can make it easier to hit, but that's not the only way to do it. HCI has a long history of attempting to find optimum solutions to these problems, and I think it's a field that still has a lot to offer.

There are decades of research and industrial work to draw upon -- not to mention a great deal of creativity in the UX industry today. There is a wealth of knowledge from which we can derive a far better approach than yeah, double all padding values in your CSS file and if the widgets still look too small, double them again.

Focusing Only on Fitts' Law is Unproductive

But there's something else that's amiss here.

Let's say we've reached the bottom of our bag of UI tricks. We have discovered every possible way of presenting information on screen, and we're not going to find anything new until we build holographic interfaces. Pie menus have been tried in the 1990s, and they didn't stick. There's nothing else beyond the horizon, all we can do is re-shuffle the UIs we have today.

Clearly, Fitts' law says make things big. There's a limit to how close you can bring them together, and not everything fits context menus and pie menus and top menu bars and whatnot.

Are things so clear though?

The limits of Fitts' Model

Let me get an obvious thing out of the way first: the predictions of Fitts' model are valid. It's a model that has stood the test of time. It's so old that, when it was devised, Eisenhower was in the White House and Stalin was in the Kremlin. No one has proven it wrong, and it's been extended to cover arbitrary motion in 2D space with a lot of success.

But, like any mathematical model, Fitts' has its limits. Fitts' model refers to a very specific type of motion. Specifically, it refers to situations where:

  1. Motion is performed by a subject instructed to move at their maximum rate
  2. Motion is equally difficult in all directions (unlike, for example, the motion of a thumb from the hand that you're also using to hold a phone)
  3. Pointing is equally difficult all over the motion space, including the target area. This, too, doesn't always hold for touch interfaces. Areas that you have to touch with the the full width of your thumb, or with the thumb awkwardly bent, are in fact harder to point at precisely than those areas which can be touched with the tip of your thumb.
  4. The subject performs the motion as their only task, not as part of a wider sequence of interactions with a specific objective. In other words, all Fitts' subjects did was move as quickly as possible. Two of his three experiments involved some additional manipulation, but not much of it.
  5. The motion involves limited manipulation -- touching a target area or grabbing objects

None of these things invalidate any of Fitts' conclusions. Even where he didn't explicitly account for them, e.g. in #2 and #3, it's trivial to amend the formula so that it still holds.

But they do hint at a few things that you need to consider when using Fitts' model.

First, the difficulty index refers only to the difficulty of motion, not to the difficulty of the overall task. Reducing the difficulty index of hitting even all widgets in an interface by 50% does not necessarily make that interface "easier to use" by 50%. In fact, as I will argue shortly, it doesn't even necessarily make it easier to use at all!

Second, and especially on devices where motion and pointing difficulty are non-uniform, the \(ID\) associated with a target area of a given size varies with widget placement and motion patterns. An assessment of \(ID\) that doesn't take target placement and motion patterns into consideration should be treated with some reservation. Not because it would be consistently false, but because it may turn out that the \(ID\) is not sufficiently low for the patterns it's subjected to, or that the improvement it brings over previous iterations is negligible when subjected to real-world use.

That's barely an inherent limit of the model: we already know that \(ID\) varies with distance, so of course it varies, depending on where cursor motion starts and on where the widget is placed. There are other known mechanisms at work, too. On touch interfaces, a button of a given width will be harder to hit if reaching it requires you to bend your thumb in an awkward position.

At a higher level of abstraction, it's also important to remember that Fitts' model is behavioural, so any reasoning about underlying causes should be treated with skepticism. As with any behavioural model, we aren't very sure why it's right. We can speculate on the underlying mechanisms, but it's only speculation -- just like any conclusion that's drawn based on speculation about these mechanisms.

Fitts himself was fairly reserved about the underlying mechanisms that shaped his model. His model was developed based largely on intuition and analogy. Fitts' experiments sought as much to explore as they sought to validate his model. 65 years of progress in biology and neuroscience have brought us closer to understanding why Fitts got the numbers that he got, but we're not there yet.

The art of making trade-offs

With this in mind, I think we can tackle the next two points in my argumentation. I think that modern UX practice neglects the limits of Fitts' model, and more importantly, it fails to account for other metrics, and consequently, it fails to make informed design trade-offs.

Models and their limits. Before we use a model to make an engineering decision, we need to verify that it holds. This is true of any model, from any branch of engineering.

For example, there is a rule of thumb in electrical engineering which says that the lumped-element circuit model breaks down if a transmission path or a circuit element is longer than one tenth of your signal's wavelength. But that's just a ballpark figure. In practice, what you usually do if you're anywhere near that magic number is you pick a few relevant quantities in your circuit, run numbers with both models (lumped-circuit and transmission lines), and if they differ too much, you go pour yourself a strong cup of coffee 'cause it's gonna be a long evening at the office.

Unfortunately, we have no similar rule of thumb that says where Fitts' model breaks down.

If you think about it, a lot of real-life usage scenarios don't really tick all the boxes of Fitts' experiments. Rapid, unconstrained, largely repetitive motion performed at a subject's highest capacity is a very infrequent occurrence.

Does that make Fitts' model useless? Definitely not. But I think it does make blanket statements of the form "we need to make widgets bigger because Fitts' law says they're easy to hit" largely useless.

First, it makes them useless because, without at least a qualitative assessment of the degree to which Fitts' model applies, it's not even clear if the prediction is useful.

A complex analysis may not be called for every time. But even a basic qualitative analysis is better than nothing. It makes sense, for example, to think that predictions about increasing the size of the "OK" button of an information dialog are going to be more trustworthy than predictions about increasing the size of widgets in a complex structure. Clicking the OK button you're probably seeing for the fifteenth time today is closer to rapid motion performed at the highest possible rate than clicking a date in a calendar widget.

Second, without assessing if (and why) the metrics of Fitts' model are relevant to a given scenario, these statements aren't too informative, either.

Being easy to hit is an important quality for any widget. But, for example, so is being visible. There's also a law which states green, yellow and blue are the most easily-perceived colours under normal lighting conditions, but we don't make all widgets green, yellow and blue. There are plenty of widgets for which "visible enough" is sufficient. It makes sense that there would also be plenty of widgets for which "easy enough to hit" is sufficient.

Sure, Fitts' law tells us that a larger widget is easier to hit than a smaller one situated at the same distance. But is that a relevant metric for all widgets? Even the ones that aren't too frequently used? Even for menu items or list items? Even for menu items or list items regardless of the size of the menu or list?

And this brings me to my second point: trade-offs.

Fitts' law should be used to make informed trade-offs, not argue for maximizing a single metric. Here are two things to consider:

a) Increasing the size of widgets does not result in an uniform decrease of the difficulty index
b) Decreasing the difficulty index does not necessarily make all tasks easier to perform

At least for one of these things, you don't have to take my word for it! Let's run some numbers!

a) Increasing the size of widgets does not result in an uniform decrease of the difficulty index

Consider the dialog below, drawn to scale. Buttons have a width \(W\) of 24 pixels (width in terms of motion; in our figure, it's the height). The spacing between buttons is 4 pixels:

A little cramped, isn't it?

The user has just clicked button 2, and wants to click button 3.

If the pointer is right in the middle of button two, the distance to the center of button 3 is:

$$D = W/2 + W/2 + padding = 24 + 4 = 28 px$$

and the difficulty index of hitting button 3 is:

$$ID = log_2(2 \cdot 28 / 24) = 1.22$$

Now, in an effort to make the application easier to use, let's say we redraw it like this:

Much better!

The buttons are now 40-pixel high: there is 16-pixel padding (the label has the same height -- 8 pixels), and the spacing between the buttons has been increased to 8 pixels to provide visual breathing room and visual guidance, and to minimize the chance of hitting the wrong target.

The difficulty index in this case is:

$$ID = log_2(2 \cdot 48 / 40) = 1.26$$

The button is twice as big but it's just as hard to hit -- in fact, slightly harder! What gives?

Well, if you look at the numbers, what happened is that increasing padding also increased the distance that the mouse has to travel!

It's certainly possible to change the size of the buttons and spacing between them and get a better \(ID\) figure, but -- depending on the initial position of the cursor and on where the widgets are placed (which influence the distance term) -- not just any increase will do!

This brings me to my earlier point about usage, and specifically about motion patterns. There's a set of conditions under which increasing the width of a widget results in an improvement of the \(ID\). Increasing the widget's size without knowing if these conditions match real-life scenarios is, at best, unproductive.

Any reasoning which involves a mathematical model but no numbers (or at least some analytical estimates!) and no analysis of its initial assumptions should be treated with skepticism. Fitts' law is non-linear, and extending it to 2D space (in the form of a steering law) gives you a model that's even harder to manipulate in strictly intuitive terms. For all but the most trivial interactive structurse (a single window with a single button), Fitts' model -- correctly! -- predicts things that are actually a lot more complicated than "larger widgets are easier to hit".

b) Decreasing the difficulty index does not necessarily make tasks easier to perform

Let's think of a book reader application. It's a very simple application: it has only three widgets. A text view, and two buttons, one which moves to the next page, and one which moves to the previous page. They are frequently-accessed, and the user probably wants to hit them as quickly as possible, so as not to interrupt the flow of their lecture.

Minimizing the \(ID\) of hitting the buttons would lead us to an interface that looks like this:

The world's worst book reader app also has the lowest difficulty index.?

But, as anyone who's ever had to read more than 40 characters on a 2x20 LCD will tell you, reading a book in that format is a terrible idea. Especially if it has any kind of pictures or tables.

Of course, that's an extreme example. No one in their right mind would make a book reader app that looks like that. I deliberately chose an example that's extremely broken, so that I can make the point I want to make without having to work out any numbers (I'll get to why I'm avoiding numbers in this case in a minute).

This design is so hopelessly bad because the benefit of the low \(ID\) for the widgets is outweighed by the increased difficulty in reading -- which is what a book reader application is supposed to be about in the first place. The discrepancy is obvious in this case: you don't need to run any numbers to tell that this is a terrible design.

But evaluating a design that's less obviously wrong is a very complicated ordeal. Some of the usability impact of doubling a button's width is easy to quantify. But we don't have any similarly useful and straightforward model to help us estimate the usability impact of increased scrolling.

There is some literature which suggests that scrolling has a negative impact on comprehension, such as this paper by Sanchez and Wiley. However, most research has been focused on the reading and comprehension of long, complex pieces of text -- useful if the widget is a text view for a book reading application, but not immediately applicable to things like scrolling through a list view of processes. Furthermore, available evidence is quite conflicting. Some research suggests that scrolling does not significantly affect comprehension. The only result on which there appears to be a wide consensus is that additional scrolling makes reading slower. In any case, as far as I know, there is no model comparable to Fitts' which allows us to quantify any of these effects. That's why I wanted to avoid running numbers earlier: to my knowledge, there's no model that would allow me to do it.

Increased scrolling is just one of the things we don't have good metrics for -- not only in text views, but also in list and tree views. There is plenty of research which suggests that the difficulty of picking the right item increases with the size of the list, but we don't know how much of it is due to the increased number of choices and how much of it is due to the amount of scrolling required to get to it. Common sense would tell us that most of the effect is due to the number of choices, but is that proportion constant? Is there a threshold where increased scrolling starts to really matter? There's evidence that menu search is both random and systematic, which suggests that how much of a list is shown at a time can affect the amount of time it takes to find an item. Plus, when it comes to scientific matters, common sense has an absolutely disastrous track record.

We're also not very sure how to quantify the impact of increased eye motion, which occurs when displaying very large items on large screens. To make matters worse, what we do have is literature that suggests it's going to be pretty hard to devise a good experiment to measure that, since eye motion also seems to be correlated with mental effort.

We also lack mathematical models that would help us put these things in context. For example, how do the results of Fitts' law apply to motion performed by a subject whose visual focus is on a narrow area, and/or who is doing significant mental effort on a more complex task -- such as an artist using a 3D modelling program? Do distance and width matter in the same proportion, or does the difficulty index skyrocket after a certain distance? Does the average time required to perform a task still depend linearly on the \(ID\), or is there a point after which further decrease of the \(ID\) has a negligible effect, because the user will simply never move too quickly? Or, au contraire, is the \(ID\) even more relevant, because the user is focused on something else, and they find it even more difficult to hit a small target?

All of us have answers for all these questions, based on our own experience, but not mathematical models. As such, our answers are only opinions. There's nothing wrong with having an opinion, but we shouldn't mistake it for an analysis.

All these things make it difficult to make an informed trade-off. When we have a single useful metric, it's tempting to optimize it at the expense of things which we're unable to quantify and, therefore, quickly disappear from any estimates that involve numbers.

Does that mean we should just shrug and maybe make our buttons just a little smaller -- I mean, larger than the claustrophobic buttons of the Amiga, but smaller than it's fashionable today? Shall we just average the padding values of all major platforms and hope we strike an optimum value?

Well, not really. Operating in the absence of models (or in the absence of models that are easy to manipulate) is nothing new in the realm of engineering. Twenty years ago, accurately modelling the interconnects of high-frequency integrated circuits was a very complicated (if not untractable) problem, but we've been building high-frequency integrated circuits for about twice longer than that.

How did people manage? Through a combination of:

  • Tortuous prototyping and measurement cycles, where you'd basically take your best guess then measure to see if you were right
  • Cautiously applying qualitative and quantitative reasoning together
  • Extrapolating from simpler models

Either way, we manage -- but only as long as we remember that every change we make will be a trade-off across various metrics. A net improvement of a single metric, with no detrimental effect to other metrics, is a very rare occurrence in engineering.