Wednesday, November 17, 2010

Turing completeness is a trap

Or why you can do everything in Excel but it may not be a good idea.

My wife's struggle with a spreadsheet from work yesterday got me thinking about this topic again. It has perplexed me for a long time that people use Excel for so much that other tools are clearly more suited for. There are many rants on the Internet about using Excel for database-like activities, and I will probably write a bit more on that later. However, I think the key phrase in the overuse of any powerful tool is "but I can do that in my tool, too".

What I mean is that one could approach someone who is an expert Excel user and say "I don't like Excel for engineering calculations because it doesn't allow me to track the units of the numbers like for instance Frink or Mathcad does". This guy would say "but you can do that in Excel - just do X or Y". Somewhere they are thinking "I will use another tool when I find something that Excel can't do". However, they are stuck in a Turing Tarpit. The problem is that all Turing complete languages are equivalent in power in this strange and abstract sense that it is possible to do the same calculation in both. This does not say anything about how easy it will be to do that calculation, or how maintainable the code will be - these are requirements that have very little to do with the computational power.

So, you if you are taking the view that you will use Excel as long as it is possible to do the thing you want to do, you literally don't need any other tool. Unfortunately the same can be said about any esoteric programming language like INTERCAL or x86 assembly. It is perfectly valid to say "but I can do that calculation in INTERCAL - I don't need any other languages". In a strict sense this is true, but this really just points out how little Turing completeness actually buys you in terms of useful programming structures.

At this point it seems like a good idea to mention the other thing that keeps people in Excel. I think the marketers of spreadsheets have been using the "no programming required" line for so long that everyone thinks that spreadsheets are different from programming. In fact, they can be understood as functional programming languages with a two-dimensional (or three-dimensional if you take sheets into account) structure. Of course, there's also VBA if you aren't sufficiently resourceful to figure out how to do everything using only the built-in functions and cells, but the spreadsheet itself is also a computational device. So a lot of people think that "using Excel" is different from "programming", when it differs mainly in the environment.

Bottom line: if you are a single-language kind of person (who only uses Excel or only uses C) you need to understand that it is true that you will never find a program that your favourite tool won't be able to solve in a strict sense, but that it is also true that other tools may make it a lot easier.


  1. iWork numbers, with its multiple named tables on the same "sheet" makes it almost useable in database like scenarious....

    I must say I use a great variety of tools, but not so many "languages", other than BASH, PHP and applescript.

    Then again, I don't solve HARD engineering problems, and I do tend to use excel/Numbers as a crude swiss army knife for model building.

    Luckily very little of what I do involves searching complex solution spaces or optimisation or simultaneous linear equations or multiple dynamic equations.... I do more projections, and financial modeling.

    Most SIMPLE statistical type modeling can be done in excel, and I know there are monte-carlo simulators available for excel.

    Also Pivot Tables in excel are pretty useful for DB like / OLAP style multi-dimensional database analyses - although I'm no expert.

  2. The spreadsheet metaphor is good if you have largely unstructured tasks. It helps /you/ to do the analysis, while handling the minor bookkeeping. Spreadsheets are great for that, make no mistake. It is also true that Pivot tables and add-ons like Solver for Excel are very powerful. In fact, I believe Excel is Turing Complete even without VBA. So I don't think spreadsheets are evil or should be avoided.

    That said, /relational/ queries (with joins etc) are brittle to set up in Excel, formulas don't typically adjust well when you add data to the bottom of tables, the dimensionality of stuff is usually fixed - these things are handled much better by full-on relational databases. In addition, the whole structure of spreadsheets make it difficult to re-use solutions in ways that don't involve effectively dupicating the codebase: I cant "inherit" from a standard spreadsheet class and just add my own formulas in a way that will incorporate changes to the parent class when it changes later, I can't build up a nice library of spreadsheet patterns that can be shared with other people and centrally maintained.

    So, yes, they're powerful for quick calculations, and because of their power you'll never "need" another tool. That's the whole problem, as it often means people keep using excel for everything when there are tools that suit their particular problem better.

  3. It's also worth mentioning that excel is technically not touring complete when you consider the sheet size limitations. And even before you get there excel can start getting glitchy. I once attempted to perform statistical analysis on a very large data set using lots of sumifs() functions and other complicated formulas. I started noticing random anomalies in the data, and realized that I really couldn't trust the results. Perhaps I'll try it again in R or MATLAB.

    To further underline this point, here is an article where they built a literal touring machine in excel, and it makes it pretty clear how touring completeness doesn't get you very far if you have finite time (it's also pretty cool):