Spreadsheets in the Academic World
It’s been a pleasant surprise for me to see how much interest the world of spreadsheets arises within the academic community. Since I finished my degree, back in 1993, and I began my professional life, I have worked with Lotus 1-2-3, Quattro Pro, and then Excel, in all its different versions. During all this time my focus in using spreadsheets has always been quite, let’s say practical and particularly in the field of civil engineering, my sole profession and interest for 17 years.
When I jumped off the corporate boat a few years ago, I began to take my interest in Excel towards some more theoretical fields, and mathematical curiosities: probability, cryptography, algorithms; and I also began to disregard the usefulness of my developments in the short term. At the Excel forums in Spanish, which since then have been my main field of front-line spreadsheet related action, almost everything revolves around the problems that Excel users encounter when they try to get through their everyday needs at the office (and in most cases around the painful and inappropriate use of Excel as a small data base management system). But at SEMS I’ve found that spreadsheets are also a matter of concern in the academic world, and that quite a few people are doing their research and PhD around this subject and even proudly referring to it as a subset of the general software engineering area.
Having Time on Your Side
Clearly, for those of us who, by the force of habit, have become used to taking straightforward approaches for solving spreadsheet problems, and to hearing the voice of our boss saying: “It has to be on my desk at noon, no excuses!”, the academic approach might appear a little relaxed and even, as some other SEMS-EUSPRIG attendant has told me in private, lacking practical aim. But this is only a slant viewpoint and I find it OK that universities keep on being a kind of ivory tower where new perspectives are nurtured to tackle old problems and prevent new ones, without the pressures of corporate time, always damned and precarious.
Error Prone Spreadsheets
It’s no matter of wonder that almost in all cases, the main concern of the researcher is the avoidance of errors to which, lamentably, spreadsheets, perhaps because they allow so much freedom to the user, seem to be so prone. Now, I will not go over on my own opinions about why I think that Excel or spreadsheets in general, are so full of errors, but I will try to summarize the contributions of the speakers at SEMS-EUSPRIG.
Ray Panko spoke brilliantly and eloquently about the problems related to human errors, the actual driving force behind spreadsheet errors. Human errors are, on average, the same across fields and, as Ray himself pointed out, there are no silver bullets in the field of software engineering, where between 30 and 50% of budgets are regularly spent in testing. Until the same thing happens in spreadsheets, errors will continue to show up at the present rate. And what rate is that, you may be asking? Well, my friend, you’d better hold steady, because in general, 91% of spreadsheets contain errors. That’s why it’s so worrying that, as Bas Jansen, who is making his PhD at Delft University, said in his talk, important decisions keep on being made at Boards of Directors on the strength of some information driven from untested spreadsheets.
One great source of errors in Excel is the combination of absolute freedom to link cells to each other (including links between different sheets and workbooks) and Excel’s weak capabilities for tracing and debugging dependencies. Sohon Roy, another PhD student at the TU of Delft, spoke about how important the cell dependency problem is (around 20% of all spreadsheet errors are related to cell dependencies), and explained how little improvement has been made in this respect during the previous years, in spite of there being a lot of potential.
Lack of Samples from the Real World
This is another recurrent complaint that I have noticed in the academic world, at least in the sample of the academic world that I have met in Delft (Sorry for the play on words). Companies are, logically, reluctant to share their spreadsheets because they are afraid that some compromising data could be revealed. The spreadsheets built in university departments to supplement this lack only represent artificially created problems and do not always relate to the real world. Real samples from different industries are required to perform extended and statistically significant analysis.
That’s the focus of Anonymous XL, a plug-in developed by Felienne Hermans, a professor at TU Delft and also the person in charge of the excellent organization at SEMS-EUSPRIG 2014, and her team, that erases metadata and scrambles or changes data inside the worksheet, apparently without spoiling the functionality and the significance of the model by means of leaving some “magical” numbers untouched: 0,1,2,6,12 and some other interesting strategies.
Auditing and Controlling Spreadsheets
This is another big concern of the academic community; how to effectively implement quality control in the models that are built with spreadsheets. Birgit Hofer, from the Graz University spoke about this, and introduced a concept new to my ears: mutant spreadsheets. Mutation in this case means making changes to formulas and see how the model works and whether it’s still consistent within certain parameters that the researcher can establish beforehand.
For people like me, outside of the academic world, it can be sometimes tricky to understand what a researcher or scientist understands as a spreadsheet model. I must confess I didn’t understand much of what Richard Paige said about how they are using some very specialized languages, like “epsilon”, wherewith they are being able to query spreadsheets and use them in engineering models with reliability. He said that they have built plugins for Excel and GoogleXL. Jacome Cunha, from Minho University (Braga, Portugal) also spoke about how important it is to build models whose structural growth can be automated in order to avoid manipulation errors. Jacome has built some pretty interesting utilities and add-ons using Java, that can help various tasks in spreadsheets, from pointing out to what Jacome calls bad-smelling cells (by colouring them) to query and build consistent models in Excel, OpenOffice Calc and GoogleXL.
Daniel Kulesz, from Stuttgart University, spoke about another Add-in that he and his team have developed. Now, Daniel has focused mainly in testing models that are already built, establishing some range of expected results for output cells and having the spreadsheet warn the user by colouring the output cell and pointing out to suspicious cells. This is founded in the old idea that I was educated in when I finished my degree in structural calculus, that the designer always has to have command of his field of study to a level that allows them to have a rough idea of the result by doing manual calculations, and the computer will only confirm or deny that fact, but never be used as a blindly trusted source of design data.
Some of the EUSPRIG talks were dedicated to specialized models, two of them in the field of financing construction projects (the people from Invantive and also Haydn Palliser, from Corality). Henk Vlootman spoke mainly about how to take advantage of the correct use of ranges in Excel and there were the talks of the alternative track that I missed, among them Scenario Tools, by Tony De Jonker (my friendly “rival”) and two people from Microsoft spoke in the evening about new features that keep on being added to Excel 2013, with an interval of 15 days, and particularly for the on-line version. The desktop version will keep its three-year cycle, so we should still wait a couple of years for Excel 2016. Let’s hope that some complaints are going to be listened, such as the one somebody made about the POWER utilities and the INQUIRE add-in only being available for the professional versions.
There was a very interesting session called DEMOS, where the audience split into different groups and some people presented their add-ins and utilities, mainly related to auditing Excel files, colouring cells according to their contents and function, mapping structures and linked cells, finding calculation bottlenecks, i.e. FastExcel V3, XLTest and many others.
In sum, SEMS-EUSPRIG 2014 has been a wonderful experience for me and has brought the realization of the potential of spreadsheets to become fully fledged members of the academic curriculum: objects of study worthy of the attention of researchers and subjects of PhD thesis. As for spreadsheets in the real world of industries and corporations, and for Excel in particular, I think Felienne’s remark is true, that they are going to be around for many years, as long as all pieces of specialized software, whose main purpose usually is, curiously enough, to avoid errors by constraining the freedom that Excel allows, keeps being endowed with a “Export to Excel” option. It’s there, in Excel, where the final report that reaches the board of directors is going to be prepared. It’s on the strength of that Excel report that the decision will be made.
Besides I’ve had the opportunity to greet people that I had met at the London 2006 Excel User Conference (Charles Williams and Patrick O’Beirne) and I’ve met a lot of nice and interesting people from many parts of the world. Of course, I’ve also enjoyed the Dutch legendary kindness and politeness, for whose truth, now, I can fully attest, and in the case of Delft I’ve also enjoyed its peaceful and tranquil lifestyle founded in the extensive use of bicycles, a thing so dear to me and so alien to us, here in hilly, rigorous, hectic Madrid. My congratulations and thanks to Felienne and the rest of the organizers.
And no. I haven’t forgotten to speak about the man who made music with spreadsheets, Thomas Levine. Only he was somewhat mysterious and cleverly avoided a well-planned cross examination (the well-known, indispensable question at these meetings: What do you do?) that I deployed at lunch time. He only revealed that he was an American, lived in the New York area (I’m not sure about this) and at some point in time did some work for the World Bank. But however little information about himself he was willing to unveil, he couldn’t hide his excellent and kind nature and, above all, he gave us a different vision on data, a vision that I liked, a new perspective on tables that besides the everyday, usual, ordinary visual interpretation, forecasts the incorporation of our five senses to the interpretation of big sets of data: “musicalization”, “gastronomification”, “sculpturization” and “smellification” of data: a whole world of emotions and sensations!
By the way, I was also a speaker at EUSPRIG. Have you seen the reproduction of my talk? No? Then why not taking a look now! Subtitles are ready both in English and Spanish.
Exploring The Limits of Excel
Un viaje de exploración a las fronteras de la hoja de cálculo más usada: Microsof Excel.
Un análisis de errores potenciales y límites de tipo estructural, operacional y de programación.
Redactado en inglés sencillo y accesible.