Software Development

Sometimes I miss Haskell’s immutability

I have ‘fond’ memories of tracking down a particularly PITA bug in some Python code.  It was down to whether using = makes two references to the same underlying object, or whether it makes a copy of the object in question. And equally fond memories of debugging a ruby function that changed a hash unexpectedly.

This sort of thing:

>>> arr = [1,2,3]
>>> arr2 = arr
>>> arr
[1,2,3]
>>> arr2
[1,2,3]
>>> arr == arr2
True # Values are equivalent, objects are also the same
>>> id(arr)
4469618560 # the id for both arr and arr2 is the same: they are the same object
>>> id(arr2)
4469618560
>>> arr2.append(4)
>>> arr
[1,2,3,4]
>>> arr2
[1,2,3,4]

compared to

>>> arr = [1,2,3]
>>> arr2 = [1,2,3] + [] # arr2 is now a different object
>>> arr
[1,2,3]
>>> arr2
[1,2,3]
>>> arr == arr2
True # Values are the same even though the object is different
>>> id(arr)
4469618560
>>> id(arr2)
4469669184 # value is different
>>> arr2.append(4)
>>> arr
[1,2,3]
>>> arr2
[1,2,3,4]

Or, in a case that you’re more likely to see in the wild

>>> arr = [1,2,3]
>>> d1 = {"foo":arr, "bar": "baz"}
>>> d2 = d1
>>> d3 = dict(d1) # Creates new dict, with its own id
>>> id(d1)
140047012964608
>>> id(d2)
140047012964608
>>> id(d3)
140047012965120
>>> arr.append(4)
>>> d1["bar"] += "x"
>>> d1
{'foo': [1, 2, 3, 4], 'bar': 'bazx'}
>>> d2
{'foo': [1, 2, 3, 4], 'bar': 'bazx'}
>>> d3
{'foo': [1, 2, 3, 4], 'bar': 'baz'}

d1 and d2 are the same dict so when you change the value of bar, both have the same result, but d3 still has the old value. But, even though the dicts are different, the arr in them is the same one. So anything that mutates that list will change it in all the dicts

This sort of behaviour has its logic but it is a logic you have to learn, and a logic you often have to learn the painful way. I particularly ‘enjoy’ cases where you’ve gone to great lengths to make sure the object is a copy and then find that something inside that list gets mutated.

It doesn’t have to be like this.

In what is probably the most beautiful language I’ve ever used (though sadly only in personal projects, not for work), Haskell, everything is immutable.

If you’ve never tried Haskell this might sound incomprehensible, but it’s just a different approach to programming. Two good links about this:

Immutability is Awesome

What does immutable variable in Haskell mean

This language feature eliminates a whole class of bugs related to changing an object unexpectedly. So I’d encourage any software developers to try to get their heads around a language like Haskell. It opened my eyes to a whole different approach to writing code (e.g. focussing on individual functions rather than starting with a db schema and working out from there).

Software Development

Learning Haskell gave me a whole new outlook on programming

A while back I decided to learn Haskell as a counterpoint to Ruby/Java etc that I was more familiar with previously. I am very grateful for the new perspective that learning Haskell has given me. In particular:

  • I always used to start with database tables and a user interface. Haskell forces you to think about the functions and how they work on data structures. Not having to think about data storage is strangely liberating as it is much easier to change your data structures if there is no UI or database to worry about.
  • It might need a lot more thinking to write a piece of code, but by the time you’ve finished it invariably feels very elegant.
  • Writing pure code and then wrapping IO around it later really forces you to write code that is simpler and more testable.
  • I have an appreciation of how intimidating jargon can confuse and put people off, unnecessarily. For example, Haskell purists might want you to grok Category Theory even if it’s not that relevant to writing Haskell apps.

The learning curve has been ridiculous. Have a look at Tymon Tobolski’s comparison of using Ruby vs Haskell in a small Redis application. The Haskell version is terse to the point of being incomprehensible at the outset. As a Haskell learner I couldn’t find something that packed a similar “wow” to “A blog in 15 minutes with Ruby on Rails”

Here’s a small Scotty/Warp Haskell app I put together recently. (Scotty is Haskell’s answer to Sinatra for you Rubyists).

Software Development

I Wrote a Software: My Tech Stack

I’ve been writing recently about a side-project of mine that I’ve been doing to scratch a couple of itches I have: Using Excel as a Collaboration Tool in an Enterprise Setting and Seeing what all the fuss is about Haskell.

I started off, reasonably enough, by trying to write a web app. But getting started with anything in Haskell is hard enough: getting started with a website was a nightmare [1]. So I made a virtue of necessity. I figured if I’m going to make it easier for people to collaborate with Excel then what is the most natural way for people to collaborate with Excel in real life? The answer is simple: by sending email attachments.

So I forgot about any web interface (for now) and just concentrated on working with Excel Attachments on Emails. It meant I could just focus on learning Haskell and also forced me to work in a way that is going to be more natural for anyone who might use the service. And also it turns out that I’m bang on trend with this whole “No UI” movement [2].

In this post I’m going to get into what I’m using on this project and why:

  1. Ubuntu. I run this on the server simply because I’m used to using Ubuntu on the desktop from the past. I’ve run Ubuntu in production for a long time and it’s fine for me. (I’ve also run CentOS and Windows and going back far enough HP-UX and Solaris, but Ubuntu just seems… nice).
  2. Linode. Not all that much thought on this one. I’ve used AWS in the past and find the interface unbearably complex. I’ve used Rackspace Cloud with a lot of success. But I’ve also been curious as to how Linode and Digital Ocean stack up. I asked Twitter and I got an answer from someone in the Linode community. And so here I am.
  3. Postfix/Dovecot. I never imagined running my own mail server. I originally wanted to run all the email through Gmail. But I kept getting blocked for having a script running against the Gmail servers. In the end I figured I may as well run my own. Remains to be seen how well that works out. And (perhaps coincidentally) the best guide I could find to setting up Postfix/Dovecot was from Linode.
  4. MongoDB. I know that Mongo is no longer the cool kid on the block. But given that my use case for data storage is pretty straightforward, but I expect to need to store large Excel files, Mongo seemed neat. I’m using Mongo as a queueing system as much as a persistent data store.
  5. Haskell. For the heavy lifting. Learning it has been a long, hard slog. I must have read Learn You a Haskell 15 times (I even bought a copy). And also Real World Haskell several times. But you only really learn by doing and for me the best doing came from trying to follow Write Yourself a Scheme and then by trying to figure out how the HaExcel guys did it. Hat tips must also go to HaskellLive for getting an environment set up (though this now seems to be superseded by Stephen Diehl’s excellent write-up).
  6. Plain vanilla Ruby. Because with the wealth of gems in the community you can do a lot very easily.

[1] My brother who also works in tech bought me a copy of Building Web Applications with Haskell and Yesod. He thought that Haskell and Yesod were two individuals who were doing the teaching. This is in stark contrast to Ruby which I learnt from Alan Bradburne’s excellent Practical Rails Social Networking Sites.

[2] A cynic may say that mailing lists have been doing this for donkeys years. To which my reply is that the surely this validates the use of email for sending instructions around.

Software Development

I wrote a software

It was born out of a couple of itches I needed to scratch:

  1. I wanted to learn Haskell [1]
  2. I wanted to make collaboration with Excel less painful

Excel is heavily used in organisations. It’s clunky, prone to users making errors and next to impossible to incorporate someone else’s changes. Especially once you start emailing Excel attachments to your co-workers. Yet people can’t give up on Excel. Apparently because it is a convenient way to lay out a tabular structure:

He typed the current date in the top of the spreadsheet, printed a copy, put it in a three-ring binder, and that was pretty much his whole, entire job. It was kind of sad. He took two lunch breaks a day. I would too, if that was my whole job.

Over the next two weeks we visited dozens of Excel customers, and did not see anyone using Excel to actually perform what you would call “calculations.” Almost all of them were using Excel because it was a convenient way to create a table.

Felienne Hermans of Delft University of Technology has some great research on the subject. In A modern day Pompeii: Spreadsheets at Enron she talks about digging around subpoenaed Enron emails to find that:

Over the 15 months that the email set spans, we counted 100 emails per day (!) involving spreadsheets. Some emails occurred double in the set, as both the sender and receiver were in the mailboxes acquired, so it would be more fair to say there were 100 spreadsheet email – interactions a day. But still! Talking about errors in the spreadsheets was also pretty common, 6% of all spreadsheet related emails contained word such as error or fault

We just can’t wean ourselves off Excel, despite all its shortcomings.

People have tried to solve this problem before. One of the key principles of the Enterprise Software industry is that it will replace your crappy, inefficient Excel-based processes with accurate, real-time information. Yet each new enterprise system is sufficiently inflexible in the real world that it spawns a whole set of Excel spreadsheets at either the front-end (forms or tables to collect data to feed into the machine) or at the back-end (download data to Excel to run your pivot tables for analysis).

If replacing Excel with a web-accessible database system isn’t going to do the trick, what if there were an easier way to address some of the issues people have with using Excel?

[1] Consider it “personal development”. Some people go on training courses. Some people try to pick up new programming languages and techniques. Functional programming has been completely eye-opening for me. Also it’s been great to remind myself of what developers have to go through on a daily basis even without pressure being exerted from an uncomprehending management layer.