Adventures in upgrading a Python text summarisation library

A reflection on what it took to upgrade a simple Python lib to support Python 3. The lib in question is PyTeaser and the final result is at PyTeaserPython3.

TL;DR

The moral of the story is:

  1. Don’t try to upgrade something unless you really need to. It rarely goes well. Even a simple library like this one can throw up all kinds of challenges.
  2. Automated tests really are crucial to allow work like future upgrades. In this case I had some tests that seemed to work but that didn’t give me the full picture.
  3. Ultimately your program is most probably about turning one set of data into another set of other data. Make sure your tests cover those use cases. In this case I was lucky: there was a demo script in the project directory that I could use to manually compare results between the Python 2 and Python 3 version.
  4. Even if all goes perfectly well you can end up with surprising results when the behaviour of an underlying library changes in subtle ways. So it can be worth having tests that check expected behaviour happens even when you are using standard, out of the box features.

Step By Step

Run the tests:

alan@dg04:~/PyTeaserPython3$ python -m tests
Traceback (most recent call last):
 File "/home/alan/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
 "__main__", mod_spec)
 File "/home/alan/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
 exec(code, run_globals)
 File "/home/alan/PyTeaserPython3/tests.py", line 2, in 
 from pyteaser import Summarize, SummarizeUrl
 File "/home/alan/PyTeaserPython3/pyteaser.py", line 72
 print 'IOError'
 ^
SyntaxError: Missing parentheses in call to 'print'

That didn’t go too well. Print is a function in Python3.

There’s a utility called 2to3 that will automatically update the code.

alan@dg04:~/PyTeaserPython3$ 2to3 -wn .
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
...
RefactoringTool: ./goose/utils/encoding.py
RefactoringTool: ./goose/videos/extractors.py
RefactoringTool: ./goose/videos/videos.py

alan@dg04:~/PyTeaserPython3$ git diff --stat
 demo.py | 6 +++---
 goose/__init__.py | 2 +-
 goose/article.py | 18 +++++++++---------
 goose/extractors.py | 8 ++++----
 goose/images/extractors.py | 6 +++---
 goose/images/image.py | 4 ++--
 goose/images/utils.py | 6 +++---
 goose/network.py | 16 ++++++++--------
 goose/outputformatters.py | 2 +-
 goose/parsers.py | 6 +++---
 goose/text.py | 4 ++--
 goose/utils/__init__.py | 10 +++++-----
 goose/utils/encoding.py | 28 ++++++++++++++--------------
 pyteaser.py | 16 ++++++++--------
 tests.py | 12 ++++++------
 15 files changed, 72 insertions(+), 72 deletions(-)

It’s obviously done some work – how does this affect the tests?

alan@dg04:~/PyTeaserPython3$ python -m tests
.E
======================================================================
ERROR: testURLs (__main__.TestSummarize)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/alan/PyTeaserPython3/tests.py", line 20, in testURLs
 summaries = SummarizeUrl(url)
 File "/home/alan/PyTeaserPython3/pyteaser.py", line 70, in SummarizeUrl
 article = grab_link(url)
...
 File "/home/alan/PyTeaserPython3/goose/text.py", line 88, in 
 class StopWords(object):
 File "/home/alan/PyTeaserPython3/goose/text.py", line 90, in StopWords
 PUNCTUATION = re.compile("[^\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Nd}\\p{Pc}\\s]")
 File "/home/alan/anaconda3/lib/python3.6/re.py", line 233, in compile
 return _compile(pattern, flags)
 File "/home/alan/anaconda3/lib/python3.6/re.py", line 301, in _compile
 p = sre_compile.compile(pattern, flags)
 ...
 File "/home/alan/anaconda3/lib/python3.6/sre_parse.py", line 526, in _parse
 code1 = _class_escape(source, this)
 File "/home/alan/anaconda3/lib/python3.6/sre_parse.py", line 336, in _class_escape
 raise source.error('bad escape %s' % escape, len(escape))
sre_constants.error: bad escape \p at position 2

----------------------------------------------------------------------
Ran 2 tests in 0.054s

FAILED (errors=1)

Some progress. One of the two tests passed.

Root of the error in the failing test is this line: PUNCTUATION = re.compile("[^\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Nd}\\p{Pc}\\s]"). Looks like it isn’t used anywhere:

alan@dg04:~/PyTeaserPython3$ grep -nrI PUNCTUATION
goose/text.py:90: PUNCTUATION = re.compile("[^\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Nd}\\p{Pc}\\s]")
alan@dg04:~/PyTeaserPython3$

Comment out that line and try again

alan@dg04:~/PyTeaserPython3$ python -m tests
.E
======================================================================
ERROR: testURLs (__main__.TestSummarize)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/alan/PyTeaserPython3/tests.py", line 20, in testURLs
summaries = SummarizeUrl(url)
...
File "/home/alan/PyTeaserPython3/goose/text.py", line 91, in StopWords
TRANS_TABLE = string.maketrans('', '')
AttributeError: module 'string' has no attribute 'maketrans'

----------------------------------------------------------------------
Ran 2 tests in 0.061s

FAILED (errors=1)
alan@dg04:~/PyTeaserPython3$

I admit it was a bit optimistic to think that commenting out one line would do the trick. Now the problem arises when TRANS_TABLE is defined, and this is used elsewhere in the code.

alan@dg04:~/PyTeaserPython3$ grep -nrI TRANS_TABLE
goose/text.py:91: TRANS_TABLE = string.maketrans('', '')
goose/text.py:107: return content.translate(self.TRANS_TABLE, string.punctuation).decode('utf-8')
alan@dg04:~/PyTeaserPython3$

Fortunately someone put a useful comment into this method so I can google StackOverflow and find out how to do the same thing in Python3.

def remove_punctuation(self, content):
    # code taken form
    # http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python
    if isinstance(content, str):
        content = content.encode('utf-8')
    return content.translate(self.TRANS_TABLE, string.punctuation).decode('utf-8')

Sure enough there is an equivalent question and answer on StackOverflow so I can edit the method accordingly:

def remove_punctuation(self, content):
    # code taken form
    # https://stackoverflow.com/questions/34293875/how-to-remove-punctuation-marks-from-a-string-in-python-3-x-using-translate
    translator = str.maketrans('','',string.punctuation)
    return content.translate(translator)

And now I can remove the reference to TRANS_TABLE from line 91 and run the tests again.

alan@dg04:~/PyTeaserPython3$ python -m tests
.E
======================================================================
ERROR: testURLs (__main__.TestSummarize)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/alan/PyTeaserPython3/tests.py", line 20, in testURLs
summaries = SummarizeUrl(url)
...
return URLHelper.get_parsing_candidate(crawl_candidate.url)
File "/home/alan/PyTeaserPython3/goose/utils/__init__.py", line 104, in get_parsing_candidate
link_hash = '%s.%s' % (hashlib.md5(final_url).hexdigest(), time.time())
TypeError: Unicode-objects must be encoded before hashing

----------------------------------------------------------------------
Ran 2 tests in 0.273s

FAILED (errors=1)
alan@dg04:~/PyTeaserPython3$

A bit of digging to fix this and the tests now pass.

alan@dg04:~/PyTeaserPython3$ python -m tests
..
----------------------------------------------------------------------
Ran 2 tests in 0.273s

OK
alan@dg04:~/PyTeaserPython3$

Let’s see how the demo works.

alan@dg04:~/PyTeaserPython3$ python demo.py
None
None
None
alan@dg04:~

Hmm, that doesn’t seem right. Ah, I’m not connected to the internet, duh.

Connect and try again.

alan@dg04:~/PyTeaserPython3$ python demo.py
None
None
None
alan@dg04:~

Compare with the results from the python2 version.

[u"Twitter's move is the latest response from U.S. Internet firms following disclosures by former spy agency contractor Edward Snowden about widespread, classified U.S. government surveillance programs.",
u'"Since then, it has become clearer and clearer how important that step was to protecting our users\' privacy."',
u'"A year and a half ago, Twitter was first served completely over HTTPS," the company said in a blog posting.',
...

Something isn’t working. I track it down to this code in the definition of get_html in goose/network.py.

    try:
        result = urllib.request.urlopen(request).read()
    except:
        return None

In Python3 the encoding of the URL causes this error: urllib.error.URLError: . The try fails and so None is returned. I fix the encoding but there is now a ValueError being thrown. From grab_link in pyteaser.py.

    try:
        article = Goose().extract(url=inurl)
        return article
    except ValueError:
        print('Goose failed to extract article from url')
        return None

A bit more digging – this is due to the fact that the string ‘10.0’ can’t be converted to an int. I edit the code to use a float instead of an int in this case.

Seems to be working now.

alan@dg04:~/PyTeaserPython3$ python demo.py
["Twitter's move is the latest response from U.S. Internet firms following "
'disclosures by former spy agency contractor Edward Snowden about widespread, '
'classified U.S. government surveillance programs.',
'"Since then, it has become clearer and clearer how important that step was '
'to protecting our users\' privacy."',

Let’s just double-check the tests.

alan@dg04:~/PyTeaserPython3$ python -m tests
./home/alan/PyTeaserPython3/goose/outputformatters.py:65: DeprecationWarning: The unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.
txt = HTMLParser().unescape(txt)
.
----------------------------------------------------------------------
Ran 2 tests in 3.282s

OK
alan@dg04:~/PyTeaserPython3$

The previous passing tests didn’t give me the full story. Let’s fix the deprecation warning in goose/outputformatters.py and re-run the tests.

alan@dg04:~/PyTeaserPython3$ python -m tests
..
----------------------------------------------------------------------
Ran 2 tests in 3.653s

OK

Better.

Finally, I want to double check the outputs of demo.py just to make sure I am getting the same output. It turns out that the summary for the second URL in the demo was producing a different result between Python 2 and Python 3. See the keywords function in pyteaser.py (beginning line 177). The culprit is line 184 where Counter has items in a different order between the different versions. Seems the ordering logic has changed subtly between the two versions.

In Python 3 version:

Counter({'montevrain': 6, 'animal': 5, 'tiger': 3, 'town': 3, 'officials': 2, 'dog': 2, 'big': 2, 'cat': 2, 'outside': 2, 'woman': 2, 'supermarket': 2, 'car': 2, 'park': 2, 'search': 2, 'called': 2, 'local': 2, 'schools': 2, 'kept': 2, 'parisien': 2, 'mayors': 2, 'office': 2

In Python2 version

Counter({'montevrain': 6, 'animal': 5, 'tiger': 3, 'town': 3, 'office': 2, 'local': 2, 'mayors': 2, 'woman': 2, 'big': 2, 'schools': 2, 'officials': 2, 'outside': 2, 'supermarket': 2, 'search': 2, 'parisien': 2, 'park': 2, 'car': 2, 'cat': 2, 'called': 2, 'dog': 2, 'kept': 2

So when line 187 picks the 10 most common keywords the Python 2 and Python 3 version end up with a different list. A subtle change in the logic of Counter made quite a difference to the end result of running PyTeaser.

Advertisements

A CTO perspective on developers

I wrote a series of posts called “Impress Your CTO”.

The TL;DR running through all of those is:

  1. Learn your tooling. We have all made the error of selecting all rows of a database table and then counting them when we should just done <code>select count(*) from table</code>. There are plenty of other instances where something can be very simply and robustly done in your datastore, or even at operating system level, rather than at the application level.
  2. Be aware that the requirements you get from users only scratch the surface of what they really want to be able to do. Great developers are those who have a sufficient understanding of the users and so intuitively understand what makes sense or not, and what has been left unsaid by the product owner.

The whole series, for your reading pleasure, is re-capped here:

 

First steps with Ethereum Private Networks and Smart Contracts on Ubuntu 16.04

Ethereum is still in that “move fast and break things” phase. The docs for contracts[1] are very out of date, even the docs for mining have some out of date content in[2].

I wanted a very simple guide to setting up a small private network for testing smart contracts. I couldn’t find a simple one that worked. After much trial and error and digging around on Stackexchange, see below the steps I eventually settled on to get things working with a minimum of copy/paste. Hope it will prove useful for other noobs out there and that more experienced people might help clear things up that I have misunderstood.

I’ll do 3 things:

  1. Set up my first node and do some mining on it
  2. Add a very simple contract
  3. Add a second node to the network

First make sure you have installed ethereum (geth) and solc, for example with:

sudo apt-get install software-properties-common
sudo add-apt-repository -y ppa:ethereum/ethereum
sudo apt-get update
sudo apt-get install ethereum solc

Set up first node and do some mining on it

Create a genesis file – the below is about as simple as I could find. Save it as genesis.json in your working directory.

{
  "config": {
    "chainId": 1907,
    "homesteadBlock": 0,
    "eip155Block": 0,
    "eip158Block": 0
  },
  "difficulty": "40",
  "gasLimit": "2100000",
  "alloc": {}
}

Make a new data directory for the first node and set up two accounts to be used by that node. (Obviously your addresses will differ to the examples you see below).

ethuser@host01:~$ mkdir node1
ethuser@host01:~$ geth --datadir node1 account new
WARN [07-19|14:16:22] No etherbase set and no accounts found as default 
Your new account is locked with a password. Please give a password. Do not forget this password.
Passphrase: 
Repeat passphrase: 
Address: {f74afb1facd5eb2dd69feb589213c12be9b38177}
ethuser@host01:~$ geth --datadir node1 account new
Your new account is locked with a password. Please give a password. Do not forget this password.
Passphrase: 
Repeat passphrase: 
Address: {f0a3cf66cc2806a1e9626e11e5324360ee97f968}

Choose a networkid for your private network and initiate the first node:

ethuser@host01:~$ geth --datadir node1 init genesis.json
INFO [07-19|14:21:44] Allocated cache and file handles         database=/home/ethuser/node1/geth/chaindata cache=16 handles=16
....
INFO [07-19|14:21:44] Successfully wrote genesis state         database=lightchaindata                          hash=dd3f8d…707d0d

Now launch a geth console

ethuser@host01:~$ geth --datadir node1 --networkid 98765 console
INFO [07-19|14:22:42] Starting peer-to-peer node               instance=Geth/v1.6.7-stable-ab5646c5/linux-amd64/go1.8.1
...
 datadir: /home/ethuser/node1
 modules: admin:1.0 debug:1.0 eth:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0

> 

The first account you created is set as eth.coinbase. This will earn ether through mining. It does not have any ether yet[3], so we need to mine some blocks:

> eth.coinbase
"0xf74afb1facd5eb2dd69feb589213c12be9b38177"
> eth.getBalance(eth.coinbase)
0
> miner.start(1)

First time you run this it will create the DAG. This will take some time. Once the DAG is completed, leave the miner running for a while until it mines a few blocks. When you are ready to stop it, stop it with miner.stop().

.....
INFO [07-19|14:40:03] 🔨 mined potential block                  number=13 hash=188f37…47ef07
INFO [07-19|14:40:03] Commit new mining work                   number=14 txs=0 uncles=0 elapsed=196.079µs
> miner.stop()
> eth.getBalance(eth.coinbase)
65000000000000000000
> eth.getBalance(eth.accounts[0])
65000000000000000000

The first account in the account list is the one that has been earning the ether for its mining, so all we prove above is that eth.coinbase == eth.accounts[0]. Now we’ve got some ether in the first account, let’s send it to the 2nd account we created. The source account has to be unlocked before it can send a transaction.

> eth.getBalance(eth.accounts[1])
0
> personal.unlockAccount(eth.accounts[0])
Unlock account 0xf74afb1facd5eb2dd69feb589213c12be9b38177
Passphrase: 
true
> eth.sendTransaction({from: eth.accounts[0], to: eth.accounts[1], value: web3.toWei(3,"ether")})
INFO [07-19|14:49:12] Submitted transaction                    fullhash=0xa69d3fdf5672d2a33b18af0a16e0b56da3cbff5197898ad8c37ced9d5506d8a8 recipient=0xf0a3cf66cc2806a1e9626e11e5324360ee97f968
"0xa69d3fdf5672d2a33b18af0a16e0b56da3cbff5197898ad8c37ced9d5506d8a8"

For this transaction to register it has to be mined into a block, so let’s mine one more block:

> miner.start(1)
INFO [07-19|14:50:14] Updated mining threads                   threads=1
INFO [07-19|14:50:14] Transaction pool price threshold updated price=18000000000
null
> INFO [07-19|14:50:14] Starting mining operation 
INFO [07-19|14:50:14] Commit new mining work                   number=14 txs=1 uncles=0 elapsed=507.975µs
INFO [07-19|14:51:39] Successfully sealed new block            number=14 hash=f77345…f484c9
INFO [07-19|14:51:39] 🔗 block reached canonical chain          number=9  hash=2e7186…5fbd96
INFO [07-19|14:51:39] 🔨 mined potential block                  number=14 hash=f77345…f484c9

> miner.stop()
true
> eth.getBalance(eth.accounts[1])
3000000000000000000

One small point: the docs talk about miner.hashrate. This no longer exists, you have to use eth.hashrate if you want to see mining speed.

Add a very simple contract

The example contract is based on an example in the Solidity docs. There is no straightforward way to compile a contract into geth. Browser-solidity is a good online resource but I want to stick to the local server as much as possible for this posting. Save the following contract into a text file called simple.sol

pragma solidity ^0.4.13;

contract Simple {
  function arithmetics(uint _a, uint _b) returns (uint o_sum, uint o_product) {
    o_sum = _a + _b;
    o_product = _a * _b;
  }

  function multiply(uint _a, uint _b) returns (uint) {
    return _a * _b;
  }
}

And compile it as below:

ethuser@host01:~/contract$ solc -o . --bin --abi simple.sol
ethuser@host01:~/contract$ ls
Simple.abi  Simple.bin  simple.sol

The .abi file holds the contract interface and the .bin file holds the compiled code. There is apparently no neat way to load these files into geth so we will need to edit those files into scripts that can be loaded. Edit the files so they look like the below:

ethuser@host01:~/contract$ cat Simple.abi
var simpleContract = eth.contract([{"constant":false,"inputs":[{"name":"_a","type":"uint256"},{"name":"_b","type":"uint256"}],"name":"multiply","outputs":[{"name":"","type":"uint256"}],"payable":false,"type":"function"},{"constant":false,"inputs":[{"name":"_a","type":"uint256"},{"name":"_b","type":"uint256"}],"name":"arithmetics","outputs":[{"name":"o_sum","type":"uint256"},{"name":"o_product","type":"uint256"}],"payable":false,"type":"function"}])

and

ethuser@host01:~/contract$ cat Simple.bin
personal.unlockAccount(eth.accounts[0])

var simple = simpleContract.new(
{ from: eth.accounts[0],
data: "0x6060604052341561000f57600080fd5b5b6101178061001f6000396000f30060606040526000357c0100000000000000000000000000000000000000000000000000000000900463ffffffff168063165c4a161460475780638c12d8f0146084575b600080fd5b3415605157600080fd5b606e600480803590602001909190803590602001909190505060c8565b6040518082815260200191505060405180910390f35b3415608e57600080fd5b60ab600480803590602001909190803590602001909190505060d6565b604051808381526020018281526020019250505060405180910390f35b600081830290505b92915050565b600080828401915082840290505b92509290505600a165627a7a72305820389009d0e8aec0e9007e8551ca12061194d624aaaf623e9e7e981da7e69b2e090029",
gas: 500000
}
)

Two things in particular to notice:

  1. In the .bin file you need to ensure that the from account is unlocked
  2. The code needs to be enclosed in quotes and begin with 0x

Launch geth as before, load the contract scripts, mine them into a block and then interact with the contract. We won’t be able to do anything useful with the contract until it’s mined, as you can see below.

ethuser@host01:~$ geth --datadir node1 --networkid 98765 console
INFO [07-19|16:33:02] Starting peer-to-peer node instance=Geth/v1.6.7-stable-ab5646c5/linux-amd64/go1.8.1
....
> loadScript("contract/Simple.abi")
true
> loadScript("contract/Simple.bin")
Unlock account 0xf74afb1facd5eb2dd69feb589213c12be9b38177
Passphrase: 
INFO [07-19|16:34:16] Submitted contract creation              fullhash=0x318caec477b1b5af4e36b277fe9a9b054d86744f2ee12e22c12a7d5e16f9a022 contract=0x2994da3a52a6744aafb5be2adb4ab3246a0517b2
true
> simple
{
....
  }],
  address: undefined,
  transactionHash: "0x318caec477b1b5af4e36b277fe9a9b054d86744f2ee12e22c12a7d5e16f9a022"
}
> simple.multiply
undefined
> miner.start(1)
INFO [07-19|16:36:07] Updated mining threads                   threads=1
...
INFO [07-19|16:36:21] 🔨 mined potential block                  number=15 hash=ac3991…83b9ac
...
> miner.stop()
true
> simple.multiply
function()
> simple.multiply.call(5,6)
30
> simple.arithmetics.call(8,9)
[17, 72]

Set up a second node in the network

I’ll run the second node on the same virtual machine to keep things simple. The steps to take are:

  1. Make sure the existing geth node is running
  2. Create an empty data directory for the second node
  3. Add accounts for the second geth node as before
  4. Initialise the second geth node using the same genesis block as before
  5. Launch the second geth node setting bootnodes to point to the existing node

The second geth node will need to run on a non-default port.

Find the enode details from the existing geth node:

> admin.nodeInfo
{  enode: "enode://08993401988acce4cd85ef46a8af10d1cacad39652c98a9df4d5785248d1910e51d7f3d330f0a96053001264700c7e94c4ac39d30ed5a5f79758774208adaa1f@[::]:30303", 
...

We will need to substitute [::] with the IP address of the host, in this case 127.0.0.1

To set up the second node:

ethuser@host01:~$ mkdir node2
ethuser@host01:~$ geth --datadir node2 account new
WARN [07-19|16:55:52] No etherbase set and no accounts found as default 
Your new account is locked with a password. Please give a password. Do not forget this password.
Passphrase: 
Repeat passphrase: 
Address: {00163ea9bd7c371f92ecc3020cfdc69a32f70250}
ethuser@host01:~$ geth --datadir node2 init genesis.json
INFO [07-19|16:56:14] Allocated cache and file handles         database=/home/ethuser/node2/geth/chaindata cache=16 handles=16
...
INFO [07-19|16:56:14] Writing custom genesis block 
INFO [07-19|16:56:14] Successfully wrote genesis state         database=lightchaindata                          hash=dd3f8d…707d0d
ethuser@host01:~$ geth --datadir node2 --networkid 98765 --port 30304 --bootnodes "enode://08993401988acce4cd85ef46a8af10d1cacad39652c98a9df4d5785248d1910e51d7f3d330f0a96053001264700c7e94c4ac39d30ed5a5f79758774208adaa1f@127.0.0.1:30303" console

Wait a little and you will see block synchronisation taking place

> INFO [07-19|16:59:36] Block synchronisation started 
INFO [07-19|16:59:36] Imported new state entries               count=1 flushed=0 elapsed=118.503µs processed=1 pending=4 retry=0 duplicate=0 unexpected=0
INFO [07-19|16:59:36] Imported new state entries               count=3 flushed=2 elapsed=339.353µs processed=4 pending=3 retry=0 duplicate=0 unexpected=0

To check that things have fully synced, run eth.getBlock('latest') on each of the nodes. If things aren’t looking right then use admin.peers on each node to make sure that each node has peered with the other node.

Now you can run the miner on one node and run transactions on the other node.

Notes

[1]  https://github.com/ethereum/go-ethereum/issues/3793:

Compiling via RPC has been removed in #3740 (see ethereum/EIPs#209 for why). We will bring it back under a different method name if there is sufficient user demand. You’re the second person to complain about it within 2 days, so it looks like there is demand.

[2] The official guide to Contracts is out of date. I spotted some out of date material on Mining and submitted an issue but updating the official docs doesn’t seem much of a priority so I figured I would collect my learnings here for now.
[3] You can pre-allocate ether in the genesis.json if you prefer, but that would mean a little more cut and paste which I am doing my best to minimise here.

 

Edited 26 Aug 2017

No need to specify networkid when initialising the node.

Learning Haskell gave me a whole new outlook on programming

A while back I decided to learn Haskell as a counterpoint to Ruby/Java etc that I was more familiar with previously. I am very grateful for the new perspective that learning Haskell has given me. In particular:

  • I always used to start with database tables and a user interface. Haskell forces you to think about the functions and how they work on data structures. Not having to think about data storage is strangely liberating as it is much easier to change your data structures if there is no UI or database to worry about.
  • It might need a lot more thinking to write a piece of code, but by the time you’ve finished it invariably feels very elegant.
  • Writing pure code and then wrapping IO around it later really forces you to write code that is simpler and more testable.
  • I have an appreciation of how intimidating jargon can confuse and put people off, unnecessarily. For example, Haskell purists might want you to grok Category Theory even if it’s not that relevant to writing Haskell apps.

The learning curve has been ridiculous. Have a look at Tymon Tobolski’s comparison of using Ruby vs Haskell in a small Redis application. The Haskell version is terse to the point of being incomprehensible at the outset. As a Haskell learner I couldn’t find something that packed a similar “wow” to “A blog in 15 minutes with Ruby on Rails”

Here’s a small Scotty/Warp Haskell app I put together recently. (Scotty is Haskell’s answer to Sinatra for you Rubyists).

Impress your CTO – Track Everything

The day after your new feature goes live, someone will want to know how well it’s working (or not). They don’t just mean “are there any exceptions in the logs”. They mean “how are people using it, if at all?”

Hopefully, you had thought about unspecified requirements and so you already implemented something that they can see. It doesn’t even have to be super heavyweight. I’ve seen good stuff done using something sophisticated like Heap Analytics, and also seen people be able to get actionable insights from something as simple as Google Analytics.

Whatever you choose to do, make sure that there is an easy way for a non-developer to dig around the stats, and ideally download some raw data to play around with offline.

Just be sure to track everything.

 

Trying to make some sense of it all

I’m still trying to get to grips with the course of events in the UK recently. Since the Brexit referendum, of course, but increasingly over the past few days. I’m getting a bit tired of the Facebook echo-chamber and so given that I have a blog I figured I’d take a momentary break from talking about tech to think through these issues. But here’s a tweet to give some Tech Business angle to this post.

OK with that out of the way let’s get on with things.

The Conservatives apparently want to shame British firms into hiring more British workers by making them publish how many non-British workers they employ. And most Brits apparently support that position.

Similarly, they, apparently all of a sudden, want schools to list how many foreign-born students they each have. And there are also the reports about the Foreign Office not wanting to use foreigners in some advisory work to do with leaving the EU.

Meanwhile, the de facto opposition party (UKIP) has experienced thuggish behaviour and somehow this is considered acceptable by the UK electorate. And in a surprising twist, a UKIP representative has spoken out against these Conservative ideas.

I’m a member of the liberal, cosmopolitan elite. In the referendum I supported Remain. My sympathies were to Leave based on the mess that the EU has made of Greece. But the absence of any realistic vision for how Leave could work meant that, for me, the whole thing was flawed. I suspect that most (all?) other voters also have their own reasons that defy stereotyping. I certainly didn’t enjoy the Remain scare-mongering. For what it’s worth I also have a healthy scepticism of experts – they crop up all over the place and I’m not always convinced of their insight. However I do have to admit that some of the worst scare-mongering (I seem to remember £1 being predicted to fall as far as $1.28) has now come true so I do wonder what else will be coming over the horizon.

Anyway, back to the plot. The public was asked a question and in a close result the public answered it. Brexit was about leaving the EU. I hope that however Brexit gets implemented it is not a simple Tyranny of the 51%. But that’s not what worries me most. What worries me most is that, based on the noises that we are hearing, Brexit is increasingly becoming about xenophobia, not about the EU.

Since the referendum result racist incidents have increased. Over the past week we’ve seen these latest nationality-related proposals. Obviously I’ve never met any of these people and don’t know what their motives are, to what degree they believe what they are reported as saying, to what degree they are grandstanding or just doing their job or whatever. But something is very concerning indeed about the content of this debate.

Let’s come back to the Name And Shame Businesses Who Employ Foreigners idea. The Party may now be back-pedalling on it but remember that, firstly, it was seriously considered, and secondly, the reality is that apparently most UK people support it. I do expect it to come back in another guise sooner or later (consider the example about profiling non-Brits in British schools). So let’s imagine that, in order to try to improve standards of living, British companies are somehow encouraged to hire native Brits rather than foreigners. It’s easy to see where this will end: They will hire based on nationality rather than ability; so they will perform less well; standards of living won’t increase as expected; immigrants will get scapegoated even more; and the cycle will repeat itself.

Whether in our out of the EU, this country depends on immigrants and will always depend on immigrants. Immigrants disproportionately contribute to the UK, despite the views evident in the YouGov poll. How to reconcile these contradictory facts is something I would hope our leadership manages to lead on, rather than just scapegoating immigration as they seem to be doing. I do have a general faith in common sense, decency and democracy so I remain optimistic, but it ain’t easy. Maybe I’ll have to stay up to watch the US Presidential Debate tonight to remind myself that it could always be worse.

Update 31st December 2016:

One thing that has been playing on my mind is the clear strength of feeling on the subject. There was a James O’Brien LBC clip doing the rounds a while back in which a Leave voter was finding it hard to articulate why he wanted to Leave (probably something to do with immigration). Getting out-talked by a talk show host isn’t the interesting thing here. The interesting thing here is that towards the end of the call the caller says he is willing to suffer 5 years of financial/economic hardship in order to get to a better future. That level of self-sacrifice is pretty incredible.

That’s not a business, it’s a hustle

“That’s not a business, it’s a hustle.”

I’m paraphrasing Dan Lyons speaking at a recent Chew The Fat event. He was talking about his time at HubSpot and wondering how a loss-making company could grow so fast, list, and make a lot of investors a lot of money. In his view its business model is fundamentally flawed: he describes it as buying dollar bills at face value and selling them at 75 cents each. That sort of model may well get you hyper-growth but it’s a hustle, not a business.

The image of the hustling, zillionaire tech entrepreneur who has never made worried about profits has become dangerously deeply ingrained. I recently got talking to a graduate – evidently a bright individual – who ran a business idea past me. It was to do with optimising a retail shopping experience. Quite a neat idea so I asked the obvious question: where would the revenue come from? The consumers or the retailers?

He didn’t have an answer.

He assumed that all he’d have to do is somehow get some VC money, spend it on scaling a platform with loads of users, then sell out and walk off into the distance with $50 million in his pocket. Now that’s a hustle, not a business. There are cases where this has happened. But there are also plenty of cases of people winning the lottery and I suspect that on a risk-weighted basis, winning the lottery is more likely than being a part of the next Whatsapp.

Please, if you want to become an entrepreneur, bear these two things in mind:

  1. You stand a better chance of making a successful business if you have intimate knowledge of the problem you are trying to solve. If you’re targeting students then, fine, as a recent student you might have more than enough knowledge to build a viable business. But if you’re not then probably you would need to work somewhere for a few years to get that deep understanding first (“domain knowledge”, in the jargon)
  2. You will stand a better chance of success if you can create something that makes revenue, profit and cash. All those things are different so make sure you have a clear idea of how they will interplay in your business. In fact, you might even end up with more money in your pocket if you build a niche, profitable, growing business than if you go chasing unicorns.

This doesn’t mean you need a detailed business plan. You just need to be articulate those two points in as straightforward a fashion as possible. Here’s a classic DHH talk to hopefully get you thinking along the right lines . Incredibly it is still as valid today as it was in 2008.