Hacking Data

November 2019

By Jordi Sicart  from Smart Capex

 

One-shot programming

 

Programming is about automation and data manipulation. The french word for computer science is ‘Informatique’, the contraction of ‘Information’ and ‘Automatique’. Computer scientists often mistake getting a valid solution with solving the problem for all possible inputs and cases. This is the equivalent of building a full factory to build a single motor. But life is short and writing good software takes time.

Building a generic solution requires more work than just finding a solution to our initial problem. We need to handle edge cases, ensure long term maintenance and build a production environment. Programming for it is also more difficult as we have to write the code independently of the data. Most of this is wasted effort if you need to do the actual computation only once.

Let’s consider a classic use case when you have to prepare data before a meeting. You have to extract data from a source (like a DB), clean it, find patterns and build some stats. If it takes hours to do and you need to do the same thing again every day, automation is a clear winner, but very often it does not.

The classic way this kind of thing is solved at work is by using Excel. Don’t get me wrong. Excel is a fine tool but sometimes we fall short for something a bit more advanced in terms of programmatic capabilities and expressivity. It turns out that most web browsers implement a really good programming environment!

 

 

The browser is my favorite IDE

 

You have at your disposal a JavaScript REPL (Read Eval Print Loop), the ability to save snippets of code, debuggers, and multiple terminals. The average salesman laptop already contains everything that you need, even offline.

Let’s give an example of this using a simple exercise: computing the square of all the numbers that are greater than 5 in a series of numbers.

> var data = [1,2,3,4,5,6,7];

< undefined

> var filteredData = data.filter(num => num > 5);

< undefined

// BTW what is my filteredData?

> filteredData

<(2) [6, 7]

> filteredData.map(num => num * num)

<(2) [36, 49]

This step by step programming style allows you to manipulate data fast and check the output after each command to see if you’re on track. This is much more efficient than first building a big program to chain all these operations and having to debug it step by step. The funny thing with a browser like Chrome is that you don’t even have to execute the statements to see the result as it displays some nice precomputation.

You can also start working with a small dataset and once you have a correct formula, combine all your steps together to create a simple function to use with bigger datasets.

I hack stuff often in the console. Whether I want to check how a lib API works, test a small function that I just wrote, compute some GeoJSON test data, or parse an email full of logs. You can handle a few tens of megabytes of data in the console without big performance problems.

If CLI is not your style, Code Pen or Code Sandbox online editors can help replace it although the feedback loop is not as great, especially if you have some long computation running each time you refresh the page.

The JavaScript CLI is great but sometimes, it’s not enough. GeoJSON Manipulation without a tool is not easy and error-prone. But that’s not a problem because we have the richest package manager ever created at our disposal.

 

 

The web as the package manager

 

That’s right, it’s simply the web. In need of a date or number formatter, a JWT parser? It’s available. There are two main ways of accessing your favorite libs from the console.

The first way is again to use an online javascript editor. Most of them allow importing popular packages. You can even export these packages to the window object and continue from the console.

The second way is the fun one for me. Most lib websites are exposing their libs as global variables. Copy your data to the clipboard, open a new tab, browse to the Moment (date formatting library) website, paste your data in the console and start formatting your dates.

So what are my favorite tools online? I use Turf.JS a lot for GeoJSON manipulation, Lodash for algorithms, and Moment & Numbro.JS for date and number parsing/formatting. VocaJS provides some nice features for string manipulation. Sometimes it’s not even about running things in the CLI. There are a lot of formatters and validators (SQL, JSON, etc.) or visualizers like Webpack Visualizer, GraphQL Voyager or Geojson.io directly usable in browsers. Just copy and paste.

 

 Example 1: Building a voronoi map in 1 minute

 

A Voronoi diagram is a classic polygon building technique.

Writing a small program able to generate the diagram and create its input data matching approximate points of interest is a process that can take between 15 minutes and a day depending on the programmer’s experience. I actually never did it before. Let’s see how I could perform it in less than a minute without actually building a program for it.

  1. First place your points on the map on geojson.io:

Data hacking by Jordi

  1. Copy-paste the GeoJSON feature collection to the console in `data` variable in the turf.js website console.
  2. Build the Voronoi’s and copy back to the clipboard:

Data Hacking by Jordi

  1. Let’s go back on geojson.io to see the result:

Data hacking

Boom, I now have an answer expressed in a complex GeoJSON structure. I can visualize it on a map and I’m ready to add a screen capture to a slideshow for my next meeting.

 

 Let’s hack data and all other things

 

In the end, this is not very different from what people do with the CLI in bash/PowerShell except that the browser is a much better IDE than a Linux terminal.

The fast feedback loop of the CLI and the ability to access almost any tool makes it one of my favorite ways to find practical answers every day. It’s amusing to see the look of the junior colleagues when they see how simply we can demonstrate the usage of libs or that we can do sorting, filtering plus formatting faster than a business consultant using Excel.

Now, go out and start hacking!

 

Enjoyed this read? You might also enjoy:

 

About the author

 

Jordi Sicart

Senior Developer at Riaktr

  

Start a conversation with us!

Share your feedback with us! Got a question or a comment? Please go ahead and share your thoughts!