top of page
Writer's pictureAmie Williams

Data cleansing isn't as scary as you think, this is what to do...

So your data is in a pickle. Bits and pieces are all over the place, you may even refer to it as a “dreadsheet” rather than spreadsheet. The great thing about data, though, is that there’s going to be something to help pull it together. Even in the most complicated, sprawling databases with the worst naming conventions there is always a way to bash it into something understandable.


There’s always a reason or excuse to ignore the herculean task of tackling the complex, boring beast of data. (Maybe tidying your desk or sorting your paperclips will deliver more value, you ask yourself…) My advice, after years of taking the worst data and knocking it into shape, is just to start and make sure you have a clear picture of what the end will look like. You’ve probably spent an hour regretting all your life choices at a brutal spin class or watching your football team get defeated, so channel that energy into pulling together the information you want to make your business better.


The worst set of data I came across had been saved in so many places and updated by so many people at different stages that the latest version of the information was so far from its real self that it was like a photocopy of a photocopy of the Magna Carta. I could barely tell what some of the columns were supposed to be because things had been accidentally pasted in where they didn’t belong, overtyped and deleted.


Once we’d taken a few deep breaths, I reminded the business owner of my “go-to” principles to get back in business:


1) Just start, it’s never as bad as you think!

2) Have a clear idea of what the finished article should look like. This avoids “scope creep” which magnifies the issue and creates further procrastination

3) Create a simple base template ensuring the columns have a uniform naming convention to capture all the info from the legacy spreadsheets

4) Identify and categorise the old data based on this template and new naming convention

5) Input the info into the new file, check for duplicates and resolve conflicts


The final result was a clean set of data with a very small number of lines that needed to be checked for conflicts. The owner did that in about ten minutes, knowing his business well and being able to tell which was correct, and we produced a beautiful set of reports out of it that he still uses now.


While it could sound a bit like a fairy tale, I haven’t found an ugly set of data yet that I’ve not been able to work out and transform into a magnificent swan. No magic required, just a bit of knowledge and a cup of tea.

9 views0 comments

Commentaires


bottom of page