Nice thoughts; I’d also be interested to read your thoughts on data preprocessing and tidying up data.

One of the issues with R, compared with SPSS or Stata has been in data preprocessing (that is, identifying outliers, missing data, imputation, etc), that it is not great in that department and the analyst is advised to preprocess the data on a spreadsheet first and then import to R for further work. Besides, while R was great, people pined for a more intuitive way to interact with R (not always easy).

That was then. Till Hadley Wickham took that “bull” by the horn and his dplyr (indeed all of Hadleyverse, :-)) is an invitation to think in new ways about engaging with R programming and modelling. With dplyr and ggplot, a new world of working with R opened up, and made “data cleaning” in particular easy as.

Anyway, the point I am trying to get at is this. As “data sciences" will mature as a field and dominate the space of software engineering, and indeed much of how we interact with our computers and machines (for the plebes like us), the demand for tools that enable us humans to work with messy data will grow.

Those of us who analyse data day to day love to work with tidy data but they do not always come in that form. Perhaps we can talk about how we “tidy” up data in startups. There is a room for growth there.

Associate Professor of Epidemiology and Environmental Health at the University of Canterbury, New Zealand. Also in: https://refind.com/arinbasu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store