How to “clean” and analyse data using Exploratory, Openrefine, Rstudio, & Jupyter

This could also be a writing workflow

(!Work in progress, perpetual alpha … use these tips at your own risk)

What about it?

I am going to write a series about my data analysis and writing workflow. These workflows use free and open source tools to clean, analyse, and write research. I want these to be reproducible.

Can Medium be a medium for Scholarly Writing?

I am going to write about these workflows in the following series of posts on data analysis & writing. I like Medium to be a channel for scholarly writing. At the time of writing this series, Medium allows embedding of spreadsheets in the form of airtable embeds and shows graphics (including interactive graphics using Plot.ly), but otherwise use images to present tables. You can use endnote/footnote styles (superscripts) to write references (type out the references), but beyond this, it does not allow for minting of DOIs, so using Medium as a portal to keep your scholarly publication or preprint is not possible at the moment. Otherwise, Medium would be an ideal platform for scholarly communication and peer review as it allows for different levels of commenting and feedback.

Start with OpenRefine

Read the data by first saving the data into csv file. Get the data in the form of excel spreadsheet, read it in your favourite software and then use exploratory to read the data. Exploratory works well with the columns where you can either delete unwanted columns, or rename them.

Data Preprocessing: OpenRefine

Remove rows and columns

  1. From “All” column, select “facet by star”
  2. In the resulting window, you see “true” and a count (can be “1” if only one row, otherwise n where n = as many rows as you want removed)
  3. In the “All” column, click “Remove all matching rows”
  4. This will remove all the “starred” rows, and if you click on the “False” link, you will see the remaining rows in the right hand panel (See Figure 1)
Figure 1. You get this view by clicking on “All”, and asking to show “Facet by star”

Associate Professor of Epidemiology and Environmental Health at the University of Canterbury, New Zealand. Also in: https://refind.com/arinbasu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store