Open Refine

This is a tool for finding and correcting errors in data. All datasets have errors and inconsistencies in them. We need to clean them before using them, otherwise the errors will show up in the charts or maps we make. Open Refine, formally Google Refine, can help you do this.

Data errors can be the cause of different date formats used for the same day, typing errors made during data entry or just extra spaces where there shouldn't be any. Spreadsheets can have duplicate entries, or entries that should be split into two (or more) entries. These can be hard to find. Sometimes these problems are one-offs, but they can also be systematic errors made across the whole dataset, such as spelling a person’s name or location differently each time.

Finding and correcting these by hand is time-consuming and comes with the risk of making new errors when trying to correct the old ones. Open Refine highlights where errors might be and can help you fix the problems all at once across your whole dataset. This tool can also do many other things to your data, including re-structuring and re-formatting data, and merging your data with other datasets. It can also translate data into other languages, though this is a little more complicated.

it automates tasks that would take a long time to do manually, such as finding mistakes and data that might be out of place.

how hard it is to figure out what filtering, faceting, clustering and reconciliation actually do to your data!

Steep

No, you'll need to download and install Open Refine on your computer however it's used through your Internet browser just like a website. Open Refine is an application on your computer that will open up and run in your web browser.

This tool isn't for creating data from scratch like a spreadsheet. To get started, download Open Refine by following the instructions here. You will then by asked to upload data you already have, whether as a spreadsheet or another sort of file. The programme will then show your data in your Internet browser. The types of analysis Open Refine can do are accessed through the down-pointing arrows in the first row of your data as displayed by Open Refine. The tool keeps a record of everything you do, so you can “undo” or “redo” changes you make to your data. Open Refine does not change the data in your spreadsheet – it creates a new dataset that has all the changes, which can be exported from Open Refine as a new spreadsheet.

CSV, Google Spreadsheets, JSON, RDF, TSV, XLS, XLSX and XML. 

CSV,HTML, TSV and XLS. 

Open Refine works on your computer, not on the Internet. You control how it is used, what data you put in it and who can access it. Think of it as a personal and private web application.

Previously Google and now it belongs to the open source community.

BSD