Posts

Showing posts from October, 2019

Exploring Chapter 2 of Open Refine Pt. 2

Recipe 4 In recipe 4, I learned how to perform text filters, and just like searching any type of document, it is riddled with problems. While you can certainly perform a text filter, it does not account for the many variations of a word that might have been typed in. In my opinion, this feature is handy if dealing with a dataset that you created or even a dataset with consistent entries (do those even exist?). This reminded me of the find feature in Excel. It is a quick way to retrieve data, but accurate results are not guaranteed. Recipe 5 In recipe 5, I learned more about retrieving records out of a dataset. The most beneficial to me was the ability to edit cells to remove uppercase letters. What I enjoyed most about this feature was the ability to edit multiple cells at once. It is a quick way to make changes to a large set of data. Recipe 6 In recipe 6, I learned how to apply everything that I learned throughout the chapter and make changes to my dataset. While I appreciated lea...

Exploring Chapter 2 of Open Refine Pt. 1

I am quick to admit that uploading the GZ file was something new for me, and I struggled with this for over an hour. After testing nearly every option I saw in Open Refine, I finally found where to import a project that has already been “worked on”. When reading the chapter, I do not remember seeing any guidance on how to mess with this data, and I believe clarity with this would go a long way. Alas, it is done so I can begin working on Recipe 1. Recipe 1 In recipe 1, sorting seemed fairly straight forward and somewhat like the sort feature on Excel. The main difference worth noting, something addressed in Chapter 1, was the ability to easily undo and redo the changes that have been made to the data sets. The convenience of sorting and easily restoring the information to its original state makes the sorting feature rather nice. Recipe 2 In recipe 2, I learned how to use facets to retrieve records. This can come in handy when I want to weed out the records with obvious errors or even is...

Exploring Open Refine

Chapter 1 Overview in Relation to Excel When using Open Refine, I noticed many similarities in regards to its Excel counterpart. After working my way through the chapter, I can definitely say that I prefer Open Refine’s editing tools over those available in Excel. The ability to undo/redo from any part of the project is more convenient than Excel. In Excel, you would have to constantly save another version outside of the original dataset so that no permanent damage is done to the original data. To record my thoughts throughout chapter 1, I made notes of each recipe as I worked through them. Recipe 1 In recipe 1, I had to install the Open Refine application. For the most part, this was pretty straight forward, but I did have to go in and edit my security settings before it would allow me to complete the download. After doing that, I was ready to go. Recipe 2 In recipe 2, I had to upload a dataset into Open Refine. Once uploading, I was able to play around with the settings to see how i...

Spreadsheets for Data Management

When dealing with data, you must be consistent during the data entry process to ensure your records are crafted with accuracy and clarity. Now, in an ideal world, data would be entered in the same format every single time with the same values, but that is not always the case. During the exercises from Tidy Data for Librarians , I was able to see not only how data can be misconstrued but also how it can be analyzed for errors and edited for clarity. What I found most useful from these exercises was the helpful tip of having a notes sheet when working so that you can keep track of what you are doing. In the past, I would have a word document open in hopes of keeping track of my information only to later realize I didn’t create a thorough record of my changes. Overall, these were great exercises that enabled me to refresh my Excel skills, and I am now more comfortable setting data validation, retrieving dates, and using color scales to locate errors. For anyone that needs to learn more of...