Hands-On Humanities Data Curation
“Making the most of messy data”
​
​
Convenors: Elizabeth Wickes and Katrina Fenlon, School of Information Sciences, The iSchool at Illinois
Hashtag: #dhcuration and #DHOxSS
Computers: please bring your own laptops (no tablets please)
​
Abstract
​
Humanists create and use data in a variety of formats. For these data to retain their value over time requires data curation: the active and ongoing management of data through its lifecycle of interest and usefulness. Curation provides the foundation for a range of related activities from analyzing and visualizing research data to promoting access and reuse across a broader scholarly community.
This workshop will provide a hands-on introduction to useful tools, methods, and perspectives for managing, organizing, cleaning, and processing data in humanities projects of any size or complexity. Sessions will cover a range of topics, including information organization, data modelling, data quality and cleaning, and workflows. Learners will apply curation techniques and tools to data from a real-world digital humanities project. Time will be allotted for learners to apply what they have learned to data or a project of their choice.
The programme is aimed at humanities researchers from any discipline, background, or current role. The workshop will be led by experts from the iSchool at Illinois with additional speakers from the University of Oxford.
​
Convenors
​
Katrina Fenlon is a postdoctoral research associate at the School of Information Sciences at the University of Illinois at Urbana-Champaign, where she recently completed her PhD in Library and Information Science. Her research focuses on the representation and use of digital collections in different contexts, ranging from organizational and descriptive problems in large digital libraries, to the use and sustainability of curated cultural collections. She teaches courses in digital humanities and data curation.
Elizabeth Wickes is a Lecturer with the School of Information Sciences at the University of Illinois at Urbana-Champaign. She is on the Executive Council of The Carpentries and a Python user group organizer. She previously worked as a Data Curation Specialist at the University Library at Illinois, and the curation manager for Wolfram|Alpha. Her research interests are in programming education, digital humanities technical training, and managing research data.
​
"Discovered a bunch of tools and resources that will be really useful in my work and research. Left the workshop really excited and willing to know more!"
DHOxSS 2017 participant, Humanities Data: A Hands-On Approach
Link to overview of the week's timetable including evening events.
​
Monday 2nd July
​​
08.15-09.15
​​
Registration (Sloane Robinson building)
Tea and coffee (ARCO building)
​​
09.30-10.30
​​
Opening Keynote (Sloane Robinson lecture theatre)
​​
10.30-11.00
​​
Refreshment break (ARCO building)
​
11:00-11:15
​
Workshop Introductions
​
Introductions should come first. We want to know about you, your projects, and your data. (Pip Willcox)
​
11:15-12:30
​
Introduction to Humanities Data
​
In this session, we’ll review some of the unique characteristics and challenges in working with humanities data. We will also introduce the “messy” dataset we’ll be cleaning throughout the week, and review the workshop agenda. (Katrina Fenlon)
​
12:30-14:00
​
Lunch (Dining Hall)
​
14:00-16:00
​
Hands on with Spreadsheets
​
As our opening hands on activity, this session will outline strategies to encode unstructured data into structured speadsheets. We will also be demonstrating best practices for data preservation and quality control. (Katrina Fenlon)
​
16:00-16:30
​
Refreshment break (ARCO building)
​
16:30-17:20
​
Information Organization and Data Quality
​
An overview of the basics of information organization: tables, trees, and triples! Also an introduction to the basics of data quality and fitness-for-use. (Katrina Fenlon)
17:20-17:30
​
Free writing
​
Apply the day’s lessons to your own work.
​
​
Tuesday 3rd July
09:00-10:30
​
Minimum Viable Curation: Case Studies
We will explore what real life curation can look like in a variety of humanities contexts, focusing on minimal effort for biggest impact. We will use case studies from real humanities projects to frame these conversations, while also discussing core data management tools and perspectives. (Elizabeth Wickes and Katrina Fenlon)
10.30-11.00
​​
Refreshment break (ARCO building)
​
11:00-13:00
​
Contextual Data Modeling
​
Building upon concepts from Information Organization, this session approaches data modeling through deeper considerations of context, provenance, and evidence. (Neil Jefferies)
​
13:00-14:30
​
Lunch (Dining Hall)
14:30-15:20
​
Learning strategies for technical skills
​
As preparation for our upcoming hands on technical and programming activities, we’ll discuss strategies and sources for learning technical skills, approaches for note taking, and learning styles for technical content. (Elizabeth Wickes)
​
15:20 -15:30
​
Free writing
​
Apply the day’s lessons to your own work.
​
15.30-16.00
​​
Refreshment break (ARCO building)
​
16:00 -17:00
​
Lectures (various venues)
​
​
Wednesday 4th July
09:00-10:30
​
Hands on with OpenRefine
OpenRefine is a “free, open source power tool for working with messy data and improving it.” We’ll demo this tool and prepare you to perform basic normalization tasks using OpenRefine, including exploring a dataset, faceting and clustering for normalization, and reapplying these techniques to new datasets. (Katrina Fenlon)
​
10.30-11.00
​​
Refreshment break (ARCO building)
​
11:00-13:00
​
Hands on with SQLite
​
SQLite is the most ubiquitous database engine across the globe. It’s lightweight, relatively easy to learn, and can be an important asset in your data curation arsenal. Participants will dive in with a hands-on introduction to database structures and data profiling. (Elizabeth Wickes)
13:00-14:30
​
Lunch (Dining Hall)
​
14:30-15:20
​
Selective preservation
​
This section will introduce the basic concepts and concerns of preservation for data curation efforts. We will examine from a new perspective how our data have moved through workflows in the course of hands-on activities so far this week, and consider the implications of workflows -- and data models at each stage of workflows -- for selective and strategic digital preservation efforts. (Katrina Fenlon)
​
15:20 -15:30
​
Free writing
​
Apply the day’s lessons to your own work.
​
15.30-16.00
​​
Refreshment break (ARCO building)
​
16:00 -17:00
​
Lectures (various venues)
​
Thursday 5th July
09:00-09:15
​
Free writing debrief
​
Reflect on what you've learned so far. Apply this to your own work: your individual research and your institutional context. Make notes on how you've managed your data previously and what you might do in the future. Does this raise new questions? Are there other things you'd like to explore?
​
09:15-10:30
​
Tbc
​
10.30-11.00
​​
Refreshment break (ARCO building)
​
11:00 - 13:00
​
Hands on Data Exploration with Python
​
This session presumes zero programming exposure and will include an introduction to using a Python development environment, writing scripts, and approaches to creating text processing utilities. We will step through the process of transforming and exploring unstructured data into a dataset for analysis. (Elizabeth Wickes)
​
13:00-14:30
​
Lunch (Dining Hall)
​
14:30-15:30
​
From Project to Preservation: Institutional Repositories
What happens to your data when your project is complete? This session provides an overview of archiving and data management from the perspective of institutional repositories. (David Tomkins)
​
15.30-16.00
​​
Refreshment break (ARCO building)
​
16:00-17:00
​
Lectures (various venues)
​
Friday 6th July
09:00 - 10:30
Ethics and Social Implications
​
Legal, Ethical, and Policy issues related to data collection, analysis, and sharing (Katrina Fenlon and Elizabeth Wickes)
​
10.30-11.00
​​
Refreshment break (ARCO building)
​
11:00-13:00
Rapid Data Project Prototyping
​
Participants will have an opportunity to put their skills to work by quickly cleaning data for one of a few uses (e.g. making a timeline, map, a network diagram, a conceptual model…) with the project data used throughout the week. (Elizabeth Wickes and Katrina Fenlon available for consulting)
​
13:00-14:00
​
Lunch (Dining Hall)
​
14:00 - 15:00
​
Closing discussion
​
After an exercise to develop a personal learning plan, the full group will reconvene to discuss: What can you do to improve your own curatorial practices in the near term and in the long term? What are the key lessons you learned from your week as data curators? (Elizabeth Wickes and Katrina Fenlon)
​
15:00 - 16:00
​
Closing Plenary (O'Reilly lecture theatre)
​
​
Tutor biographies
Neil Jefferies is Head of Innovation for Bodleian Digital Library Systems and Services at Oxford, guiding innovative digital projects at the Bodleian covering both traditional library materials and research data in all its forms. He is a scientist by training but has been working with internet technologies for nearly 20 years, mostly commercially – his first website was Snickers/Euro'96! He is Technical Strategist of "Cultures of Knowledge", an international collaborative project launched in 2009 "to reconstruct the correspondence and social networks central to the revolutionary intellectual developments of the early modern period". He was also one of the co-creators of IIIF, and is currently working on V3 of the SWORD protocol for the automated transfer and updating of digital objects between repositories.
David Tomkins is Curator of Digital Research Data at the Bodleian Libraries in Oxford and manages ORA-Data, the University’s institutional repository for research data. He has led a number of high-profile digitization, content creation and crowd-sourcing projects for the Bodleian, including Queen Victoria’s Journals, What’s the Score?, Mapping Crime and Electronic Ephemera, having previously undertaken similar roles at the Victoria & Albert Museum and the Institute of Historical Research. David is co-author of Illustrating Empire: a visual history of British imperialism, and has also written book chapters, articles, and an online course for the Oxford University Department for Continuing Education.
​
Pip Willcox is the Head of the Centre for Digital Scholarship, Bodleian Libraries, a Senior Researcher at the University of Oxford e-Research Centre, Director of the Digital Humanities at Oxford Summer School, and research member of the common room at Wolfson College, Oxford. With a background in scholarly editing and book history, her current research is in the experimental humanities.