Data Census Blog

Omaha and CSCW Preparation

Last week members from the OCDF working groups met in Omaha, Nebraska to discuss the purpose, structure and language of the manifest – now re-named the OCDX (Online Communities Data Exchange). Thanks to some very productive, enthusiastic and brilliant folks, over the course of eight hours we were able to:

Pare down the number of fields
Clarify the purpose of each field
Develop data input standards

Developing a coherent and functional manifest will support future efforts to identify workflows and tools that automate metadata creation.  Before starting these efforts it will be important to test the effectiveness of the manifest. Members from the Metadata and Ontologies working group along with researchers from the Wikimedia Foundation will be hosting a workshop at CSCW at the end of February. We plan to use the workshop to evaluate the OCDX and ascertain whether it can: (1) help researchers identify possible open online community (OOC) datasets and (2) replication (or at least partially replicate) research findings generated through an OOC dataset.

Stay tuned for more updates!

In a World Where Every Researcher Manages Data Differently…

11.19.15-media copyThis month we’d like to discuss collaboration and the contributions working groups have been making to the metadata manifest. Describing data sets consistently, accurately and thoroughly poses numerous challenges. Respecting the researchers time, avoiding confusion about terms and formats in order to promote interdisciplinary use, documenting steps taken to protect and promote the rights of research subjects… etc. are just a handful of issues we’ve been discussing. Taking steps to capture this information and translate it into a functional schema has been a major work in progress.

In order to avoid re-inventing the wheel we’ve reached out to other repositories and projects that promote the collection, description and reuse of open online community (OOC) data. Also, we have been participating in OCDF leadership meetings in order to keep in touch with the Privacy & Ethics, and Infrastructure working groups.

I’ve Never Met-a-data I Didn’t Like

Revising and refining the data census interface has gone hand in hand with the development of  a metadata schema.  To map out how it will contribute to general OCDF goals we’ve also developed a metadata workflow that describes the general contributions metadata can make to the OCDF, as well as particular areas where collaboration with other working groups is possible.

Xena vs. the Data Census

In a time of ancient gods, warlords and kings, a data entry form in turmoil cried out for a hero…. She was Xena, a mighty princess forged from the collaboration of many working groups….


Let’s not get ahead of ourselves though, there might be power, passion and danger here, but some context might make it easier to change the ways we describe OOC datasets:

Over the past few months AJ and I have been re-designing the data entry for the OCDF data census. Based on our initial review of the data census we cleaned up existing records so that they reflect the metadata standards we worked to implement before the OCDF meeting in Copenhagen. After listening to working group needs and OCDF participant interests we spent the months after the meeting in Copenhagen brainstorming how to move forward with the data entry interface. Feedback has been possible to clarify the functional requirements for goals and function of metadata, which has in turn made it possible to re-develop the data entry form. Overall, we hope that the current iteration of the data entry form balances working group, course instructor needs and non-OCDF researcher interests in metadata associated with OOC datasets.

While we have developed a new streamlined form, but our work is far from done! We are hoping for more feedback before the beginning of the Spring 2016 semester, so please visit the data entry form 2.0 and create a dummy record to review fields (paying particular attention to any instructions given).  Our goal is to have a coherent set of fields for students in data management courses to utilize while completing assignments ranging from the identification and description of datasets to discussions about dataset replication or data management policies.

If you have comments or suggestions please contact

Kristen ( or AJ (