Research Framework

Research Framework Purpose

The collaborative research process is complex, iterative, and by extension messy. Many of the complexities of collaborative work are an extension of the variety and volume of data produced as researchers and practitioners with different backgrounds go about their work. Researchers may not share software platforms and the software employed by one collaborator often introduces barriers to others. Output portability can likewise be problematic, especially when in a proprietary format.

If data are to remain accessible to all parties and support analyses and decision-making, a precise and flexible organization is essential. This section presents an open source research and data management framework developed within the Oklahoma EPSCoR grant investigating social vulnerability and hazardous weather. As part of a multi-institutional and multi-disciplinary collaboration, this work seeks to make a broader impact on the professional community by presenting how a variety of open source solutions can be collectively organized to manage the collaborative research process. Software recommendations and implementations are detailed, as are instructional vignettes that guide users interested in developing similar data management frameworks. The procurement, management, and analysis of spatial data related to social vulnerability and severe weather as part of the SoVI research project are used as an illustrative example.

Workshop: Python for Data Manipulation and GeoVisualization

Workshop Purpose

As part of our research process, the open source Research Framework was constructed when research tasks required more scriptable tools. During construction of the research framework, we investigated complementary tools to enrich those materials and improve the replicability of the research project as input data and context change. Our hope was that this effort would extend the impact of our work beyond the lifecycle of the EPSCoR project. This parallel work led us to adopt an open source research framework that produced a series of instructional vignettes advising researchers how to better design systems that work with large geographic datasets.

The materials presented here were designed for a series of workshops offered on the Oklahoma State University campus in November 2018. The purpose of this workshop was to inform and educate the OSU student community and those working in fields related to the EPSCoR project in the state about the tools available for scripting data manipulation and mapping tasks. Familiarity with programming concepts was considered helpful, but not required. The focus of this session was be to introduce the Python programming language, briefly, and then demonstrate basic tasks using Python. All lessons were provided in and executed using Jupyter Notebook software. Included demonstrations of using the Pandas library to script tabular data manipulation tasks like:

  • Calculations based on fields
  • Merging datasets
  • Sorting data
  • Grouping data
  • Pivot Tables
  • Generating Figures

The Geopandas library was used to perform tasks often performed in Geographic Information Systems software like:

  • Spatial Queries
  • Create new features
  • Make Buffers
  • Generate maps

Following each demonstration, there was time to work with these tools under supervision.

Workshop Files

There’s two ways to get the workshop files. There are HTML copies of the “ran” notebook files for reference. There’s also a zip archive containing the “clean” IPNYB files for you to run, all data required, and reference images. If you already have Jupyter Notebooks installed, the zip is where to start. If you do not already have Jupyter Notebook installed, you cannot read the first part of the workshop: “How to Install Jupyter Notebook”. Instead, read the HTML copy to get all software installed and then get started on the zip files.

Contributors

  • Peter Kedron
  • Clay Barrett
  • Kellen Bullock
  • Matthew Burton