Screenshot of the open data catalog search page."

SF Open Data Program

 Website

Program goals

The Open Data Program is really many projects supporting timely data made easily available to City departments and the public. This means data is offered:

  1. At no-cost
  2. With permissive licensing
  3. In machine readable formats including via Application Programming Interfaces (APIs)

In addition, the open data program supports:

  1. Improved knowledge of data assets
  2. Data governance around those assets
  3. Increased use of data in decision-making

My role was to identify and implement continuous improvements to the program in support of DataSF’s mission to empower use of the City’s data.

Approach

Our overriding approach at DataSF is to use lean, continuous improvement cycles to plan, do, check, and act on deliverables.

The open data program projects happen within several work streams that support the goal of timely data made easily available. These are:

  1. publishing support and automation services,
  2. data coordination, governance and quality management, and
  3. user support and training

Below, I’ll highlight some projects I delivered to support open data.

  1. Develop open data publishing process. To accommodate a federated data environment, we needed to develop a publishing process that DataSF could use to manage and monitor for ongoing improvements; all within a constrained budget. I developed, with the team, an intake process and standardized work for developing data pipelines. We use a Trello board to manage the work and data about the process is automatically captured for rollup into a PowerBI dashboard for planning and monitoring. We documented the “open data operating manual” as part of a 4 part blog series.
  2. Inventory data citywide. State and local law requires an inventory of City systems and datasets updated at least annually. I have managed and improved the inventory process each year since the first one in 2015. I work with 50+ data coordinators in departments to collate and update the inventory. This includes developing the templates, processes and guidance to support the updates. I developed scripts to populate templates with existing inventoried data and then to ingest back into Airtable. I also developed scripts to sync the inventories of systems and datasets to the open data portal.

Outcomes

  1. 150+% increase in available data on the portal over 3 years. Providing consistent, centralized publishing and data pipeline development helped increase available data on the portal
  2. 96% of datasets published with APIs. Before the publishing program was established, only about 35% of open datasets had APIs, the remainder were posted as external links to files. This has improved accessibility to and reuse of data.
  3. 85% of datasets updated on time on average. Timely data is important. Datasets that don’t meet their update schedule are generally those maintained manually by departments. Data automated for departments by DataSF meets the update schedule closer to 100% of the time.

Out in the world


Leader in civic data and its many uses with 10+ years experience in data management, analysis, visualization, and engineering. I helped build DataSF into a world-recognized program empowering use of San Francisco's data.
Jason Lally on Twitter