Open Knowledge Festival 2014

This blog post is written by Michelle Willmers, Project Manager at OpenUCT Initiative, and is cross-posted from the OpenUCT Initiative blog.

The OpenUCT Initiative has in recent months been fortunate to participate in a number of workshops and events around open access, open science and open data in the African and developing country context.

Group Photo – Open Data in Developing Countries Research Network Workshop, Berlin, July 14th/15th 2014 (flickr)

Most recently, myself and Francois van Schalkwyk participated in the IDRC/World Wide Web Foundation's Open Data in Developing Countries (ODDC) network meeting which took place in Berlin to coincide with the Open Knowledge Festival, a bi-annual global event organized by the Open Knowledge Foundation.

The slogan of this year's OKFest was ‘Open Minds to Open Action’. It was inspiring to be a part of what felt like a quite historic event and there was a tangible sense of excitement around what might be possible if this extremely talented and passionate community of activists, researchers and open advocates could harness their collective energies to promote openness and improved access to information.

Against this backdrop, network members from the African ODDC projects got together in Berlin to discuss what we could be doing to promote the open agenda and consolidate the open network in Africa – particularly in the emerging areas of open science and open data. It was acknowledged that while there have been a number of funder-driven initiatives, workshops and projects across Africa, it was incumbent upon us to move to action in joining the dots between these initiatives and doing more to consolidate the Africa network.

In short, discussion amongst the African participants surfaced a general sense of frustration around:
(a) Lack of cohesion in African “open” projects and research initiatives — i.e. we can see increasing pockets of sophisticated activity but no real initiative/conversation to consolidate the agenda.
(b) Reliance on funders and partners from the North to stimulate the local conversation.
(c) The need to take the conversation around openness outside of the purely academic context in order to include NGOs/CBOs as well as private/corporate partners.

As a small first step to help address this issue, I volunteered to share a public list of Africa-based academics, university managers, advocates and practitioners that I have interacted with at “Open” events in recent months. The list is available here.

It is a modest start, but we are hoping that African colleagues will add their names to this live database, which will hopefully be of use in surfacing a local network and providing contacts for partners from other continents. Africa is an enormous continent and finding key people in niche areas can be one of the main challenges in penetrating and collaborating within this environment.

In addition to adding names to the directory, we are also encouraging anybody interested in the African conversation to tweet items of local interest using the #OpenAfrica and #scholarAfrica hashtags.

Should you be interested, preliminary insights from the Open Data in Developing Countries initiative can be accessed here.

A Storify from the special ODDC session at OKFest is available here.

(CC-BY-SA)

This blog post is written by Tariq Khokhar, Data Scientist and Open Data Evangelist at The World Bank, and is cross-posted from Open Data. The World Bank Data Blog.

I was lucky to be in Berlin with some colleagues earlier this month for the 2014 Open Knowledge Festival and associated fringe events.

There's really too much to distill into a short post – from Neelie Kroes, the European Commissioner for Digital Agenda, making the case for “Embracing the open opportunity” to Patrick Alley’s breathtaking accounts of how Global Witness uses information to expose crime and corruption in countries around the world.

A few things really stuck with me though from the dozens of great sessions throughout the week, here they are:

1) Open data needs users and long-term commitment from governments.

The “Nos Ecoles, Nos Donnees” Application in Burkina Faso

The Partnership for Open Data hosted a fantastic session highlighting examples of open data in action in low and middle income countries.

Tanzania

Joachim Mangilimai a School of Data Fellow from Tanzania showcased a Swahili mobile app he'd developed to support decision making by medical staff. The app was based on guidelines published by The Population Council and built using the Open Data Kit framework. He also highlighted Shule.info, a project by Twaweza that compiles and visualizes government data on school performance that parents can use to stay better informed.

Burkina Faso

Malick Tapsoba, the technical manager of the Open Data Burkina Faso team highlighted the difficulties they overcame in launching their open data portal in a low capacity, low connectivity environment and how the next big challenge was to nurture a community of data users. They'd also built a great school information app called “Our Schools, Our Data” that offers gender disaggregated data on school performance. They’ve done an impressive job of kick-starting their initiative in a difficult environment.

Mexico and The Philippines

We also heard from Ania Calderón of the Mexican government on their “Data Squads” program providing rapid support to different government agencies to publish high quality data to the national open data portal. Finally, Happy Feraran who created the Bantay corruption reporting platform in the Philippines emphasised the importance of mobilizing the community.

Lessons learned: There are some great open data initiatives around the world and two common themes are the need for a strong community of technologically literate data re-users, and the sustained effort needed within governments to change how they create, manage and publish data in the long term. Tim Davies has also shared “15 open data insights” from the Open Data Research Network, and you can read the ODI’s Liz Carolan’s takeaways from the event here.

2) Spreadsheets are code, and you can unit test data

A Turing Machine implemented in Excel

Jenni Tennison has declared 2014 the year of the CSV and the fringe event csv,conf was the most informative conference I've been to in a long time. With over 30 speakers on technically specialised topics to do with the creation, management and application of (mostly) tabular data there was again too much to choose from but my highlights were on “Treating spreadsheets as code” and “Unit testing for tabular data”

Spreadsheets are code

Felienne Hermans who heads The Spreadsheet Lab (I'm not kidding) at Delft University asked that if we remember one thing from her talk it’s that “spreadsheets are code”. She thinks we should treat them as such and use software engineering approaches like tests, refactoring, and designing for maintainability. She casually demonstrated that Excel is “Turing complete” and just as powerful as any other programming language, by using it to build a Turing Machine (see picture above) and highlighting some tools that can help to improve the quality of spreadsheet applications.

The first tool is Bumblebee which Felienne developed for optimizing spreadsheet formulas. It can do a lot but think about automatically replacing things like “SUM(F3:F7)/COUNT(F3:F7)” with the simpler “AVERAGE(F3:F7)” plus other user-defined or automatic transformations. She discussed another tool (which I now forget the name of) that helps with formula testing and at the end of her talk, mentioned the (commercial) service spreadgit that brings cloud-based git-like revision management to Excel.

She noted that “Like democracy, spreadsheets are the worst, except for all others” and in her “Programming and data science for the 99%” course recognizes that Excel (and open alternatives like LibreOffice and OpenOffice) are going to be the main way most people do data analysis for the foreseeable future, so we should encourage people to adopt some good software engineering habits when coding spreadsheets.

Programatically Testing data

Karthik Ram the co-founder of the awesome rOpenSci and scientist at Berkeley shared some promising work they've been doing on the testdat R package. In short, it will let you programmatically test for and correct errors like outliers, text formatting problems and invalid values in datasets. It’s still in development but you can get an idea from Karthik’s slides.

The old and new approaches to science by Karthik Ram.

He ended with a useful reminder of the changing norms the “open science” movement is creating – where once the research paper was the principal output of a scientist, it's increasingly accepted that the code, the data and separate elements of the narrative of a scientific study will all be public and available for re-use.

Lessons learned: Spreadsheets are code and we can adopt some software engineering practices to make much better use of them. There are a number of powerful tools and approaches to data handing being pioneered by the scientific community (e.g. Hadley Wickham just announced the tidyr tool for data cleaning in R) and those of us working in other fields can adopt and emulate many of them.

3) The future of civic tech (probably) lies in re-usable software components

I had a chat with the always thoughtful Tom Steinberg of mySociety just before the “Can Open Data Go Wrong?” session and Tom told me about one way he thinks open data can go right: Poplus

To use their own words, Poplus is an “open federation of people and organisations from many different countries.” with a “joint mission to share knowledge and technology that can help us to help citizens” The primary resource they've got at the moment are Poplus Components which you can think of as building blocks for more complex civic applications.

The current components are:

Represent Boundaries – a web API onto geographic areas like electoral districts

SayIt – a service to store and retrieve written transcripts of public statements

MapIt – a service that finds out which administrative area covers a specific point

WriteIt – a service to write and send messages to public persons

PopIt – a tool to keep lists of politicians and associated biographic information

BillIt – a flexible document storage tool

Why re-usable software components and not re-usable apps?

So for example, could this app built to visualize secondary school performance in the United Kingdom be re-purposed to work in Tanzania or Burkina Faso? Maybe, but probably not. Why? Because the context is different enough, that the UK-based app, like many others, just doesn't quite translate to work in other countries, so it’s just easier to build a new app designed for the local context.

This is why Poplus components are great – they abstract out functional elements of civic applications and make them easy to combine and build a more complex service on top of. Nerdier readers will remember Robert Glass' “rule of three” which states it’s three times as difficult to build reusable components as single-use components. I think Poplus understands this this and the components are carefully curated and already being re-used and combined around the world.

Lessons learned: Open data fundamentally needs open source software. App reuse often doesn't happen because contexts are too different. Reusable software components can reduce the development overhead for creating locally customized civic software applications and a pool of high quality civic software components is a valuable public good worth contributing to.

Finally, a big thank you to the organizers of OKFest and csv,conf for hosting such great gatherings. Were you at #okfest14 or #csvconf – what did you learn?