ColaLife: an experiment in ‘crowd analysing’ (crowd sourced data analysis)

Retailer data on Google Docs

>> Quick link to data:

We are all now familiar with the terms ‘Crowd sourcing’ and Crowd funding’ but what about ‘Crowd analysing’ or the ‘Crowd sourcing of data analysis’? This is an experiment in Crowd analysing. We are releasing anonymised data from our work to scale-up Kit Yamoyo in Zambia to see what others can deduce from analysing it.

Our partner, Keepers Zambia Foundation (KZF), is gathering a lot of data everyday as part of their work supporting Kit Yamoyo retailers. We use this data to produce the weekly project performance dashboard which is published twice a month here: However, a lot of the data collected has yet to be analysed. Can you help with this analysis?

Over the last 17 months (1-Jan-16 to 31-May-17) we have collected data from more than 11,000 contacts (or attempted contacts) with retailers. In the right hands, this data could go some way to answering the following important questions:

  • Do stock levels of Kit Yamoyo, sales and prices vary with:
    • Distance from Health Centre
    • Type of retail outlet (shop type)
    • Gender of the owner
    • Other variables?
  • Are retailers breaking up Kit Yamoyo and selling the components separately? If they are, is there a tendency for this to happen more frequently in one shop type than others?
  • What are the key determinants of the retail price?
  • What is the gender balance of the retailers visited vs the gender balance of retailers overall?
  • How important is the Shoprite Supermarket network as a wholesale outlet compared to traditional pharmaceutical wholesalers?
  • How dependent are retailers currently on the visiting KZF staff for their supply of Kit Yamoyo stock?
  • Do retailers registered to receive vouchers in Lusaka District differ from others in terms of:
    • the stocks they carry
    • the prices they charge or
    • the sales they achieve?
  • Some retailers have stopped stocking Kit Yamoyo. Why is this?
  • How does the performance of the different KZF fieldworkers vary and what are the key determinants of fieldworker performance?
  • And so on.

We are releasing this data through a Google Workbook on this link: This document is ‘read-only’. However, you can add comments to it and download it. The workbook contains two datasets: a dataset of retailer contacts (or attempted contacts) and a dataset of retailers which includes key characteristics for each retailer. These two datasets can be linked by the case_id field. We have sought to provide an explanation of each field within each dataset in the Google Workbook itself.

The data in these datasets has been cleaned. Any anomalies remaining will not be material. However, if you notice any please comment on this blog post and these will be investigated. Any changes to the data will be documented within the Google Workbook and as a comment on this blog post.

Please go forth and analyse! And share your findings back with us.

We cannot respond to individual emails relating to this experiment but we will respond to all questions posted as comments on this blog post and to comments on the Google Workbook.





  1. I have used the VLOOKUP function to complete the ‘shop-type’ field for all visit records. I then did a copy and paste special>values to remove the VLOOKUP formula.

    We know the ‘shop_type’ for all but 51 of the visits carried out (11,408 of 11,459).