resturcured for reproducability

2019-03-19 12:53:12 +01:00 · 2019-03-19 12:53:12 +01:00 · eaa71c9eeb
commit eaa71c9eeb
parent 7da2bfc400
102 changed files with 2251 additions and 1472517 deletions
--- a/Readme.md
+++ b/Readme.md
@ -1,5 +1,45 @@
 # Stimmen Fryslan

+## Reproducibiliy results [paper xyz]
+
+These notebooks allow for the reroducabiluty, they require access to the stimmen mysql database. One needs to request to this database.
+
+### General statistics
+
+[Statistics for Nanna's email of 2019-02-13](notebooks/Statistics for Nanna's email of 2019-02-13.ipynb)
+
+Calculates statistics of the stimmen app usage.
+
+### Regions
+
+[Partition provinces in wijken and gemeentes](notebooks/Segment Provinces in Wijken and Gemeentes.ipynb)
+
+
+Partitions Fryslan, the Dutch province, with repesct to two granularities, as defined by the CBS 'wijken' and 'gemeentes' of 2017. These partitionings are used in all maps created with the other notebooks.
+
+### Heatmaps
+
+[Frysian pronunciation occurrence](notebooks/Frysian pronunciation occurrence.ipynb)
+
+
+Creates all heatmaps illustrating the distribution of one pronunciation relative to all other pronunciations of that word.
+
+**Example:**
+
+![example pronunciation occurence map](images/heatmaps/wijken_zaterdag_snjoun.png)
+
+
+### Distribution maps
+
+Creates maps for both granularities, each illustrating the pronunciation distribution of one word.
+
+[Frysian pronunciation distribution maps](notebooks/Frysian pronunciation distribution maps.ipynb)
+
+**Example:**
+
+![example pronunciation distribution map](images/bar-maps/wijken_zaterdag.png)
+
+
 ## Notebooks

 ### Extract Frysian dialect regions
@ -75,9 +115,21 @@ This is a simple example for the created gabmap files.
 * [percentages](data/Pronunciation_percentages_example.gabmap.tsv)
 * [pronunciation](data/Pronunciations_example.gabmap.tsv)

-### Bar Maps per word for Pronounciation Occurence in Frysian Municipalities
+### Bar Maps per word for Pronunciation Occurrence in Frysian Municipalities

 For each word, a map illustrates the pronunciation occurrence as measured by the prediction quiz, per Frysian
 municipality.

-[notebook](notebooks/Bar%20Maps%20per%20word%20for%20Pronounciation%20Occurence%20in%20Frysian%20Municipalities.ipynb)
+[notebook](notebooks/Bar%20Maps%20per%20word%20for%20Pronunciation%20Occurrence%20in%20Frysian%20Municipalities.ipynb)
+
+### Heatmap per word for Pronunciation Occurrence in Frysian Municipalities
+
+[notebook](notebooks/Heatmap%20per%20word%20for%20Pronunciation%20Occurrence%20in%20Frysian%20Municipalities.ipynb)
+
+Each map displays the pronounciation occurence in Frysian municipalities for one word. Each pronunciation is represented by one map layer, and for one municipality layer the percentages for each pronunciation add up to 100% + rounding errors.
+
+### Heatmap per word for Pronunciation Occurrence in Frysian Neighborhoods
+
+Same as for Municipalities, but for Neighborhoods.
+
+[notebook](notebooks/Heatmap%20per%20word%20for%20Pronunciation%20Occurrence%20in%20Frysian%20Neighborhoods.ipynb)
--- a/data/Friesland_gemeentes.geojson
+++ b/data/Friesland_gemeentes.geojson
--- a/data/Friesland_gemeentes.kml
+++ b/data/Friesland_gemeentes.kml
--- a/data/Friesland_wijken.geojson
+++ b/data/Friesland_wijken.geojson
--- a/data/Friesland_wijken.kml
+++ b/data/Friesland_wijken.kml
--- a/data/Groningen_gemeentes.geojson
+++ b/data/Groningen_gemeentes.geojson
--- a/data/Groningen_gemeentes.kml
+++ b/data/Groningen_gemeentes.kml
--- a/data/Groningen_wijken.geojson
+++ b/data/Groningen_wijken.geojson
--- a/data/Groningen_wijken.kml
+++ b/data/Groningen_wijken.kml
--- a/(lichaamsdeel).html
+++ b/(lichaamsdeel).html
--- a/maps/bar-maps/avond.html
+++ b/maps/bar-maps/avond.html
--- a/maps/bar-maps/bij
+++ b/maps/bar-maps/bij
--- a/maps/bar-maps/blad
+++ b/maps/bar-maps/blad
--- a/(lichaamsdeel).html
+++ b/(lichaamsdeel).html
--- a/maps/bar-maps/dag.html
+++ b/maps/bar-maps/dag.html
--- a/maps/bar-maps/deurtje.html
+++ b/maps/bar-maps/deurtje.html
--- a/maps/bar-maps/geel.html
+++ b/maps/bar-maps/geel.html
--- a/maps/bar-maps/gegaan.html
+++ b/maps/bar-maps/gegaan.html
--- a/maps/bar-maps/gezet.html
+++ b/maps/bar-maps/gezet.html
--- a/maps/bar-maps/heel.html
+++ b/maps/bar-maps/heel.html
--- a/maps/bar-maps/index.html
+++ b/maps/bar-maps/index.html
@ -1,20 +1,22 @@
-<html><head></head><body>	<a href="armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
-	<a href="avond.html">avond<a><br/>
-	<a href="bij (insect).html">bij (insect)<a><br/>
-	<a href="blad (aan een boom).html">blad (aan een boom)<a><br/>
-	<a href="borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
-	<a href="dag.html">dag<a><br/>
-	<a href="deurtje.html">deurtje<a><br/>
-	<a href="geel.html">geel<a><br/>
-	<a href="gegaan.html">gegaan<a><br/>
-	<a href="gezet.html">gezet<a><br/>
-	<a href="heel.html">heel<a><br/>
+<html><head></head><body>	<a href="gemeentes_avond.html">gemeentes avond<a><br/>
 	<a href="index.html">index<a><br/>
-	<a href="kaas.html">kaas<a><br/>
-	<a href="koken.html">koken<a><br/>
-	<a href="oog.html">oog<a><br/>
-	<a href="sprak (toe).html">sprak (toe)<a><br/>
-	<a href="tand.html">tand<a><br/>
-	<a href="trein.html">trein<a><br/>
-	<a href="vis.html">vis<a><br/>
-	<a href="zaterdag.html">zaterdag<a></body></html>
+	<a href="neighborhood_armen (lichaamsdeel).html">neighborhood armen (lichaamsdeel)<a><br/>
+	<a href="neighborhood_avond.html">neighborhood avond<a><br/>
+	<a href="neighborhood_bij (insect).html">neighborhood bij (insect)<a><br/>
+	<a href="neighborhood_blad (aan een boom).html">neighborhood blad (aan een boom)<a><br/>
+	<a href="neighborhood_borst (lichaamsdeel).html">neighborhood borst (lichaamsdeel)<a><br/>
+	<a href="neighborhood_dag.html">neighborhood dag<a><br/>
+	<a href="neighborhood_deurtje.html">neighborhood deurtje<a><br/>
+	<a href="neighborhood_geel.html">neighborhood geel<a><br/>
+	<a href="neighborhood_gegaan.html">neighborhood gegaan<a><br/>
+	<a href="neighborhood_gezet.html">neighborhood gezet<a><br/>
+	<a href="neighborhood_heel.html">neighborhood heel<a><br/>
+	<a href="neighborhood_kaas.html">neighborhood kaas<a><br/>
+	<a href="neighborhood_koken.html">neighborhood koken<a><br/>
+	<a href="neighborhood_oog.html">neighborhood oog<a><br/>
+	<a href="neighborhood_sprak (toe).html">neighborhood sprak (toe)<a><br/>
+	<a href="neighborhood_tand.html">neighborhood tand<a><br/>
+	<a href="neighborhood_trein.html">neighborhood trein<a><br/>
+	<a href="neighborhood_vis.html">neighborhood vis<a><br/>
+	<a href="neighborhood_zaterdag.html">neighborhood zaterdag<a><br/>
+	<a href="wijken_avond.html">wijken avond<a></body></html>
--- a/maps/bar-maps/kaas.html
+++ b/maps/bar-maps/kaas.html
--- a/maps/bar-maps/koken.html
+++ b/maps/bar-maps/koken.html
--- a/maps/bar-maps/oog.html
+++ b/maps/bar-maps/oog.html
--- a/maps/bar-maps/sprak
+++ b/maps/bar-maps/sprak
--- a/maps/bar-maps/tand.html
+++ b/maps/bar-maps/tand.html
--- a/maps/bar-maps/trein.html
+++ b/maps/bar-maps/trein.html
--- a/maps/bar-maps/vis.html
+++ b/maps/bar-maps/vis.html
--- a/maps/bar-maps/zaterdag.html
+++ b/maps/bar-maps/zaterdag.html
--- a/maps/heatmaps-combined/armen
+++ b/maps/heatmaps-combined/armen
--- a/maps/heatmaps-combined/avond.html
+++ b/maps/heatmaps-combined/avond.html
--- a/maps/heatmaps-combined/bij
+++ b/maps/heatmaps-combined/bij
--- a/maps/heatmaps-combined/blad
+++ b/maps/heatmaps-combined/blad
--- a/maps/heatmaps-combined/borst
+++ b/maps/heatmaps-combined/borst
--- a/maps/heatmaps-combined/dag.html
+++ b/maps/heatmaps-combined/dag.html
--- a/maps/heatmaps-combined/deurtje.html
+++ b/maps/heatmaps-combined/deurtje.html
--- a/maps/heatmaps-combined/geel.html
+++ b/maps/heatmaps-combined/geel.html
--- a/maps/heatmaps-combined/gegaan.html
+++ b/maps/heatmaps-combined/gegaan.html
--- a/maps/heatmaps-combined/gezet.html
+++ b/maps/heatmaps-combined/gezet.html
--- a/maps/heatmaps-combined/heel.html
+++ b/maps/heatmaps-combined/heel.html
--- a/maps/heatmaps-combined/index.html
+++ b/maps/heatmaps-combined/index.html
@ -1,20 +0,0 @@
-<html><head></head><body>	<a href="armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
-	<a href="avond.html">avond<a><br/>
-	<a href="bij (insect).html">bij (insect)<a><br/>
-	<a href="blad (aan een boom).html">blad (aan een boom)<a><br/>
-	<a href="borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
-	<a href="dag.html">dag<a><br/>
-	<a href="deurtje.html">deurtje<a><br/>
-	<a href="geel.html">geel<a><br/>
-	<a href="gegaan.html">gegaan<a><br/>
-	<a href="gezet.html">gezet<a><br/>
-	<a href="heel.html">heel<a><br/>
-	<a href="index.html">index<a><br/>
-	<a href="kaas.html">kaas<a><br/>
-	<a href="koken.html">koken<a><br/>
-	<a href="oog.html">oog<a><br/>
-	<a href="sprak (toe).html">sprak (toe)<a><br/>
-	<a href="tand.html">tand<a><br/>
-	<a href="trein.html">trein<a><br/>
-	<a href="vis.html">vis<a><br/>
-	<a href="zaterdag.html">zaterdag<a></body></html>
--- a/maps/heatmaps-combined/kaas.html
+++ b/maps/heatmaps-combined/kaas.html
--- a/maps/heatmaps-combined/koken.html
+++ b/maps/heatmaps-combined/koken.html
--- a/maps/heatmaps-combined/oog.html
+++ b/maps/heatmaps-combined/oog.html
--- a/maps/heatmaps-combined/sprak
+++ b/maps/heatmaps-combined/sprak
--- a/maps/heatmaps-combined/tand.html
+++ b/maps/heatmaps-combined/tand.html
--- a/maps/heatmaps-combined/trein.html
+++ b/maps/heatmaps-combined/trein.html
--- a/maps/heatmaps-combined/vis.html
+++ b/maps/heatmaps-combined/vis.html
--- a/maps/heatmaps-combined/zaterdag.html
+++ b/maps/heatmaps-combined/zaterdag.html
--- a/maps/heatmaps-wijk/armen
+++ b/maps/heatmaps-wijk/armen
--- a/maps/heatmaps-wijk/avond.html
+++ b/maps/heatmaps-wijk/avond.html
--- a/maps/heatmaps-wijk/bij
+++ b/maps/heatmaps-wijk/bij
--- a/maps/heatmaps-wijk/blad
+++ b/maps/heatmaps-wijk/blad
--- a/maps/heatmaps-wijk/borst
+++ b/maps/heatmaps-wijk/borst
--- a/maps/heatmaps-wijk/dag.html
+++ b/maps/heatmaps-wijk/dag.html
--- a/maps/heatmaps-wijk/deurtje.html
+++ b/maps/heatmaps-wijk/deurtje.html
--- a/maps/heatmaps-wijk/geel.html
+++ b/maps/heatmaps-wijk/geel.html
--- a/maps/heatmaps-wijk/gegaan.html
+++ b/maps/heatmaps-wijk/gegaan.html
--- a/maps/heatmaps-wijk/gezet.html
+++ b/maps/heatmaps-wijk/gezet.html
--- a/maps/heatmaps-wijk/heel.html
+++ b/maps/heatmaps-wijk/heel.html
--- a/maps/heatmaps-wijk/index.html
+++ b/maps/heatmaps-wijk/index.html
@ -1,20 +0,0 @@
-<html><head></head><body>	<a href="armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
-	<a href="avond.html">avond<a><br/>
-	<a href="bij (insect).html">bij (insect)<a><br/>
-	<a href="blad (aan een boom).html">blad (aan een boom)<a><br/>
-	<a href="borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
-	<a href="dag.html">dag<a><br/>
-	<a href="deurtje.html">deurtje<a><br/>
-	<a href="geel.html">geel<a><br/>
-	<a href="gegaan.html">gegaan<a><br/>
-	<a href="gezet.html">gezet<a><br/>
-	<a href="heel.html">heel<a><br/>
-	<a href="index.html">index<a><br/>
-	<a href="kaas.html">kaas<a><br/>
-	<a href="koken.html">koken<a><br/>
-	<a href="oog.html">oog<a><br/>
-	<a href="sprak (toe).html">sprak (toe)<a><br/>
-	<a href="tand.html">tand<a><br/>
-	<a href="trein.html">trein<a><br/>
-	<a href="vis.html">vis<a><br/>
-	<a href="zaterdag.html">zaterdag<a></body></html>
--- a/maps/heatmaps-wijk/kaas.html
+++ b/maps/heatmaps-wijk/kaas.html
--- a/maps/heatmaps-wijk/koken.html
+++ b/maps/heatmaps-wijk/koken.html
--- a/maps/heatmaps-wijk/oog.html
+++ b/maps/heatmaps-wijk/oog.html
--- a/maps/heatmaps-wijk/sprak
+++ b/maps/heatmaps-wijk/sprak
--- a/maps/heatmaps-wijk/tand.html
+++ b/maps/heatmaps-wijk/tand.html
--- a/maps/heatmaps-wijk/trein.html
+++ b/maps/heatmaps-wijk/trein.html
--- a/maps/heatmaps-wijk/vis.html
+++ b/maps/heatmaps-wijk/vis.html
--- a/maps/heatmaps-wijk/zaterdag.html
+++ b/maps/heatmaps-wijk/zaterdag.html
--- a/(lichaamsdeel).html
+++ b/(lichaamsdeel).html
--- a/maps/heatmaps/avond.html
+++ b/maps/heatmaps/avond.html
--- a/maps/heatmaps/bij
+++ b/maps/heatmaps/bij
--- a/maps/heatmaps/blad
+++ b/maps/heatmaps/blad
--- a/(lichaamsdeel).html
+++ b/(lichaamsdeel).html
--- a/maps/heatmaps/dag.html
+++ b/maps/heatmaps/dag.html
--- a/maps/heatmaps/deurtje.html
+++ b/maps/heatmaps/deurtje.html
--- a/maps/heatmaps/geel.html
+++ b/maps/heatmaps/geel.html
--- a/maps/heatmaps/gegaan.html
+++ b/maps/heatmaps/gegaan.html
--- a/maps/heatmaps/gezet.html
+++ b/maps/heatmaps/gezet.html
--- a/maps/heatmaps/heel.html
+++ b/maps/heatmaps/heel.html
--- a/maps/heatmaps/index.html
+++ b/maps/heatmaps/index.html
@ -1,20 +0,0 @@
-<html><head></head><body>	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/avond.html">avond<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/bij (insect).html">bij (insect)<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/blad (aan een boom).html">blad (aan een boom)<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/dag.html">dag<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/deurtje.html">deurtje<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/geel.html">geel<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/gegaan.html">gegaan<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/gezet.html">gezet<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/heel.html">heel<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/index.html">index<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/kaas.html">kaas<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/koken.html">koken<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/oog.html">oog<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/sprak (toe).html">sprak (toe)<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/tand.html">tand<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/trein.html">trein<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/vis.html">vis<a><br/>
-	<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/zaterdag.html">zaterdag<a></body></html>
--- a/maps/heatmaps/kaas.html
+++ b/maps/heatmaps/kaas.html
--- a/maps/heatmaps/koken.html
+++ b/maps/heatmaps/koken.html
--- a/maps/heatmaps/oog.html
+++ b/maps/heatmaps/oog.html
--- a/maps/heatmaps/sprak
+++ b/maps/heatmaps/sprak
--- a/maps/heatmaps/tand.html
+++ b/maps/heatmaps/tand.html
--- a/maps/heatmaps/trein.html
+++ b/maps/heatmaps/trein.html
--- a/maps/heatmaps/vis.html
+++ b/maps/heatmaps/vis.html
--- a/maps/heatmaps/zaterdag.html
+++ b/maps/heatmaps/zaterdag.html
--- a/Municipalities.ipynb
+++ b/Municipalities.ipynb
--- a/notebooks/Dialect
+++ b/notebooks/Dialect
--- a/notebooks/Gabmap
+++ b/notebooks/Gabmap
@ -1,83 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Gabmap format\n",
-    "\n",
-    "Exploration of the format of the lines in example Gabmap files Martijn had sent."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "with open('../data/martijn_format/Dutch613-coordinates.txt') as f:\n",
-    "    coordinates = list(f)\n",
-    "    \n",
-    "with open('../data/martijn_format/Nederlands-ipa.utxt') as f:\n",
-    "    table = list(f)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "coordinates[0].split('\\t')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "coordinates[1].split('\\t')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "table[0].split('\\t')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "table[1].split('\\t')"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/notebooks/Gabmap
+++ b/notebooks/Gabmap
@ -1,458 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Geographical pronunciation tables, simple example\n",
-    "\n",
-    "Simple example to create gabmap files for two words with few pronunciations an two regions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import sys\n",
-    "sys.path.append('..')\n",
-    "\n",
-    "import pandas\n",
-    "import MySQLdb\n",
-    "import json\n",
-    "import copy\n",
-    "\n",
-    "db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')\n",
-    "\n",
-    "from shapely.geometry import shape, Point\n",
-    "\n",
-    "from gabmap import create_gabmap_dataframes\n",
-    "\n",
-    "from stimmen.geojson import merge_features"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "with open('../data/Friesland_wijken.geojson') as f:\n",
-    "    regions = json.load(f)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Load and simplify"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Answers to how participants state a word should be pronounced\n",
-    "\n",
-    "answers = pandas.read_sql('''\n",
-    "SELECT prediction_quiz_id, user_lat, user_lng, question_text, answer_text\n",
-    "FROM       core_surveyresult as survey\n",
-    "INNER JOIN core_predictionquizresult as result ON survey.id = result.survey_result_id\n",
-    "INNER JOIN core_predictionquizresultquestionanswer as answer\n",
-    "    ON result.id = answer.prediction_quiz_id\n",
-    "''', db)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "regions_simple = merge_features(copy.deepcopy(regions),\n",
-    "    condition=lambda feature: feature['properties']['GM_NAAM'] == 'Heerenveen',\n",
-    ")\n",
-    "\n",
-    "regions_simple = merge_features(\n",
-    "    regions_simple,\n",
-    "    condition=lambda feature: feature['properties']['GM_NAAM'] == 'Leeuwarden',\n",
-    ")\n",
-    "regions_simple['features'] = regions_simple['features'][-2:]\n",
-    "\n",
-    "regions_simple['features'][0]['properties']['name'] = 'Heerenveen'\n",
-    "regions_simple['features'][1]['properties']['name'] = 'Leeuwarden'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "answers_simple = answers[\n",
-    "    (answers['question_text'] == '\"blad\" (aan een boom)') |\n",
-    "    (answers['question_text'] == '\"vis\"')\n",
-    "].copy()\n",
-    "\n",
-    "answers_simple['question_text'] = answers_simple['question_text'].map(\n",
-    "    lambda x: x.replace('\"', '').replace('*', ''))\n",
-    "\n",
-    "answers_simple['answer_text'] = answers_simple['answer_text'].map(\n",
-    "    lambda x: x[x.find('('):x.find(')')][1:])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Two words, boom and vis, with each 4 and 2 pronunciations"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>answer_text</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>question_text</th>\n",
-       "      <th></th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>blad (aan een boom)</th>\n",
-       "      <td>4</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>vis</th>\n",
-       "      <td>2</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "                     answer_text\n",
-       "question_text                   \n",
-       "blad (aan een boom)            4\n",
-       "vis                            2"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "answers_simple.groupby('question_text').agg({'answer_text': lambda x: len(set(x))})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "centroids_example, pronunciations_example, counts_example = create_gabmap_dataframes(\n",
-    "    regions_simple, answers_simple,\n",
-    "    latitude_column='user_lat', longitude_column='user_lng',\n",
-    "    word_column='question_text', pronunciation_column='answer_text',\n",
-    "    region_name_property='name'\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Resulting tables\n",
-    "\n",
-    "Stored as tab separated files for gabmap"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>latitude</th>\n",
-       "      <th>longitude</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>#name</th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>Heerenveen</th>\n",
-       "      <td>52.996076</td>\n",
-       "      <td>5.977925</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Leeuwarden</th>\n",
-       "      <td>53.169940</td>\n",
-       "      <td>5.797613</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "             latitude  longitude\n",
-       "#name                           \n",
-       "Heerenveen  52.996076   5.977925\n",
-       "Leeuwarden  53.169940   5.797613"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "centroids_example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>blad (aan een boom)</th>\n",
-       "      <th>vis</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>Heerenveen</th>\n",
-       "      <td>blet / blɑt / blɔd / blɛ:t</td>\n",
-       "      <td>fisk / fɪs</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Leeuwarden</th>\n",
-       "      <td>blet / blɑt / blɔd / blɛ:t</td>\n",
-       "      <td>fisk / fɪs</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "                   blad (aan een boom)         vis\n",
-       "                                                  \n",
-       "Heerenveen  blet / blɑt / blɔd / blɛ:t  fisk / fɪs\n",
-       "Leeuwarden  blet / blɑt / blɔd / blɛ:t  fisk / fɪs"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "pronunciations_example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>blad (aan een boom): blet</th>\n",
-       "      <th>blad (aan een boom): blɑt</th>\n",
-       "      <th>blad (aan een boom): blɔd</th>\n",
-       "      <th>blad (aan een boom): blɛ:t</th>\n",
-       "      <th>vis: fisk</th>\n",
-       "      <th>vis: fɪs</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>Heerenveen</th>\n",
-       "      <td>31.654676</td>\n",
-       "      <td>2.158273</td>\n",
-       "      <td>2.158273</td>\n",
-       "      <td>64.028777</td>\n",
-       "      <td>52.517986</td>\n",
-       "      <td>47.482014</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Leeuwarden</th>\n",
-       "      <td>7.865169</td>\n",
-       "      <td>7.022472</td>\n",
-       "      <td>8.707865</td>\n",
-       "      <td>76.404494</td>\n",
-       "      <td>75.000000</td>\n",
-       "      <td>25.000000</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "            blad (aan een boom): blet  blad (aan een boom): blɑt  \\\n",
-       "                                                                   \n",
-       "Heerenveen                  31.654676                   2.158273   \n",
-       "Leeuwarden                   7.865169                   7.022472   \n",
-       "\n",
-       "            blad (aan een boom): blɔd  blad (aan een boom): blɛ:t  vis: fisk  \\\n",
-       "                                                                               \n",
-       "Heerenveen                   2.158273                   64.028777  52.517986   \n",
-       "Leeuwarden                   8.707865                   76.404494  75.000000   \n",
-       "\n",
-       "             vis: fɪs  \n",
-       "                       \n",
-       "Heerenveen  47.482014  \n",
-       "Leeuwarden  25.000000  "
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "counts_example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pronunciations_example.to_csv('../data/Pronunciations_example.gabmap.tsv', sep='\\t')\n",
-    "counts_example.to_csv('../data/Pronunciation_percentages_example.gabmap.tsv', sep='\\t')\n",
-    "centroids_example.to_csv('../data/Centroids_example.gabmap.tsv', sep='\\t', columns=['longitude', 'latitude'])\n",
-    "with open('../data/Gabmap_example.geojson', 'w') as f:\n",
-    "    json.dump(regions_simple, f, indent=1)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/notebooks/Gabmap
+++ b/notebooks/Gabmap
@ -1,157 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Geographical pronunciation tables\n",
-    "\n",
-    "Creates gabmap files with region centroids, percentages and pronunciations for wijken in Friesland."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import sys\n",
-    "sys.path.append('..')\n",
-    "\n",
-    "import pandas\n",
-    "import MySQLdb\n",
-    "import json\n",
-    "import copy\n",
-    "\n",
-    "db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')\n",
-    "\n",
-    "from shapely.geometry import shape, Point\n",
-    "\n",
-    "from gabmap import create_gabmap_dataframes"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "with open('../data/Friesland_wijken.geojson') as f:\n",
-    "    regions = json.load(f)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Answers to how participants state a word should be pronounced\n",
-    "\n",
-    "answers = pandas.read_sql('''\n",
-    "SELECT prediction_quiz_id, user_lat, user_lng, question_text, answer_text\n",
-    "FROM       core_surveyresult as survey\n",
-    "INNER JOIN core_predictionquizresult as result ON survey.id = result.survey_result_id\n",
-    "INNER JOIN core_predictionquizresultquestionanswer as answer\n",
-    "    ON result.id = answer.prediction_quiz_id\n",
-    "''', db)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "zero_latlng_questions = {\n",
-    "    q\n",
-    "    for q, row in answers.groupby('question_text').agg('std').iterrows()\n",
-    "    if row['user_lat'] == 0 and row['user_lng'] == 0\n",
-    "}\n",
-    "answers_filtered = answers[answers['question_text'].map(lambda x: x not in zero_latlng_questions)].copy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array(['gegaan', 'avond', 'heel', 'dag', 'bij (insect)', 'sprak (toe)',\n",
-       "       'oog', 'armen (lichaamsdeel)', 'kaas', 'deurtje', 'koken',\n",
-       "       'borst (lichaamsdeel)', 'vis', 'zaterdag', 'trein', 'geel', 'tand',\n",
-       "       'gezet', 'blad (aan een boom)'], dtype=object)"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "answers_filtered['question_text'].unique()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "answers_filtered['question_text'] = answers_filtered['question_text'].map(\n",
-    "    lambda x: x.replace('\"', '').replace('*', ''))\n",
-    "\n",
-    "answers_filtered['answer_text'] = answers_filtered['answer_text'].map(\n",
-    "    lambda x: x[x.find('('):x.find(')')][1:])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "centroids, pronunciations, counts = create_gabmap_dataframes(\n",
-    "    regions, answers_filtered,\n",
-    "    latitude_column='user_lat', longitude_column='user_lng',\n",
-    "    word_column='question_text', pronunciation_column='answer_text',\n",
-    "    region_name_property='gemeente_en_wijk_naam'\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pronunciations.to_csv('../data/Friesland_wijken_pronunciations.gabmap.tsv', sep='\\t')\n",
-    "counts.to_csv('../data/Friesland_wijken_pronunciation_percentages.gabmap.tsv', sep='\\t')\n",
-    "centroids.to_csv('../data/Friesland_wijken_centroids.gabmap.tsv', sep='\\t', columns=['longitude', 'latitude'])"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/notebooks/Group
+++ b/notebooks/Group
@ -1,265 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Group recordings in 4 Frysian dialect regions\n",
-    "\n",
-    " * Klaaifrysk\n",
-    " * Waldfrysk\n",
-    " * Sudwesthoeksk\n",
-    " * Noardhoeksk\n",
-    " \n",
-    "First run `Dialect Regions from image.ipynb`.\n",
-    "\n",
-    "![dialect regions](../data/dialects.png)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from math import floor\n",
-    "import json\n",
-    "import pandas\n",
-    "import MySQLdb\n",
-    "from collections import Counter\n",
-    "\n",
-    "from math import sqrt\n",
-    "import numpy as np\n",
-    "from shapely.geometry import shape, Point\n",
-    "from vincenty import vincenty\n",
-    "\n",
-    "from jupyter_progressbar import ProgressBar\n",
-    "\n",
-    "db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Input\n",
-    "\n",
-    "Load the geojson with the dialect region and create shapely shapes."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "with open('../data/fryslan_dialect_regions.geojson', 'r') as f:\n",
-    "    geojson = json.load(f)\n",
-    "\n",
-    "dialect_regions = [region['properties']['dialect'] for region in geojson['features']]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "shapes = {\n",
-    "    feature['properties']['dialect']: shape(feature['geometry'])\n",
-    "    for feature in geojson['features']\n",
-    "}\n",
-    "\n",
-    "def regions_for(coordinate):\n",
-    "    regions = {\n",
-    "        region_name\n",
-    "        for region_name, shape in shapes.items()\n",
-    "        if shape.contains(Point(*coordinate))\n",
-    "    }\n",
-    "    return regions\n",
-    "\n",
-    "def distance_to_shape(shape, longitude, latitude):\n",
-    "    ext = shape.exterior\n",
-    "    p = ext.interpolate(ext.project(Point(longitude, latitude)))\n",
-    "    return vincenty((latitude, longitude), (p.y, p.x))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Query and process\n",
-    "\n",
-    "Query all picture game and free speech recordings and assign the dialect region."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def dialect_regions_and_distance(data):\n",
-    "    return[\n",
-    "        {\n",
-    "            'dialects': [\n",
-    "                {\n",
-    "                    'dialect': dialect,\n",
-    "                    'boundary_distance': distance_to_shape(shapes[dialect], longitude, latitude),\n",
-    "                }\n",
-    "                for dialect in regions_for((longitude, latitude))\n",
-    "            ],\n",
-    "            'filename': filename,\n",
-    "        }\n",
-    "        for filename, (latitude, longitude) in ProgressBar(\n",
-    "            data[['latitude', 'longitude']].iterrows(),\n",
-    "            size=len(data)\n",
-    "        )\n",
-    "    ]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "picture_games = pandas.read_sql('''\n",
-    "SELECT language.name as language, item.name as picture,\n",
-    "       survey.user_lat as latitude, survey.user_lng as longitude,\n",
-    "       survey.area_name as area, survey.country_name as country,\n",
-    "       result.recording as filename,\n",
-    "       result.submitted_at as date\n",
-    "FROM       core_surveyresult as survey\n",
-    "INNER JOIN core_picturegameresult as result ON survey.id = result.survey_result_id\n",
-    "INNER JOIN core_language as language ON language.id = result.language_id\n",
-    "INNER JOIN core_picturegameitem as item\n",
-    "    ON result.picture_game_item_id = item.id\n",
-    "''', db)\n",
-    "picture_games.set_index('filename', inplace=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "5825449a737b4fcab38a4f4ac2adfd87",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "VBox(children=(HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='<b>0</b>s passed', placeholder='0…"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "dialect_region_per_picture_game = dialect_regions_and_distance(picture_games)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "df = pandas.DataFrame([\n",
-    "    [r['filename'], r['dialects'][0]['dialect'], r['dialects'][0]['boundary_distance']]\n",
-    "    for r in dialect_region_per_picture_game\n",
-    "    if len(r['dialects']) == 1\n",
-    "], columns = ['filename', 'dialect', 'boundary_distance'])\n",
-    "\n",
-    "df.to_excel('../data/picture_game_recordings_by_dialect.xlsx')\n",
-    "df.to_csv('../data/picture_game_recordings_by_dialect.csv')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "free_speech_games = pandas.read_sql('''\n",
-    "SELECT language.name as language,\n",
-    "       survey.user_lat as latitude, survey.user_lng as longitude,\n",
-    "       survey.area_name as area, survey.country_name as country,\n",
-    "       result.recording as filename,\n",
-    "       result.submitted_at as date\n",
-    "FROM       core_surveyresult as survey\n",
-    "INNER JOIN core_freespeechresult as result ON survey.id = result.survey_result_id\n",
-    "INNER JOIN core_language as language ON language.id = result.language_id\n",
-    "''', db)\n",
-    "free_speech_games.set_index('filename', inplace=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "8afad9f71e544658b554b828932d7769",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "VBox(children=(HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='<b>0</b>s passed', placeholder='0…"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "dialect_region_per_free_speech = dialect_regions_and_distance(free_speech_games)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "df = pandas.DataFrame([\n",
-    "    [r['filename'], r['dialects'][0]['dialect'], r['dialects'][0]['boundary_distance']]\n",
-    "    for r in dialect_region_per_free_speech\n",
-    "    if len(r['dialects']) == 1\n",
-    "], columns = ['filename', 'dialect', 'boundary_distance'])\n",
-    "\n",
-    "df.to_excel('../data/free_speech_recordings_by_dialect.xlsx')\n",
-    "df.to_csv('../data/free_speech_recordings_by_dialect.csv')"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}
--- a/pronunciation.ipynb
+++ b/pronunciation.ipynb
--- a/notebooks/Segment
+++ b/notebooks/Segment
@ -7,13 +7,11 @@
    "# Segment provinces\n",
    "\n",
    "\n",
-    "Create wijk and gemeente level segmentations for all Dutch provinces and save as geojson and Gabmap KML.\n",
+    "Create wijk and gemeente level segmentations for two Dutch provinces, Groningen and Friesland, and save as geojson and Gabmap KML.\n",
    "\n",
-    "All is based on CBS data.\n",
+    "All is based on [CBS data](https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische%20data/wijk-en-buurtkaart-2017)\n",
    "\n",
-    "For Friesland, several wijken are merged.\n",
-    "\n",
-    "Note: only applied to Groningen and Friesland, because other provinces give gemetry errors."
+    "For Friesland, several wijken are merged, in particular those of the municipalities Ameland, Harlingen, Schiermonnikoog, Terschelling and Vlieland, and those of Leeuwarden with centroid above 53.167. These neighborhoods are small in area and hence we decided to merge, to avoid a "
   ]
  },
  {
@ -29,7 +27,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
@ -53,13 +51,36 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 4,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Groningen\n",
+      "0\n",
+      "1\n",
+      "2\n",
+      "3\n",
+      "4\n",
+      "5\n",
+      "6\n",
+      "Friesland\n",
+      "0\n",
+      "1\n",
+      "2\n",
+      "3\n",
+      "4\n",
+      "5\n",
+      "6\n"
+     ]
+    }
+   ],
   "source": [
    "for province in ['Groningen', 'Friesland']:\n",
-    "    wijken_geojson = gwb_in_province(province, 'wijk', 2018)\n",
-    "    gemeente_geojson = gwb_in_province(province, 'gem', 2018)\n",
+    "    wijken_geojson = gwb_in_province(province, 'wijk', 2018, polygon_simplification=None)\n",
+    "    gemeente_geojson = gwb_in_province(province, 'gem', 2018, polygon_simplification=None)\n",
    "    \n",
    "    if province == 'Friesland':\n",
    "        for gemeente in {'Ameland', 'Harlingen', 'Schiermonnikoog', 'Terschelling', 'Vlieland'}:\n",
@ -106,6 +127,13 @@
    "    with open('../data/{}_gemeentes.kml'.format(province), 'w') as f:\n",
    "        f.write(as_gabmap_kml(gemeente_geojson, name_property='gemeente_naam'))"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
--- a/Segmentations.ipynb
+++ b/Segmentations.ipynb
--- a/stimmen/cbs.py
+++ b/stimmen/cbs.py
@ -24,7 +24,7 @@ def clear_cache_poll():
    global __cache, __last_access
    while True:
        time.sleep(60*60)
-        if (time.time() - __last_access) > 60*60:
+        if (__last_access is not None and time.time() - __last_access) > 60*60:
            __cache = {}


@ -60,9 +60,15 @@ def province_geojson(province, with_water=False):
    return __cache[(province, with_water)]


+def expand_box(x0, y0, x1, y1, f=0.1):
+    return (x0 - (x1 - x0)*f, y0 - (y1 - y0)*f,
+            x1 + (x1 - x0)*f, y1 + (y1 - y0)*f)
+
+
 def gwb_in_province(
        province='Friesland', region_level='wijk', region_year='2018',
-        polygon_simplification=0.001, province_dilation=0.0005
+        polygon_simplification=0.001, province_dilation=0.0005,
+        bounding_box_dilation=0.01
 ):
    assert region_level in {'gem', 'wijk', 'buurt'}, (
        "region_level {} not supported, must be gem, wijk or buurt".format(region_level))
@ -70,7 +76,7 @@ def gwb_in_province(
        "region_year {} not supported, must 2017 or 2018".format(region_year))

    province_with_water = shape(province_geojson(province, with_water=True)['geometry'])
-    province_bounding_box = box(*province_with_water.bounds)
+    province_bounding_box = box(*expand_box(*province_with_water.bounds, f=bounding_box_dilation))

    province_land_only = shape(province_geojson(province, with_water=False)['geometry'])
    province_land_only_dilated = province_land_only.buffer(-province_dilation)
@ -80,7 +86,10 @@ def gwb_in_province(
    shapes = [shape(geojson_['geometry']) for geojson_ in geojson['features']]

    shapes, geojson = map(list, zip(*(
+        (
            (intersection.simplify(tolerance=polygon_simplification), geojson_)
+            if polygon_simplification is not None else (intersection, geojson_)
+        )
        for shape_, geojson_ in zip(shapes, geojson['features'])
        if province_bounding_box.contains(shape_)
        for intersection in [shape_.intersection(province_land_only_dilated)] # alias
--- a/stimmen/folium.py
+++ b/stimmen/folium.py
@ -1,5 +1,6 @@
 import folium
 from jupyter_progressbar import ProgressBar
+from matplotlib import pyplot
 from pygeoif.geometry import mapping
 from shapely.geometry.geo import shape, box

@ -9,6 +10,12 @@ import numpy as np

 from stimmen.latitude_longitude import reverse_latitude_longitude

+import tempfile
+import time
+from selenium import webdriver
+from .folium_injections import *
+from .folium_colorbar import *
+

 def get_palette(n, no_black=True, no_white=True):
    with open(data_file('data', 'glasbey', '{}_colors.txt'.format(n + no_black + no_white))) as f:
@ -21,7 +28,7 @@ def get_palette(n, no_black=True, no_white=True):


 def colored_name(name, color):
-    return '<span style=\\"color:{}; \\">{}</span>'.format(color, name)
+    return '<span class=\\"with-block\\" style=\\"color:{}; \\"><span class=\\"blackable; \\">{}</span></span>'.format(color, name)


 def region_area_cdf(region_shape, resolution=10000):
@ -78,9 +85,12 @@ def pronunciation_bars(
    regions, dataframe,
    region_name_property, region_name_column,
    group_column='answer_text',
+    count_column=None,
    cutoff_percentage=0.05,
    normalize_area=True,
    progress_bar=False,
+    area_adjust_resolution=10000,
+    simplify_shapes=None,
 ):
    # all values of group_column that appear at least cutoff_percentage in one of the regions
    relevant_groups = {
@ -103,7 +113,7 @@ def pronunciation_bars(
    feature_groups = {
        group_value: folium.FeatureGroup(
            name=colored_name(
-                '{value} ({amount})'.format(value=escape(group_value), amount=amount),
+                '{value} <span class=\\"amount\\">({amount})</span>'.format(value=escape(group_value), amount=amount),
                color
            ),
            overlay=True
@ -114,6 +124,7 @@ def pronunciation_bars(
        if group_value != 'other' else
        n_other
    ]  # alias
+        if amount > 0
    }

    progress_bar = ProgressBar if progress_bar else lambda x: x
@ -123,6 +134,8 @@ def pronunciation_bars(
        region_name = feature['properties'][region_name_property]
        region_rows = dataframe[dataframe[region_name_column] == region_name]
        region_shape = shape(feature['geometry'])
+        if simplify_shapes:
+            region_shape = region_shape.simplify(simplify_shapes)
        _, ymin, _, ymax = region_shape.bounds

        group_values_occurrence = {
@ -136,14 +149,16 @@ def pronunciation_bars(
            key=lambda x: (x[0] == 'other', -x[1])
        ))

-        group_percentages = np.array(group_occurrences) / len(region_rows)
-        group_boundaries = np.cumsum((0,) + group_occurrences) / len(region_rows)
+        group_percentages = np.array(group_occurrences) / max(1, len(region_rows))
+        group_boundaries = np.cumsum((0,) + group_occurrences) / max(1, len(region_rows))
        if normalize_area:
            if '__region_shape_cdf_cache' not in feature['properties']:
-                feature['properties']['__region_shape_cdf_cache'] = region_area_cdf(region_shape).tolist()
+                feature['properties']['__region_shape_cdf_cache'] = region_area_cdf(
+                    region_shape, resolution=area_adjust_resolution).tolist()
            group_boundaries = area_adjust_boundaries(
                region_shape, group_boundaries,
-                region_cdf_cache=feature['properties']['__region_shape_cdf_cache']
+                region_cdf_cache=feature['properties']['__region_shape_cdf_cache'],
+                resolution=area_adjust_resolution
            )
        else:
            group_boundaries = width_adjust_boundaries(region_shape, group_boundaries)
@ -158,7 +173,7 @@ def pronunciation_bars(
                continue

            bar_shape = region_shape.intersection(box(left_boundary, ymin, right_boundary, ymax))
-            if bar_shape.area == 0:
+            if bar_shape.area == 0 or group_occurrences == 0:
                continue
            polygon = folium.Polygon(
                reverse_latitude_longitude(mapping(bar_shape)['coordinates']),
@ -167,6 +182,213 @@ def pronunciation_bars(
                color=None,
                popup='{} ({}, {: 3d}%)'.format(group_value, count, int(round(100 * percentage)))
            )
+            polygon._bar_shape = bar_shape
            polygon.add_to(feature_groups[group_value])

    return feature_groups
+
+
+def shape_label(region_shape, label, font_size=12):
+    return folium.map.Marker(
+        [region_shape.centroid.y, region_shape.centroid.x],
+        icon=folium.DivIcon(
+            icon_size=(50 / 12 * font_size, 24 / 12 * font_size),
+            icon_anchor=(25 / 12 * font_size, font_size),
+            html=(
+                '<div class="percentage-label" style="font-size: {}pt; '
+                'background-color: rgba(255,255,255,0.8); border-radius: {}px; text-align: center;">'
+                '{}</div>').format(font_size, font_size, label),
+        )
+    )
+
+
+def pronunciation_heatmaps(
+    regions, dataframe,
+    region_name_property, region_name_column,
+    group_column='answer_text',
+    cmap=pyplot.get_cmap('YlOrRd'),
+    label_font_size=12,
+    min_percentage=None, max_percentage=None,
+    show_labels=False
+):
+    def hex_color(percentage):
+        return '#{:02x}{:02x}{:02x}'.format(*(
+            int(255 * c)
+            for c in cmap(percentage)[:3]
+        ))
+
+    group_value_order, group_value_occurrence = zip(*sorted(
+        ((group_value, len(rows)) for group_value, rows in dataframe.groupby(group_column)),
+        key=lambda x: -x[1]
+    ))
+
+    occurrence_in_region = {
+        region_name: len(region_rows)
+        for region_name, region_rows in dataframe.groupby(region_name_column)
+    }
+
+    max_group_value_occurrence_in_region = [
+        max(
+            (region_rows[group_column] == group_value).sum() / occurrence_in_region[region_name]
+            for region_name, region_rows in dataframe.groupby(region_name_column)
+        )
+        for group_value in group_value_order
+        # for _ in [print(group_value)] # hack
+    ]
+
+    feature_groups = [
+        folium.FeatureGroup(
+            name='{} ({})'.format(group_value, occurrence),
+            overlay=False
+        )
+        for group_value, occurrence in zip(group_value_order, group_value_occurrence)
+    ]
+    for group in feature_groups:
+        folium.TileLayer(tiles='stamentoner').add_to(group)
+
+    for feature in regions['features']:
+        region_name = feature['properties'][region_name_property]
+        region_rows = dataframe[dataframe[region_name_column] == region_name]
+        region_shape = shape(feature['geometry'])
+        region_occurrence = occurrence_in_region.get(region_name, 1);
+
+        group_value_occurrence_in_region = [
+            (region_rows[group_column] == group_value).sum()
+            for group_value in group_value_order
+        ]
+
+        for group_value, value_occurrence_in_region, value_occurrence, max_group_value_occurrence, feature_group in zip(
+                group_value_order,
+                group_value_occurrence_in_region,
+                group_value_occurrence,
+                max_group_value_occurrence_in_region,
+                feature_groups
+        ):
+            percentage = value_occurrence_in_region / region_occurrence
+            if max_percentage is not None:
+                max_group_value_occurrence = max_percentage / 100
+            min_value = min_percentage / 100 if min_percentage is not None else 0
+            scale_value = percentage - min_value / (max_group_value_occurrence - min_value)
+            polygon = folium.Polygon(
+                reverse_latitude_longitude(feature['geometry']['coordinates']),
+                fill_color=hex_color(scale_value) if value_occurrence_in_region > 0 else '#888',
+                color='#000000',
+                fill_opacity=0.8,
+                popup='{} ({}, {: 3d}%)'.format( # ‰
+                    region_name[:50], value_occurrence_in_region,
+                    int(round(100 * percentage))
+                )
+            )
+            polygon.add_to(feature_group)
+            if show_labels and value_occurrence_in_region > 0:
+                shape_label(
+                    region_shape,
+                    '{:d}%'.format(int(round(100 * percentage))),  # ‰
+                    font_size=label_font_size
+                ).add_to(feature_group)
+
+    return dict(zip(group_value_order, feature_groups))
+
+
+def scatter_pronunciation_map(
+        dataframe,
+        latitude_column, longitude_column,
+        group_column,
+        split_at_groups=6
+):
+    std = (0.0189, 0.0135)
+
+    group_values, group_value_occurrences = zip(*sorted(
+        ((group_value, len(group_rows)) for group_value, group_rows in dataframe.groupby(group_column)),
+        key=lambda x: -x[1]
+    ))
+
+    maps = (
+        [group_values, group_values[:split_at_groups], group_values[split_at_groups:]]
+        if len(group_values) > split_at_groups else [group_values]
+    )
+    result_names = ['all', 'most_occurring', 'least_occurring']
+
+    results = {name: [] for name in result_names}
+
+    for map, map_name in zip(maps, result_names):
+        colors = get_palette(len(map))
+        for group_value, group_color in zip(map, colors):
+            group_rows = dataframe[dataframe[group_column] == group_value]
+
+            group_name = '<span style=\\"color: {}; \\">{}  ({})</span>'.format(
+                group_color, escape(group_value), len(group_rows))
+
+            results[map_name].append(folium.FeatureGroup(name=group_name))
+
+            for point in zip(group_rows[latitude_column], group_rows[longitude_column]):
+                point = tuple(p + s * np.random.randn() for p, s in zip(point, std))
+                folium.Circle(
+                    point,
+                    color=None,
+                    fill_color=group_color,
+                    radius=400 * min(1., 100 / len(group_rows)),
+                    fill_opacity=1
+                ).add_to(results[map_name][-1])
+
+    return results
+
+
+def bar_map_css(legend_fontsize='30pt', attribution_fontsize='14pt'):
+    return FoliumCSS("""
+.leaflet-control-container .leaflet-control-layers-base {{
+    display: none;
+}}
+
+.leaflet-control-container .leaflet-control-layers-separator {{
+    display: none;
+}}
+
+.leaflet-control-container .leaflet-control-layers-overlays {{
+    display: flex
+}}
+
+.leaflet-control-container .leaflet-control-layers-overlays label:not(:last-child) {{
+    margin-right: 15px;
+}}
+
+.leaflet-control-container .leaflet-control-layers-overlays label span.with-block::before {{
+    content: '■ '; color: inherit;
+}}
+
+.leaflet-control-container .leaflet-control-layers-overlays label {{
+    margin-bottom: 0px; font-size: {legend_fontsize};
+}}
+
+.leaflet-control-container .leaflet-control-layers-overlays label input {{
+    display: none;
+}}
+
+.leaflet-control-attribution a {{
+    display: none;
+}}
+
+.leaflet-control-attribution.leaflet-control-attribution.leaflet-control-attribution.leaflet-control-attribution {{
+    background-color: white;
+    font-size: {attribution_fontsize};
+}}
+""".format(legend_fontsize=legend_fontsize, attribution_fontsize=attribution_fontsize))
+
+
+def save_map(m, filename, resolution=(1600, 1400), headless=True):
+    f = tempfile.NamedTemporaryFile(delete=False, suffix='.html')
+    f.close()
+    m.save(f.name)
+
+    options = webdriver.ChromeOptions()
+    options.add_argument('--window-size={1},{0}'.format(*resolution))
+    if headless:
+        options.add_argument('--headless')
+
+    browser = webdriver.Chrome(options=options)
+    browser.get("file://" + f.name)
+    time.sleep(1)
+
+    browser.save_screenshot(filename)
+    browser.quit()
+    f.delete
--- a/Show More
+++ b/Show More