resturcured for reproducability

This commit is contained in:
H.T. Kruitbosch 2019-03-19 12:53:12 +01:00
parent 7da2bfc400
commit eaa71c9eeb
102 changed files with 2251 additions and 1472517 deletions

View File

@ -1,5 +1,45 @@
# Stimmen Fryslan
## Reproducibiliy results [paper xyz]
These notebooks allow for the reroducabiluty, they require access to the stimmen mysql database. One needs to request to this database.
### General statistics
[Statistics for Nanna's email of 2019-02-13](notebooks/Statistics for Nanna's email of 2019-02-13.ipynb)
Calculates statistics of the stimmen app usage.
### Regions
[Partition provinces in wijken and gemeentes](notebooks/Segment Provinces in Wijken and Gemeentes.ipynb)
Partitions Fryslan, the Dutch province, with repesct to two granularities, as defined by the CBS 'wijken' and 'gemeentes' of 2017. These partitionings are used in all maps created with the other notebooks.
### Heatmaps
[Frysian pronunciation occurrence](notebooks/Frysian pronunciation occurrence.ipynb)
Creates all heatmaps illustrating the distribution of one pronunciation relative to all other pronunciations of that word.
**Example:**
![example pronunciation occurence map](images/heatmaps/wijken_zaterdag_snjoun.png)
### Distribution maps
Creates maps for both granularities, each illustrating the pronunciation distribution of one word.
[Frysian pronunciation distribution maps](notebooks/Frysian pronunciation distribution maps.ipynb)
**Example:**
![example pronunciation distribution map](images/bar-maps/wijken_zaterdag.png)
## Notebooks
### Extract Frysian dialect regions
@ -75,9 +115,21 @@ This is a simple example for the created gabmap files.
* [percentages](data/Pronunciation_percentages_example.gabmap.tsv)
* [pronunciation](data/Pronunciations_example.gabmap.tsv)
### Bar Maps per word for Pronounciation Occurence in Frysian Municipalities
### Bar Maps per word for Pronunciation Occurrence in Frysian Municipalities
For each word, a map illustrates the pronunciation occurrence as measured by the prediction quiz, per Frysian
municipality.
[notebook](notebooks/Bar%20Maps%20per%20word%20for%20Pronounciation%20Occurence%20in%20Frysian%20Municipalities.ipynb)
[notebook](notebooks/Bar%20Maps%20per%20word%20for%20Pronunciation%20Occurrence%20in%20Frysian%20Municipalities.ipynb)
### Heatmap per word for Pronunciation Occurrence in Frysian Municipalities
[notebook](notebooks/Heatmap%20per%20word%20for%20Pronunciation%20Occurrence%20in%20Frysian%20Municipalities.ipynb)
Each map displays the pronounciation occurence in Frysian municipalities for one word. Each pronunciation is represented by one map layer, and for one municipality layer the percentages for each pronunciation add up to 100% + rounding errors.
### Heatmap per word for Pronunciation Occurrence in Frysian Neighborhoods
Same as for Municipalities, but for Neighborhoods.
[notebook](notebooks/Heatmap%20per%20word%20for%20Pronunciation%20Occurrence%20in%20Frysian%20Neighborhoods.ipynb)

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,20 +1,22 @@
<html><head></head><body> <a href="armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
<a href="avond.html">avond<a><br/>
<a href="bij (insect).html">bij (insect)<a><br/>
<a href="blad (aan een boom).html">blad (aan een boom)<a><br/>
<a href="borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
<a href="dag.html">dag<a><br/>
<a href="deurtje.html">deurtje<a><br/>
<a href="geel.html">geel<a><br/>
<a href="gegaan.html">gegaan<a><br/>
<a href="gezet.html">gezet<a><br/>
<a href="heel.html">heel<a><br/>
<html><head></head><body> <a href="gemeentes_avond.html">gemeentes avond<a><br/>
<a href="index.html">index<a><br/>
<a href="kaas.html">kaas<a><br/>
<a href="koken.html">koken<a><br/>
<a href="oog.html">oog<a><br/>
<a href="sprak (toe).html">sprak (toe)<a><br/>
<a href="tand.html">tand<a><br/>
<a href="trein.html">trein<a><br/>
<a href="vis.html">vis<a><br/>
<a href="zaterdag.html">zaterdag<a></body></html>
<a href="neighborhood_armen (lichaamsdeel).html">neighborhood armen (lichaamsdeel)<a><br/>
<a href="neighborhood_avond.html">neighborhood avond<a><br/>
<a href="neighborhood_bij (insect).html">neighborhood bij (insect)<a><br/>
<a href="neighborhood_blad (aan een boom).html">neighborhood blad (aan een boom)<a><br/>
<a href="neighborhood_borst (lichaamsdeel).html">neighborhood borst (lichaamsdeel)<a><br/>
<a href="neighborhood_dag.html">neighborhood dag<a><br/>
<a href="neighborhood_deurtje.html">neighborhood deurtje<a><br/>
<a href="neighborhood_geel.html">neighborhood geel<a><br/>
<a href="neighborhood_gegaan.html">neighborhood gegaan<a><br/>
<a href="neighborhood_gezet.html">neighborhood gezet<a><br/>
<a href="neighborhood_heel.html">neighborhood heel<a><br/>
<a href="neighborhood_kaas.html">neighborhood kaas<a><br/>
<a href="neighborhood_koken.html">neighborhood koken<a><br/>
<a href="neighborhood_oog.html">neighborhood oog<a><br/>
<a href="neighborhood_sprak (toe).html">neighborhood sprak (toe)<a><br/>
<a href="neighborhood_tand.html">neighborhood tand<a><br/>
<a href="neighborhood_trein.html">neighborhood trein<a><br/>
<a href="neighborhood_vis.html">neighborhood vis<a><br/>
<a href="neighborhood_zaterdag.html">neighborhood zaterdag<a><br/>
<a href="wijken_avond.html">wijken avond<a></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,20 +0,0 @@
<html><head></head><body> <a href="armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
<a href="avond.html">avond<a><br/>
<a href="bij (insect).html">bij (insect)<a><br/>
<a href="blad (aan een boom).html">blad (aan een boom)<a><br/>
<a href="borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
<a href="dag.html">dag<a><br/>
<a href="deurtje.html">deurtje<a><br/>
<a href="geel.html">geel<a><br/>
<a href="gegaan.html">gegaan<a><br/>
<a href="gezet.html">gezet<a><br/>
<a href="heel.html">heel<a><br/>
<a href="index.html">index<a><br/>
<a href="kaas.html">kaas<a><br/>
<a href="koken.html">koken<a><br/>
<a href="oog.html">oog<a><br/>
<a href="sprak (toe).html">sprak (toe)<a><br/>
<a href="tand.html">tand<a><br/>
<a href="trein.html">trein<a><br/>
<a href="vis.html">vis<a><br/>
<a href="zaterdag.html">zaterdag<a></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,20 +0,0 @@
<html><head></head><body> <a href="armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
<a href="avond.html">avond<a><br/>
<a href="bij (insect).html">bij (insect)<a><br/>
<a href="blad (aan een boom).html">blad (aan een boom)<a><br/>
<a href="borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
<a href="dag.html">dag<a><br/>
<a href="deurtje.html">deurtje<a><br/>
<a href="geel.html">geel<a><br/>
<a href="gegaan.html">gegaan<a><br/>
<a href="gezet.html">gezet<a><br/>
<a href="heel.html">heel<a><br/>
<a href="index.html">index<a><br/>
<a href="kaas.html">kaas<a><br/>
<a href="koken.html">koken<a><br/>
<a href="oog.html">oog<a><br/>
<a href="sprak (toe).html">sprak (toe)<a><br/>
<a href="tand.html">tand<a><br/>
<a href="trein.html">trein<a><br/>
<a href="vis.html">vis<a><br/>
<a href="zaterdag.html">zaterdag<a></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,20 +0,0 @@
<html><head></head><body> <a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/armen (lichaamsdeel).html">armen (lichaamsdeel)<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/avond.html">avond<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/bij (insect).html">bij (insect)<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/blad (aan een boom).html">blad (aan een boom)<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/borst (lichaamsdeel).html">borst (lichaamsdeel)<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/dag.html">dag<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/deurtje.html">deurtje<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/geel.html">geel<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/gegaan.html">gegaan<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/gezet.html">gezet<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/heel.html">heel<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/index.html">index<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/kaas.html">kaas<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/koken.html">koken<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/oog.html">oog<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/sprak (toe).html">sprak (toe)<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/tand.html">tand<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/trein.html">trein<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/vis.html">vis<a><br/>
<a href="http://herbertkruitbosch.com/pronunciation_maps/heatmaps/zaterdag.html">zaterdag<a></body></html>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,83 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gabmap format\n",
"\n",
"Exploration of the format of the lines in example Gabmap files Martijn had sent."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open('../data/martijn_format/Dutch613-coordinates.txt') as f:\n",
" coordinates = list(f)\n",
" \n",
"with open('../data/martijn_format/Nederlands-ipa.utxt') as f:\n",
" table = list(f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"coordinates[0].split('\\t')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"coordinates[1].split('\\t')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table[0].split('\\t')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"table[1].split('\\t')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -1,458 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Geographical pronunciation tables, simple example\n",
"\n",
"Simple example to create gabmap files for two words with few pronunciations an two regions."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append('..')\n",
"\n",
"import pandas\n",
"import MySQLdb\n",
"import json\n",
"import copy\n",
"\n",
"db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')\n",
"\n",
"from shapely.geometry import shape, Point\n",
"\n",
"from gabmap import create_gabmap_dataframes\n",
"\n",
"from stimmen.geojson import merge_features"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"with open('../data/Friesland_wijken.geojson') as f:\n",
" regions = json.load(f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load and simplify"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Answers to how participants state a word should be pronounced\n",
"\n",
"answers = pandas.read_sql('''\n",
"SELECT prediction_quiz_id, user_lat, user_lng, question_text, answer_text\n",
"FROM core_surveyresult as survey\n",
"INNER JOIN core_predictionquizresult as result ON survey.id = result.survey_result_id\n",
"INNER JOIN core_predictionquizresultquestionanswer as answer\n",
" ON result.id = answer.prediction_quiz_id\n",
"''', db)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"regions_simple = merge_features(copy.deepcopy(regions),\n",
" condition=lambda feature: feature['properties']['GM_NAAM'] == 'Heerenveen',\n",
")\n",
"\n",
"regions_simple = merge_features(\n",
" regions_simple,\n",
" condition=lambda feature: feature['properties']['GM_NAAM'] == 'Leeuwarden',\n",
")\n",
"regions_simple['features'] = regions_simple['features'][-2:]\n",
"\n",
"regions_simple['features'][0]['properties']['name'] = 'Heerenveen'\n",
"regions_simple['features'][1]['properties']['name'] = 'Leeuwarden'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"answers_simple = answers[\n",
" (answers['question_text'] == '\"blad\" (aan een boom)') |\n",
" (answers['question_text'] == '\"vis\"')\n",
"].copy()\n",
"\n",
"answers_simple['question_text'] = answers_simple['question_text'].map(\n",
" lambda x: x.replace('\"', '').replace('*', ''))\n",
"\n",
"answers_simple['answer_text'] = answers_simple['answer_text'].map(\n",
" lambda x: x[x.find('('):x.find(')')][1:])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two words, boom and vis, with each 4 and 2 pronunciations"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>answer_text</th>\n",
" </tr>\n",
" <tr>\n",
" <th>question_text</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>blad (aan een boom)</th>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>vis</th>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" answer_text\n",
"question_text \n",
"blad (aan een boom) 4\n",
"vis 2"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"answers_simple.groupby('question_text').agg({'answer_text': lambda x: len(set(x))})"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"centroids_example, pronunciations_example, counts_example = create_gabmap_dataframes(\n",
" regions_simple, answers_simple,\n",
" latitude_column='user_lat', longitude_column='user_lng',\n",
" word_column='question_text', pronunciation_column='answer_text',\n",
" region_name_property='name'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resulting tables\n",
"\n",
"Stored as tab separated files for gabmap"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" </tr>\n",
" <tr>\n",
" <th>#name</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Heerenveen</th>\n",
" <td>52.996076</td>\n",
" <td>5.977925</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Leeuwarden</th>\n",
" <td>53.169940</td>\n",
" <td>5.797613</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" latitude longitude\n",
"#name \n",
"Heerenveen 52.996076 5.977925\n",
"Leeuwarden 53.169940 5.797613"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"centroids_example"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>blad (aan een boom)</th>\n",
" <th>vis</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Heerenveen</th>\n",
" <td>blet / blɑt / blɔd / blɛ:t</td>\n",
" <td>fisk / fɪs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Leeuwarden</th>\n",
" <td>blet / blɑt / blɔd / blɛ:t</td>\n",
" <td>fisk / fɪs</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" blad (aan een boom) vis\n",
" \n",
"Heerenveen blet / blɑt / blɔd / blɛ:t fisk / fɪs\n",
"Leeuwarden blet / blɑt / blɔd / blɛ:t fisk / fɪs"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pronunciations_example"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>blad (aan een boom): blet</th>\n",
" <th>blad (aan een boom): blɑt</th>\n",
" <th>blad (aan een boom): blɔd</th>\n",
" <th>blad (aan een boom): blɛ:t</th>\n",
" <th>vis: fisk</th>\n",
" <th>vis: fɪs</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Heerenveen</th>\n",
" <td>31.654676</td>\n",
" <td>2.158273</td>\n",
" <td>2.158273</td>\n",
" <td>64.028777</td>\n",
" <td>52.517986</td>\n",
" <td>47.482014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Leeuwarden</th>\n",
" <td>7.865169</td>\n",
" <td>7.022472</td>\n",
" <td>8.707865</td>\n",
" <td>76.404494</td>\n",
" <td>75.000000</td>\n",
" <td>25.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" blad (aan een boom): blet blad (aan een boom): blɑt \\\n",
" \n",
"Heerenveen 31.654676 2.158273 \n",
"Leeuwarden 7.865169 7.022472 \n",
"\n",
" blad (aan een boom): blɔd blad (aan een boom): blɛ:t vis: fisk \\\n",
" \n",
"Heerenveen 2.158273 64.028777 52.517986 \n",
"Leeuwarden 8.707865 76.404494 75.000000 \n",
"\n",
" vis: fɪs \n",
" \n",
"Heerenveen 47.482014 \n",
"Leeuwarden 25.000000 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"counts_example"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"pronunciations_example.to_csv('../data/Pronunciations_example.gabmap.tsv', sep='\\t')\n",
"counts_example.to_csv('../data/Pronunciation_percentages_example.gabmap.tsv', sep='\\t')\n",
"centroids_example.to_csv('../data/Centroids_example.gabmap.tsv', sep='\\t', columns=['longitude', 'latitude'])\n",
"with open('../data/Gabmap_example.geojson', 'w') as f:\n",
" json.dump(regions_simple, f, indent=1)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -1,157 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Geographical pronunciation tables\n",
"\n",
"Creates gabmap files with region centroids, percentages and pronunciations for wijken in Friesland."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append('..')\n",
"\n",
"import pandas\n",
"import MySQLdb\n",
"import json\n",
"import copy\n",
"\n",
"db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')\n",
"\n",
"from shapely.geometry import shape, Point\n",
"\n",
"from gabmap import create_gabmap_dataframes"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"with open('../data/Friesland_wijken.geojson') as f:\n",
" regions = json.load(f)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Answers to how participants state a word should be pronounced\n",
"\n",
"answers = pandas.read_sql('''\n",
"SELECT prediction_quiz_id, user_lat, user_lng, question_text, answer_text\n",
"FROM core_surveyresult as survey\n",
"INNER JOIN core_predictionquizresult as result ON survey.id = result.survey_result_id\n",
"INNER JOIN core_predictionquizresultquestionanswer as answer\n",
" ON result.id = answer.prediction_quiz_id\n",
"''', db)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"zero_latlng_questions = {\n",
" q\n",
" for q, row in answers.groupby('question_text').agg('std').iterrows()\n",
" if row['user_lat'] == 0 and row['user_lng'] == 0\n",
"}\n",
"answers_filtered = answers[answers['question_text'].map(lambda x: x not in zero_latlng_questions)].copy()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['gegaan', 'avond', 'heel', 'dag', 'bij (insect)', 'sprak (toe)',\n",
" 'oog', 'armen (lichaamsdeel)', 'kaas', 'deurtje', 'koken',\n",
" 'borst (lichaamsdeel)', 'vis', 'zaterdag', 'trein', 'geel', 'tand',\n",
" 'gezet', 'blad (aan een boom)'], dtype=object)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"answers_filtered['question_text'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"answers_filtered['question_text'] = answers_filtered['question_text'].map(\n",
" lambda x: x.replace('\"', '').replace('*', ''))\n",
"\n",
"answers_filtered['answer_text'] = answers_filtered['answer_text'].map(\n",
" lambda x: x[x.find('('):x.find(')')][1:])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"centroids, pronunciations, counts = create_gabmap_dataframes(\n",
" regions, answers_filtered,\n",
" latitude_column='user_lat', longitude_column='user_lng',\n",
" word_column='question_text', pronunciation_column='answer_text',\n",
" region_name_property='gemeente_en_wijk_naam'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"pronunciations.to_csv('../data/Friesland_wijken_pronunciations.gabmap.tsv', sep='\\t')\n",
"counts.to_csv('../data/Friesland_wijken_pronunciation_percentages.gabmap.tsv', sep='\\t')\n",
"centroids.to_csv('../data/Friesland_wijken_centroids.gabmap.tsv', sep='\\t', columns=['longitude', 'latitude'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -1,265 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Group recordings in 4 Frysian dialect regions\n",
"\n",
" * Klaaifrysk\n",
" * Waldfrysk\n",
" * Sudwesthoeksk\n",
" * Noardhoeksk\n",
" \n",
"First run `Dialect Regions from image.ipynb`.\n",
"\n",
"![dialect regions](../data/dialects.png)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from math import floor\n",
"import json\n",
"import pandas\n",
"import MySQLdb\n",
"from collections import Counter\n",
"\n",
"from math import sqrt\n",
"import numpy as np\n",
"from shapely.geometry import shape, Point\n",
"from vincenty import vincenty\n",
"\n",
"from jupyter_progressbar import ProgressBar\n",
"\n",
"db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Input\n",
"\n",
"Load the geojson with the dialect region and create shapely shapes."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"with open('../data/fryslan_dialect_regions.geojson', 'r') as f:\n",
" geojson = json.load(f)\n",
"\n",
"dialect_regions = [region['properties']['dialect'] for region in geojson['features']]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"shapes = {\n",
" feature['properties']['dialect']: shape(feature['geometry'])\n",
" for feature in geojson['features']\n",
"}\n",
"\n",
"def regions_for(coordinate):\n",
" regions = {\n",
" region_name\n",
" for region_name, shape in shapes.items()\n",
" if shape.contains(Point(*coordinate))\n",
" }\n",
" return regions\n",
"\n",
"def distance_to_shape(shape, longitude, latitude):\n",
" ext = shape.exterior\n",
" p = ext.interpolate(ext.project(Point(longitude, latitude)))\n",
" return vincenty((latitude, longitude), (p.y, p.x))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Query and process\n",
"\n",
"Query all picture game and free speech recordings and assign the dialect region."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def dialect_regions_and_distance(data):\n",
" return[\n",
" {\n",
" 'dialects': [\n",
" {\n",
" 'dialect': dialect,\n",
" 'boundary_distance': distance_to_shape(shapes[dialect], longitude, latitude),\n",
" }\n",
" for dialect in regions_for((longitude, latitude))\n",
" ],\n",
" 'filename': filename,\n",
" }\n",
" for filename, (latitude, longitude) in ProgressBar(\n",
" data[['latitude', 'longitude']].iterrows(),\n",
" size=len(data)\n",
" )\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"picture_games = pandas.read_sql('''\n",
"SELECT language.name as language, item.name as picture,\n",
" survey.user_lat as latitude, survey.user_lng as longitude,\n",
" survey.area_name as area, survey.country_name as country,\n",
" result.recording as filename,\n",
" result.submitted_at as date\n",
"FROM core_surveyresult as survey\n",
"INNER JOIN core_picturegameresult as result ON survey.id = result.survey_result_id\n",
"INNER JOIN core_language as language ON language.id = result.language_id\n",
"INNER JOIN core_picturegameitem as item\n",
" ON result.picture_game_item_id = item.id\n",
"''', db)\n",
"picture_games.set_index('filename', inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5825449a737b4fcab38a4f4ac2adfd87",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='<b>0</b>s passed', placeholder='0…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dialect_region_per_picture_game = dialect_regions_and_distance(picture_games)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"df = pandas.DataFrame([\n",
" [r['filename'], r['dialects'][0]['dialect'], r['dialects'][0]['boundary_distance']]\n",
" for r in dialect_region_per_picture_game\n",
" if len(r['dialects']) == 1\n",
"], columns = ['filename', 'dialect', 'boundary_distance'])\n",
"\n",
"df.to_excel('../data/picture_game_recordings_by_dialect.xlsx')\n",
"df.to_csv('../data/picture_game_recordings_by_dialect.csv')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"free_speech_games = pandas.read_sql('''\n",
"SELECT language.name as language,\n",
" survey.user_lat as latitude, survey.user_lng as longitude,\n",
" survey.area_name as area, survey.country_name as country,\n",
" result.recording as filename,\n",
" result.submitted_at as date\n",
"FROM core_surveyresult as survey\n",
"INNER JOIN core_freespeechresult as result ON survey.id = result.survey_result_id\n",
"INNER JOIN core_language as language ON language.id = result.language_id\n",
"''', db)\n",
"free_speech_games.set_index('filename', inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8afad9f71e544658b554b828932d7769",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='<b>0</b>s passed', placeholder='0…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dialect_region_per_free_speech = dialect_regions_and_distance(free_speech_games)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"df = pandas.DataFrame([\n",
" [r['filename'], r['dialects'][0]['dialect'], r['dialects'][0]['boundary_distance']]\n",
" for r in dialect_region_per_free_speech\n",
" if len(r['dialects']) == 1\n",
"], columns = ['filename', 'dialect', 'boundary_distance'])\n",
"\n",
"df.to_excel('../data/free_speech_recordings_by_dialect.xlsx')\n",
"df.to_csv('../data/free_speech_recordings_by_dialect.csv')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@ -7,13 +7,11 @@
"# Segment provinces\n",
"\n",
"\n",
"Create wijk and gemeente level segmentations for all Dutch provinces and save as geojson and Gabmap KML.\n",
"Create wijk and gemeente level segmentations for two Dutch provinces, Groningen and Friesland, and save as geojson and Gabmap KML.\n",
"\n",
"All is based on CBS data.\n",
"All is based on [CBS data](https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische%20data/wijk-en-buurtkaart-2017)\n",
"\n",
"For Friesland, several wijken are merged.\n",
"\n",
"Note: only applied to Groningen and Friesland, because other provinces give gemetry errors."
"For Friesland, several wijken are merged, in particular those of the municipalities Ameland, Harlingen, Schiermonnikoog, Terschelling and Vlieland, and those of Leeuwarden with centroid above 53.167. These neighborhoods are small in area and hence we decided to merge, to avoid a "
]
},
{
@ -29,7 +27,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
@ -53,13 +51,36 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 4,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Groningen\n",
"0\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"Friesland\n",
"0\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n"
]
}
],
"source": [
"for province in ['Groningen', 'Friesland']:\n",
" wijken_geojson = gwb_in_province(province, 'wijk', 2018)\n",
" gemeente_geojson = gwb_in_province(province, 'gem', 2018)\n",
" wijken_geojson = gwb_in_province(province, 'wijk', 2018, polygon_simplification=None)\n",
" gemeente_geojson = gwb_in_province(province, 'gem', 2018, polygon_simplification=None)\n",
" \n",
" if province == 'Friesland':\n",
" for gemeente in {'Ameland', 'Harlingen', 'Schiermonnikoog', 'Terschelling', 'Vlieland'}:\n",
@ -106,6 +127,13 @@
" with open('../data/{}_gemeentes.kml'.format(province), 'w') as f:\n",
" f.write(as_gabmap_kml(gemeente_geojson, name_property='gemeente_naam'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

File diff suppressed because one or more lines are too long

View File

@ -24,7 +24,7 @@ def clear_cache_poll():
global __cache, __last_access
while True:
time.sleep(60*60)
if (time.time() - __last_access) > 60*60:
if (__last_access is not None and time.time() - __last_access) > 60*60:
__cache = {}
@ -60,9 +60,15 @@ def province_geojson(province, with_water=False):
return __cache[(province, with_water)]
def expand_box(x0, y0, x1, y1, f=0.1):
return (x0 - (x1 - x0)*f, y0 - (y1 - y0)*f,
x1 + (x1 - x0)*f, y1 + (y1 - y0)*f)
def gwb_in_province(
province='Friesland', region_level='wijk', region_year='2018',
polygon_simplification=0.001, province_dilation=0.0005
polygon_simplification=0.001, province_dilation=0.0005,
bounding_box_dilation=0.01
):
assert region_level in {'gem', 'wijk', 'buurt'}, (
"region_level {} not supported, must be gem, wijk or buurt".format(region_level))
@ -70,7 +76,7 @@ def gwb_in_province(
"region_year {} not supported, must 2017 or 2018".format(region_year))
province_with_water = shape(province_geojson(province, with_water=True)['geometry'])
province_bounding_box = box(*province_with_water.bounds)
province_bounding_box = box(*expand_box(*province_with_water.bounds, f=bounding_box_dilation))
province_land_only = shape(province_geojson(province, with_water=False)['geometry'])
province_land_only_dilated = province_land_only.buffer(-province_dilation)
@ -80,7 +86,10 @@ def gwb_in_province(
shapes = [shape(geojson_['geometry']) for geojson_ in geojson['features']]
shapes, geojson = map(list, zip(*(
(
(intersection.simplify(tolerance=polygon_simplification), geojson_)
if polygon_simplification is not None else (intersection, geojson_)
)
for shape_, geojson_ in zip(shapes, geojson['features'])
if province_bounding_box.contains(shape_)
for intersection in [shape_.intersection(province_land_only_dilated)] # alias

View File

@ -1,5 +1,6 @@
import folium
from jupyter_progressbar import ProgressBar
from matplotlib import pyplot
from pygeoif.geometry import mapping
from shapely.geometry.geo import shape, box
@ -9,6 +10,12 @@ import numpy as np
from stimmen.latitude_longitude import reverse_latitude_longitude
import tempfile
import time
from selenium import webdriver
from .folium_injections import *
from .folium_colorbar import *
def get_palette(n, no_black=True, no_white=True):
with open(data_file('data', 'glasbey', '{}_colors.txt'.format(n + no_black + no_white))) as f:
@ -21,7 +28,7 @@ def get_palette(n, no_black=True, no_white=True):
def colored_name(name, color):
return '<span style=\\"color:{}; \\">{}</span>'.format(color, name)
return '<span class=\\"with-block\\" style=\\"color:{}; \\"><span class=\\"blackable; \\">{}</span></span>'.format(color, name)
def region_area_cdf(region_shape, resolution=10000):
@ -78,9 +85,12 @@ def pronunciation_bars(
regions, dataframe,
region_name_property, region_name_column,
group_column='answer_text',
count_column=None,
cutoff_percentage=0.05,
normalize_area=True,
progress_bar=False,
area_adjust_resolution=10000,
simplify_shapes=None,
):
# all values of group_column that appear at least cutoff_percentage in one of the regions
relevant_groups = {
@ -103,7 +113,7 @@ def pronunciation_bars(
feature_groups = {
group_value: folium.FeatureGroup(
name=colored_name(
'{value} ({amount})'.format(value=escape(group_value), amount=amount),
'{value} <span class=\\"amount\\">({amount})</span>'.format(value=escape(group_value), amount=amount),
color
),
overlay=True
@ -114,6 +124,7 @@ def pronunciation_bars(
if group_value != 'other' else
n_other
] # alias
if amount > 0
}
progress_bar = ProgressBar if progress_bar else lambda x: x
@ -123,6 +134,8 @@ def pronunciation_bars(
region_name = feature['properties'][region_name_property]
region_rows = dataframe[dataframe[region_name_column] == region_name]
region_shape = shape(feature['geometry'])
if simplify_shapes:
region_shape = region_shape.simplify(simplify_shapes)
_, ymin, _, ymax = region_shape.bounds
group_values_occurrence = {
@ -136,14 +149,16 @@ def pronunciation_bars(
key=lambda x: (x[0] == 'other', -x[1])
))
group_percentages = np.array(group_occurrences) / len(region_rows)
group_boundaries = np.cumsum((0,) + group_occurrences) / len(region_rows)
group_percentages = np.array(group_occurrences) / max(1, len(region_rows))
group_boundaries = np.cumsum((0,) + group_occurrences) / max(1, len(region_rows))
if normalize_area:
if '__region_shape_cdf_cache' not in feature['properties']:
feature['properties']['__region_shape_cdf_cache'] = region_area_cdf(region_shape).tolist()
feature['properties']['__region_shape_cdf_cache'] = region_area_cdf(
region_shape, resolution=area_adjust_resolution).tolist()
group_boundaries = area_adjust_boundaries(
region_shape, group_boundaries,
region_cdf_cache=feature['properties']['__region_shape_cdf_cache']
region_cdf_cache=feature['properties']['__region_shape_cdf_cache'],
resolution=area_adjust_resolution
)
else:
group_boundaries = width_adjust_boundaries(region_shape, group_boundaries)
@ -158,7 +173,7 @@ def pronunciation_bars(
continue
bar_shape = region_shape.intersection(box(left_boundary, ymin, right_boundary, ymax))
if bar_shape.area == 0:
if bar_shape.area == 0 or group_occurrences == 0:
continue
polygon = folium.Polygon(
reverse_latitude_longitude(mapping(bar_shape)['coordinates']),
@ -167,6 +182,213 @@ def pronunciation_bars(
color=None,
popup='{} ({}, {: 3d}%)'.format(group_value, count, int(round(100 * percentage)))
)
polygon._bar_shape = bar_shape
polygon.add_to(feature_groups[group_value])
return feature_groups
def shape_label(region_shape, label, font_size=12):
return folium.map.Marker(
[region_shape.centroid.y, region_shape.centroid.x],
icon=folium.DivIcon(
icon_size=(50 / 12 * font_size, 24 / 12 * font_size),
icon_anchor=(25 / 12 * font_size, font_size),
html=(
'<div class="percentage-label" style="font-size: {}pt; '
'background-color: rgba(255,255,255,0.8); border-radius: {}px; text-align: center;">'
'{}</div>').format(font_size, font_size, label),
)
)
def pronunciation_heatmaps(
regions, dataframe,
region_name_property, region_name_column,
group_column='answer_text',
cmap=pyplot.get_cmap('YlOrRd'),
label_font_size=12,
min_percentage=None, max_percentage=None,
show_labels=False
):
def hex_color(percentage):
return '#{:02x}{:02x}{:02x}'.format(*(
int(255 * c)
for c in cmap(percentage)[:3]
))
group_value_order, group_value_occurrence = zip(*sorted(
((group_value, len(rows)) for group_value, rows in dataframe.groupby(group_column)),
key=lambda x: -x[1]
))
occurrence_in_region = {
region_name: len(region_rows)
for region_name, region_rows in dataframe.groupby(region_name_column)
}
max_group_value_occurrence_in_region = [
max(
(region_rows[group_column] == group_value).sum() / occurrence_in_region[region_name]
for region_name, region_rows in dataframe.groupby(region_name_column)
)
for group_value in group_value_order
# for _ in [print(group_value)] # hack
]
feature_groups = [
folium.FeatureGroup(
name='{} ({})'.format(group_value, occurrence),
overlay=False
)
for group_value, occurrence in zip(group_value_order, group_value_occurrence)
]
for group in feature_groups:
folium.TileLayer(tiles='stamentoner').add_to(group)
for feature in regions['features']:
region_name = feature['properties'][region_name_property]
region_rows = dataframe[dataframe[region_name_column] == region_name]
region_shape = shape(feature['geometry'])
region_occurrence = occurrence_in_region.get(region_name, 1);
group_value_occurrence_in_region = [
(region_rows[group_column] == group_value).sum()
for group_value in group_value_order
]
for group_value, value_occurrence_in_region, value_occurrence, max_group_value_occurrence, feature_group in zip(
group_value_order,
group_value_occurrence_in_region,
group_value_occurrence,
max_group_value_occurrence_in_region,
feature_groups
):
percentage = value_occurrence_in_region / region_occurrence
if max_percentage is not None:
max_group_value_occurrence = max_percentage / 100
min_value = min_percentage / 100 if min_percentage is not None else 0
scale_value = percentage - min_value / (max_group_value_occurrence - min_value)
polygon = folium.Polygon(
reverse_latitude_longitude(feature['geometry']['coordinates']),
fill_color=hex_color(scale_value) if value_occurrence_in_region > 0 else '#888',
color='#000000',
fill_opacity=0.8,
popup='{} ({}, {: 3d}%)'.format( # ‰
region_name[:50], value_occurrence_in_region,
int(round(100 * percentage))
)
)
polygon.add_to(feature_group)
if show_labels and value_occurrence_in_region > 0:
shape_label(
region_shape,
'{:d}%'.format(int(round(100 * percentage))), # ‰
font_size=label_font_size
).add_to(feature_group)
return dict(zip(group_value_order, feature_groups))
def scatter_pronunciation_map(
dataframe,
latitude_column, longitude_column,
group_column,
split_at_groups=6
):
std = (0.0189, 0.0135)
group_values, group_value_occurrences = zip(*sorted(
((group_value, len(group_rows)) for group_value, group_rows in dataframe.groupby(group_column)),
key=lambda x: -x[1]
))
maps = (
[group_values, group_values[:split_at_groups], group_values[split_at_groups:]]
if len(group_values) > split_at_groups else [group_values]
)
result_names = ['all', 'most_occurring', 'least_occurring']
results = {name: [] for name in result_names}
for map, map_name in zip(maps, result_names):
colors = get_palette(len(map))
for group_value, group_color in zip(map, colors):
group_rows = dataframe[dataframe[group_column] == group_value]
group_name = '<span style=\\"color: {}; \\">{} ({})</span>'.format(
group_color, escape(group_value), len(group_rows))
results[map_name].append(folium.FeatureGroup(name=group_name))
for point in zip(group_rows[latitude_column], group_rows[longitude_column]):
point = tuple(p + s * np.random.randn() for p, s in zip(point, std))
folium.Circle(
point,
color=None,
fill_color=group_color,
radius=400 * min(1., 100 / len(group_rows)),
fill_opacity=1
).add_to(results[map_name][-1])
return results
def bar_map_css(legend_fontsize='30pt', attribution_fontsize='14pt'):
return FoliumCSS("""
.leaflet-control-container .leaflet-control-layers-base {{
display: none;
}}
.leaflet-control-container .leaflet-control-layers-separator {{
display: none;
}}
.leaflet-control-container .leaflet-control-layers-overlays {{
display: flex
}}
.leaflet-control-container .leaflet-control-layers-overlays label:not(:last-child) {{
margin-right: 15px;
}}
.leaflet-control-container .leaflet-control-layers-overlays label span.with-block::before {{
content: ''; color: inherit;
}}
.leaflet-control-container .leaflet-control-layers-overlays label {{
margin-bottom: 0px; font-size: {legend_fontsize};
}}
.leaflet-control-container .leaflet-control-layers-overlays label input {{
display: none;
}}
.leaflet-control-attribution a {{
display: none;
}}
.leaflet-control-attribution.leaflet-control-attribution.leaflet-control-attribution.leaflet-control-attribution {{
background-color: white;
font-size: {attribution_fontsize};
}}
""".format(legend_fontsize=legend_fontsize, attribution_fontsize=attribution_fontsize))
def save_map(m, filename, resolution=(1600, 1400), headless=True):
f = tempfile.NamedTemporaryFile(delete=False, suffix='.html')
f.close()
m.save(f.name)
options = webdriver.ChromeOptions()
options.add_argument('--window-size={1},{0}'.format(*resolution))
if headless:
options.add_argument('--headless')
browser = webdriver.Chrome(options=options)
browser.get("file://" + f.name)
time.sleep(1)
browser.save_screenshot(filename)
browser.quit()
f.delete

Some files were not shown because too many files have changed in this diff Show More