13 KiB
13 KiB
Geographical pronunciation tables, simple example¶
Simple example to create gabmap files for two words with few pronunciations an two regions.
In [1]:
import sys sys.path.append('..') import pandas import MySQLdb import json import copy db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8') from shapely.geometry import shape, Point from gabmap import create_gabmap_dataframes from stimmen.geojson import merge_features
In [2]:
with open('../data/Friesland_wijken.geojson') as f: regions = json.load(f)
Load and simplify¶
In [3]:
# Answers to how participants state a word should be pronounced answers = pandas.read_sql(''' SELECT prediction_quiz_id, user_lat, user_lng, question_text, answer_text FROM core_surveyresult as survey INNER JOIN core_predictionquizresult as result ON survey.id = result.survey_result_id INNER JOIN core_predictionquizresultquestionanswer as answer ON result.id = answer.prediction_quiz_id ''', db)
In [4]:
regions_simple = merge_features(copy.deepcopy(regions), condition=lambda feature: feature['properties']['GM_NAAM'] == 'Heerenveen', ) regions_simple = merge_features( regions_simple, condition=lambda feature: feature['properties']['GM_NAAM'] == 'Leeuwarden', ) regions_simple['features'] = regions_simple['features'][-2:] regions_simple['features'][0]['properties']['name'] = 'Heerenveen' regions_simple['features'][1]['properties']['name'] = 'Leeuwarden'
In [5]:
answers_simple = answers[ (answers['question_text'] == '"blad" (aan een boom)') | (answers['question_text'] == '"vis"') ].copy() answers_simple['question_text'] = answers_simple['question_text'].map( lambda x: x.replace('"', '').replace('*', '')) answers_simple['answer_text'] = answers_simple['answer_text'].map( lambda x: x[x.find('('):x.find(')')][1:])
Two words, boom and vis, with each 4 and 2 pronunciations
In [6]:
answers_simple.groupby('question_text').agg({'answer_text': lambda x: len(set(x))})
Out[6]:
answer_text | |
---|---|
question_text | |
blad (aan een boom) | 4 |
vis | 2 |
In [7]:
centroids_example, pronunciations_example, counts_example = create_gabmap_dataframes( regions_simple, answers_simple, latitude_column='user_lat', longitude_column='user_lng', word_column='question_text', pronunciation_column='answer_text', region_name_property='name' )
Resulting tables¶
Stored as tab separated files for gabmap
In [8]:
centroids_example
Out[8]:
latitude | longitude | |
---|---|---|
#name | ||
Heerenveen | 52.996076 | 5.977925 |
Leeuwarden | 53.169940 | 5.797613 |
In [9]:
pronunciations_example
Out[9]:
blad (aan een boom) | vis | |
---|---|---|
Heerenveen | blet / blɑt / blɔd / blɛ:t | fisk / fɪs |
Leeuwarden | blet / blɑt / blɔd / blɛ:t | fisk / fɪs |
In [10]:
counts_example
Out[10]:
blad (aan een boom): blet | blad (aan een boom): blɑt | blad (aan een boom): blɔd | blad (aan een boom): blɛ:t | vis: fisk | vis: fɪs | |
---|---|---|---|---|---|---|
Heerenveen | 31.654676 | 2.158273 | 2.158273 | 64.028777 | 52.517986 | 47.482014 |
Leeuwarden | 7.865169 | 7.022472 | 8.707865 | 76.404494 | 75.000000 | 25.000000 |
In [12]:
pronunciations_example.to_csv('../data/Pronunciations_example.gabmap.tsv', sep='\t') counts_example.to_csv('../data/Pronunciation_percentages_example.gabmap.tsv', sep='\t') centroids_example.to_csv('../data/Centroids_example.gabmap.tsv', sep='\t', columns=['longitude', 'latitude']) with open('../data/Gabmap_example.geojson', 'w') as f: json.dump(regions_simple, f, indent=1)