stimmenfryslan/notebooks/Gabmap Pronunciation Tables...

13 KiB
Raw Blame History

Geographical pronunciation tables, simple example

Simple example to create gabmap files for two words with few pronunciations an two regions.

In [1]:
import sys
sys.path.append('..')

import pandas
import MySQLdb
import json
import copy

db = MySQLdb.connect(user='root', passwd='Nmmxhjgt1@', db='stimmen', charset='utf8')

from shapely.geometry import shape, Point

from gabmap import create_gabmap_dataframes

from stimmen.geojson import merge_features
In [2]:
with open('../data/Friesland_wijken.geojson') as f:
    regions = json.load(f)

Load and simplify

In [3]:
# Answers to how participants state a word should be pronounced

answers = pandas.read_sql('''
SELECT prediction_quiz_id, user_lat, user_lng, question_text, answer_text
FROM       core_surveyresult as survey
INNER JOIN core_predictionquizresult as result ON survey.id = result.survey_result_id
INNER JOIN core_predictionquizresultquestionanswer as answer
    ON result.id = answer.prediction_quiz_id
''', db)
In [4]:
regions_simple = merge_features(copy.deepcopy(regions),
    condition=lambda feature: feature['properties']['GM_NAAM'] == 'Heerenveen',
)

regions_simple = merge_features(
    regions_simple,
    condition=lambda feature: feature['properties']['GM_NAAM'] == 'Leeuwarden',
)
regions_simple['features'] = regions_simple['features'][-2:]

regions_simple['features'][0]['properties']['name'] = 'Heerenveen'
regions_simple['features'][1]['properties']['name'] = 'Leeuwarden'
In [5]:
answers_simple = answers[
    (answers['question_text'] == '"blad" (aan een boom)') |
    (answers['question_text'] == '"vis"')
].copy()

answers_simple['question_text'] = answers_simple['question_text'].map(
    lambda x: x.replace('"', '').replace('*', ''))

answers_simple['answer_text'] = answers_simple['answer_text'].map(
    lambda x: x[x.find('('):x.find(')')][1:])

Two words, boom and vis, with each 4 and 2 pronunciations

In [6]:
answers_simple.groupby('question_text').agg({'answer_text': lambda x: len(set(x))})
Out[6]:
answer_text
question_text
blad (aan een boom) 4
vis 2
In [7]:
centroids_example, pronunciations_example, counts_example = create_gabmap_dataframes(
    regions_simple, answers_simple,
    latitude_column='user_lat', longitude_column='user_lng',
    word_column='question_text', pronunciation_column='answer_text',
    region_name_property='name'
)

Resulting tables

Stored as tab separated files for gabmap

In [8]:
centroids_example
Out[8]:
latitude longitude
#name
Heerenveen 52.996076 5.977925
Leeuwarden 53.169940 5.797613
In [9]:
pronunciations_example
Out[9]:
blad (aan een boom) vis
Heerenveen blet / blɑt / blɔd / blɛ:t fisk / fɪs
Leeuwarden blet / blɑt / blɔd / blɛ:t fisk / fɪs
In [10]:
counts_example
Out[10]:
blad (aan een boom): blet blad (aan een boom): blɑt blad (aan een boom): blɔd blad (aan een boom): blɛ:t vis: fisk vis: fɪs
Heerenveen 31.654676 2.158273 2.158273 64.028777 52.517986 47.482014
Leeuwarden 7.865169 7.022472 8.707865 76.404494 75.000000 25.000000
In [12]:
pronunciations_example.to_csv('../data/Pronunciations_example.gabmap.tsv', sep='\t')
counts_example.to_csv('../data/Pronunciation_percentages_example.gabmap.tsv', sep='\t')
centroids_example.to_csv('../data/Centroids_example.gabmap.tsv', sep='\t', columns=['longitude', 'latitude'])
with open('../data/Gabmap_example.geojson', 'w') as f:
    json.dump(regions_simple, f, indent=1)