### Comparing word properties across languages: A case study on ratings for arousal and valence
Annika Tjuka, Robert Forkel, and Mattis List
Words in the World conference (November 27, 2021)
---
# The challenge
----
Psychologists and linguists collect an increasing amount of data for a growing number of languages to describe various properties of words and concepts.
----
But no resource exists yet where one could compare different properties of words across languages.
----
Therefore, we created the Database of Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts (NoRaRe) which combines data from psychology and linguistics.
<img src="https://pad.gwdg.de/uploads/d21977cc-5973-4cb6-8385-74a2fa068e98.png" alt="drawing" width="400"/>
---
# Data overview
<img src="https://pad.gwdg.de/uploads/8b3840a9-cdb4-44c6-9027-4a65fb764e88.jpeg" alt="drawing" width="600"/>
----
## Numbers
In its current version (v0.2), NoRaRe includes **65** unique word and concept properties derived from **98** different data sets across **40** languages.
----
## Norms
Data that are determined by taking samples from a total quantity. They are collected and applied predominantly in the field of psychology.
- word frequency
- lexical decision
----
## Ratings
Data that are based on participant judgments of a given word in a particular language either on a scale or on other measures.
- age-of-acquisition
- psychological states (e.g, valence, arousal)
- sensory modality (e.g., haptic, visual)
----
## Relations
Data that offer information on the relation between two words or concepts. They are collected in the field of comparative linguistics and Natural Language Processing (NLP).
- colexifications (e.g., CLICS)
- stability rankings
- associations (e.g, WordNet)
----
## Access
Web interface: https://digling.org/norare/
GitHub: https://github.com/concepticon/norare-data
---
# Application
----
## Case study
A comparison of ratings for arousal and valence of words on a 9-point scale across English, Dutch, and Spanish.
----
## Material
- English ([Scott et al. 2019](https://doi.org/10.3758/s13428-018-1099-3))
- Dutch ([Moors et al. 2013](https://doi.org/10.3758/s13428-012-0243-8))
- Spanish ([Stadthagen-González et al. 2017](https://doi.org/10.3758/s13428-015-0700-2))
----
## Results Arousal Ratings
----

----
## Results Valence Ratings
----

----
- The correlation strenght varied for arousal:
- highest Pearson coefficient in Dutch-Spanish pair (_R_=0.63)
- lowest Pearson coefficient in English-Spanish pair (_R_=0.32).
----
- For valence, the Pearson coefficient was above 0.8 in all language pairs.
----
- The findings for arousal differ from Jackson et al. (2019), who found that closely related languages have more similar emotion semantics.
---
# Discussion
----
The biggest challenge of our project was to transform a large number of different data sets so that they are comparable.
----
Especially for cross-linguistic studies, the NoRaRe database is the perfect starting point and properties can be compared easily across languages.
----
The results of the comparison of arousal and valence ratings across three languages showed that our approach provides important insights about the comparability of word properties.
----
Yet, there are also limitations to our approach.
----
- Although many studies collected data on the same properties, such as concreteness or imageability, most of them use different scales (e.g., 5-, 7-, or 9-point scales).
----
- The number of items that can be compared is limited by the concepts provided in Concepticon (List et al. 2021).
----
However, this is only the beginning and we will extend the data in the future.
---
<img src="https://pad.gwdg.de/uploads/01b71c50-6d65-4656-828d-8964044704fa.jpeg" alt="drawing" width="75"/> <img src="https://pad.gwdg.de/uploads/695f4580-16f7-4cf3-8924-ce3f164a6d32.jpeg" alt="drawing" width="80"/> <img src="https://pad.gwdg.de/uploads/a8f45403-1e99-4da5-8c73-1881972245cc.jpeg" alt="drawing" width="80"/>
:::info
Tjuka, Annika, Robert Forkel & Johann-Mattis List. 2021. [Linking norms, ratings, and relations of words and concepts across multiple language varieties](https://doi.org/10.3758/s13428-021-01650-1). _Behavior Research Methods_.
:::
---
# Tutorials
----
:::info
Tjuka, Annika. "Adding data sets to NoRaRe: A guide for beginners," [Blog post] in _Computer-Assisted Language Comparison in Practice_, 11/08/2021, https://calc.hypotheses.org/2890.
:::
----
:::info
Tjuka, Annika. "Comparing NoRaRe data sets: Calculation of correlations and creation of plots in R," [Blog post] in _Computer-Assisted Language Comparison in Practice_, 01/11/2021, https://calc.hypotheses.org/3109.
:::
---
# Thank you
Contact:
[@AnnikaTjuka](https://twitter.com/AnnikaTjuka)
tjuka@shh.mpg.de