Research on Modern Latvian Language and Development of Language Technology (LATE)

Year 2022 Dec–2024 Dec
Funding State Research Programme
Letonika – Fostering a Latvian and European Society
VPP-LETONIKA-2021/1-0006
Partners Latvian Language Institute UL, Liepaja University, Faculty of Humanties UL, Institute of Literature, Folklore and Art UL
Abstract The aim of the project is to advance research on the grammatical, lexical-semantic, phonetic and phonological system of the modern Latvian language, and Latvian sign language using data-driven methods, as well as to develop sustainable Latvian language resources and tools. In order to achieve the goal, the Latvian speech corpus, the pilot corpus of Latvian sign language will be developed, and Tezaurs.lv and “Dictionary of Contemporary Latvian” will be improved. Based on Latvian grammar studies, “Latvian Treebank” will be enhanced. These resources will be integrated into a single Latvian language research infrastructure, as well into the CLARIN-LV repository. During the project, a LATE platform for speech transcription and subtitling will be created.
Homepage http://www.digitalhumanities.lv/projects/vpp-late/

Publications

L. Rituma, G. Nespore-Berzkalne, B. Saulite, L. Pretkalnina
Vārdkopas analogi „Latviešu valodas sintaktiski marķētajā korpusā”
Valoda: nozīme un forma (Analogue of subordinate phrase in Latvian Treebank), 156-173, 2023
PDF, DOI, BibTeX
L. Lauze and I. Auzina
Korpusu un individuālā vākuma salīdzinājums: ģenitīva un nominatīva konkurence saistījumā ar adverbu
Valoda: nozīme un forma (A comparison of corpora and individual collection: Genitive and nominative competition in connection with an adverb), 12, 111-125, 2023
PDF, DOI, BibTeX
L. Pretkalnina
Formāls latviešu valodas gramatikas modelis un tā realizācija mašīnlasāmā sintakses korpusā
2023
PDF, BibTeX
M. Grasmanis, P. Paikens, L. Pretkalnina, L. Rituma, L. Strankale, A. Znotins, N. Gruzitis
Tēzaurs.lv – the experience of building a multifunctional lexical resource
Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference, Lexical Computing CZ s.r.o., 2023
PDF, BibTeX
I. Skadina, I. Auzina, R. Dargis, E. Lasmanis, A. Voitkans
CLARIN-LV: Many Steps till Operation
CLARIN Annual Conference Proceedings, 2022
PDF, BibTeX
B. Saulite, R. Dargis, N. Gruzitis, I. Auzina, K. Levane-Petrova, L. Pretkalnina, L. Rituma, P. Paikens, A. Znotins, L. Strankale et al.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022
PDF, BibTeX
R. Dargis, I. Auzina, I. Kaija, K. Levane-Petrova, K. Pokratniece
LaVA – Latvian Language Learner corpus
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022
PDF, BibTeX
P. Paikens, M. Grasmanis, A. Klints, I. Lokmane, L. Pretkalnina, L. Rituma, M. Stade, L. Strankale
Towards Latvian WordNet
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022
PDF, BibTeX