Title:                      Corpus-Based Approaches to the Balkan Languages and Dialects


Date:                     December 5–7, 2016

Location:              Saint Petersburg, Russia


Host Institution:     Institute for Linguistic Studies of the Russian Academy of Sciences

                             199004, Tuchkov pereulok 9, Saint Petersburg, Russia

Organizers:           Alexander Yu. Rusakov, Maria S. Morozova, Maxim L. Kisilier

Meeting Email:      This email address is being protected from spambots. You need JavaScript enabled to view it.


Linguistic Fields:   Balkan Linguistics, Corpus Linguistics, Phonetics, Morphosyntax, Lexicon


Meeting Description:

The first international conference on corpus-based approaches to the Balkan languages and dialects is intended as a gathering of scholars who conduct corpus-based studies of various linguistic phenomena in any or several of the Balkan languages, and/or are involved in building annotated linguistic corpora, linguistic archives and interactive databases for the Balkan languages and dialects.


Important Dates:

Submission of Abstracts:   April 15, 2016

Notification to the Authors:    May 15, 2016


Invited Speakers:

Ruprecht von Waldenfels

(Department of Slavic Languages and Literatures, University of California, Berkeley)


In the last decades digital corpora of the Balkan languages have been developed by scholarly institutions and universities in the Balkans and worldwide. For the time being, Balkan linguists can make use of a range of projects, such as the Bulgarian National Corpus (developed by the Bulgarian Academy of Sciences), BCS Gralis Corpus (developed in the University of Graz, Austria), Croatian National Corpus (developed in Zagreb University, Croatia), Albanian National Corpus and the Corpus of Modern Greek (both developed by the Russian Academy of Sciences), etc. These corpora constitute a very useful tool, which can facilitate various kinds of linguistic research, where unstructured electronic text collections do not suffice or are not available.


One of the conference’s aims is to share experience in developing electronic corpora and interactive databases of the Balkan languages and dialects (corpora of written language, parallel corpora of the Balkan languages, corpora of spoken language, dialect corpora, etc.). Participants are invited to present either completed or on-going projects, and report on theoretical and practical challenges they have already encountered or are likely to encounter when developing a corpus of (a) Balkan language(s) and discuss possible solutions to these problems. Potential domains of inquiry include:

- structure of corpus (subcorpora, types of texts), selection of texts and their presentation in corpus (transcription, translations, use of standard orthographies);

- development of linguistic (morphological, syntactic and semantic) annotation standards and metadata descriptions.

Another major aim is to present case studies facilitated by existing corpora and interactive databases of the Balkan languages and dialects. Participants are expected to share the results of their own corpus-based researches in various linguistic domains. The investigations may include:

- phonetic, morphosyntactic and lexical research using available corpora of the Balkan languages and dialects;

- diachronic and synchronic studies with the use of language corpora;

- the investigations of written language as well as corpus-based studies of spoken language covering various aspects of spontaneous speech analysis and language acquisition issues.


The languages of the conference are English, French, German, and Russian.

Speaking time is 20 minutes, plus 10 minutes for discussion.


Abstracts of 400–500 words in MS Word (.doc/.docx/.rtf) format should be submitted to This email address is being protected from spambots. You need JavaScript enabled to view it. by April 15, 2016.

Make sure that the abstract you submit contains the exact title of your paper and the following information about the author(s): (1) the first name and surname of every author, (2) the affiliation of every author, (3) the email address of every author.

Abstracts will be evaluated and selected by the Programme Committee. We will notify you of the Committee’s decision by May 15, 2016.


There is no participation fee.

The organisers will not be able to cover participants’ travel and hotel expenses.



Dobrić, Nikola. 2012. Language Corpora in the West Balkans – History, Current State and Future Perspective. Slavistična revija No. 60, Vol. 4, pp. 677–692.

Goutsos, Dionysis. 2010. The Corpus of Greek Texts: A reference corpus for Modern Greek. Corpora 5/1, pp. 29–44.

Morozova, Maria S., Alexander Yu. Rusakov. 2015. Albanian National Corpus: Composition, Text Processing and Corpus-Oriented Grammar Development. In Bardhyl Demiraj (hrsg.), Sprache und Kultur der Albaner. Zeitliche und räumliche Dimensionen. Akten der 5. Deutsch-albanischen kulturwissenschaftlichen Tagung (5.–8. Juni 2014, Buçimas bei Pogradec, Albanien), Albanische Forschungen 37, Wiesbaden, Harrassowitz, pp. 270–308.

Tadić, Marko. 2009. New Version of the Croatian National Corpus. In Hlaváčková, Dana, Aleš Horák, Klara Osolsobě, Pavel Rychlý (ur.), After Half a Century of Slavonic Natural Language Processing, Masaryk University, Brno, pp. 199–205.

Albanian National Corpus. language=en (accessed on January 29, 2016)

Bulgarian National Corpus. (accessed on January 29, 2016)

Corpus of Modern Greek. language=en (accessed on January 29, 2016)

Croatian National Corpus. (accessed on January 29, 2016)

Gralis Text-Corpus. (accessed on January 29, 2016)