(May 7th 2018, Miyazaki; part of the LREC-2018 workshop structure)

Special Topic

Interoperability of corpus query and analysis systems


Large corpora require careful design, licensing, collecting, cleaning, encoding, annotation, management, storage, retrieval, analysis, and curation to unfold their potential for a wide range of research questions and users, across a number of disciplines. Apart from the usual CMLC topics that fall into these areas, CMLC-6 will have a special focus on corpus query and analysis systems and specifically on goals concerning their interoperability.

In the past 5 years, a whole new generation of corpus query engines that overcome limitations on the number of tokens and annotation layers has started to emerge at several different places. While there seems to be a consensus that there can be no single corpus tool that fulfills the need of all communities and that a degree of heterogeneity is required, the time seems ripe to discuss whether (further, unrestricted) divergence should be avoided in order to allow for some interoperability and reusability – and how this can be achieved. The two most prominent areas where interoperability seems highly desirable are query languages and software components for corpus analysis. The former issue is already partially addressed by the proposed ISO standard Corpus Query Lingua Franca (CQLF). Components for corpus analysis, on the other hand, should in an ideal world be exchangeable and reusable across different platforms, not only to avoid redundancies, but also to foster replicability and a canonization of methodology in NLP and corpus linguistics.

The 6th edition of the workshop will devote much of its time to these issues, including an expert panel discussion with representatives of tool development teams and power users.


09.00 – 10.30     Session 1: Management and Search

10.30 – 11.00     Coffee Break

11.00 – 12.00     Session 2: Query and Interoperability

12.00 – 13.00     Panel Discussion: Interoperability and Extensibility of Analysis Components in Corpus Query Tools

Programme Committee

Organizing Committee

[hover for the e-mail address]

Institut für Deutsche Sprache, Mannheim

Piotr Bański, Marc Kupietz, Andreas Witt

Academiae Corpora, Austrian Academy of Sciences, Vienna

Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder

Institute of Computational Linguistics, University of Zurich

Simon Clematide

Friedrich-Alexander-Universität Erlangen-Nürnberg

Stefan Evert


CMLC series homepage is located at http://corpora.ids-mannheim.de/cmlc.html