The origins and development of computational linguistics in Uppsala

The concept of a university computing center in Uppsala was focussed on the installation and operation of a powerful electronic computer facility. But it also included a ‘support programming group’, whose task was to initiate the use of such facilities in academic fields other than the natural sciences.

The University Computing Centers were also tasked with using this support programming group to spread awareness of the new technology to the regional economy and the public sector, as well as to provide assistance – on commercial terms – with the use of technology.

It was this part of the infrastructure that laid the foundations of the most essential and enduring innovations that have their roots in Uppsala. Computational Linguistics – or Language Technology as it is also called nowadays – is one of these.

The introduction of computer use into research areas where this had not yet occurred took the form of ‘support program seminars’. This involved the UDAC director, Schneider, contacting a particular institution with a proposal for a series of at least three seminars, according to the following schedule:

– A presentation of the power of the new research tools, if possible with reference to what was already going on in the world.

– A representative of the department presented the ongoing and planned research, together with ideas of areas where the new tools might enable what had hitherto – unfortunately – been impossible.

– Based on the first two meetings, discussions were held about potential pilot projects. Schneider stipulated that there should be at least two of these. It was of course impossible at this stage to judge whether the projects would succeed or not. But at least one in three ought to be successful.

During the fall of 1965 Schneider offered seminar series in this format together with support for initializing two to three pilot projects to various linguistics departments at Uppsala University and it was the department of Slavic languages that was most interested. The pioneer there was Associate Professor Carin Davidsson. One advantage was that Werner Schneider had already been involved in starting similar activities at the University of Aarhus.

The first project to be defined and that began almost immediately was a “backwards sorted” Czech dictionary.

The second project was much more ambitious. The goal was to develop computational tools for a variety of linguistic specialties, such as language analysis. The overarching aim was machine translation of texts, especially between Russian and Swedish. Carin Davidsson suggested a person she felt very suited to lead such a project, supported by an assistant programmer. The person was a young research graduate with a pioneering spirit, Anna-Lena Sågvall, who had just started her graduate studies in Slavic languages in Leningrad. The studies focused on machine translation of Russian – initially the analysis of Russian words and their component parts.

When Anna-Lena returned to Uppsala, Schneider hired her as half-time System Administrator at UDAC. He had managed to convince his Treasury-appointed boss, University Counsellor Gunnar Wijkman, that the optimal support programmer in the linguistics field was a combination of a half-time specialist linguist, Anna-Lena, together with a half-time systems analyst / programmer. Bengt Castman was employed to help her carry out the first study. It led to an analysis program that could recognize and analyze verbs in continuous Russian text, an important step on the way to the machine translation of Russian texts. It became her thesis for her masters degree, which she took in 1970.

Activities in Uppsala since 1971

Anna-Lena Sågvall combined her part-time position at UDAC with a part-time post at the Slavic Institute at Uppsala University, where she amongst other things taught Russian text interpretation and grammar. But it became fragmented and strenuous to work in two places and she then decided to invest wholeheartedly in UDAC, which in her view had the “higher ceiling”. She turned to Werner Schneider and asked for permission to work full time. And it was no problem. There was in fact already an agreement between UDAC and the linguistics section of the Faculty of Arts to the effect that the computer center would provide support to linguists who needed computers. This arrangement was certainly not a common occurrence at the time.

An important factor was that Sågvall through both her university studies and in-house training had learned computer programming. She had started her studies with machine code and Algol, and soon mastered FORTRAN and eventually other programming languages such as PL / I, and LISP.

Anna-Lena Sågvall now expanded her research to include the analysis of all the Russian parts of speech. And meanwhile a small working group was formed under her direction to do what we would today call Computational Linguistics.

The work on Russian parts of speech was documented in an extensive manuscript and the Humanities Research Council approved funding for it to be printed as a monograph. In 1971, before the book was printed, Sågvall met Sture Allen, Professor of Linguistic Computing, for the first time. He read the manuscript and advised her to present the book as her doctoral thesis. The Slavic experts agreed, and after some additions in 1974, she was awarded her Ph.D. in Slavic Languages with this thesis.

Sågvall’s research group, located at UDAC, now consisted of three people (“the girls”), who with pedagogical support from UKÄ developed a teaching tool that could facilitate the learning of Russian through the computer systematization of vocabulary. This tool was used in the Slavic Department’s undergraduate program for more than 10 years and also in the Russian teaching at Interpreter School for many years. The group also provided support to philologists, including those involved in other languages than Russian, such as English, Finnish, Estonian, Bulgarian and French.

Eventually the group was integrated into the linguistics department as the Centre for Computational Linguistics with Anna-Lena Sågvall, later Anna Sågvall Hein, as director. But the group was still working closely with UDAC, and through support from Werner Schneider was given the opportunity to participate in various conferences in computational linguistics that now sprouted up around the world. It was at this time that relational databases began to be current, and here UDAC lay right at the forefront of research through the development of Mimer.

As a result of her research focus, Anna Sågvall Hein stood with one leg in the linguistics camp and the other in computer sciences. It culminated in 1981 with her appointment as associate professor of computational linguistics.

The Centre for Computational Linguistics, UCDL, conducted its own research and development but was also available to linguists who wanted computer help on their own initiative. It also provided stand-alone courses, for instance under the heading “Language and Computers”. And the center also participated in an EU project (CONNECT), which aimed to develop language technology skills in small and medium businesses.

1986 was a turning point for Anna Sågvall Hein. Sture Allen became Permanent Secretary of the Swedish Academy, and he asked her if she would be willing to deputize for him as Acting Professor of Computational Linguistics in Gothenburg. She accepted and moved – as a single mother of two – to Gothenburg. With the greater resources there, she could concentrate on research and teaching.

The professorship and academic development

Meanwhile, computational linguistics was developing as a science in Sweden and abroad, and in 1987 Uppsala University announced a professorship in Computational Linguistics. Anna Sågvall Hein applied for the position and took it up on 1 January 1990.

The professorship was originally located at the Center for Computational Linguistics, but on a proposal from the department of Philology the Center combined with the Institute of Linguistics in 1992. The department of Philology later became a separate faculty and Sågvall Hein was appointed its dean in 2002. During her time as dean she reorganized the faculty to include four institutes – from the previous nine – one of which was Linguistics and Philology, where linguistics, classical languages and Afro-Asiatic languages were included in addition to computational linguistics.

At the international evaluation of research at Uppsala University conducted in 2007 (KOF 07), Computational Linguistics achieved the highest quality score. http://www.uu.se/digitalAssets/80/80671_KoF07_kort.pdf.

“The strongest points are automatic syntactic analysis, advanced methodology in machine translation research and corpus-based research for applied purposes like machine translation, e-learning and automatic language understanding.” The positive outcome of the evaluation led to the research team being awarded funds to hire a visiting professor, and at the re-evaluation of research at Uppsala University in 2011 (KoF11), the group once again received the highest quality rating. “We grade the quality of the research as top-quality, world-leading […]. It is a relatively small (<20) but extremely productive group with an impact comparable to that of some of the best CL groups at Berkeley and MIT. “Anna Sågvall Hein had by then retired (2008) and been succeeded by Joakim Nivre.

Today there are professorships in Computational Linguistics at many universities, both Swedish and international. What is unique about the professorship in Uppsala is that it has its origin in the operation of a university computer center. This was made possible by Werner Schneider’s creative and forward-thinking leadership. Other professorships in Scandinavia have developed in more classical subject-based university environments related to either linguistics or computer science.

The research involves, amongst other things, exploring languages and the relationship between them by examining large-scale computerized text databases. One major application area is machine translation and particularly interesting in this context is texts with translations. With automatic methods, you connect them – sentence by sentence and word by word – to find more translation options. By adding information about the grammatical properties of the texts, both automatically and manually, you can create a basis for building translation dictionaries and philological studies of various kinds. At the same time you create a valuable basis for machine translation. It has been shown that even today the Mimer relational database and the international standard SQL are well-suited in terms of research in computational linguistics. The data you enter is in principle independent of language, and there are no problems with including all possible types of letters and other characters.

Convertus AB and commercial development

In 2006 Sågvall Hein and her group formed a language technology company, Convertus AB (www.convertus.se), specializing in machine translation, language editing and terminology control. The company’s main product is a translation service for machine translation of academic curricula from Swedish to English. The company has most of the major Swedish universities among its customers. Furthermore, the company has initiated and engaged as a partner in an EU project, the Bologna Project (http://www.bologna-translation.eu/), whose goal is to expand Convertus technology to include the translation of new languages (German, French, Spanish, Portuguese, Dutch, Finnish and Turkish) into English and of English into Chinese. Convertus technology offers a holistic solution to the translation task, and includes examination of the source text as well as the translation itself and an environment for editing and reuse of previous edits. This enables the service to be trained for a specific user and to make gradual improvements.

The technology combines statistical and linguistic methods. For instance, linguistics methodologies are used to refine the results of statistical machine translation (e.g. Google).

Conclusion

Today Computational Linguistics remains strong in Uppsala, both from academic and applied, commercial perspectives. Its creation and development is one of many examples of innovative and sustainable development springing from Werner Schneider’s visionary and innovative leadership and the nurturing of UDAC. Without these efforts Computational Linguistics would most likely not exist in Uppsala today.