News

Doctoral Research Seminar on "Linguistic Graphs - Small Language Models?"


The next Doctoral Research Seminar, taking place next Monday, will feature a presentation titled "Linguistic Graphs - Small Language Models?" by Dr. Jakob Prange.

Time: December 2nd 10:30-11:30
Place: Seminar Room 2026 (Karlstr. 45)

Language models such as the GPT series are getting larger and larger. In some tasks and metrics, large language models (LLMs) outperform humans. In many others, they are easily fooled.
It is beyond any debate that modern LLM capabilities are extremely impressive, but also that they do not handle language like a human does. Linguistic theories, supported by psychological evidence, maintain that humans process language hierarchically. This hierarchical structure can be encoded computationally as trees or DAGs. In the face of the impressive performance of LLMs, the question then arises whether linguistic graphs can tell us something about the structure of language that sequential neural networks cannot. To put it more precisely, can combined neuro-symbolic models be better models of language than purely neural ones? After summarizing several recent publications addressing this problem from different angles, I finally arrive at the question of what a "good model of language" is, or should be. In Prange et al. (TACL, 2021), we propose novel methods for top-down tree-structured prediction that account for the internal structure of linguistic categories called CCG supertags. Traditionally treated as opaque labels, supertags form an open-ended and sparse distribution. Our best model needs only a fraction of the parameters of state-of-the-art alternatives to match their performance on frequent tags, and additionally recovers a sizeable portion of rare and even unseen ones. In a different series of studies (Prange et al., NAACL-HLT 2022; Prange and Chersoni, *SEM 2023), we examine if and how different linguistic graph representations can complement and improve the GPT-2 model. We develop several neural graph encoding methods, following the maxim "simple but effective". The final representation requires only a handful of parameters per token and is twice as fast as R-GCN, a popular graph-convolution-based method, while still showing a perplexity advantage compared to the neural baseline.