Can language models pre-pretrained on Linear Context-Free Rewriting Systems (LCFRSs) receive effective transfer of features imparting linguistic biases that allow them to outperform those pre-pretrained on strongly Context-Sensitive Languages?
Prior work (Papadimitriou and Jurafsky, 2020; Chiang and Lee, 2022; McCoy and Griffiths, 2025, Hu et al. 2025) has established the benefits of pre-pretraining language models on formal languages for the more rapid acquisition of natural language due to the imparting of linguistic biases, an effect which interestingly does not show up when attempting to use natural languages for the same purpose. Context-sensitive languages (CSLs) have proven to be better for this task than context-free languages (Papadimitriou and Jurafsky 2023), which can be said to mirror the fact that natural language (morpho)syntax is not purely context-free. However, there are intermediary classes between the two, most notably linear context-free rewriting systems (LCFRSs), which are said to be "mildly context-sensitive". Notably, LCFRSs are able to capture many of the context-sensitive aspects of (morpho)syntax while remaining computationally simpler than CSLs at-large: LCFRSs are parsable in polynomial time (Ivliev 2020), while CSLs at-large parse in exponential time in the worst cases.
Given that LCFRSs seem to be more tightly linked with natural language (morpho)syntax than CSLs at large, might restricting a pre-pretraining set of formal languages to LCFRSs allow for more optimal linguistic biases to be induced compared with those resulting from pre-pretraining on CSLs that are more strongly context-sensitive?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{chinnappa-prepretraining-on-lcfrss-2026,
author = {Chinnappa, Surya},
title = {Pre-pretraining on LCFRSs vs Strongly CSLs},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/fPWrG5mBJfJ3KFhy5lP8}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!