Theory and Implementation of a Head-Driven Phrase Structure Grammar for Persian

Grant Agencies

DFG and ANR (Grant Number MU 2822/3-1)

Principle Investigators

Stefan Müller and Pollet Samvelian (Universite Paris III Sorbonne Nouvelle)




Pollet Samvelian (Paris: descriptive work, syntax, morphology, semantics)
Stefan Müller (Berlin: syntax, morphology, semantics, integration with German, Danish, Maltese, and Mandarin grammar)
Masood Ghayoomi (Berlin: descriptive work, syn­tax, mor­phol­o­gy, se­man­tics)
Olivier Bonami (Paris)
Lionel Clément (Paris)
Kim Gerdes (Paris)
Benoît Sagot (Paris)
Soha Safaï (Paris)
N.N. (Paris)

Web page of the project in Paris


The goal of this project is the description of central phenomena in Persian and the development of a non-trivial grammar fragment in the framework of HPSG. This grammar will cover a subset of the phenomena that are covered in existing computational grammars of German: Long Distance Dependencies, local reorderings (scrambling), Passive, and Control. In addition the nominal domain of Persian, which is quite different from what is known from German, and the complex noun-verb predicates, which constitute a central phenomenon in the Persian lexicon-grammar, will be modeled.

In parallel, the project includes the development of various lexical resources: a) a full form lexicon of verbs and common nouns, b) valency frames for verbs c) the most common Light Verb Constructions (LVCs) and including idiomatic preverb light verb combinations.

The project aims for a tight integration of theory and implementation. The analysis will build on already existing implementations of grammar fragments for German, Maltese, and Mandarin Chinese. The grammar fragments of the respective languages were implemented so that they use a large common core or common parts that represent certain language classes.

The grammar development aims to avoid language specific rules or features. However if the stipulation of such rules or features turns out to be unavoidable for the description of certain phenomena, this provides evidence for typological differences that will be the basis of descriptive and theoretical publications.


  • Bijankhan, Mahmood and Javad Sheykhzadegan and Mohammad Bahrani and Masood Ghayoomi (2011) "Lessons from building a Persian written corpus: Peykare" In Language Resources and Evaluation, 45 (2): 143-164. Springer.
  • Ghayoomi, Masood (2010) "Using Variance as a Stopping Criterion for Active Learning of Frame Assignment" In Proceedings of the NAACL-HLT 2010 Workshop on Active Learning for Natural Language Processing, Los Angeles, USA, 6 June 2010, pp: 1-9.
  • Ghayoomi, Masood (2012) "Bootstrapping the Development of an HPSG-based Treebank for Persian" In Linguistic Issues in Language Technology, 7 (1). CSLI Publications.
  • Ghayoomi, Masood (2012) "From Grammar Rule Extraction to Treebanking: A Bootstrapping Approach" In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), 23-25 May, 2012; Istanbul, Turkey, pp: 1912-1919.
  • Ghayoomi, Masood and Saeedeh Momtazi and Mahmood Bijankhan (2010) A Study of Corpus Development for Persian In International Journal on Asian Language Processing 20 (1): 17–33.
  • Müller, Stefan (2010) Persian Complex Predicates and the Limits of Inheritance-Based Analyses. Journal of Linguistics 46(3):601–655.
  • Müller, Stefan and Masood Ghayoomi (2010) PerGram: A TRALE Implementation of an HPSG Fragment of Persian. In Proceedings of 2010 IEEE International Multiconference on Computer Science and Information Technology – Computational Linguistics Applications (CLA'10), Wisła, Poland, pp: 461–467, 18–20 October 2010.
  • Sagot, Benoît and Géraldine Walther (2010) Développement de Ressources pour le Persan: Lexique Morphologique et Chaîne de Traitements de Surface. 17 Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2010), Montreal, 19–23 July 2010.
  • Sagot, Benoît and Géraldine Walther (2010) A Morphological Lexicon for the Persian Language. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC'10). Valletta, Malta, pp:300–303, 17–23 May 2010.
  • Samvelian, Pollet and Jesse Tseng (2010) Persian Object Clitics and the Syntax-morphology Interface. In 17th International Conference on Head-Driven Phrase Structure Grammar (HPSG 2010). Paris, pp: 212–232, 7–10 July 2010.