Prezentace v rámci doktorského semináře 28.11. ve 14.10

V rámci doktorského semináře 28.11. budou od 14.10. v m.č. P105 představeny následující prezentace:

Jakub Sláma: K efektivitě manuální a poloautomatické excerpce neologismů

On the Efficiency of Manual and Semi-automatic Detection of Neologisms

The paper presents a simple semi-automatic neologism detection procedure: a trivial Python script processes a text file, making use of a Czech morphological tagger, and extracts all words unrecognized by the tagger as potential neologisms. The list of these candidates has to be checked by a human (hence semi-automatic). This method was applied to a set of texts that were also analyzed in a more traditional way, by the “reading and marking” technique (i.e. the current practice at the Czech Language Institute). The comparison of the two methods has revealed that the simple semi-automatic procedure clearly outperforms the current practice both in speed and in efficiency.

Veronika Raušová: Non-standard functions of like in spoken discourse: a diachronic view

Sonda k dizertační práci, kde se na 100 + 100 příkladech ze dvou korpusů (BNC1994 a BNC2014) zkoumá, jestli se něco změnilo v užívání “like” v britské angličtině.

Denisa Šebestová: N-gram-based methodologies in genre analysis: Characterising children’s fiction


  • data-driven, frequency-based phraseology
  • n-grams = repeated sequences of n words, not necessarily structured in terms of grammar or semantics X patterns = structured sequences


  • methodologically – exploring different uses of n-grams
  • typologically – what caveats do n-grams present in EN & CZ? what typological features stand out?
  • genre-wise – what genre characteristics can n-grams reveal?


  • experimenting with different n-gram lengths: which lengths reflect which text features
  • n-grams as a starting point towards the identification of patterns (linguistic building blocks)
  • semantic classification of patterns, with focus on temporal expressions
  • identifying cooccurrences in text > how do frequent word-combinations contribute to structuring the text?

by combining these methods, I aim to provide a complex corpus-driven characteristics of the genre of children’s literature

