V rámci doktorského semináře 28.11. budou od 14.10. v m.č. P105 představeny následující prezentace:
Jakub Sláma: K efektivitě manuální a poloautomatické excerpce neologismů
On the Efficiency of Manual and Semi-automatic Detection of Neologisms
The paper presents a simple semi-automatic neologism detection procedure: a trivial Python script processes a text file, making use of a Czech morphological tagger, and extracts all words unrecognized by the tagger as potential neologisms. The list of these candidates has to be checked by a human (hence semi-automatic). This method was applied to a set of texts that were also analyzed in a more traditional way, by the “reading and marking” technique (i.e. the current practice at the Czech Language Institute). The comparison of the two methods has revealed that the simple semi-automatic procedure clearly outperforms the current practice both in speed and in efficiency.
Veronika Raušová: Non-standard functions of like in spoken discourse: a diachronic view
Sonda k dizertační práci, kde se na 100 + 100 příkladech ze dvou korpusů (BNC1994 a BNC2014) zkoumá, jestli se něco změnilo v užívání “like” v britské angličtině.
Denisa Šebestová: N-gram-based methodologies in genre analysis: Characterising children’s fiction
- data-driven, frequency-based phraseology
- n-grams = repeated sequences of n words, not necessarily structured in terms of grammar or semantics X patterns = structured sequences
- methodologically – exploring different uses of n-grams
- typologically – what caveats do n-grams present in EN & CZ? what typological features stand out?
- genre-wise – what genre characteristics can n-grams reveal?
- experimenting with different n-gram lengths: which lengths reflect which text features
- n-grams as a starting point towards the identification of patterns (linguistic building blocks)
- semantic classification of patterns, with focus on temporal expressions
- identifying cooccurrences in text > how do frequent word-combinations contribute to structuring the text?
by combining these methods, I aim to provide a complex corpus-driven characteristics of the genre of children’s literature