koinenlp API reference


Return the given text with final sigmas normalized to normal sigmas.


Return the given text in lowercase.


Return the given text in a normalized form suitable for indexing.

Namely, return after converting to lowercase, removing diacritics, converting final sigma to sigma, expanding elision to the full form and normalizing for unicode.

koinenlp.remove_elision(text, diacritics=False)

Return the given text with all instances of elision removed.

Pass diacritics=True if the input text contains diacritics. These must be removed for elisions can be detected and removed.


Return the given text with punctuation removed.


Simplify the given tag, returning only the POS portion.

This function may be given as the tag_mapping_function to the nltk.corpus.reader.TaggedCorpusReader (or similar) class. This allows the argument simplify_tags=True to be passed to tagged_* methods on corpora.


Return the given text string with Unicode diacritics removed.


Return the given text normalized to Unicode NFKC.