koinenlp API reference¶
-
koinenlp.final_sigma(text)¶ Return the given text with final sigmas normalized to normal sigmas.
-
koinenlp.lowercase(text)¶ Return the given text in lowercase.
-
koinenlp.normalize(text)¶ Return the given text in a normalized form suitable for indexing.
Namely, return after converting to lowercase, removing diacritics, converting final sigma to sigma, expanding elision to the full form and normalizing for unicode.
-
koinenlp.remove_elision(text, diacritics=False)¶ Return the given text with all instances of elision removed.
Pass diacritics=True if the input text contains diacritics. These must be removed for elisions can be detected and removed.
-
koinenlp.remove_punctuation(text)¶ Return the given text with punctuation removed.
-
koinenlp.simplify_tag(tag)¶ Simplify the given tag, returning only the POS portion.
This function may be given as the tag_mapping_function to the nltk.corpus.reader.TaggedCorpusReader (or similar) class. This allows the argument simplify_tags=True to be passed to tagged_* methods on corpora.
-
koinenlp.strip_diacritics(text)¶ Return the given text string with Unicode diacritics removed.
-
koinenlp.unicode_normalize(text)¶ Return the given text normalized to Unicode NFKC.