koinenlp API reference¶
-
koinenlp.
final_sigma
(text)¶ Return the given text with final sigmas normalized to normal sigmas.
-
koinenlp.
lowercase
(text)¶ Return the given text in lowercase.
-
koinenlp.
normalize
(text)¶ Return the given text in a normalized form suitable for indexing.
Namely, return after converting to lowercase, removing diacritics, converting final sigma to sigma, expanding elision to the full form and normalizing for unicode.
-
koinenlp.
remove_elision
(text, diacritics=False)¶ Return the given text with all instances of elision removed.
Pass diacritics=True if the input text contains diacritics. These must be removed for elisions can be detected and removed.
-
koinenlp.
remove_punctuation
(text)¶ Return the given text with punctuation removed.
-
koinenlp.
simplify_tag
(tag)¶ Simplify the given tag, returning only the POS portion.
This function may be given as the tag_mapping_function to the nltk.corpus.reader.TaggedCorpusReader (or similar) class. This allows the argument simplify_tags=True to be passed to tagged_* methods on corpora.
-
koinenlp.
strip_diacritics
(text)¶ Return the given text string with Unicode diacritics removed.
-
koinenlp.
unicode_normalize
(text)¶ Return the given text normalized to Unicode NFKC.