Usage Details ============= Rake object accepts the following as the constructor parameters, by varying which this implementation of Rake algorithm can be used for several different purposes. * stopwords : List of words which are considered to break sentences into phrases. Defaults to ``None``. * punctuations : List of punctuations which are considered to break sentences into phrases. Defaults to ``None``. * language : Language to use for tokenization and stopwords. Defaults to ``english``. * ranking_metric : Metric to use for ranking of metrics * Ratio of degree of word to its frequency d(w)/f(w). (Default) * Degree of word only. * Frequency of word only. * max_length : Of phrases to consider. Defaults to ``100000``. * min_length : Of phrases to consider. Defaults to ``1``. to use it with a specific language supported by nltk. ----------------------------------------------------- .. code:: python from rake_nltk import Rake r = Rake(language=) Implementation automatically picks up the ``stopwords`` for that language and default ``punctuation`` set. to provide your own list of stop words and punctuations ------------------------------------------------------- .. code:: python from rake_nltk import Rake r = Rake( stopwords=, punctuations= ) to control the metric for ranking --------------------------------- .. code:: python from rake_nltk import Metric, Rake # Paper uses d(w)/f(w) as the final metric. You can use this API with the # following metrics: # 1. d(w)/f(w) (Default metric) Ratio of degree of word to its frequency. r = Rake(ranking_metric=Metric.DEGREE_TO_FREQUENCY_RATIO) # 2. d(w) Degree of word only. r = Rake(ranking_metric=Metric.WORD_DEGREE) # 3. f(w) Frequency of word only. r = Rake(ranking_metric=Metric.WORD_FREQUENCY) to control the max or min words in a phrase ------------------------------------------- So that only phrases of specific min and/or max lengthes are considered for ranking .. code:: python from rake_nltk import Rake r = Rake(min_length=2, max_length=4) to control whether or not to include repeated phrases in text ------------------------------------------------------------- So that user can choose to include all phrases generated from text or to include phrases only once. Example: "Magic systems is a company. Magic systems was founded in a garage" has the phrase (magic, systems) occuring twice. .. code:: python from rake_nltk import Rake # To include all phrases even the repeated ones. r = Rake() # Equivalent to Rake(include_repeated_phrases=True) # To include all phrases only once and ignore the repetitions r = Rake(include_repeated_phrases=False) to control the sentence tokenizer --------------------------------- So that user can choose the sentence tokenizer they want to use. .. code:: python from rake_nltk import Rake # To use default `nltk.tokenize.sent_tokenize` tokenizer. r = Rake() # Equivalent to Rake(sentence_tokenizer=nltk.tokenize.sent_tokenize) # To use a custom tokenizer. def custom_tokenizer(text: str) -> List[str]: ... r = Rake(sentence_tokenizer=custom_tokenizer) to control the word tokenizer --------------------------------- So that user can choose the word tokenizer they want to use. .. code:: python from rake_nltk import Rake # To use default `nltk.tokenize.wordpunct_tokenize` tokenizer. r = Rake() # Equivalent to Rake(word_tokenizer=nltk.tokenize.wordpunct_tokenize) # To use a custom tokenizer. def custom_tokenizer(text: str) -> List[str]: ... r = Rake(word_tokenizer=custom_tokenizer)