n-gram

Introduction:
Wikipedia defines  An n-gram as:  a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.
An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram"; and size 4 or more is simply called an "n-gram".
Usages
n-grams are used in various areas of statistical natural language processing and genetic sequence analysis.
Examples:
Examples of word level 3-grams and 4-grams (and counts of the number of times they appeared) from the Google n-gram corpus.
  • ceramics collectables collectibles (55)
  • ceramics collectables fine (130)
  • ceramics collected by (52)
  • ceramics collectible pottery (50)
  • ceramics collectibles cooking (45)
4-grams
  • serve as the incoming (92)
  • serve as the incubator (99)
  • serve as the independent (794)
  • serve as the index (223)
  • serve as the indication (72)
  • serve as the indicator (120)
References:
1. Wikipedia: N-Gram

0 comments:

Post a Comment