Bigrams and Trigrams and Tokens - Oh My!

<  Bigrams and Trigrams and Tokens - Oh My!

This post is for the SEOs, word geeks and other webrepreneurs that find their way here.

http://www.ldc.upenn.edu

” This data set, contributed by Google Inc., contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. We expect this data will be useful for statistical language modeling, e.g., for machine translation or speech recognition, as well as for other uses.

Personally I’m interested in this data for the ‘other uses’ they were referring to. I might have to ask Santa Claus to send me the bi/tri/four gram lists. If you have this list, send me an email and tell me how stoked you are.

For those of you still not sure what I’m rambling on about here…simply put, this is an amazing list of keywords that are commonly used — that Google compiled and made available. The value of this data is pretty amazing and I’ll leave it at that.

Enjoy & type-atcha-later,

Sean

Posted in The Internet ~ You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

You must be logged in to post a comment.

« Youtube Opens up Revenue Sharing

 
Powered by WordPress and NoseBleed