Archives / Snippets / Projets

sebsauvage

Bayesian filtering Bayesian filtering is the last buzz-word of spam fighting. And it works very well indeed !

Reverend is a free Bayesian module for Python. You can download it from http://divmod.org/trac/wiki/DivmodReverend

Here's an example: Recognizing the language of a text.

First, train it on a few sentences:

from reverend.thomas import Bayes guesser = Bayes() guesser.train('french','La souris est rentrée dans son trou.') guesser.train('english','my tailor is rich.') guesser.train('french','Je ne sais pas si je viendrai demain.') guesser.train('english','I do not plan to update my website soon.') And now let it guess the language:

print guesser.guess('Jumping out of cliffs it not a good idea.') [('english', 0.99990000000000001), ('french', 9.9999999999988987e-005)] The bayesian filter says: "It's english, with a 99,99% probability."

Let's try another one:

print guesser.guess('Demain il fera très probablement chaud.') [('french', 0.99990000000000001), ('english', 9.9999999999988987e-005)] It says: "It's french, with a 99,99% probability." Not bad, isn't it ?

You can train it on even more languages at the same time. You can also train it to classify any kind of text.