dspam

純粹以統計方法分析 spam 的超強利器 dspam，經過測試後發現 SpamAssassin 是被巴假的…

基本想法：既然是以統計方法分析，一開始當然是都假設都是 innocent (在 SA 裡我們叫做 ham)，然後開始 train，所以這套軟體不是一裝好就什麼都不用做，你必須先拿一些 example train 他才能開始使用。(用 dspam_corpus 這隻程式 train)

另外你可以定時跑 SpamAssassin，把分數過高 (假設放個 20 分好了) 的丟進 dspam 學習，這部分我沒有做，因為我才剛玩不久而已 :P 不過我把 .muttrc 抓 false postive 及 false negative 的 SpamAssassin 設定改掉用 dspam 去 train：

macro index X “<pipe-entry>dspam –mode=toe –user gslin –class=spam –source=error\n<delete-message>” “mark as spam”
macro pager X “<pipe-entry>dspam –mode=toe –user gslin –class=spam –source=error\n<delete-message>” “mark as spam”
macro index Z “<pipe-entry>dspam –mode=toe –user gslin –class=innocent –source=error\n” “mark as innocent”
macro pager Z “<pipe-entry>dspam –mode=toe –user gslin –class=innocent –source=error\n” “mark as innocent”

這套軟體相當難裝 (即使用 ports 已經省了很多功夫)，他之所以難裝不是因為程式寫的很差，而是因為 document 寫的很差，差到有 undocument options (靠，這我怎麼知道)，所以你必須訂 mailing list 看，或是翻 source code，才會知道這些東西。不過看在他的速度大約快 SpamAssassin 十倍以上，實在很吸引人啊 :P (實際數字要再測試…)

Anyway，我有空的時候放 single user version 的安裝方法及使用方法出來讓大家玩玩 :P

3 thoughts on “dspam”

goretex says:

November 16, 2004 at 10:39 am

喔～我也裝了一套在我的trustix上，安裝有點小麻煩，直接./configure ; make ; make install會失敗，要把storage driver換成mysql可以安裝成功。現在剛裝好而已，希望你能PO一些基本的設定，或是一些技巧來看看吧！
joshua says:

December 30, 2004 at 11:01 am

請問要訓練多少example才夠呢?
是不是example太少, 分類的結果都是 innocent 呢?
gslin says:

January 1, 2005 at 12:28 pm

大約各 200 封以後會開始有效果，這是我目前的數量：

gslin:
TS Total Spam: 28251
TI Total Innocent: 73883
SM Spam Misclassified: 752
IM Innocent Misclassified: 545
SC Spam Corpusfed: 560
IC Innocent Corpusfed: 1707
TL Training Left: 0

Comments are closed.

Gea-Suan Lin's BLOG

dspam

Related

3 thoughts on “dspam”

Gea-Suan Lin's technical note and murmuring :)