From: jw schultz <jw@pegasys.ws>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed to public posts
Date: Thu, 22 Jan 2004 14:18:02 -0800 [thread overview]
Message-ID: <20040122221802.GD12666@pegasys.ws> (raw)
In-Reply-To: <40101B1E.3030908@blue-labs.org>
On Thu, Jan 22, 2004 at 01:49:02PM -0500, David Ford wrote:
> I've been amusing myself once or twice a week by studying some of these
> emails. Due to the use of common words just like your email below,
> bayesian score is far too low (granting it a negative point value in SA).
>
> The problem is that properly trained is too fluid. It'd be far more
> achievable if I only talked geek.. Or if I only talked automotive. Or
> that I only talked medical. However, my "vocabulary" is far to varied
> to train a bayesian filter that the use of medical terms, computer
> terms, or a given topic, is taboo.
>
> It cuts the gray area far to close to the middle of the road and thus
> makes marking the email as probable spam useless. All I'm doing now is
> wasting CPU because in the end I'm doing the job of dealing with the
> spam myself.
Most of the spam using that technique get flagged on other
rules so they get scores of at least 8 but i've been
considering writing a rule to catch them and up the score.
Beyes is the wrong aproach for those random words from the
dictionary blocks.
Those i've seen seem to be a long string of words all longer
than 4 characters. A rule that gave a score of based on the
number of consecutive words longer than some number or
characters would catch those fairly easily. If i get
annoyed enough i may figure out how to write such a rule.
On the downside, once a rule becomes common to catch these
random word lists the spammers will start salting the lists
with short common words. Then when we get something that
somehow measures semantic content they will shift to random
random sentence construction and/or quotations.
What we need is a bounty on these scum. $1000 fine per
reported recipient with half going to the reporter would be
nice.
> David Lang wrote:
>
> >On Thu, 22 Jan 2004, David Ford wrote:
> >
> >
> >>Considering that Bayesian filters are useless against the new spam that
> >>is proliferating these days, that's laughable. Spam now comes with a
> >>good 5-10K of random dictionary words.
> >>
> >>
> >so we need to extend the Bayesian filters to deal with multi-word combos,
> >how many legit mail has those dictionary words in them? properly traind
> >their presence should help identify the spam.
> >
> >not that you will ever see this (other then through the list) as I won't
> >respond to your confirmation message.
> >
> >David Lang
> >
> >
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
next prev parent reply other threads:[~2004-01-22 22:18 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ecartis-01212004203954.14209.1@mail.convergence2.de>
2004-01-21 19:43 ` List 'linux-dvb' closed to public posts Dave Jones
2004-01-21 19:54 ` Christoph Hellwig
2004-01-21 19:56 ` Linus Torvalds
2004-01-21 19:56 ` Dave Jones
2004-01-21 20:12 ` John Bradford
2004-01-21 21:44 ` Jes Sorensen
2004-01-22 5:11 ` Rik van Riel
2004-01-21 20:38 ` Zan Lynx
2004-01-21 20:57 ` Charles Cazabon
2004-01-21 21:57 ` Diego Calleja
2004-01-21 21:15 ` Dave Jones
2004-01-21 21:29 ` Randy.Dunlap
2004-01-21 21:30 ` [OT] Confirmation Spam Blocking was: " Mike Fedyk
2004-01-21 22:50 ` Adrian Bunk
2004-01-21 23:01 ` Wakko Warner
2004-01-22 6:51 ` Jan-Benedict Glaw
2004-01-22 14:31 ` Wakko Warner
2004-01-21 23:40 ` Andreas Jellinghaus
2004-01-22 0:26 ` Zan Lynx
2004-01-22 5:14 ` Rik van Riel
2004-01-22 13:24 ` Jes Sorensen
2004-01-22 16:56 ` David Ford
2004-01-22 17:01 ` Trond Myklebust
2004-01-22 17:10 ` David Ford
2004-01-22 17:35 ` Trond Myklebust
2004-01-22 18:18 ` Andreas Jellinghaus
2004-01-22 17:11 ` Andreas Jellinghaus
2004-01-22 17:30 ` viro
2004-01-22 17:34 ` Ralf Hildebrandt
2004-01-22 17:41 ` David Ford
2004-01-22 18:20 ` Brian Beattie
2004-01-23 7:41 ` Willy Tarreau
2004-01-23 9:24 ` Paul Jakma
2004-01-22 18:35 ` David Lang
2004-01-22 18:49 ` David Ford
2004-01-22 22:18 ` jw schultz [this message]
2004-01-22 22:58 ` Linus Torvalds
2004-01-22 23:16 ` Linus Torvalds
2004-01-23 6:49 ` David S. Miller
2004-01-23 15:38 ` Chris Ricker
2004-01-23 9:25 ` Paul Jakma
2004-01-23 19:38 ` Pavel Machek
2004-01-22 22:43 ` Scott Laird
2004-01-24 20:14 ` Kevin O'Connor
2004-01-24 21:12 ` Linus Torvalds
2004-01-24 23:25 ` Kevin O'Connor
[not found] ` <1hDmg-4AP-9@gated-at.bofh.it>
2004-01-24 23:59 ` Russ Allbery
2004-01-22 22:15 ` Krzysztof Halasa
2004-01-23 8:43 ` Jes Sorensen
2004-01-26 22:58 ` Max Valdez
2004-01-23 9:17 ` Paul Jakma
2004-01-22 5:13 ` Rik van Riel
2004-01-21 23:08 ` Russell King
2004-01-22 13:28 ` Theodore Ts'o
2004-01-21 22:13 ` Linus Torvalds
2004-01-21 23:01 ` Marcus Metzler
2004-01-22 14:14 ` Johannes Stezenbach
2004-01-22 15:14 ` Marcus Metzler
2004-01-22 15:31 ` Johannes Stezenbach
2004-01-21 23:21 ` Stephen Hemminger
2004-01-22 15:15 ` Michael Hunold
2004-01-22 15:18 ` Dave Jones
2004-01-21 20:08 ` Valdis.Kletnieks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040122221802.GD12666@pegasys.ws \
--to=jw@pegasys.ws \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox