git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bogofilter is my emails
@ 2006-09-03  4:02 Shawn Pearce
  2006-09-03 22:53 ` Davide Libenzi
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Shawn Pearce @ 2006-09-03  4:02 UTC (permalink / raw)
  To: git

Bogofilter ate two messages today that I tried to send to this
mailing list.  At least I got back bounce messages from it.

I'm not quite sure how to fix either message to get them to the list.
Neither email was a patch so I'm not going to try resending them
but I'm certainly a little curious as to how my email writing style
twice tripped bogofilter's spam switch.

-- 
Shawn.

-- 
VGER BF report: S 1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bogofilter is my emails
  2006-09-03  4:02 Bogofilter is my emails Shawn Pearce
@ 2006-09-03 22:53 ` Davide Libenzi
  2006-09-03 23:09 ` Linus Torvalds
  2006-09-04  5:46 ` Martin Langhoff
  2 siblings, 0 replies; 5+ messages in thread
From: Davide Libenzi @ 2006-09-03 22:53 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

On Sun, 3 Sep 2006, Shawn Pearce wrote:

> Bogofilter ate two messages today that I tried to send to this
> mailing list.  At least I got back bounce messages from it.
>
> I'm not quite sure how to fix either message to get them to the list.
> Neither email was a patch so I'm not going to try resending them
> but I'm certainly a little curious as to how my email writing style
> twice tripped bogofilter's spam switch.

Maybe Matti trained the filter using the lkml corpus. Just sprinkle a few 
linux-kernel hot words in your posts ;)



- Davide



-- 
VGER BF report: U 0.926352

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bogofilter is my emails
  2006-09-03  4:02 Bogofilter is my emails Shawn Pearce
  2006-09-03 22:53 ` Davide Libenzi
@ 2006-09-03 23:09 ` Linus Torvalds
  2006-09-04  1:23   ` Junio C Hamano
  2006-09-04  5:46 ` Martin Langhoff
  2 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2006-09-03 23:09 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git



On Sun, 3 Sep 2006, Shawn Pearce wrote:
> 
> I'm not quite sure how to fix either message to get them to the list.
> Neither email was a patch so I'm not going to try resending them
> but I'm certainly a little curious as to how my email writing style
> twice tripped bogofilter's spam switch.

I'm surprised and disgusted that vger started using bogofilter.

Last I saw, the bogofilter approach was totally bogus, using purely 
single-word frequencies (or, more strictly, a "does word X exist or not", 
where X has often gone through what a linguist would probably call a 
"lemmatizer", ie something that turns different forms of the same word 
into its canonical word, aka "lemma") for its "bayesian" filtering.

Maybe they've enhanced it enough since, but it certainly used to be not 
only fairly easy to fool, since it at least originally didn't take any 
account at all of any more complex structure. 

There's even some papers about how the bayesian thing does not work well 
(even when extended to do some phrases and with lemmatization) if the 
cut-off is hard.

I think the bogofilter is probably an acceptable input as _one_ of many 
rules for a real spam-filter (ie as one of many spamassassin rules), but 
not for what vger does.

Hard rules at mail acceptance are much better if they use some really hard 
datum. For example, checking that the sending site actually also receives 
email, and that it resolves back to itself. That's one thing that OSDL 
does, for example, and it means that you can only send me email if your 
machine is actually designated as a MX gateway. That cuts down on a _lot_ 
of spam.

(I'd love to speak of the details, but I wouldn't know. Kees Cook set it 
all up at osdl, and I can just say that it works beautifully.)

		Linus

-- 
VGER BF report: U 0.720981

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bogofilter is my emails
  2006-09-03 23:09 ` Linus Torvalds
@ 2006-09-04  1:23   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2006-09-04  1:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 3 Sep 2006, Shawn Pearce wrote:
>> 
>> I'm not quite sure how to fix either message to get them to the list.
>> Neither email was a patch so I'm not going to try resending them
>> but I'm certainly a little curious as to how my email writing style
>> twice tripped bogofilter's spam switch.
>
> I'm surprised and disgusted that vger started using bogofilter.
>...
> 		Linus
>
> -- 
> VGER BF report: U 0.720981

Hehe, and it still considers that your message is highly spammy
(1.0 being definite spam, 0.0 being definite ham, if I am not
mistaken).

The filter at least should learn to recognize "^\t\tLinus$" ;-).


-- 
VGER BF report: U 0.669453

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bogofilter is my emails
  2006-09-03  4:02 Bogofilter is my emails Shawn Pearce
  2006-09-03 22:53 ` Davide Libenzi
  2006-09-03 23:09 ` Linus Torvalds
@ 2006-09-04  5:46 ` Martin Langhoff
  2 siblings, 0 replies; 5+ messages in thread
From: Martin Langhoff @ 2006-09-04  5:46 UTC (permalink / raw)
  To: git

On 9/3/06, Shawn Pearce <spearce@spearce.org> wrote:
> I'm not quite sure how to fix either message to get them to the list.
> Neither email was a patch so I'm not going to try resending them

Well, it's just eaten 2 trivial patches of mine. Grumble.

I am resending them via a different smtp host with the assumption that
the rules may be blocking emails sent via localhost on the initial hop
or other smtp-routing-related rule.

Discussion however seems to imply that bogofilter is only
email-content based? The vger mta admins need to get a grip on 2006
and use a combination of weighted rules to play the spam blocking game
with at least some hope.

Simple rulesets nowadays are triggered by ham way more often than spam.

cheers,



martin

-- 
VGER BF report: U 0.550887

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-09-04  5:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-03  4:02 Bogofilter is my emails Shawn Pearce
2006-09-03 22:53 ` Davide Libenzi
2006-09-03 23:09 ` Linus Torvalds
2006-09-04  1:23   ` Junio C Hamano
2006-09-04  5:46 ` Martin Langhoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).