* Re: [mlmmj] Potential mail loss in postfix?
2010-09-28 23:21 [mlmmj] Potential mail loss in postfix? Robin H. Johnson
@ 2010-11-11 3:58 ` Ben Schmidt
2010-11-11 4:55 ` Ben Schmidt
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ben Schmidt @ 2010-11-11 3:58 UTC (permalink / raw)
To: mlmmj
I might have found this bug.
If init_sockfd() fails (e.g. because Postfix has shut down so there is no smtpd
listening) it calls exit(). Mail would then fail to be archived or requeued. It
will be in a queue file only until mlmmj-maintd cleans it up (which it will do as
soon as it finds it, as it won't have accompanying .mailfrom etc. files).
Do you have logs from when this happened? Do you see "Could not get socket" or
"Could not connect to %s, exiting..." (%s probably is 127.0.0.1) in them?
Ben.
On 29/09/10 9:21 AM, Robin H. Johnson wrote:
> Hi
>
> Noticed something, and I don't have a testcase for it yet unfortunately
> or a suitable setup to re-test on. Instead I've got my analysis of the
> problem how it's occurred twice now.
>
> - Using verp and postfix together first of all (string 'postfix' in the
> verp file, '100' in maxverprecips).
> - Pick a list with a lot of subscribers.
> - This leads to a case where the mlmmj-send invocation takes several
> minutes to complete for a normal list mail.
> - (optional) set postfix to hold incoming mail, and you can release it
> just at the right moment to see it be mlmmj-recieve.
> - The postfix log will show delivery to mlmmj-recieve.
> - mlmmj.operation.log will contain a line from mlmmj-process stating
> that the message was allowed (by your access rules).
> - Now, while mlmmj-send is running, you're going to execute a normal
> shutdown of postfix: 'postfix stop' [1]
> - The mail will be lost completely now. There is no record of it in
> archive, or any of the queues :-(.
>
> [1] The description for 'postfix stop': Stop the Postfix mail system in
> an orderly fashion. If possible, running processes are allowed to
> terminate at their earliest convenience.
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [mlmmj] Potential mail loss in postfix?
2010-09-28 23:21 [mlmmj] Potential mail loss in postfix? Robin H. Johnson
2010-11-11 3:58 ` Ben Schmidt
@ 2010-11-11 4:55 ` Ben Schmidt
2010-11-11 5:12 ` Robin H. Johnson
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ben Schmidt @ 2010-11-11 4:55 UTC (permalink / raw)
To: mlmmj
On 11/11/10 2:58 PM, Ben Schmidt wrote:
> I might have found this bug.
>
> If init_sockfd() fails (e.g. because Postfix has shut down so there is no smtpd
> listening) it calls exit(). Mail would then fail to be archived or requeued. It
> will be in a queue file only until mlmmj-maintd cleans it up (which it will do as
> soon as it finds it, as it won't have accompanying .mailfrom etc. files).
>
> Do you have logs from when this happened? Do you see "Could not get socket" or
> "Could not connect to %s, exiting..." (%s probably is 127.0.0.1) in them?
"Could not connect to %s, exiting ..."
Omitted a space before. Correcting myself, just in case you search for
that part of the string and don't find it because of my error. :-)
> Ben.
>
>
>
> On 29/09/10 9:21 AM, Robin H. Johnson wrote:
>> Hi
>>
>> Noticed something, and I don't have a testcase for it yet unfortunately
>> or a suitable setup to re-test on. Instead I've got my analysis of the
>> problem how it's occurred twice now.
>>
>> - Using verp and postfix together first of all (string 'postfix' in the
>> verp file, '100' in maxverprecips).
>> - Pick a list with a lot of subscribers.
>> - This leads to a case where the mlmmj-send invocation takes several
>> minutes to complete for a normal list mail.
>> - (optional) set postfix to hold incoming mail, and you can release it
>> just at the right moment to see it be mlmmj-recieve.
>> - The postfix log will show delivery to mlmmj-recieve.
>> - mlmmj.operation.log will contain a line from mlmmj-process stating
>> that the message was allowed (by your access rules).
>> - Now, while mlmmj-send is running, you're going to execute a normal
>> shutdown of postfix: 'postfix stop' [1]
>> - The mail will be lost completely now. There is no record of it in
>> archive, or any of the queues :-(.
>>
>> [1] The description for 'postfix stop': Stop the Postfix mail system in
>> an orderly fashion. If possible, running processes are allowed to
>> terminate at their earliest convenience.
>>
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [mlmmj] Potential mail loss in postfix?
2010-09-28 23:21 [mlmmj] Potential mail loss in postfix? Robin H. Johnson
2010-11-11 3:58 ` Ben Schmidt
2010-11-11 4:55 ` Ben Schmidt
@ 2010-11-11 5:12 ` Robin H. Johnson
2010-11-11 12:15 ` Ben Schmidt
2010-11-11 21:13 ` Robin H. Johnson
4 siblings, 0 replies; 6+ messages in thread
From: Robin H. Johnson @ 2010-11-11 5:12 UTC (permalink / raw)
To: mlmmj
[-- Attachment #1: Type: text/plain, Size: 1350 bytes --]
(No need to CC me, just send to the list)
On Thu, Nov 11, 2010 at 03:55:06PM +1100, Ben Schmidt wrote:
> On 11/11/10 2:58 PM, Ben Schmidt wrote:
> > I might have found this bug.
> >
> > If init_sockfd() fails (e.g. because Postfix has shut down so there is no smtpd
> > listening) it calls exit(). Mail would then fail to be archived or requeued. It
> > will be in a queue file only until mlmmj-maintd cleans it up (which it will do as
> > soon as it finds it, as it won't have accompanying .mailfrom etc. files).
> >
> > Do you have logs from when this happened? Do you see "Could not get socket" or
> > "Could not connect to %s, exiting..." (%s probably is 127.0.0.1) in them?
>
> "Could not connect to %s, exiting ..."
>
> Omitted a space before. Correcting myself, just in case you search for
> that part of the string and don't find it because of my error. :-)
I don't find it in the last month of syslog or mlmmj logfiles
(incidently, would be really nice to have them go to syslog...).
I've left a much larger trawl of syslog data for that box running, I'll
check for any hits in the morning (~120GiB worth of logs takes a
while...).
--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [mlmmj] Potential mail loss in postfix?
2010-09-28 23:21 [mlmmj] Potential mail loss in postfix? Robin H. Johnson
` (2 preceding siblings ...)
2010-11-11 5:12 ` Robin H. Johnson
@ 2010-11-11 12:15 ` Ben Schmidt
2010-11-11 21:13 ` Robin H. Johnson
4 siblings, 0 replies; 6+ messages in thread
From: Ben Schmidt @ 2010-11-11 12:15 UTC (permalink / raw)
To: mlmmj
On 11/11/10 4:12 PM, Robin H. Johnson wrote:
> (No need to CC me, just send to the list)
I'll try to remember that.
> I don't find it in the last month of syslog or mlmmj logfiles
> (incidently, would be really nice to have them go to syslog...).
Yes. That'll be one of my highest priorities after getting 1.2.18 out.
> I've left a much larger trawl of syslog data for that box running, I'll
> check for any hits in the morning (~120GiB worth of logs takes a
> while...).
Ta.
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [mlmmj] Potential mail loss in postfix?
2010-09-28 23:21 [mlmmj] Potential mail loss in postfix? Robin H. Johnson
` (3 preceding siblings ...)
2010-11-11 12:15 ` Ben Schmidt
@ 2010-11-11 21:13 ` Robin H. Johnson
4 siblings, 0 replies; 6+ messages in thread
From: Robin H. Johnson @ 2010-11-11 21:13 UTC (permalink / raw)
To: mlmmj
[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]
On Thu, Nov 11, 2010 at 11:15:06PM +1100, Ben Schmidt wrote:
> On 11/11/10 4:12 PM, Robin H. Johnson wrote:
> > (No need to CC me, just send to the list)
>
> I'll try to remember that.
>
> > I don't find it in the last month of syslog or mlmmj logfiles
> > (incidently, would be really nice to have them go to syslog...).
>
> Yes. That'll be one of my highest priorities after getting 1.2.18 out.
>
> > I've left a much larger trawl of syslog data for that box running, I'll
> > check for any hits in the morning (~120GiB worth of logs takes a
> > while...).
Confirmed, here's a line from around when I sent the first email in this
thread:
Sep 28 21:35:38 pigeon /usr/bin/mlmmj-send[6681]: init_sockfd.c:55: Could not connect to 127.0.0.1, exiting ... : Connection refused
It happened 1333 times, in the span of 21:35:31 to 21:35:38 (all UTC)
that day.
--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread