From: David Woodhouse <dwmw2@infradead.org>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: David Miller <davem@davemloft.net>,
torvalds@linux-foundation.org, marcel@holtmann.org,
sfeldma@gmail.com, netdev@vger.kernel.org, teg@jklm.no
Subject: Re: Problem with patch "make nlmsg_end() and genlmsg_end() void"
Date: Tue, 09 Jun 2015 14:34:24 +0100 [thread overview]
Message-ID: <1433856864.19447.25.camel@infradead.org> (raw)
In-Reply-To: <1428498482.2809.10.camel@sipsolutions.net>
[-- Attachment #1: Type: text/plain, Size: 2480 bytes --]
On Wed, 2015-04-08 at 15:08 +0200, Johannes Berg wrote:
> On Wed, 2015-04-08 at 13:03 +0100, David Woodhouse wrote:
>
> > I'm not sure if this is entirely fixed. In Fedora 22 (4.0.0-rc5-git4)
> > I'm occasionally seeing glibc deadlock in __check_pf() on a netlink
> > recvmsg(), here:
> > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/check_pf.c;h=162606d7;hb=glibc-2.21#l166
> >
> > As I understand it, this shouldn't happen. Even if messages are
> > dropped (which surely shouldn't happen as often as I'm seeing this),
> > glibc should get ENOBUFS from the recvmsg() call.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1209433
> >
> > I haven't bisected and proved that it *was* this commit which
> > introduced the problem, as it only happens after a day or two of
> > running Evolution and I haven't managed to trigger it more reliably.
>
> I don't see the connection to this change.
>
> The issue with my patch was that some code for NLM_F_DUMP would have
> this pattern:
>
> int fill_function(...)
> {
> ...
> return nlmsg_end(...);
> }
>
> loop (...) {
> if (fill_function() <= 0)
> break; /* continue in next dump */
> }
>
> and that all had to be converted to be just "< 0" now.
>
> Additionally, the failure mode of this was the process running out of
> memory due to receiving the same results over and over again - does that
> happen for you? It seems it was stuck in recvmsg(), but that may just be
> a side effect of happening to interrupt at that point?
>
I don't think the problem was introduced by your change. At
https://github.com/nahi/httpclient/issues/232 it seems to have been
observed even in November of last year.
I've added some debugging, and it seems that when it deadlocks, glibc
doesn't get *any* response to its RTM_GETADDR request. I know we'd get
ENOBUFS is a *response* was dropped... but what about when the request
itself is dropped? Does userspace get any hint of that? Is this purely
a glibc bug, for assuming its request got delivered and unconditionally
waiting for a response?
I don't know why it suddenly started happening to me in the 4.0 kernel
when I'd never seen it before, but it's still happening. I've put a
poll() in the glibc code (referenced above), and made it fail after a 5
-second timeout. That will at least prevent me from throwing my
computer out the window for the time being...
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]
next prev parent reply other threads:[~2015-06-09 13:34 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-18 11:44 Problem with patch "make nlmsg_end() and genlmsg_end() void" Marcel Holtmann
2015-01-18 23:44 ` Marcel Holtmann
2015-01-19 1:53 ` Scott Feldman
2015-01-19 2:10 ` Marcel Holtmann
2015-01-19 4:37 ` David Miller
2015-01-19 9:31 ` Scott Feldman
2015-04-08 12:03 ` David Woodhouse
2015-04-08 13:08 ` Johannes Berg
2015-04-08 14:12 ` David Woodhouse
2015-04-20 14:30 ` David Woodhouse
2015-06-09 13:34 ` David Woodhouse [this message]
2015-06-10 0:49 ` Eric Dumazet
2015-06-11 0:31 ` David Woodhouse
2015-06-11 7:16 ` David Miller
2015-06-11 22:03 ` David Woodhouse
2015-06-18 6:38 ` David Woodhouse
2015-01-19 8:53 ` Johannes Berg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1433856864.19447.25.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=davem@davemloft.net \
--cc=johannes@sipsolutions.net \
--cc=marcel@holtmann.org \
--cc=netdev@vger.kernel.org \
--cc=sfeldma@gmail.com \
--cc=teg@jklm.no \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).