public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Glibc recvmsg from kernel netlink socket hangs forever
@ 2015-09-25  0:29 Steven Schlansker
  2015-09-25  4:36 ` Guenter Roeck
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Schlansker @ 2015-09-25  0:29 UTC (permalink / raw)
  To: linux-kernel

Hello linux-kernel,

I write to you on behalf of many developers at my company, who
are having trouble with their applications endlessly locking up
inside of libc code, with no hope of recovery.

Currently it affects our Mono and Node processes mostly, and the
symptoms are the same:  user code invokes getaddrinfo, and libc
attempts to determine whether ipv4 or ipv6 is appropriate, by using
the RTM_GETADDR netlink message.  The write into the netlink socket
succeeds, and it immediately reads back the results ... and waits
forever.  The read never returns.  The stack looks like this:

#0  0x00007fd7d8d214ad in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fd7d8d3e44d in make_request (fd=fd@entry=13, pid=1) at ../sysdeps/unix/sysv/linux/check_pf.c:177
#2  0x00007fd7d8d3e9a4 in __check_pf (seen_ipv4=seen_ipv4@entry=0x7fd7d37fdd00, seen_ipv6=seen_ipv6@entry=0x7fd7d37fdd10, 
    in6ai=in6ai@entry=0x7fd7d37fdd40, in6ailen=in6ailen@entry=0x7fd7d37fdd50) at ../sysdeps/unix/sysv/linux/check_pf.c:341
#3  0x00007fd7d8cf64e1 in __GI_getaddrinfo (name=0x31216e0 "mesos-slave4-prod-uswest2.otsql.opentable.com", service=0x0, 
    hints=0x31216b0, pai=0x31f09e8) at ../sysdeps/posix/getaddrinfo.c:2355
#4  0x0000000000e101c8 in uv__getaddrinfo_work (w=0x31f09a0) at ../deps/uv/src/unix/getaddrinfo.c:102
#5  0x0000000000e09179 in worker (arg=<optimized out>) at ../deps/uv/src/threadpool.c:91
#6  0x0000000000e16eb1 in uv__thread_start (arg=<optimized out>) at ../deps/uv/src/unix/thread.c:49
#7  0x00007fd7d8ff3182 in start_thread (arg=0x7fd7d37fe700) at pthread_create.c:312
#8  0x00007fd7d8d2047d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

(libuv is part of Node and makes DNS lookups "asynchronous" by having
a thread pool in the background working)

The applications will run for hours or days successfully, until eventually hanging with
no apparent pattern or cause.  And once this hang happens it hangs badly, because
check_pf is holding a lock during the problematic recvmsg call.

I raised this issue on the libc-help mailing list, but I'm hoping that lkml will
have a higher number of people familiar with netlink that may better offer advice.
The original thread is here:
https://sourceware.org/ml/libc-help/2015-09/msg00014.html

Looking at the getaddrinfo / check_pf source code:
https://fossies.org/dox/glibc-2.22/sysdeps_2unix_2sysv_2linux_2check__pf_8c_source.html

146  if (TEMP_FAILURE_RETRY (__sendto (fd, (void *) &req, sizeof (req), 0,
147      (struct sockaddr *) &nladdr,
148      sizeof (nladdr))) < 0)
149    goto out_fail;
150 
151  bool done = false;
152 
153  bool seen_ipv4 = false;
154  bool seen_ipv6 = false;
155 
156  do
157  {
158    struct msghdr msg =
159    {
160      (void *) &nladdr, sizeof (nladdr),
161      &iov, 1,
162      NULL, 0,
163      0
164    };
165 
166  ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
167  if (read_len <= 0)
168    goto out_fail;
169 
170  if (msg.msg_flags & MSG_TRUNC)
171    goto out_fail;
172 

I notice that there is possibility that if messages are dropped either on send
or receive side, maybe this code will hang forever?  The netlink(7) man page makes
me slightly worried:

> Netlink is not a reliable protocol.  It tries its best to deliver a message to its destination(s), but may drop messages when an out-of-memory  condition  or  other error occurs.
> However,  reliable  transmissions from kernel to user are impossible in any case.  The kernel can't send a netlink message if the socket buffer is full: the message will be dropped and the kernel and the user-space process will no longer have the same view of kernel state.  It is up to the application to detect when  this  happens (via the ENOBUFS error returned by recvmsg(2)) and resynchronize.


I have taken the glibc code and created a simple(r) program to attempt to reproduce this issue.
I inserted some simple polling between the sendto and recvmsg calls to make the failure case more evident:

      struct pollfd pfd;
      pfd.fd = fd;
      pfd.events = POLLIN;
      pfd.revents = 0;

      int pollresult = poll(&pfd, 1, 1000);
      if (pollresult < 0) {
        perror("glibc: check_pf: poll");
        abort();
      } else if (pollresult == 0 || pfd.revents & POLLIN == 0) {
        fprintf(stderr, "[%ld] glibc: check_pf: netlink socket read timeout\n", gettid());
        abort();
      }

I have placed the full source code and strace output here:
https://gist.github.com/stevenschlansker/6ad46c5ccb22bc4f3473

The process quickly sends off hundreds of threads which sit in a
loop attempting this RTM_GETADDR message exchange.

The code may be compiled as "gcc -o pf_dump -pthread pf_dump.c"

An example invocation that quickly fails:

root@24bf2e440b5e:/# strace -ff -o pfd ./pf_dump 
[3700] exit success
glibc: check_pf: netlink socket read timeout
Aborted (core dumped)

Interestingly, this seems to be very easy to reproduce using pthreads, but much less
common with fork() or clone()d threads.  I'm not sure if this is just an artifact
of how I am testing or an actual clue, but I figured I'd mention it.

I have tested this program on vanilla kernels 4.0.4 and 4.2.1 -- the 4.0.4 version
reliably crashes, but I am having trouble reproducing on 4.2.1

So usually I would upgrade to 4.2.1 and be happy, except we ran into serious problems
with 4.1.2 and are now a little shy about upgrading:

https://bugzilla.xamarin.com/show_bug.cgi?id=29212

So my questions from here are:

* Is this glibc code correct?
* What are the situations where a recvmsg from a netlink socket can hang as it does here?
* Is the potential "fix" in 4.2.1 due to any particular commit?  I checked the changelogs and nothing caught my eye.

We'll be testing out 4.2.1 more thoroughly over the coming days but I am hoping someone here
can shed some light on our problem.

Thanks for reading,
Steven


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Glibc recvmsg from kernel netlink socket hangs forever
@ 2016-03-11 19:33 Jun Wang
  2016-03-11 23:50 ` Guenter Roeck
  0 siblings, 1 reply; 16+ messages in thread
From: Jun Wang @ 2016-03-11 19:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Guenter Roeck, Herbert Xu, Cong Wang

> On 09/25/2015 08:55 AM, Herbert Xu wrote:
>> On Thu, Sep 24, 2015 at 10:34:10PM -0700, Guenter Roeck wrote:
>>>
>>> Any idea what may be needed for 4.1 ?
>>> I am currently trying https://patchwork.ozlabs.org/patch/473041/,
>>
>> This patch should not make any difference on 4.1 and later because
>> 4.1 is where I rewrote rhashtable resizing and it should work (or
>> if it is broken then the latest kernel should be broken too).
>>
> Yes, applying (only) the above patch to 4.1 didn't help.
>
>>> but I have no idea if that will help with the problem we are seeing there.
>>
>> Having looked at your message agin I don't think the issue I
>> alluded to is relevant since the symptom there ought to be a
>> straight kernel lock-up as opposed to just a user-space one because
>> you will end up with the kernel sending a message to itself.
>>
>> And the fact that 4.2 works is more indicative as the bug is
>> present in both 4.1 and 4.2.
>>
>> I'll try to reproduce this in 4.1 as time permits but no promises.
>>
>
> I applied your patches (and a few additional netlink changes from 4.2)
> to our 4.1 branch. I'll let you know if it makes a difference for us.
>
> Thanks,
> Guenter

Guenter,

Which additional netlink changes from 4.2 did you patch? We still see
the problem with your test program with 4.1.12 which have the
following two patches mentioned by Herbert Xu on this thread.


> https://patchwork.ozlabs.org/patch/519245/
> https://patchwork.ozlabs.org/patch/520824/

Thanks,
Jun

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-12  1:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-25  0:29 Glibc recvmsg from kernel netlink socket hangs forever Steven Schlansker
2015-09-25  4:36 ` Guenter Roeck
2015-09-25  4:58   ` Herbert Xu
2015-09-25  5:34     ` Guenter Roeck
2015-09-25 15:55       ` Herbert Xu
2015-09-25 16:14         ` Guenter Roeck
2015-09-26  3:45         ` Guenter Roeck
2015-09-25 21:37       ` Steven Schlansker
2015-09-25 21:54         ` Steven Schlansker
2015-09-26  2:58         ` Guenter Roeck
2015-10-05 23:26           ` Steven Schlansker
2015-10-05 23:30             ` Guenter Roeck
  -- strict thread matches above, loose matches on Subject: below --
2016-03-11 19:33 Jun Wang
2016-03-11 23:50 ` Guenter Roeck
2016-03-12  0:13   ` Cong Wang
2016-03-12  1:25     ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox