public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: netdev@vger.kernel.org
Subject: Fw: [Bug 99461] New: recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU [was __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33]
Date: Tue, 9 Jun 2015 17:05:06 -0700	[thread overview]
Message-ID: <20150609170506.7789783d@urahara> (raw)



Begin forwarded message:

Date: Fri, 5 Jun 2015 12:39:38 +0000
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "shemminger@linux-foundation.org" <shemminger@linux-foundation.org>
Subject: [Bug 99461] New: recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU [was __libc_recv (fd=fd@entry=300, buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1, flags@entry=258) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33]


https://bugzilla.kernel.org/show_bug.cgi?id=99461

            Bug ID: 99461
           Summary: recvfrom SYSCALL infinite loop/deadlock chewing 100%
                    CPU [was __libc_recv (fd=fd@entry=300,
                    buf=buf@entry=0x7f6042880600, n=n@entry=5, flags=-1,
                    flags@entry=258) at
                    ../sysdeps/unix/sysv/linux/x86_64/recv.c:33]
           Product: Networking
           Version: 2.5
    Kernel Version: 3.13.0
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: dan@censornet.com
        Regression: No

This is a repost of a bug I reported initially to the GNU libc bug list
(https://sourceware.org/bugzilla/show_bug.cgi?id=18493).

I was advised by Andreas Schwab that __libc_recv function is just a thin
wrapper around the recvfrom system call, and to report this to "the kernel
people", which I assume is you people.

Here's a summary of the problem:

In a multi-threaded pthreads process running on Ubuntu 14.04 AMD64 (with over
1000 threads) which uses real time FIFO scheduling, we occasionally see calls
to recv() with flags (MSG_PEEK | MSG_WAITALL) get stuck in an infinte loop or
deadlock meaning the threads lock up chewing as much CPU as they can (due to
FIFO scheduling) while stuck inside recv().

Here's an example gdb back trace:

[Switching to thread 4 (Thread 0x7f6040546700 (LWP 27251))]
#0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146,
buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:33
33      ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
(gdb) bt
#0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146,
buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x0000000000421945 in recv (__flags=258, __n=5, __buf=0x7f6040543600,
__fd=146) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
[snip]

The socket is a TCP socket in blocking mode, the recv() call is inside an outer
loop with a counter, and I've checked the counter with gdb and it's always at
1, meaning that I'm sure that the outer loop isn't the problem, the thread is
indeed deadlocked inside the recv() internals.

Other nodes: 
* There always seems to be 2 or more threads deadlocked in the same place (same
recv() call but with distinct FDs)
* The threads calling recv() have cancellation disbaled by previously
executing: thread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);

I've even tried adding a poll() call for POLLRDNORM on the socket before
calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make sure there's
data available on the socket before calling *recv()*, but it makes no
difference.

So, I don't know what is wrong here, I've read all the recv() documentation and
believe that recv() is being used correctly, the only conclusion I can come to
is that there is a bug in libc recv() when using flags MSG_PEEK | MSG_WAITALL
with thousands of pthreads running.

-- 
You are receiving this mail because:
You are the assignee for the bug.

                 reply	other threads:[~2015-06-10  0:05 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150609170506.7789783d@urahara \
    --to=stephen@networkplumber.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox