epoll with ONESHOT possibly fails to deliver events

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* epoll with ONESHOT possibly fails to deliver events
@ 2012-12-11 22:23 Andreas Voellmy
  2012-12-12 23:49 ` Andreas Voellmy
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-11 22:23 UTC (permalink / raw)
  To: viro, linux-fsdevel, linux-kernel; +Cc: Andreas Voellmy

Hi list,

I am using epoll for the Linux (version 3.4.0) implementation of the event notification subsystem of GHC's (Glasgow Haskell Compiler) RTS (runtime system). I am running into a bug that has only popped up using many cores (> 16) and under particular kind of load. I've been debugging for a couple of days now, and I can't find the error in the way that I am using epoll. I'm starting to wonder whether I am either misunderstanding the semantics of epoll and TCP sockets (likely) or there may be a bug in epoll itself (less likely). 

Here is a simplified version of my epoll usage: My program is a multithreaded web server. I have one thread per TCP socket and each socket is marked non-blocking. Each thread serving a client socket repeats the following: 

1. receive a single http request's worth of bytes. 
2. send an http response.

For both steps, the thread will do a non-blocking operation (either recv or send) and if and only if the call returns EWOULDBLOCK or EAGAIN, then it calls epoll_ctl to register the socket and then it blocks on a condition variable. When the condition variable is signaled, it will continue where it left off (either about to recv or about to send). The epoll_ctl is performed with operation EPOLL_CTL_ADD if this is the first time the socket is being registered and otherwise is done with EPOLL_CTL_MOD.  The events field is EPOLLIN | EPOLLET | EPOLLONESHOT. 

Another thread, distinct from all of the threads serving particular sockets, is perfoming epoll_wait calls. When sockets are returned as being ready from an epoll_wait call, the thread signals to the condition variable for the socket. Since I am using EPOLLONESHOT, I assume that there is no need to also perform epoll_ctl with EPOLL_CTL_DEL here. 

This guarantees that I only wait for epoll to signal a file's readiness if (a) we hit EAGAIN or EWOULDBLOCK in a recv or send, and (b) we call epoll_ctl to re-arm (or arm if on the first time) the socket on epoll.

The problem I am encountering is that sometimes a thread will block waiting for the readiness signal and will never get notified, even though there is data to be read. This behavior seems to go away when I remove EPOLLONESHOT flag when registering the event. 

Is my use of epoll (as I described here) OK? Is the following sequence possible? 

1. epoll reports activity on socket previously registered with ONESHOT; now socket is deactivated in epoll.
2. call to recv on socket returns EAGAIN or EWOULDBLOCK
3. data arrives on socket
4. epoll_ctl call rearms socket with epoll (with ONESHOT flag).
5. epoll_wait never returns the socket as being ready.

Do I need to first call epoll_ctl and then call recv until I get to EAGAIN, or is it correct to call epoll_ctl for the file only after I've hit EAGAIN on a recv? 

I have looked over the epoll source here: http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=fs/eventpoll.c;h=c0b3c70ee87a2b8e0e46c01a87d63ac692aecc71;hb=refs/heads/linux-3.4.y and I don't see how EPOLLONESHOT could result in the event sequence above, but I'm not that familiar with the code, so it would be great if others can confirm as well. 

I am not subscribed to the kernel list, so please include my email on replies.

Cheers,
Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-11 22:23 epoll with ONESHOT possibly fails to deliver events Andreas Voellmy
@ 2012-12-12 23:49 ` Andreas Voellmy
  2012-12-13  9:32   ` Eric Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-12 23:49 UTC (permalink / raw)
  To: viro, linux-fsdevel, linux-kernel

Hi list, 

Using strace, I checked that my program is using epoll api as I described. Here is a fragment of the strace output that demonstrates my use: 

recvfrom(161, "GET / HTTP/1.1\r\nHost: 10.12.0.1:"..., 90, 0, NULL, NULL) = 90
sendto(161, "HTTP/1.1 200 OK\r\nDate: Tue, 09 O"..., 323, 0, NULL, 0) = 323
write(6, "\1\0\0\0\0\0\0\0", 8)         = 8
recvfrom(161, 0x7f05ef6c3070, 90, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(7, EPOLL_CTL_MOD, 161, {EPOLLIN|EPOLLONESHOT|EPOLLET, {u32=161, u64=4294967457}}) = 0
epoll_wait(7, {{EPOLLIN, {u32=161, u64=4294967457}}, {EPOLLIN, {u32=160, u64=16673999036704882848}}, {EPOLLIN, {u32=162, u64=22028646743015586}}}, 64, 0) = 3

I.e. we do the following (1) receive until EAGAIN, (2) register socket with epoll_ctl. In addition epoll_wait is called repeatedly, often following (2), as in the fragment above.

Is this considered a correct usage of the epoll API? If not, what is wrong with this usage?

Thanks,
Andi

On Dec 11, 2012, at 5:23 PM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:

> Hi list,
> 
> I am using epoll for the Linux (version 3.4.0) implementation of the event notification subsystem of GHC's (Glasgow Haskell Compiler) RTS (runtime system). I am running into a bug that has only popped up using many cores (> 16) and under particular kind of load. I've been debugging for a couple of days now, and I can't find the error in the way that I am using epoll. I'm starting to wonder whether I am either misunderstanding the semantics of epoll and TCP sockets (likely) or there may be a bug in epoll itself (less likely). 
> 
> Here is a simplified version of my epoll usage: My program is a multithreaded web server. I have one thread per TCP socket and each socket is marked non-blocking. Each thread serving a client socket repeats the following: 
> 
> 1. receive a single http request's worth of bytes. 
> 2. send an http response.
> 
> For both steps, the thread will do a non-blocking operation (either recv or send) and if and only if the call returns EWOULDBLOCK or EAGAIN, then it calls epoll_ctl to register the socket and then it blocks on a condition variable. When the condition variable is signaled, it will continue where it left off (either about to recv or about to send). The epoll_ctl is performed with operation EPOLL_CTL_ADD if this is the first time the socket is being registered and otherwise is done with EPOLL_CTL_MOD.  The events field is EPOLLIN | EPOLLET | EPOLLONESHOT. 
> 
> Another thread, distinct from all of the threads serving particular sockets, is perfoming epoll_wait calls. When sockets are returned as being ready from an epoll_wait call, the thread signals to the condition variable for the socket. Since I am using EPOLLONESHOT, I assume that there is no need to also perform epoll_ctl with EPOLL_CTL_DEL here. 
> 
> This guarantees that I only wait for epoll to signal a file's readiness if (a) we hit EAGAIN or EWOULDBLOCK in a recv or send, and (b) we call epoll_ctl to re-arm (or arm if on the first time) the socket on epoll.
> 
> The problem I am encountering is that sometimes a thread will block waiting for the readiness signal and will never get notified, even though there is data to be read. This behavior seems to go away when I remove EPOLLONESHOT flag when registering the event. 
> 
> Is my use of epoll (as I described here) OK? Is the following sequence possible? 
> 
> 1. epoll reports activity on socket previously registered with ONESHOT; now socket is deactivated in epoll.
> 2. call to recv on socket returns EAGAIN or EWOULDBLOCK
> 3. data arrives on socket
> 4. epoll_ctl call rearms socket with epoll (with ONESHOT flag).
> 5. epoll_wait never returns the socket as being ready.
> 
> Do I need to first call epoll_ctl and then call recv until I get to EAGAIN, or is it correct to call epoll_ctl for the file only after I've hit EAGAIN on a recv? 
> 
> I have looked over the epoll source here: http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=fs/eventpoll.c;h=c0b3c70ee87a2b8e0e46c01a87d63ac692aecc71;hb=refs/heads/linux-3.4.y and I don't see how EPOLLONESHOT could result in the event sequence above, but I'm not that familiar with the code, so it would be great if others can confirm as well. 
> 
> I am not subscribed to the kernel list, so please include my email on replies.
> 
> Cheers,
> Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-12 23:49 ` Andreas Voellmy
@ 2012-12-13  9:32   ` Eric Wong
  2012-12-13 15:29     ` Andreas Voellmy
  2012-12-14  0:08     ` Phil Turmel
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Wong @ 2012-12-13  9:32 UTC (permalink / raw)
  To: Andreas Voellmy; +Cc: viro, linux-fsdevel, linux-kernel

Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> Using strace, I checked that my program is using epoll api as I
> described. Here is a fragment of the strace output that demonstrates
> my use: 
> 
> recvfrom(161, "GET / HTTP/1.1\r\nHost: 10.12.0.1:"..., 90, 0, NULL, NULL) = 90
> sendto(161, "HTTP/1.1 200 OK\r\nDate: Tue, 09 O"..., 323, 0, NULL, 0) = 323
> write(6, "\1\0\0\0\0\0\0\0", 8)         = 8
> recvfrom(161, 0x7f05ef6c3070, 90, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> epoll_ctl(7, EPOLL_CTL_MOD, 161, {EPOLLIN|EPOLLONESHOT|EPOLLET, {u32=161, u64=4294967457}}) = 0
> epoll_wait(7, {{EPOLLIN, {u32=161, u64=4294967457}}, {EPOLLIN, {u32=160, u64=16673999036704882848}}, {EPOLLIN, {u32=162, u64=22028646743015586}}}, 64, 0) = 3
> 
> I.e. we do the following (1) receive until EAGAIN, (2) register socket
> with epoll_ctl. In addition epoll_wait is called repeatedly, often
> following (2), as in the fragment above.
> 
> Is this considered a correct usage of the epoll API? If not, what is
> wrong with this usage?

It looks right to me.

> On Dec 11, 2012, at 5:23 PM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> > I am using epoll for the Linux (version 3.4.0) implementation of the
> > event notification subsystem of GHC's (Glasgow Haskell Compiler) RTS
> > (runtime system). I am running into a bug that has only popped up
> > using many cores (> 16) and under particular kind of load. I've been
> > debugging for a couple of days now, and I can't find the error in
> > the way that I am using epoll. I'm starting to wonder whether I am
> > either misunderstanding the semantics of epoll and TCP sockets
> > (likely) or there may be a bug in epoll itself (less likely). 

Everything you describe with your epoll usage seems valid and lines up
with my use of it.

> > Another thread, distinct from all of the threads serving particular
> > sockets, is perfoming epoll_wait calls. When sockets are returned as
> > being ready from an epoll_wait call, the thread signals to the
> > condition variable for the socket.

Perhaps there is a bug in the way your epoll_wait thread
uses the condition variable to notify other threads?

Fwiw, I just use epoll_wait(maxevents=1) in my normal threads (right
after I call epoll_ctl()).  This means I can avoid both the condition
variable and also avoid using a dedicated thread calling epoll_wait().

> > Since I am using EPOLLONESHOT, I assume that there is no need to
> > also perform epoll_ctl with EPOLL_CTL_DEL here. 

Correct.

> > The problem I am encountering is that sometimes a thread will block
> > waiting for the readiness signal and will never get notified, even
> > though there is data to be read. This behavior seems to go away when
> > I remove EPOLLONESHOT flag when registering the event. 

Is the thread the one waiting on the condition variable or epoll_wait?
In your situation (stream I/O via multiple threads, single epoll
descriptor), I think EPOLLONESHOT is the /only/ sane thing to do.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-13  9:32   ` Eric Wong
@ 2012-12-13 15:29     ` Andreas Voellmy
  2012-12-14  0:16       ` Andreas Voellmy
  2012-12-14  0:08     ` Phil Turmel
  1 sibling, 1 reply; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-13 15:29 UTC (permalink / raw)
  To: Eric Wong; +Cc: viro, linux-fsdevel, linux-kernel

Hi Eric, 

On Dec 13, 2012, at 4:32 AM, Eric Wong <normalperson@yhbt.net> wrote:

> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> 
>>> Another thread, distinct from all of the threads serving particular
>>> sockets, is perfoming epoll_wait calls. When sockets are returned as
>>> being ready from an epoll_wait call, the thread signals to the
>>> condition variable for the socket.
> 
> Perhaps there is a bug in the way your epoll_wait thread
> uses the condition variable to notify other threads?
> 

This is possible; I've tried very hard (e.g. I added assertions to check various error conditions) to ensure that there is problem in signaling the other threads. From everything I can tell, it is working properly.

> 
>>> The problem I am encountering is that sometimes a thread will block
>>> waiting for the readiness signal and will never get notified, even
>>> though there is data to be read. This behavior seems to go away when
>>> I remove EPOLLONESHOT flag when registering the event. 
> 
> Is the thread the one waiting on the condition variable or epoll_wait?
> In your situation (stream I/O via multiple threads, single epoll
> descriptor), I think EPOLLONESHOT is the /only/ sane thing to do.

The one waiting on the condition variable.

I think I've narrowed down the problem a bit more. In my program I have multiple epoll instances. Most of the epoll instances are for monitoring sockets. One is used for monitoring an eventfd that is written to by other threads. The problem only occurs when I write to the eventfd after servicing each http request on a socket; i.e. the epoll monitoring the eventfd is returning from a blocking epoll_wait call very frequently . If I don't do that write, or if I use a different notification facility, for example poll, to monitor the eventfd, then the problem goes away.  So it looks like there may be some way in which different epoll instances can interfere with each other. 

Probably this setup sounds weird to you, but I'm trying to spare you from understanding my whole application;  this is part of a multicore runtime system for a programming language with user-level threads and to explain the full story of this would probably take more time than you want to spend.   But I can provide more detail if you like. 

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-13 15:29     ` Andreas Voellmy
@ 2012-12-14  0:16       ` Andreas Voellmy
  2012-12-15 14:50         ` Andreas Voellmy
  2012-12-20 21:32         ` Eric Wong
  0 siblings, 2 replies; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-14  0:16 UTC (permalink / raw)
  To: Eric Wong
  Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang,
	Andreas Voellmy

I believe I have found a bug in epoll. This bug causes the behavior I described in earlier emails. The bug is caused by the interaction of epoll instances which share no files in common. 

I wrote a C program that behaves similar to my original program and triggers the bug. The bug only arises when I use enough cores and threads (about 16). The program is here: https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c  This program is a super-stripped down http server.  It uses a number of threads that serve requests, each with its own epoll instance. There is also a "wakeup" thread that simply monitors an eventfd file and reads from the eventfd file when woken. All the worker threads write to the eventfd file when they process a request. This probably seems like a strange program, but something like this came up in a real system. 

I test the program using the weighttp http request generator (http://redmine.lighttpd.net/projects/weighttp/wiki). You need to test with enough requests and enough concurrent clients, and enough worker threads to create the problem. For example, I run with './weighttp -n 400000 -c 500 -t 6 -k "10.12.0.1:8080"'. With 16 cores for the server program (epollbug.c) this test workload triggers the bug about once every 3 runs.  The server (epollbug.c) has been hardcoded to work with whatever specific request weighttp sends it.  You need to find out what weighttp is sending from your test machine and then put that at the top of epollbug.c. You will see where it goes. You can uncomment the SHOW_DEBUG flag at the top of the program and run weighttp against it and it will print the request weighttp i
 s sending. Then update the EXPECTED_HTTP_REQUEST with whatever you get.

I am running Linux 3.4.0.0. 

Cheers, 
Andi

On Dec 13, 2012, at 10:29 AM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:

> Hi Eric, 
> 
> On Dec 13, 2012, at 4:32 AM, Eric Wong <normalperson@yhbt.net> wrote:
> 
>> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>> 
>>>> Another thread, distinct from all of the threads serving particular
>>>> sockets, is perfoming epoll_wait calls. When sockets are returned as
>>>> being ready from an epoll_wait call, the thread signals to the
>>>> condition variable for the socket.
>> 
>> Perhaps there is a bug in the way your epoll_wait thread
>> uses the condition variable to notify other threads?
>> 
> 
> This is possible; I've tried very hard (e.g. I added assertions to check various error conditions) to ensure that there is problem in signaling the other threads. From everything I can tell, it is working properly.
> 
>> 
>>>> The problem I am encountering is that sometimes a thread will block
>>>> waiting for the readiness signal and will never get notified, even
>>>> though there is data to be read. This behavior seems to go away when
>>>> I remove EPOLLONESHOT flag when registering the event. 
>> 
>> Is the thread the one waiting on the condition variable or epoll_wait?
>> In your situation (stream I/O via multiple threads, single epoll
>> descriptor), I think EPOLLONESHOT is the /only/ sane thing to do.
> 
> The one waiting on the condition variable.
> 
> I think I've narrowed down the problem a bit more. In my program I have multiple epoll instances. Most of the epoll instances are for monitoring sockets. One is used for monitoring an eventfd that is written to by other threads. The problem only occurs when I write to the eventfd after servicing each http request on a socket; i.e. the epoll monitoring the eventfd is returning from a blocking epoll_wait call very frequently . If I don't do that write, or if I use a different notification facility, for example poll, to monitor the eventfd, then the problem goes away.  So it looks like there may be some way in which different epoll instances can interfere with each other. 
> 
> Probably this setup sounds weird to you, but I'm trying to spare you from understanding my whole application;  this is part of a multicore runtime system for a programming language with user-level threads and to explain the full story of this would probably take more time than you want to spend.   But I can provide more detail if you like. 
> 
> -Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-14  0:16       ` Andreas Voellmy
@ 2012-12-15 14:50         ` Andreas Voellmy
  2012-12-18  2:07           ` Eric Wong
  2012-12-20 21:32         ` Eric Wong
  1 sibling, 1 reply; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-15 14:50 UTC (permalink / raw)
  To: Eric Wong; +Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang

There were a couple of errors in the code when I posted my last message. I have fixed those. The epoll bug still occurs. 

-Andi

On Dec 13, 2012, at 7:16 PM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:

> I believe I have found a bug in epoll. This bug causes the behavior I described in earlier emails. The bug is caused by the interaction of epoll instances which share no files in common. 
> 
> I wrote a C program that behaves similar to my original program and triggers the bug. The bug only arises when I use enough cores and threads (about 16). The program is here: https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c  This program is a super-stripped down http server.  It uses a number of threads that serve requests, each with its own epoll instance. There is also a "wakeup" thread that simply monitors an eventfd file and reads from the eventfd file when woken. All the worker threads write to the eventfd file when they process a request. This probably seems like a strange program, but something like this came up in a real system. 
> 
> I test the program using the weighttp http request generator (http://redmine.lighttpd.net/projects/weighttp/wiki). You need to test with enough requests and enough concurrent clients, and enough worker threads to create the problem. For example, I run with './weighttp -n 400000 -c 500 -t 6 -k "10.12.0.1:8080"'. With 16 cores for the server program (epollbug.c) this test workload triggers the bug about once every 3 runs.  The server (epollbug.c) has been hardcoded to work with whatever specific request weighttp sends it.  You need to find out what weighttp is sending from your test machine and then put that at the top of epollbug.c. You will see where it goes. You can uncomment the SHOW_DEBUG flag at the top of the program and run weighttp against it and it will print the request weighttp
  is sending. Then update the EXPECTED_HTTP_REQUEST with whatever you get.
> 
> I am running Linux 3.4.0.0. 
> 
> Cheers, 
> Andi
> 
> On Dec 13, 2012, at 10:29 AM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> 
>> Hi Eric, 
>> 
>> On Dec 13, 2012, at 4:32 AM, Eric Wong <normalperson@yhbt.net> wrote:
>> 
>>> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>>> 
>>>>> Another thread, distinct from all of the threads serving particular
>>>>> sockets, is perfoming epoll_wait calls. When sockets are returned as
>>>>> being ready from an epoll_wait call, the thread signals to the
>>>>> condition variable for the socket.
>>> 
>>> Perhaps there is a bug in the way your epoll_wait thread
>>> uses the condition variable to notify other threads?
>>> 
>> 
>> This is possible; I've tried very hard (e.g. I added assertions to check various error conditions) to ensure that there is problem in signaling the other threads. From everything I can tell, it is working properly.
>> 
>>> 
>>>>> The problem I am encountering is that sometimes a thread will block
>>>>> waiting for the readiness signal and will never get notified, even
>>>>> though there is data to be read. This behavior seems to go away when
>>>>> I remove EPOLLONESHOT flag when registering the event. 
>>> 
>>> Is the thread the one waiting on the condition variable or epoll_wait?
>>> In your situation (stream I/O via multiple threads, single epoll
>>> descriptor), I think EPOLLONESHOT is the /only/ sane thing to do.
>> 
>> The one waiting on the condition variable.
>> 
>> I think I've narrowed down the problem a bit more. In my program I have multiple epoll instances. Most of the epoll instances are for monitoring sockets. One is used for monitoring an eventfd that is written to by other threads. The problem only occurs when I write to the eventfd after servicing each http request on a socket; i.e. the epoll monitoring the eventfd is returning from a blocking epoll_wait call very frequently . If I don't do that write, or if I use a different notification facility, for example poll, to monitor the eventfd, then the problem goes away.  So it looks like there may be some way in which different epoll instances can interfere with each other. 
>> 
>> Probably this setup sounds weird to you, but I'm trying to spare you from understanding my whole application;  this is part of a multicore runtime system for a programming language with user-level threads and to explain the full story of this would probably take more time than you want to spend.   But I can provide more detail if you like. 
>> 
>> -Andi
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-15 14:50         ` Andreas Voellmy
@ 2012-12-18  2:07           ` Eric Wong
  2012-12-18  2:35             ` Andreas Voellmy
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Wong @ 2012-12-18  2:07 UTC (permalink / raw)
  To: Andreas Voellmy
  Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang

Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> There were a couple of errors in the code when I posted my last
> message. I have fixed those. The epoll bug still occurs. 

Sorry I haven't gotten around to this.

Can you reproduce this with fewer cores? (I only have 4 at most).
Have you tried the latest stable kernel version?

Can you reproduce this over TCP loopback, or only across two machines?
If the latter, it could also be a driver or firmware bug...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-18  2:07           ` Eric Wong
@ 2012-12-18  2:35             ` Andreas Voellmy
  2012-12-18 17:27               ` Andreas Voellmy
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-18  2:35 UTC (permalink / raw)
  To: Eric Wong; +Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang


On Dec 17, 2012, at 9:07 PM, Eric Wong <normalperson@yhbt.net> wrote:

> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>> There were a couple of errors in the code when I posted my last
>> message. I have fixed those. The epoll bug still occurs. 
> 
> Sorry I haven't gotten around to this.
> 
> Can you reproduce this with fewer cores? (I only have 4 at most).

I've been able to reproduce it on as few as 8 cores. I've never seen it occur with fewer than that.

> Have you tried the latest stable kernel version?

No, I've only tried 3.4.0.0.

> 
> Can you reproduce this over TCP loopback, or only across two machines?
> If the latter, it could also be a driver or firmware bug...

Yes, it also occurs when I run the http request generator on the same machine on the loopback interface.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-18  2:35             ` Andreas Voellmy
@ 2012-12-18 17:27               ` Andreas Voellmy
  2012-12-19 19:39                 ` Andreas Voellmy
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-18 17:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang

BTW, I simplified the test program a bit: I removed the loop that epoll_waits on the eventfd fd and reads from it (I also removed the epoll instance in that loop).  The bug still occurs with this removed. Now the bug is triggered simply by adding the call to eventfd_write after processing each request. 

I pushed the update to my github project for the test program.

-Andi

On Dec 17, 2012, at 9:35 PM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:

> 
> On Dec 17, 2012, at 9:07 PM, Eric Wong <normalperson@yhbt.net> wrote:
> 
>> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>>> There were a couple of errors in the code when I posted my last
>>> message. I have fixed those. The epoll bug still occurs. 
>> 
>> Sorry I haven't gotten around to this.
>> 
>> Can you reproduce this with fewer cores? (I only have 4 at most).
> 
> I've been able to reproduce it on as few as 8 cores. I've never seen it occur with fewer than that.
> 
>> Have you tried the latest stable kernel version?
> 
> No, I've only tried 3.4.0.0.
> 
>> 
>> Can you reproduce this over TCP loopback, or only across two machines?
>> If the latter, it could also be a driver or firmware bug...
> 
> Yes, it also occurs when I run the http request generator on the same machine on the loopback interface.
> 
> -Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-18 17:27               ` Andreas Voellmy
@ 2012-12-19 19:39                 ` Andreas Voellmy
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-19 19:39 UTC (permalink / raw)
  To: Eric Wong
  Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang,
	e1000-devel

We (i.e. I together with my colleague Jason Wang, cc'ed) installed the latest stable kernel (3.7.1) and verified that the bug still occurs.  The bug occurs when testing the program across a network link and when testing on the loopback interface.  

We also noticed that when testing across the network, the hardware irq affinity settings (in the /proc/irq/ files) affect how frequently the bug occurs. When we use the default irq settings, the bug does not seem to occur, whereas when we use settings suggested by Intel (with script set_irq_affinity.sh) the bug occurs more frequently. 

Cheers, 
Andi

On Dec 18, 2012, at 12:27 PM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:

> BTW, I simplified the test program a bit: I removed the loop that epoll_waits on the eventfd fd and reads from it (I also removed the epoll instance in that loop).  The bug still occurs with this removed. Now the bug is triggered simply by adding the call to eventfd_write after processing each request. 
> 
> I pushed the update to my github project for the test program.
> 
> -Andi
> 
> On Dec 17, 2012, at 9:35 PM, Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> 
>> 
>> On Dec 17, 2012, at 9:07 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> 
>>> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>>>> There were a couple of errors in the code when I posted my last
>>>> message. I have fixed those. The epoll bug still occurs. 
>>> 
>>> Sorry I haven't gotten around to this.
>>> 
>>> Can you reproduce this with fewer cores? (I only have 4 at most).
>> 
>> I've been able to reproduce it on as few as 8 cores. I've never seen it occur with fewer than that.
>> 
>>> Have you tried the latest stable kernel version?
>> 
>> No, I've only tried 3.4.0.0.
>> 
>>> 
>>> Can you reproduce this over TCP loopback, or only across two machines?
>>> If the latter, it could also be a driver or firmware bug...
>> 
>> Yes, it also occurs when I run the http request generator on the same machine on the loopback interface.
>> 
>> -Andi
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-14  0:16       ` Andreas Voellmy
  2012-12-15 14:50         ` Andreas Voellmy
@ 2012-12-20 21:32         ` Eric Wong
  2012-12-20 22:25           ` Junchang(Jason) Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Wong @ 2012-12-20 21:32 UTC (permalink / raw)
  To: Andreas Voellmy
  Cc: viro, linux-fsdevel, linux-kernel, junchang.wang@yale.edu Wang

Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> I wrote a C program that behaves similar to my original program and
> triggers the bug. The bug only arises when I use enough cores and
> threads (about 16). The program is here:
> https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c

I finally took a closer look at your code.   I think your socketCheck()
thread is draining socket and causing the normal threads to miss
events.

Use the FIONREAD ioctl() instead to get unread bytes instead of recv().
If you want to recv() without draining the socket, you can also use
the MSG_PEEK flag.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-20 21:32         ` Eric Wong
@ 2012-12-20 22:25           ` Junchang(Jason) Wang
  2012-12-21 15:32             ` Andreas Voellmy
  2012-12-22  2:54             ` Eric Wong
  0 siblings, 2 replies; 16+ messages in thread
From: Junchang(Jason) Wang @ 2012-12-20 22:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: Andreas Voellmy, viro, linux-fsdevel, linux-kernel

On Thu, Dec 20, 2012 at 4:32 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>> I wrote a C program that behaves similar to my original program and
>> triggers the bug. The bug only arises when I use enough cores and
>> threads (about 16). The program is here:
>> https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c
>
> I finally took a closer look at your code.   I think your socketCheck()
> thread is draining socket and causing the normal threads to miss
> events.
>

Hi Wong,

Thank you so much for responses. But I think you probably
misunderstood socketCheck() which we use at the end of the program to
check whether the bug has occurred (by checking whether some data
remains in the sockets). Please note there's a sleep(10) (line 237) at
the very beginning of socketCheck(). In other words, we can guarantee
recv() (line 239) doesn't execute when the normal threads are running.

We still believe this is a bug in epoll system even though we can't
prove that so far. Both Andi and I are very interested in this problem
and helping you experts solve this it. Just let us know if we can
help.

Thanks.

> Use the FIONREAD ioctl() instead to get unread bytes instead of recv().
> If you want to recv() without draining the socket, you can also use
> the MSG_PEEK flag.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-20 22:25           ` Junchang(Jason) Wang
@ 2012-12-21 15:32             ` Andreas Voellmy
  2012-12-22  2:54             ` Eric Wong
  1 sibling, 0 replies; 16+ messages in thread
From: Andreas Voellmy @ 2012-12-21 15:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: viro, linux-fsdevel, linux-kernel, Junchang(Jason) Wang

Hi Eric, 

Thanks again for looking at our bug report. I agree with Jason's comments: the bug occurs independently of the socketCheck function; this function waits long enough for the server to stop serving any more requests and then checks the sockets to find out which ones still have data (and therefore should be served).  But I like your suggestion, so I updated the code to use ioctl rather than recv in socketCheck. As we expected, the bug still occurs. 

Cheers, 
Andi

On Dec 20, 2012, at 5:25 PM, "Junchang(Jason) Wang" <junchang.wang@yale.edu> wrote:

> On Thu, Dec 20, 2012 at 4:32 PM, Eric Wong <normalperson@yhbt.net> wrote:
>> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
>>> I wrote a C program that behaves similar to my original program and
>>> triggers the bug. The bug only arises when I use enough cores and
>>> threads (about 16). The program is here:
>>> https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c
>> 
>> I finally took a closer look at your code.   I think your socketCheck()
>> thread is draining socket and causing the normal threads to miss
>> events.
>> 
> 
> Hi Wong,
> 
> Thank you so much for responses. But I think you probably
> misunderstood socketCheck() which we use at the end of the program to
> check whether the bug has occurred (by checking whether some data
> remains in the sockets). Please note there's a sleep(10) (line 237) at
> the very beginning of socketCheck(). In other words, we can guarantee
> recv() (line 239) doesn't execute when the normal threads are running.
> 
> We still believe this is a bug in epoll system even though we can't
> prove that so far. Both Andi and I are very interested in this problem
> and helping you experts solve this it. Just let us know if we can
> help.
> 
> 
> Thanks.
> 
>> Use the FIONREAD ioctl() instead to get unread bytes instead of recv().
>> If you want to recv() without draining the socket, you can also use
>> the MSG_PEEK flag.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-20 22:25           ` Junchang(Jason) Wang
  2012-12-21 15:32             ` Andreas Voellmy
@ 2012-12-22  2:54             ` Eric Wong
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Wong @ 2012-12-22  2:54 UTC (permalink / raw)
  To: Junchang(Jason) Wang; +Cc: Andreas Voellmy, viro, linux-fsdevel, linux-kernel

"Junchang(Jason) Wang" <junchang.wang@yale.edu> wrote:
> We still believe this is a bug in epoll system even though we can't
> prove that so far. Both Andi and I are very interested in this problem
> and helping you experts solve this it. Just let us know if we can
> help.

I'm just another epoll user, definitely not an expert.  Hopefully
somebody else can figure this out, because I'm unable to reproduce the
problem with your code and I haven't spotted any bugs from reading
through the kernel.

Curious, I also have a multi-threaded HTTP server which is a little
similar (multi-threaded, 2 epoll descriptors (only one epoll is heavily
used).  I run it on 2/4-core systems and haven't hit issues with epoll.

If you want to test, it should be easy to build from tarball:

  http://bogomips.org/cmogstored/files/cmogstored-1.0.0.tar.gz
  configure && make
  ./cmogstored --httplisten=8080 --docroot=/path/to/whatever

More info here: http://bogomips.org/cmogstored/README
git clone http://bogomips.org/cmogstored.git

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-13  9:32   ` Eric Wong
  2012-12-13 15:29     ` Andreas Voellmy
@ 2012-12-14  0:08     ` Phil Turmel
  2012-12-14  0:15       ` Phil Turmel
  1 sibling, 1 reply; 16+ messages in thread
From: Phil Turmel @ 2012-12-14  0:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: Andreas Voellmy, viro, linux-fsdevel, linux-kernel

On 12/13/2012 04:32 AM, Eric Wong wrote:
> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:

[trim /]

>>> Another thread, distinct from all of the threads serving particular
>>> sockets, is perfoming epoll_wait calls. When sockets are returned as
>>> being ready from an epoll_wait call, the thread signals to the
>>> condition variable for the socket.
> 
> Perhaps there is a bug in the way your epoll_wait thread
> uses the condition variable to notify other threads?

Have you considered the possibility that data is arriving between
epoll_ctl and pthread_cond_wait ?  If your monitoring thread returns
from epoll_wait within this race window, it will call
pthread_cond_signal while the first thread is not yet waiting for it.
With the one-shot flag, the next iteration of epoll_wait won't see that
socket's new data.

Phil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: epoll with ONESHOT possibly fails to deliver events
  2012-12-14  0:08     ` Phil Turmel
@ 2012-12-14  0:15       ` Phil Turmel
  0 siblings, 0 replies; 16+ messages in thread
From: Phil Turmel @ 2012-12-14  0:15 UTC (permalink / raw)
  To: Eric Wong; +Cc: Andreas Voellmy, viro, linux-fsdevel, linux-kernel

On 12/13/2012 07:08 PM, Phil Turmel wrote:
> On 12/13/2012 04:32 AM, Eric Wong wrote:
>> Andreas Voellmy <andreas.voellmy@yale.edu> wrote:
> 
> [trim /]
> 
>>>> Another thread, distinct from all of the threads serving particular
>>>> sockets, is perfoming epoll_wait calls. When sockets are returned as
>>>> being ready from an epoll_wait call, the thread signals to the
>>>> condition variable for the socket.
>>
>> Perhaps there is a bug in the way your epoll_wait thread
>> uses the condition variable to notify other threads?
> 
> Have you considered the possibility that data is arriving between
> epoll_ctl and pthread_cond_wait ?  If your monitoring thread returns
> from epoll_wait within this race window, it will call
> pthread_cond_signal while the first thread is not yet waiting for it.
> With the one-shot flag, the next iteration of epoll_wait won't see that
> socket's new data.

Let me clarify:

The read thread must perform the epoll_ctl between pthread_mutex_lock
and pthread_cond_wait, while the monitoring thread must hold the mutex
lock when signaling.

pthread_cond_signal and pthread_cond_broadcast don't require the caller
to hold the mutex in general, but your app needs it.

Phil

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-12-22  2:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-11 22:23 epoll with ONESHOT possibly fails to deliver events Andreas Voellmy
2012-12-12 23:49 ` Andreas Voellmy
2012-12-13  9:32   ` Eric Wong
2012-12-13 15:29     ` Andreas Voellmy
2012-12-14  0:16       ` Andreas Voellmy
2012-12-15 14:50         ` Andreas Voellmy
2012-12-18  2:07           ` Eric Wong
2012-12-18  2:35             ` Andreas Voellmy
2012-12-18 17:27               ` Andreas Voellmy
2012-12-19 19:39                 ` Andreas Voellmy
2012-12-20 21:32         ` Eric Wong
2012-12-20 22:25           ` Junchang(Jason) Wang
2012-12-21 15:32             ` Andreas Voellmy
2012-12-22  2:54             ` Eric Wong
2012-12-14  0:08     ` Phil Turmel
2012-12-14  0:15       ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).