public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Deadlock in 2.2 sock_alloc_send_skb?
@ 2001-05-10 12:34 Ulrich.Weigand
  2001-05-10 12:57 ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Ulrich.Weigand @ 2001-05-10 12:34 UTC (permalink / raw)
  To: alan; +Cc: linux-kernel, schwidefsky



Hi Alan,

we've experienced deadlocks that appear to be caused by the loop in
sock_alloc_send_skb().  To trigger this, you need to combine heavy
network load with memory pressure.  In this situation, sock_wmalloc()
can fail because it really can't allocate any more memory, even though
the send buffer limit for this socket is not yet exhausted.

If that happens, and the socket uses GFP_ATOMIC allocation, the while (1)
loop in sock_alloc_send_skb() will endlessly spin, without ever calling
schedule(), and all the time holding the kernel lock ...

Do you agree that this is a problem?  What do you think about this fix:

diff -ur linux-2.2.19/net/core/sock.c linux-2.2.19-s390/net/core/sock.c
--- linux-2.2.19/net/core/sock.c   Sun Mar 25 18:37:41 2001
+++ linux-2.2.19-s390/net/core/sock.c    Thu May 10 14:02:11 2001
@@ -752,9 +752,11 @@
                    break;
               try_size = fallback;
          }
-         skb = sock_wmalloc(sk, try_size, 0, sk->allocation);
+         skb = sock_wmalloc_err(sk, try_size, 0, sk->allocation, &err);
          if (skb)
               break;
+         if (err)
+              goto failure;

          /*
           *   This means we have too many buffers for this socket already.


Test case is simple:  keep spawning lots of long-running 'ping' processes
until physical memory is exhausted.


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in 2.2 sock_alloc_send_skb?
  2001-05-10 12:34 Deadlock in 2.2 sock_alloc_send_skb? Ulrich.Weigand
@ 2001-05-10 12:57 ` Alan Cox
  2001-05-10 17:30   ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Cox @ 2001-05-10 12:57 UTC (permalink / raw)
  To: Ulrich.Weigand; +Cc: alan, linux-kernel, schwidefsky

> If that happens, and the socket uses GFP_ATOMIC allocation, the while (1)
> loop in sock_alloc_send_skb() will endlessly spin, without ever calling
> schedule(), and all the time holding the kernel lock ...

If the socket is using GFP_ATOMIC allocation it should never loop. That is
-not-allowed-. 

> Do you agree that this is a problem?  What do you think about this fix:

Looks good


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in 2.2 sock_alloc_send_skb?
  2001-05-10 12:57 ` Alan Cox
@ 2001-05-10 17:30   ` Andi Kleen
  2001-05-10 21:13     ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2001-05-10 17:30 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ulrich.Weigand, linux-kernel, schwidefsky

On Thu, May 10, 2001 at 01:57:49PM +0100, Alan Cox wrote:
> > If that happens, and the socket uses GFP_ATOMIC allocation, the while (1)
> > loop in sock_alloc_send_skb() will endlessly spin, without ever calling
> > schedule(), and all the time holding the kernel lock ...
> 
> If the socket is using GFP_ATOMIC allocation it should never loop. That is
> -not-allowed-. 

It is just not clear why any socket should use GFP_ATOMIC. I can understand
it using GFP_BUFFER e.g. for nbd, but GFP_ATOMIC seems to be rather radical
and fragile.

-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in 2.2 sock_alloc_send_skb?
  2001-05-10 17:30   ` Andi Kleen
@ 2001-05-10 21:13     ` Andrea Arcangeli
  2001-05-10 21:17       ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2001-05-10 21:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alan Cox, Ulrich.Weigand, linux-kernel, schwidefsky

On Thu, May 10, 2001 at 07:30:47PM +0200, Andi Kleen wrote:
> On Thu, May 10, 2001 at 01:57:49PM +0100, Alan Cox wrote:
> > > If that happens, and the socket uses GFP_ATOMIC allocation, the while (1)
> > > loop in sock_alloc_send_skb() will endlessly spin, without ever calling
> > > schedule(), and all the time holding the kernel lock ...
> > 
> > If the socket is using GFP_ATOMIC allocation it should never loop. That is
> > -not-allowed-. 
> 
> It is just not clear why any socket should use GFP_ATOMIC. I can understand
> it using GFP_BUFFER e.g. for nbd, but GFP_ATOMIC seems to be rather radical
> and fragile.

side note, the only legal use of GFP_ATOMIC in sock_alloc_send_skb is
with noblock set and fallback zero, remeber GFP_BUFFER will sleep, it
may not sleep in vanilla 2.2.19 but the necessary lowlatency hooks in
the memory balancing that for example I have on my 2.2 tree will make it
to sleep.

The patch self contained looks perfect (I didn't checked that the
callers are happy with a -ENOMEM errorcode though), if alloc_skb()
failed that's a plain -ENOMEM. The caller must _not_ try again, they
must take a recovery fail path instead.

Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in 2.2 sock_alloc_send_skb?
  2001-05-10 21:13     ` Andrea Arcangeli
@ 2001-05-10 21:17       ` Andi Kleen
  2001-05-10 21:32         ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2001-05-10 21:17 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Alan Cox, Ulrich.Weigand, linux-kernel, schwidefsky

On Thu, May 10, 2001 at 11:13:00PM +0200, Andrea Arcangeli wrote:
> On Thu, May 10, 2001 at 07:30:47PM +0200, Andi Kleen wrote:
> > On Thu, May 10, 2001 at 01:57:49PM +0100, Alan Cox wrote:
> > > > If that happens, and the socket uses GFP_ATOMIC allocation, the while (1)
> > > > loop in sock_alloc_send_skb() will endlessly spin, without ever calling
> > > > schedule(), and all the time holding the kernel lock ...
> > > 
> > > If the socket is using GFP_ATOMIC allocation it should never loop. That is
> > > -not-allowed-. 
> > 
> > It is just not clear why any socket should use GFP_ATOMIC. I can understand
> > it using GFP_BUFFER e.g. for nbd, but GFP_ATOMIC seems to be rather radical
> > and fragile.
> 
> side note, the only legal use of GFP_ATOMIC in sock_alloc_send_skb is
> with noblock set and fallback zero, remeber GFP_BUFFER will sleep, it
> may not sleep in vanilla 2.2.19 but the necessary lowlatency hooks in
> the memory balancing that for example I have on my 2.2 tree will make it
> to sleep.

Even with nonblock set the socket code will sleep in some circumstances
(e.g. when aquiring the socket lock) so interrupt operation is out anyways.


> The patch self contained looks perfect (I didn't checked that the
> callers are happy with a -ENOMEM errorcode though), if alloc_skb()
> failed that's a plain -ENOMEM. The caller must _not_ try again, they
> must take a recovery fail path instead.

I think the callers are likely broken.
The patch is still good of course, but not for GFP_ATOMIC. 


-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in 2.2 sock_alloc_send_skb?
  2001-05-10 21:17       ` Andi Kleen
@ 2001-05-10 21:32         ` Andrea Arcangeli
  2001-05-11  8:36           ` Andi Kleen
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2001-05-10 21:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alan Cox, Ulrich.Weigand, linux-kernel, schwidefsky

On Thu, May 10, 2001 at 11:17:17PM +0200, Andi Kleen wrote:
> On Thu, May 10, 2001 at 11:13:00PM +0200, Andrea Arcangeli wrote:
> > On Thu, May 10, 2001 at 07:30:47PM +0200, Andi Kleen wrote:
> > > On Thu, May 10, 2001 at 01:57:49PM +0100, Alan Cox wrote:
> > > > > If that happens, and the socket uses GFP_ATOMIC allocation, the while (1)
> > > > > loop in sock_alloc_send_skb() will endlessly spin, without ever calling
> > > > > schedule(), and all the time holding the kernel lock ...
> > > > 
> > > > If the socket is using GFP_ATOMIC allocation it should never loop. That is
> > > > -not-allowed-. 
> > > 
> > > It is just not clear why any socket should use GFP_ATOMIC. I can understand
> > > it using GFP_BUFFER e.g. for nbd, but GFP_ATOMIC seems to be rather radical
> > > and fragile.
> > 
> > side note, the only legal use of GFP_ATOMIC in sock_alloc_send_skb is
> > with noblock set and fallback zero, remeber GFP_BUFFER will sleep, it
> > may not sleep in vanilla 2.2.19 but the necessary lowlatency hooks in
> > the memory balancing that for example I have on my 2.2 tree will make it
> > to sleep.
> 
> Even with nonblock set the socket code will sleep in some circumstances
> (e.g. when aquiring the socket lock) so interrupt operation is out anyways.
> 
> 
> > The patch self contained looks perfect (I didn't checked that the
> > callers are happy with a -ENOMEM errorcode though), if alloc_skb()
> > failed that's a plain -ENOMEM. The caller must _not_ try again, they
> > must take a recovery fail path instead.
> 
> I think the callers are likely broken.
> The patch is still good of course, but not for GFP_ATOMIC. 

you said interrupt won't call that function so I don't see the
GFP_ATOMIC issue.

I also don't what's the issue with GFP_ATOMIC regardless somebody uses
it or not, the patch itself has nothing to do with GFP_ATOMIC. All
gfpmasks can fail, allock_skb can fail regardless of the gfpmask, not
only GFP_ATOMIC will fail, of course GFP_ATOMIC can fail even if the
machine is not totally out of memory but you never know and you cannot
assume anything and when alloc_skb fails you must assume the machine is
totally out of memory or you will deadlock, so if alloc_skb fails we
must return -ENOMEM and take the fail path and the patch does the right
thing in such case as well.

Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Deadlock in 2.2 sock_alloc_send_skb?
  2001-05-10 21:32         ` Andrea Arcangeli
@ 2001-05-11  8:36           ` Andi Kleen
  0 siblings, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2001-05-11  8:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Alan Cox, Ulrich.Weigand, linux-kernel, schwidefsky

On Thu, May 10, 2001 at 11:32:25PM +0200, Andrea Arcangeli wrote:
> you said interrupt won't call that function so I don't see the
> GFP_ATOMIC issue.

I said interrupts should not call it, but apparently somebody tries to 
call it with GFP_ATOMIC and I'm suspecting that caller is broken (whatever
it is, I don't think it is in the main tree)

> 
> I also don't what's the issue with GFP_ATOMIC regardless somebody uses

[...]

No argument about that it should handle oom anyways.



-Andi


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-05-11  8:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-10 12:34 Deadlock in 2.2 sock_alloc_send_skb? Ulrich.Weigand
2001-05-10 12:57 ` Alan Cox
2001-05-10 17:30   ` Andi Kleen
2001-05-10 21:13     ` Andrea Arcangeli
2001-05-10 21:17       ` Andi Kleen
2001-05-10 21:32         ` Andrea Arcangeli
2001-05-11  8:36           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox