netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
@ 2010-10-02  2:55 Nagendra Tomar
  2010-10-02  4:27 ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Nagendra Tomar @ 2010-10-02  2:55 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem



The condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory() and 
sk_stream_wait_connect() are incorrect.
The incorrect check in sk_stream_wait_memory() causes the following soft lockup 
in tcp_sendmsg() when the global tcp memory pool has exhausted. The check in 
sk_stream_wait_connect() was found by code audit.
   
>>> snip <<<
localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: 
localhost kernel: Call Trace:
localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] 
sk_stream_wait_memory+0x1b1/0x200
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83
>>> snip <<<

What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout(). 
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.

Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
---
--- linux-2.6.35.7/net/core/stream.c.orig 2010-03-23 23:46:45.000000000 +0530
+++ linux-2.6.35.7/net/core/stream.c 2010-03-24 00:21:09.000000000 +0530
@@ -73,9 +73,8 @@ int sk_stream_wait_connect(struct sock *
   prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
   sk->sk_write_pending++;
   done = sk_wait_event(sk, timeo_p,
-         !sk->sk_err &&
-         !((1 << sk->sk_state) &
-           ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
+         ((1 << sk->sk_state) &
+           (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
   finish_wait(sk_sleep(sk), &wait);
   sk->sk_write_pending--;
  } while (!done);
@@ -144,10 +143,9 @@ int sk_stream_wait_memory(struct sock *s
 
   set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
   sk->sk_write_pending++;
-  sk_wait_event(sk, &current_timeo, !sk->sk_err &&
-        !(sk->sk_shutdown & SEND_SHUTDOWN) &&
-        sk_stream_memory_free(sk) &&
-        vm_wait);
+  sk_wait_event(sk, &current_timeo, sk->sk_err ||
+        (sk->sk_shutdown & SEND_SHUTDOWN) ||
+        (sk_stream_memory_free(sk) && !vm_wait));
   sk->sk_write_pending--;
 
   if (vm_wait) {



      

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
  2010-10-02  2:55 [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event() Nagendra Tomar
@ 2010-10-02  4:27 ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2010-10-02  4:27 UTC (permalink / raw)
  To: tomer_iisc; +Cc: netdev, linux-kernel


Your patch has been corrupted by your email client, it changed
tab characters into spaces.

Please correct this and fully resubmit your patch.

Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
@ 2010-10-02  8:22 Nagendra Tomar
  2010-10-02  8:27 ` Eric Dumazet
  2010-10-02 20:27 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Nagendra Tomar @ 2010-10-02  8:22 UTC (permalink / raw)
  To: netdev; +Cc: davem, linux-kernel

Resending ...


The condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory() and sk_stream_wait_connect() are incorrect.
The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. The check in sk_stream_wait_connect() was found by code audit.
   

>>> snip <<<

localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200]  [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel: Call Trace:
localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83

>>> snip <<<

What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to *not* call schedule_timeout(). 
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.


Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
---
--- linux-2.6.35.7/net/core/stream.c.orig	2010-03-23 23:46:45.000000000 +0530
+++ linux-2.6.35.7/net/core/stream.c	2010-03-24 00:21:09.000000000 +0530
@@ -73,9 +73,8 @@ int sk_stream_wait_connect(struct sock *
 		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		sk->sk_write_pending++;
 		done = sk_wait_event(sk, timeo_p,
-				     !sk->sk_err &&
-				     !((1 << sk->sk_state) &
-				       ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
+				     ((1 << sk->sk_state) &
+				       (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
 		finish_wait(sk_sleep(sk), &wait);
 		sk->sk_write_pending--;
 	} while (!done);
@@ -144,10 +143,9 @@ int sk_stream_wait_memory(struct sock *s
 
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 		sk->sk_write_pending++;
-		sk_wait_event(sk, &current_timeo, !sk->sk_err &&
-						  !(sk->sk_shutdown & SEND_SHUTDOWN) &&
-						  sk_stream_memory_free(sk) &&
-						  vm_wait);
+		sk_wait_event(sk, &current_timeo, sk->sk_err ||
+						  (sk->sk_shutdown & SEND_SHUTDOWN) ||
+						  (sk_stream_memory_free(sk) && !vm_wait));
 		sk->sk_write_pending--;
 
 		if (vm_wait) {




      

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
  2010-10-02  8:22 Nagendra Tomar
@ 2010-10-02  8:27 ` Eric Dumazet
  2010-10-02  8:34   ` Eric Dumazet
  2010-10-02 20:27 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2010-10-02  8:27 UTC (permalink / raw)
  To: Nagendra Tomar; +Cc: netdev, davem, linux-kernel

Le samedi 02 octobre 2010 à 01:22 -0700, Nagendra Tomar a écrit :
> Resending ...
> 
> 
> The condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory() and sk_stream_wait_connect() are incorrect.
> The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. The check in sk_stream_wait_connect() was found by code audit.
>    
> 
> >>> snip <<<
> 
> localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
> localhost kernel: CPU 3:
> localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200]  [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
> localhost kernel: Call Trace:
> localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
> localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
> localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
> localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
> localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
> localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
> localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
> localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
> localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
> localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83
> 
> >>> snip <<<
> 
> What is happening is, that the sk_wait_event() condition passed from
> sk_stream_wait_memory() evaluates to true for the case of tcp global memory
> exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to *not* call schedule_timeout(). 
> Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
> This causes the caller to again try allocation, which again fails and again
> calls sk_stream_wait_memory(), and so on.
> 
> 

Hi Nagendra


> Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
> ---
> --- linux-2.6.35.7/net/core/stream.c.orig	2010-03-23 23:46:45.000000000 +0530
> +++ linux-2.6.35.7/net/core/stream.c	2010-03-24 00:21:09.000000000 +0530
> @@ -73,9 +73,8 @@ int sk_stream_wait_connect(struct sock *
>  		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>  		sk->sk_write_pending++;
>  		done = sk_wait_event(sk, timeo_p,
> -				     !sk->sk_err &&
> -				     !((1 << sk->sk_state) &
> -				       ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
> +				     ((1 << sk->sk_state) &
> +				       (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));

Just wondering why you remove the test on sk->err ?

We want to break the loop If sk->sk_err is set, or state is ESTABLISHED
or CLOSE_WAIT.

>  		finish_wait(sk_sleep(sk), &wait);
>  		sk->sk_write_pending--;
>  	} while (!done);
> @@ -144,10 +143,9 @@ int sk_stream_wait_memory(struct sock *s
>  
>  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>  		sk->sk_write_pending++;
> -		sk_wait_event(sk, &current_timeo, !sk->sk_err &&
> -						  !(sk->sk_shutdown & SEND_SHUTDOWN) &&
> -						  sk_stream_memory_free(sk) &&
> -						  vm_wait);
> +		sk_wait_event(sk, &current_timeo, sk->sk_err ||
> +						  (sk->sk_shutdown & SEND_SHUTDOWN) ||
> +						  (sk_stream_memory_free(sk) && !vm_wait));
>  		sk->sk_write_pending--;
>  
>  		if (vm_wait) {
> 
> 
> 

Thanks !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
  2010-10-02  8:27 ` Eric Dumazet
@ 2010-10-02  8:34   ` Eric Dumazet
  2010-10-02 11:49     ` Nagendra Tomar
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2010-10-02  8:34 UTC (permalink / raw)
  To: Nagendra Tomar; +Cc: netdev, davem, linux-kernel

Le samedi 02 octobre 2010 à 10:27 +0200, Eric Dumazet a écrit :

> Just wondering why you remove the test on sk->err ?
> 
> We want to break the loop If sk->sk_err is set, or state is ESTABLISHED
> or CLOSE_WAIT.

Hmm, reading the code again, I can see sk_err is tested in the loop, so
your code is better (sk_stream_wait_connect() returns an error after
your patch, instead of returning 0)

Could you please split your patch in two patches ?

The sk_stream_wait_connect() problems comes from commit
c1cbe4b7ad0bc4b1d9 ([NET]: Avoid atomic xchg() for non-error case)





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
  2010-10-02  8:34   ` Eric Dumazet
@ 2010-10-02 11:49     ` Nagendra Tomar
  0 siblings, 0 replies; 7+ messages in thread
From: Nagendra Tomar @ 2010-10-02 11:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, linux-kernel



--- On Sat, 2/10/10, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Just wondering why you remove the test on sk->err
> ?
> > 
> > We want to break the loop If sk->sk_err is set, or
> state is ESTABLISHED
> > or CLOSE_WAIT.
> 
> Hmm, reading the code again, I can see sk_err is tested in
> the loop, so
> your code is better (sk_stream_wait_connect() returns an
> error after
> your patch, instead of returning 0)

Exactly.

> 
> Could you please split your patch in two patches ?
> 

ok, I'll send it soon.

Thanks,
Tomar




      

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event()
  2010-10-02  8:22 Nagendra Tomar
  2010-10-02  8:27 ` Eric Dumazet
@ 2010-10-02 20:27 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2010-10-02 20:27 UTC (permalink / raw)
  To: tomer_iisc; +Cc: netdev, linux-kernel

From: Nagendra Tomar <tomer_iisc@yahoo.com>
Date: Sat, 2 Oct 2010 01:22:16 -0700 (PDT)

> Resending ...

It's still corrupted, see my other reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-02 20:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-02  2:55 [PATCH 2.6.35.7] net: Fix the condition passed to sk_wait_event() Nagendra Tomar
2010-10-02  4:27 ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2010-10-02  8:22 Nagendra Tomar
2010-10-02  8:27 ` Eric Dumazet
2010-10-02  8:34   ` Eric Dumazet
2010-10-02 11:49     ` Nagendra Tomar
2010-10-02 20:27 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).