netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch] net: fix incorrect counting in __scm_destroy()
@ 2009-11-04 10:04 Amerigo Wang
  2009-11-04 10:29 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Amerigo Wang @ 2009-11-04 10:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Amerigo Wang, David S. Miller


It seems that in __scm_destroy() we forgot to decrease
the ->count after fput(->fp[i]), this may cause some
problem when we recursively call fput() again.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>

---
diff --git a/net/core/scm.c b/net/core/scm.c
index b7ba91b..fa53219 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -120,8 +120,10 @@ void __scm_destroy(struct scm_cookie *scm)
 				fpl = list_first_entry(&work_list, struct scm_fp_list, list);
 
 				list_del(&fpl->list);
-				for (i=fpl->count-1; i>=0; i--)
+				for (i = fpl->count-1; i >= 0; i--) {
 					fput(fpl->fp[i]);
+					fpl->count--;
+				}
 				kfree(fpl);
 			}
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Patch] net: fix incorrect counting in __scm_destroy()
  2009-11-04 10:04 [Patch] net: fix incorrect counting in __scm_destroy() Amerigo Wang
@ 2009-11-04 10:29 ` Eric Dumazet
  2009-11-04 12:41   ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2009-11-04 10:29 UTC (permalink / raw)
  To: Amerigo Wang; +Cc: linux-kernel, netdev, David S. Miller

Amerigo Wang a écrit :
> It seems that in __scm_destroy() we forgot to decrease
> the ->count after fput(->fp[i]), this may cause some
> problem when we recursively call fput() again.
> 
> Signed-off-by: WANG Cong <amwang@redhat.com>
> Cc: David S. Miller <davem@davemloft.net>
> 
> ---
> diff --git a/net/core/scm.c b/net/core/scm.c
> index b7ba91b..fa53219 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -120,8 +120,10 @@ void __scm_destroy(struct scm_cookie *scm)
>  				fpl = list_first_entry(&work_list, struct scm_fp_list, list);
>  
>  				list_del(&fpl->list);
> -				for (i=fpl->count-1; i>=0; i--)
> +				for (i = fpl->count-1; i >= 0; i--) {
>  					fput(fpl->fp[i]);
> +					fpl->count--;
> +				}
>  				kfree(fpl);
>  			}
>  

Hmm, your patch seems suspicious.

Are you fixing a real crash/bug, or is it something you discovered in a code review ?

Given we kfree(fpl) at the end of loop, we cannot recursively call __scm_destroy()
on same fpl, it would be a bug anyway ?

So you probably need something better, like testing fpl->list being not re-included
in current->scm_work_list before kfree() it 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] net: fix incorrect counting in __scm_destroy()
  2009-11-04 10:29 ` Eric Dumazet
@ 2009-11-04 12:41   ` David Miller
  2009-11-10  6:12     ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2009-11-04 12:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: amwang, linux-kernel, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 04 Nov 2009 11:29:05 +0100

> Given we kfree(fpl) at the end of loop, we cannot recursively call
> __scm_destroy() on same fpl, it would be a bug anyway ?
> 
> So you probably need something better, like testing fpl->list being
> not re-included in current->scm_work_list before kfree() it

I can't even see what the problem is.

The code is designed such that the ->count only matters for
the top level.

If we recursively fput() and get back here, we'll see that
there is someone higher in the call chain already running
the fput() loop and we'll just list_add_tail().

The inner while() loop will make sure we process such
entries once we get back to the top level and exit the
for() loop.

Amerigo, please show us the problematic code path where the counts go
wrong and this causes problems.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] net: fix incorrect counting in __scm_destroy()
  2009-11-04 12:41   ` David Miller
@ 2009-11-10  6:12     ` Cong Wang
  2009-11-10  6:33       ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2009-11-10  6:12 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, linux-kernel, netdev

David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 04 Nov 2009 11:29:05 +0100
> 
>> Given we kfree(fpl) at the end of loop, we cannot recursively call
>> __scm_destroy() on same fpl, it would be a bug anyway ?
>>
>> So you probably need something better, like testing fpl->list being
>> not re-included in current->scm_work_list before kfree() it
> 
> I can't even see what the problem is.
> 
> The code is designed such that the ->count only matters for
> the top level.
> 
> If we recursively fput() and get back here, we'll see that
> there is someone higher in the call chain already running
> the fput() loop and we'll just list_add_tail().
> 
> The inner while() loop will make sure we process such
> entries once we get back to the top level and exit the
> for() loop.
> 
> Amerigo, please show us the problematic code path where the counts go
> wrong and this causes problems.

Hi, all.

Thanks for your replies.

I met a soft lockup around this code on ia64, something like:

  [<a0000001006394e0>] unix_gc+0x240/0x760
                                 sp=e0000260f002fd70 bsp=e0000260f0029560
  [<a000000100634500>] unix_release_sock+0x440/0x460
                                 sp=e0000260f002fdb0 bsp=e0000260f0029508
  [<a000000100634560>] unix_release+0x40/0x60
                                 sp=e0000260f002fdb0 bsp=e0000260f00294e8
  [<a00000010051fba0>] sock_release+0x80/0x1c0
                                 sp=e0000260f002fdb0 bsp=e0000260f00294c0
  [<a00000010051fd60>] sock_close+0x80/0xa0
                                 sp=e0000260f002fdc0 bsp=e0000260f0029498
  [<a000000100172280>] __fput+0x1a0/0x420
                                 sp=e0000260f002fdc0 bsp=e0000260f0029458
  [<a000000100172540>] fput+0x40/0x60
                                 sp=e0000260f002fdc0 bsp=e0000260f0029438
  [<a000000100534a30>] __scm_destroy+0x130/0x1e0
                                 sp=e0000260f002fdc0 bsp=e0000260f0029410
  [<a000000100636370>] unix_destruct_fds+0x70/0xa0
                                 sp=e0000260f002fdd0 bsp=e0000260f00293e8
  [<a00000010052da30>] __kfree_skb+0x1f0/0x320
                                 sp=e0000260f002fe00 bsp=e0000260f00293c0
  [<a00000010052dbf0>] kfree_skb+0x90/0xc0
                                 sp=e0000260f002fe00 bsp=e0000260f00293a0
  [<a000000100634420>] unix_release_sock+0x360/0x460
                                 sp=e0000260f002fe00 bsp=e0000260f0029348
  [<a000000100634560>] unix_release+0x40/0x60
                                 sp=e0000260f002fe00 bsp=e0000260f0029328
  [<a00000010051fba0>] sock_release+0x80/0x1c0
                                 sp=e0000260f002fe00 bsp=e0000260f0029300
  [<a00000010051fd60>] sock_close+0x80/0xa0
                                 sp=e0000260f002fe10 bsp=e0000260f00292d8
  [<a000000100172280>] __fput+0x1a0/0x420
                                 sp=e0000260f002fe10 bsp=e0000260f0029298
  [<a000000100172540>] fput+0x40/0x60
                                 sp=e0000260f002fe10 bsp=e0000260f0029278


Yes, this even happens after commit f8d570a47.

But after doing a bisect, we found another hrtimer patch fixes this
problem, so it's not a bug of __scm_destroy().

Sorry for the noise.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Patch] net: fix incorrect counting in __scm_destroy()
  2009-11-10  6:12     ` Cong Wang
@ 2009-11-10  6:33       ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2009-11-10  6:33 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, linux-kernel, netdev

Cong Wang a écrit :
> 
> Yes, this even happens after commit f8d570a47.
> 
> But after doing a bisect, we found another hrtimer patch fixes this
> problem, so it's not a bug of __scm_destroy().
> 
> Sorry for the noise.
> 

Thanks for the explanation !

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-11-10  6:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-04 10:04 [Patch] net: fix incorrect counting in __scm_destroy() Amerigo Wang
2009-11-04 10:29 ` Eric Dumazet
2009-11-04 12:41   ` David Miller
2009-11-10  6:12     ` Cong Wang
2009-11-10  6:33       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).