* [Patch] net: fix incorrect counting in __scm_destroy()
@ 2009-11-04 10:04 Amerigo Wang
2009-11-04 10:29 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Amerigo Wang @ 2009-11-04 10:04 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, Amerigo Wang, David S. Miller
It seems that in __scm_destroy() we forgot to decrease
the ->count after fput(->fp[i]), this may cause some
problem when we recursively call fput() again.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
---
diff --git a/net/core/scm.c b/net/core/scm.c
index b7ba91b..fa53219 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -120,8 +120,10 @@ void __scm_destroy(struct scm_cookie *scm)
fpl = list_first_entry(&work_list, struct scm_fp_list, list);
list_del(&fpl->list);
- for (i=fpl->count-1; i>=0; i--)
+ for (i = fpl->count-1; i >= 0; i--) {
fput(fpl->fp[i]);
+ fpl->count--;
+ }
kfree(fpl);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [Patch] net: fix incorrect counting in __scm_destroy()
2009-11-04 10:04 [Patch] net: fix incorrect counting in __scm_destroy() Amerigo Wang
@ 2009-11-04 10:29 ` Eric Dumazet
2009-11-04 12:41 ` David Miller
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2009-11-04 10:29 UTC (permalink / raw)
To: Amerigo Wang; +Cc: linux-kernel, netdev, David S. Miller
Amerigo Wang a écrit :
> It seems that in __scm_destroy() we forgot to decrease
> the ->count after fput(->fp[i]), this may cause some
> problem when we recursively call fput() again.
>
> Signed-off-by: WANG Cong <amwang@redhat.com>
> Cc: David S. Miller <davem@davemloft.net>
>
> ---
> diff --git a/net/core/scm.c b/net/core/scm.c
> index b7ba91b..fa53219 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -120,8 +120,10 @@ void __scm_destroy(struct scm_cookie *scm)
> fpl = list_first_entry(&work_list, struct scm_fp_list, list);
>
> list_del(&fpl->list);
> - for (i=fpl->count-1; i>=0; i--)
> + for (i = fpl->count-1; i >= 0; i--) {
> fput(fpl->fp[i]);
> + fpl->count--;
> + }
> kfree(fpl);
> }
>
Hmm, your patch seems suspicious.
Are you fixing a real crash/bug, or is it something you discovered in a code review ?
Given we kfree(fpl) at the end of loop, we cannot recursively call __scm_destroy()
on same fpl, it would be a bug anyway ?
So you probably need something better, like testing fpl->list being not re-included
in current->scm_work_list before kfree() it
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Patch] net: fix incorrect counting in __scm_destroy()
2009-11-04 10:29 ` Eric Dumazet
@ 2009-11-04 12:41 ` David Miller
2009-11-10 6:12 ` Cong Wang
0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2009-11-04 12:41 UTC (permalink / raw)
To: eric.dumazet; +Cc: amwang, linux-kernel, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 04 Nov 2009 11:29:05 +0100
> Given we kfree(fpl) at the end of loop, we cannot recursively call
> __scm_destroy() on same fpl, it would be a bug anyway ?
>
> So you probably need something better, like testing fpl->list being
> not re-included in current->scm_work_list before kfree() it
I can't even see what the problem is.
The code is designed such that the ->count only matters for
the top level.
If we recursively fput() and get back here, we'll see that
there is someone higher in the call chain already running
the fput() loop and we'll just list_add_tail().
The inner while() loop will make sure we process such
entries once we get back to the top level and exit the
for() loop.
Amerigo, please show us the problematic code path where the counts go
wrong and this causes problems.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Patch] net: fix incorrect counting in __scm_destroy()
2009-11-04 12:41 ` David Miller
@ 2009-11-10 6:12 ` Cong Wang
2009-11-10 6:33 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2009-11-10 6:12 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, linux-kernel, netdev
David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 04 Nov 2009 11:29:05 +0100
>
>> Given we kfree(fpl) at the end of loop, we cannot recursively call
>> __scm_destroy() on same fpl, it would be a bug anyway ?
>>
>> So you probably need something better, like testing fpl->list being
>> not re-included in current->scm_work_list before kfree() it
>
> I can't even see what the problem is.
>
> The code is designed such that the ->count only matters for
> the top level.
>
> If we recursively fput() and get back here, we'll see that
> there is someone higher in the call chain already running
> the fput() loop and we'll just list_add_tail().
>
> The inner while() loop will make sure we process such
> entries once we get back to the top level and exit the
> for() loop.
>
> Amerigo, please show us the problematic code path where the counts go
> wrong and this causes problems.
Hi, all.
Thanks for your replies.
I met a soft lockup around this code on ia64, something like:
[<a0000001006394e0>] unix_gc+0x240/0x760
sp=e0000260f002fd70 bsp=e0000260f0029560
[<a000000100634500>] unix_release_sock+0x440/0x460
sp=e0000260f002fdb0 bsp=e0000260f0029508
[<a000000100634560>] unix_release+0x40/0x60
sp=e0000260f002fdb0 bsp=e0000260f00294e8
[<a00000010051fba0>] sock_release+0x80/0x1c0
sp=e0000260f002fdb0 bsp=e0000260f00294c0
[<a00000010051fd60>] sock_close+0x80/0xa0
sp=e0000260f002fdc0 bsp=e0000260f0029498
[<a000000100172280>] __fput+0x1a0/0x420
sp=e0000260f002fdc0 bsp=e0000260f0029458
[<a000000100172540>] fput+0x40/0x60
sp=e0000260f002fdc0 bsp=e0000260f0029438
[<a000000100534a30>] __scm_destroy+0x130/0x1e0
sp=e0000260f002fdc0 bsp=e0000260f0029410
[<a000000100636370>] unix_destruct_fds+0x70/0xa0
sp=e0000260f002fdd0 bsp=e0000260f00293e8
[<a00000010052da30>] __kfree_skb+0x1f0/0x320
sp=e0000260f002fe00 bsp=e0000260f00293c0
[<a00000010052dbf0>] kfree_skb+0x90/0xc0
sp=e0000260f002fe00 bsp=e0000260f00293a0
[<a000000100634420>] unix_release_sock+0x360/0x460
sp=e0000260f002fe00 bsp=e0000260f0029348
[<a000000100634560>] unix_release+0x40/0x60
sp=e0000260f002fe00 bsp=e0000260f0029328
[<a00000010051fba0>] sock_release+0x80/0x1c0
sp=e0000260f002fe00 bsp=e0000260f0029300
[<a00000010051fd60>] sock_close+0x80/0xa0
sp=e0000260f002fe10 bsp=e0000260f00292d8
[<a000000100172280>] __fput+0x1a0/0x420
sp=e0000260f002fe10 bsp=e0000260f0029298
[<a000000100172540>] fput+0x40/0x60
sp=e0000260f002fe10 bsp=e0000260f0029278
Yes, this even happens after commit f8d570a47.
But after doing a bisect, we found another hrtimer patch fixes this
problem, so it's not a bug of __scm_destroy().
Sorry for the noise.
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-11-10 6:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-04 10:04 [Patch] net: fix incorrect counting in __scm_destroy() Amerigo Wang
2009-11-04 10:29 ` Eric Dumazet
2009-11-04 12:41 ` David Miller
2009-11-10 6:12 ` Cong Wang
2009-11-10 6:33 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).