netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Freeing alive fib_info caused by ebc0ffae5
@ 2010-11-04 10:23 Michael Ellerman
  2010-11-04 10:30 ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2010-11-04 10:23 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

Hi all,

I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
"Freeing alive fib_info" messages, from free_fib_info().

Actually I only get one per boot, when network interfaces come up.
Seemingly related I am getting refcount problems when I shutdown, ie.
unregister_netdevice() sees a usage count of 1, which never decrements.

Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.

    fib: RCU conversion of fib_lookup()
    
    fib_lookup() converted to be called in RCU protected context, no
    reference taken and released on a contended cache line (fib_clntref)
    

Is this a bug in that commit, or a driver bug exposed?

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freeing alive fib_info caused by ebc0ffae5
  2010-11-04 10:23 Freeing alive fib_info caused by ebc0ffae5 Michael Ellerman
@ 2010-11-04 10:30 ` Eric Dumazet
  2010-11-04 10:46   ` Eric Dumazet
  2010-11-04 11:21   ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2010-11-04 10:30 UTC (permalink / raw)
  To: michael; +Cc: netdev

Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> Hi all,
> 
> I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> "Freeing alive fib_info" messages, from free_fib_info().
> 
> Actually I only get one per boot, when network interfaces come up.
> Seemingly related I am getting refcount problems when I shutdown, ie.
> unregister_netdevice() sees a usage count of 1, which never decrements.
> 
> Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> 
>     fib: RCU conversion of fib_lookup()
>     
>     fib_lookup() converted to be called in RCU protected context, no
>     reference taken and released on a contended cache line (fib_clntref)
>     
> 
> Is this a bug in that commit, or a driver bug exposed?

Hi Michael, thanks for the report (and painful bisection I guess)

Thats hard to say... Is it reproductable on my machine ?

Thanks



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freeing alive fib_info caused by ebc0ffae5
  2010-11-04 10:30 ` Eric Dumazet
@ 2010-11-04 10:46   ` Eric Dumazet
  2010-11-04 11:21   ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2010-11-04 10:46 UTC (permalink / raw)
  To: michael; +Cc: netdev

Le jeudi 04 novembre 2010 à 11:30 +0100, Eric Dumazet a écrit :
> Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> > Hi all,
> > 
> > I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> > "Freeing alive fib_info" messages, from free_fib_info().
> > 
> > Actually I only get one per boot, when network interfaces come up.
> > Seemingly related I am getting refcount problems when I shutdown, ie.
> > unregister_netdevice() sees a usage count of 1, which never decrements.
> > 
> > Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> > 
> >     fib: RCU conversion of fib_lookup()
> >     
> >     fib_lookup() converted to be called in RCU protected context, no
> >     reference taken and released on a contended cache line (fib_clntref)
> >     
> > 
> > Is this a bug in that commit, or a driver bug exposed?
> 
> Hi Michael, thanks for the report (and painful bisection I guess)
> 
> Thats hard to say... Is it reproductable on my machine ?

You could ask a stack trace eventually, this might help to spot the bug.

Thanks

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3e0da3e..8039db0 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -159,6 +159,7 @@ void free_fib_info(struct fib_info *fi)
 {
 	if (fi->fib_dead == 0) {
 		pr_warning("Freeing alive fib_info %p\n", fi);
+		WARN_ON_ONCE(1);
 		return;
 	}
 	change_nexthops(fi) {




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Freeing alive fib_info caused by ebc0ffae5
  2010-11-04 10:30 ` Eric Dumazet
  2010-11-04 10:46   ` Eric Dumazet
@ 2010-11-04 11:21   ` Eric Dumazet
  2010-11-04 11:23     ` Michael Ellerman
  2010-11-04 11:35     ` Michael Ellerman
  1 sibling, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2010-11-04 11:21 UTC (permalink / raw)
  To: michael; +Cc: netdev

Le jeudi 04 novembre 2010 à 11:30 +0100, Eric Dumazet a écrit :
> Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> > Hi all,
> > 
> > I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> > "Freeing alive fib_info" messages, from free_fib_info().
> > 
> > Actually I only get one per boot, when network interfaces come up.
> > Seemingly related I am getting refcount problems when I shutdown, ie.
> > unregister_netdevice() sees a usage count of 1, which never decrements.
> > 
> > Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> > 
> >     fib: RCU conversion of fib_lookup()
> >     
> >     fib_lookup() converted to be called in RCU protected context, no
> >     reference taken and released on a contended cache line (fib_clntref)
> >     
> > 
> > Is this a bug in that commit, or a driver bug exposed?
> 
> Hi Michael, thanks for the report (and painful bisection I guess)
> 
> Thats hard to say... Is it reproductable on my machine ?
> 

Hmm, a review of the code spotted a bug in fib_result_assign()

Please try following patch :

Thanks again !

[PATCH] fib: fib_result_assign() should not change fib refcounts

After commit ebc0ffae5 (RCU conversion of fib_lookup()),
fib_result_assign()  should not change fib refcounts anymore.

Thanks to Michael who did the bisection and bug report.

Reported-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/fib_lookup.h |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index a29edf2..c079cc0 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -47,11 +47,8 @@ extern int fib_detect_death(struct fib_info *fi, int order,
 static inline void fib_result_assign(struct fib_result *res,
 				     struct fib_info *fi)
 {
-	if (res->fi != NULL)
-		fib_info_put(res->fi);
+	/* we used to play games with refcounts, but we now use RCU */
 	res->fi = fi;
-	if (fi != NULL)
-		atomic_inc(&fi->fib_clntref);
 }
 
 #endif /* _FIB_LOOKUP_H */



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Freeing alive fib_info caused by ebc0ffae5
  2010-11-04 11:21   ` Eric Dumazet
@ 2010-11-04 11:23     ` Michael Ellerman
  2010-11-04 11:35     ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2010-11-04 11:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On Thu, 2010-11-04 at 12:21 +0100, Eric Dumazet wrote:
> Le jeudi 04 novembre 2010 à 11:30 +0100, Eric Dumazet a écrit :
> > Le jeudi 04 novembre 2010 à 21:23 +1100, Michael Ellerman a écrit :
> > > Hi all,
> > > 
> > > I'm running Linus' latest or thereabouts (ff8b16d), and I'm seeing
> > > "Freeing alive fib_info" messages, from free_fib_info().
> > > 
> > > Actually I only get one per boot, when network interfaces come up.
> > > Seemingly related I am getting refcount problems when I shutdown, ie.
> > > unregister_netdevice() sees a usage count of 1, which never decrements.
> > > 
> > > Bisect says it's ebc0ffae5 which causes the problem, or makes it appear.
> > > 
> > >     fib: RCU conversion of fib_lookup()
> > >     
> > >     fib_lookup() converted to be called in RCU protected context, no
> > >     reference taken and released on a contended cache line (fib_clntref)
> > >     
> > > 
> > > Is this a bug in that commit, or a driver bug exposed?
> > 
> > Hi Michael, thanks for the report (and painful bisection I guess)
> > 
> > Thats hard to say... Is it reproductable on my machine ?
> > 
> 
> Hmm, a review of the code spotted a bug in fib_result_assign()

Aha, I was just adding some debug in there. Let me test the patch.

cheers


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freeing alive fib_info caused by ebc0ffae5
  2010-11-04 11:21   ` Eric Dumazet
  2010-11-04 11:23     ` Michael Ellerman
@ 2010-11-04 11:35     ` Michael Ellerman
  2010-11-04 19:06       ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2010-11-04 11:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

On Thu, 2010-11-04 at 12:21 +0100, Eric Dumazet wrote:
> 
> Hmm, a review of the code spotted a bug in fib_result_assign()
> 
> Please try following patch :
> 
> Thanks again !
> 
> [PATCH] fib: fib_result_assign() should not change fib refcounts
> 
> After commit ebc0ffae5 (RCU conversion of fib_lookup()),
> fib_result_assign()  should not change fib refcounts anymore.
> 
> Thanks to Michael who did the bisection and bug report.
> 
> Reported-by: Michael Ellerman <michael@ellerman.id.au>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/fib_lookup.h |    5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
> index a29edf2..c079cc0 100644
> --- a/net/ipv4/fib_lookup.h
> +++ b/net/ipv4/fib_lookup.h
> @@ -47,11 +47,8 @@ extern int fib_detect_death(struct fib_info *fi, int order,
>  static inline void fib_result_assign(struct fib_result *res,
>  				     struct fib_info *fi)
>  {
> -	if (res->fi != NULL)
> -		fib_info_put(res->fi);
> +	/* we used to play games with refcounts, but we now use RCU */
>  	res->fi = fi;
> -	if (fi != NULL)
> -		atomic_inc(&fi->fib_clntref);
>  }
>  
>  #endif /* _FIB_LOOKUP_H */

Perfect, that fixes it, thanks!

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Freeing alive fib_info caused by ebc0ffae5
  2010-11-04 11:35     ` Michael Ellerman
@ 2010-11-04 19:06       ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2010-11-04 19:06 UTC (permalink / raw)
  To: michael; +Cc: eric.dumazet, netdev

From: Michael Ellerman <michael@ellerman.id.au>
Date: Thu, 04 Nov 2010 22:35:26 +1100

> On Thu, 2010-11-04 at 12:21 +0100, Eric Dumazet wrote:
>> [PATCH] fib: fib_result_assign() should not change fib refcounts
>> 
>> After commit ebc0ffae5 (RCU conversion of fib_lookup()),
>> fib_result_assign()  should not change fib refcounts anymore.
>> 
>> Thanks to Michael who did the bisection and bug report.
 ...
> Perfect, that fixes it, thanks!

Applied, thanks everyone!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-04 19:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-04 10:23 Freeing alive fib_info caused by ebc0ffae5 Michael Ellerman
2010-11-04 10:30 ` Eric Dumazet
2010-11-04 10:46   ` Eric Dumazet
2010-11-04 11:21   ` Eric Dumazet
2010-11-04 11:23     ` Michael Ellerman
2010-11-04 11:35     ` Michael Ellerman
2010-11-04 19:06       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).