netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RCU problems in fib_table_insert
@ 2010-03-21 20:25 Andi Kleen
  2010-03-21 21:25 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andi Kleen @ 2010-03-21 20:25 UTC (permalink / raw)
  To: robert.olsson, netdev; +Cc: paulmck

Hi,

I got the following warning at boot with a 2.6.34-rc2ish git kernel
with RCU debugging and preemption enabled.

It seems the problem is that not all callers of fib_find_node
call it with rcu_read_lock() to stabilize access to the fib. 

I tried to fix it, but especially for fib_table_insert() that's rather 
tricky: it does a lot of memory allocations and also route flushing and 
other blocking operations while assuming the original fa is RCU stable.

I first tried to move some allocations to the beginning and keep
preemption disabled in the rest, but it's difficult with all of them.
No patch because of that.

Does the fa need an additional reference count for this problem?
Or perhaps some optimistic locking?

-Andi


==================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
/home/lsrc/git/linux-2.6/net/ipv4/fib_trie.c:964 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ip/4521:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816466af>] rtnetlink_rcv+0x1f/0x40
 #1:  ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8107cde7>] __blocking_notifier_call_chain+0x47/0x90

stack backtrace:
Pid: 4521, comm: ip Not tainted 2.6.34-rc2 #5
Call Trace:
 [<ffffffff8108b7e9>] lockdep_rcu_dereference+0xb9/0xc0
 [<ffffffff81696a05>] fib_find_node+0x185/0x1b0
 [<ffffffff8101155f>] ? save_stack_trace+0x2f/0x50
 [<ffffffff81699b1c>] fib_table_insert+0xdc/0xa90
 [<ffffffff8107cde7>] ? __blocking_notifier_call_chain+0x47/0x90
 [<ffffffff8108edb5>] ? __lock_acquire+0x1485/0x1d50
 [<ffffffff816926b0>] fib_magic+0xc0/0xd0
 [<ffffffff81692738>] fib_add_ifaddr+0x78/0x1a0
 [<ffffffff81692e60>] fib_inetaddr_event+0x50/0x2a0
 [<ffffffff8173152d>] notifier_call_chain+0x6d/0xb0
 [<ffffffff8107cdfd>] __blocking_notifier_call_chain+0x5d/0x90
 [<ffffffff8107ce46>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff81688c0a>] __inet_insert_ifa+0xea/0x180
 [<ffffffff8168971d>] inetdev_event+0x43d/0x490
 [<ffffffff8173152d>] notifier_call_chain+0x6d/0xb0
 [<ffffffff8107cb06>] raw_notifier_call_chain+0x16/0x20
 [<ffffffff81639f00>] __dev_notify_flags+0x40/0xa0
 [<ffffffff81639fa5>] dev_change_flags+0x45/0x70
 [<ffffffff81645c2c>] do_setlink+0x2fc/0x4a0
 [<ffffffff81294176>] ? nla_parse+0x36/0x110
 [<ffffffff81646d54>] rtnl_newlink+0x444/0x540
 [<ffffffff8108c44d>] ? mark_held_locks+0x6d/0x90
 [<ffffffff8172b8c5>] ? mutex_lock_nested+0x335/0x3c0
 [<ffffffff8164685e>] rtnetlink_rcv_msg+0x18e/0x240
 [<ffffffff816466d0>] ? rtnetlink_rcv_msg+0x0/0x240
 [<ffffffff816520b9>] netlink_rcv_skb+0x89/0xb0
 [<ffffffff816466be>] rtnetlink_rcv+0x2e/0x40
 [<ffffffff81651b6b>] ? netlink_unicast+0x11b/0x2f0
 [<ffffffff81651d2c>] netlink_unicast+0x2dc/0x2f0
 [<ffffffff81630a3c>] ? memcpy_fromiovec+0x7c/0xa0
 [<ffffffff81652643>] netlink_sendmsg+0x1d3/0x2e0
 [<ffffffff81624e20>] sock_sendmsg+0xc0/0xf0
 [<ffffffff8108f9cd>] ? lock_release_non_nested+0x9d/0x340
 [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
 [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
 [<ffffffff810fa386>] ? might_fault+0xc6/0xd0
 [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
 [<ffffffff81630bfc>] ? verify_iovec+0x4c/0xe0
 [<ffffffff81625c3e>] sys_sendmsg+0x1ae/0x360
 [<ffffffff810fadf9>] ? __do_fault+0x3f9/0x550
 [<ffffffff810fd143>] ? handle_mm_fault+0x1a3/0x790
 [<ffffffff8112cc77>] ? fget_light+0xe7/0x2f0
 [<ffffffff8108c735>] ? trace_hardirqs_on_caller+0x135/0x180
 [<ffffffff8172ccc2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff810030db>] system_call_fastpath+0x16/0x1b





-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-21 20:25 RCU problems in fib_table_insert Andi Kleen
@ 2010-03-21 21:25 ` Eric Dumazet
  2010-03-21 21:38   ` Paul E. McKenney
  2010-03-21 21:37 ` Paul E. McKenney
  2010-03-22  6:18 ` Robert Olsson
  2 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-03-21 21:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: robert.olsson, netdev, paulmck

Le dimanche 21 mars 2010 à 21:25 +0100, Andi Kleen a écrit :
> Hi,
> 
> I got the following warning at boot with a 2.6.34-rc2ish git kernel
> with RCU debugging and preemption enabled.
> 
> It seems the problem is that not all callers of fib_find_node
> call it with rcu_read_lock() to stabilize access to the fib. 
> 
> I tried to fix it, but especially for fib_table_insert() that's rather 
> tricky: it does a lot of memory allocations and also route flushing and 
> other blocking operations while assuming the original fa is RCU stable.
> 
> I first tried to move some allocations to the beginning and keep
> preemption disabled in the rest, but it's difficult with all of them.
> No patch because of that.
> 
> Does the fa need an additional reference count for this problem?
> Or perhaps some optimistic locking?
> 
> -Andi

No real changes needed, only a lockdep warning...

Probably a rcu_dereference() should be changed to
rcu_dereference_check() like we did for __in6_dev_get()

We hold RTNL or rcu_read_lock

[PATCH] net: fib_find_node() rcu check

We hold rcu read lock or RTNL when fib_find_node() is called.
Shutup lockdep complain.

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index af5d897..471fe07 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -961,7 +961,8 @@ fib_find_node(struct trie *t, u32 key)
 	struct node *n;
 
 	pos = 0;
-	n = rcu_dereference(t->trie);
+	n = rcu_dereference_check(t->trie,
+				  rcu_read_lock_held() || lockdep_rtnl_is_held());
 
 	while (n != NULL &&  NODE_TYPE(n) == T_TNODE) {
 		tn = (struct tnode *) n;



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-21 20:25 RCU problems in fib_table_insert Andi Kleen
  2010-03-21 21:25 ` Eric Dumazet
@ 2010-03-21 21:37 ` Paul E. McKenney
  2010-03-22  1:01   ` David Miller
  2010-03-22  6:18 ` Robert Olsson
  2 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2010-03-21 21:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: robert.olsson, netdev

On Sun, Mar 21, 2010 at 09:25:25PM +0100, Andi Kleen wrote:
> Hi,
> 
> I got the following warning at boot with a 2.6.34-rc2ish git kernel
> with RCU debugging and preemption enabled.
> 
> It seems the problem is that not all callers of fib_find_node
> call it with rcu_read_lock() to stabilize access to the fib. 
> 
> I tried to fix it, but especially for fib_table_insert() that's rather 
> tricky: it does a lot of memory allocations and also route flushing and 
> other blocking operations while assuming the original fa is RCU stable.
> 
> I first tried to move some allocations to the beginning and keep
> preemption disabled in the rest, but it's difficult with all of them.
> No patch because of that.
> 
> Does the fa need an additional reference count for this problem?
> Or perhaps some optimistic locking?
> 
> -Andi
> 
> 
> ==================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> /home/lsrc/git/linux-2.6/net/ipv4/fib_trie.c:964 invoked rcu_dereference_check() without protection!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 2 locks held by ip/4521:
>  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816466af>] rtnetlink_rcv+0x1f/0x40
>  #1:  ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8107cde7>] __blocking_notifier_call_chain+0x47/0x90

Looks to me like a false positive: If I rememeber correctly, it is OK
to invoke the fib-trie functions either inside an RCU read-side critical
section or with RTNL held.  However, I must defer to the networking guys.
For one thing, things might have changed since I last looked at this code.

But if I am correct, the following patch should work.  If I am wrong,
this patch will instead incorrectly enforce my misconceptions.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

net: suppress lockdep-RCU false positive in FIB trie.

Allow fib_find_node() to be called either under rcu_read_lock()
protection or with RTNL held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 fib_trie.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index af5d897..01ef8ba 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -961,7 +961,9 @@ fib_find_node(struct trie *t, u32 key)
 	struct node *n;
 
 	pos = 0;
-	n = rcu_dereference(t->trie);
+	n = rcu_dereference_check(t->trie,
+				  rcu_read_lock_held() ||
+				  lockdep_rtnl_is_held());
 
 	while (n != NULL &&  NODE_TYPE(n) == T_TNODE) {
 		tn = (struct tnode *) n;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-21 21:25 ` Eric Dumazet
@ 2010-03-21 21:38   ` Paul E. McKenney
  2010-03-21 21:49     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2010-03-21 21:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, robert.olsson, netdev

On Sun, Mar 21, 2010 at 10:25:52PM +0100, Eric Dumazet wrote:
> Le dimanche 21 mars 2010 à 21:25 +0100, Andi Kleen a écrit :
> > Hi,
> > 
> > I got the following warning at boot with a 2.6.34-rc2ish git kernel
> > with RCU debugging and preemption enabled.
> > 
> > It seems the problem is that not all callers of fib_find_node
> > call it with rcu_read_lock() to stabilize access to the fib. 
> > 
> > I tried to fix it, but especially for fib_table_insert() that's rather 
> > tricky: it does a lot of memory allocations and also route flushing and 
> > other blocking operations while assuming the original fa is RCU stable.
> > 
> > I first tried to move some allocations to the beginning and keep
> > preemption disabled in the rest, but it's difficult with all of them.
> > No patch because of that.
> > 
> > Does the fa need an additional reference count for this problem?
> > Or perhaps some optimistic locking?
> > 
> > -Andi
> 
> No real changes needed, only a lockdep warning...
> 
> Probably a rcu_dereference() should be changed to
> rcu_dereference_check() like we did for __in6_dev_get()
> 
> We hold RTNL or rcu_read_lock
> 
> [PATCH] net: fib_find_node() rcu check
> 
> We hold rcu read lock or RTNL when fib_find_node() is called.
> Shutup lockdep complain.

You beat me to it, Eric.  ;-)

So:

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Reported-by: Andi Kleen <andi@firstfloor.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index af5d897..471fe07 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -961,7 +961,8 @@ fib_find_node(struct trie *t, u32 key)
>  	struct node *n;
> 
>  	pos = 0;
> -	n = rcu_dereference(t->trie);
> +	n = rcu_dereference_check(t->trie,
> +				  rcu_read_lock_held() || lockdep_rtnl_is_held());
> 
>  	while (n != NULL &&  NODE_TYPE(n) == T_TNODE) {
>  		tn = (struct tnode *) n;
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-21 21:38   ` Paul E. McKenney
@ 2010-03-21 21:49     ` Eric Dumazet
  2010-03-21 22:57       ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-03-21 21:49 UTC (permalink / raw)
  To: paulmck; +Cc: Andi Kleen, robert.olsson, netdev

Le dimanche 21 mars 2010 à 14:38 -0700, Paul E. McKenney a écrit :

> You beat me to it, Eric.  ;-)
> 
> So:
> 
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Well, to be fair, I typed less text than you did ;)

Thanks



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-21 21:49     ` Eric Dumazet
@ 2010-03-21 22:57       ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2010-03-21 22:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andi Kleen, robert.olsson, netdev

On Sun, Mar 21, 2010 at 10:49:46PM +0100, Eric Dumazet wrote:
> Le dimanche 21 mars 2010 à 14:38 -0700, Paul E. McKenney a écrit :
> 
> > You beat me to it, Eric.  ;-)
> > 
> > So:
> > 
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Well, to be fair, I typed less text than you did ;)

;-) ;-) ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-21 21:37 ` Paul E. McKenney
@ 2010-03-22  1:01   ` David Miller
  2010-03-22  6:51     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2010-03-22  1:01 UTC (permalink / raw)
  To: paulmck; +Cc: andi, robert.olsson, netdev

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Sun, 21 Mar 2010 14:37:03 -0700

> net: suppress lockdep-RCU false positive in FIB trie.
> 
> Allow fib_find_node() to be called either under rcu_read_lock()
> protection or with RTNL held.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Thanks guys, I applied this copy and added Eric's signoff
since the two patches were essentially identical :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RCU problems in fib_table_insert
  2010-03-21 20:25 RCU problems in fib_table_insert Andi Kleen
  2010-03-21 21:25 ` Eric Dumazet
  2010-03-21 21:37 ` Paul E. McKenney
@ 2010-03-22  6:18 ` Robert Olsson
  2010-03-22  6:51   ` Paul E. McKenney
  2 siblings, 1 reply; 12+ messages in thread
From: Robert Olsson @ 2010-03-22  6:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: robert.olsson, netdev, paulmck



Seems like Paul and Eric fixed this problem... We use fib_trie with 
major infrastructure but always disable preempt. It was unsafe w.
preempt at least before Jareks P. patches about a year ago. I havn't
tested w. preempt after that but maybe someone else have...

Cheers
					--ro

Andi Kleen writes:
 > Hi,
 > 
 > I got the following warning at boot with a 2.6.34-rc2ish git kernel
 > with RCU debugging and preemption enabled.
 > 
 > It seems the problem is that not all callers of fib_find_node
 > call it with rcu_read_lock() to stabilize access to the fib. 
 > 
 > I tried to fix it, but especially for fib_table_insert() that's rather 
 > tricky: it does a lot of memory allocations and also route flushing and 
 > other blocking operations while assuming the original fa is RCU stable.
 > 
 > I first tried to move some allocations to the beginning and keep
 > preemption disabled in the rest, but it's difficult with all of them.
 > No patch because of that.
 > 
 > Does the fa need an additional reference count for this problem?
 > Or perhaps some optimistic locking?
 > 
 > -Andi
 > 
 > 
 > ==================================================
 > [ INFO: suspicious rcu_dereference_check() usage. ]
 > ---------------------------------------------------
 > /home/lsrc/git/linux-2.6/net/ipv4/fib_trie.c:964 invoked rcu_dereference_check() without protection!
 > 
 > other info that might help us debug this:
 > 
 > 
 > rcu_scheduler_active = 1, debug_locks = 0
 > 2 locks held by ip/4521:
 >  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816466af>] rtnetlink_rcv+0x1f/0x40
 >  #1:  ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8107cde7>] __blocking_notifier_call_chain+0x47/0x90
 > 
 > stack backtrace:
 > Pid: 4521, comm: ip Not tainted 2.6.34-rc2 #5
 > Call Trace:
 >  [<ffffffff8108b7e9>] lockdep_rcu_dereference+0xb9/0xc0
 >  [<ffffffff81696a05>] fib_find_node+0x185/0x1b0
 >  [<ffffffff8101155f>] ? save_stack_trace+0x2f/0x50
 >  [<ffffffff81699b1c>] fib_table_insert+0xdc/0xa90
 >  [<ffffffff8107cde7>] ? __blocking_notifier_call_chain+0x47/0x90
 >  [<ffffffff8108edb5>] ? __lock_acquire+0x1485/0x1d50
 >  [<ffffffff816926b0>] fib_magic+0xc0/0xd0
 >  [<ffffffff81692738>] fib_add_ifaddr+0x78/0x1a0
 >  [<ffffffff81692e60>] fib_inetaddr_event+0x50/0x2a0
 >  [<ffffffff8173152d>] notifier_call_chain+0x6d/0xb0
 >  [<ffffffff8107cdfd>] __blocking_notifier_call_chain+0x5d/0x90
 >  [<ffffffff8107ce46>] blocking_notifier_call_chain+0x16/0x20
 >  [<ffffffff81688c0a>] __inet_insert_ifa+0xea/0x180
 >  [<ffffffff8168971d>] inetdev_event+0x43d/0x490
 >  [<ffffffff8173152d>] notifier_call_chain+0x6d/0xb0
 >  [<ffffffff8107cb06>] raw_notifier_call_chain+0x16/0x20
 >  [<ffffffff81639f00>] __dev_notify_flags+0x40/0xa0
 >  [<ffffffff81639fa5>] dev_change_flags+0x45/0x70
 >  [<ffffffff81645c2c>] do_setlink+0x2fc/0x4a0
 >  [<ffffffff81294176>] ? nla_parse+0x36/0x110
 >  [<ffffffff81646d54>] rtnl_newlink+0x444/0x540
 >  [<ffffffff8108c44d>] ? mark_held_locks+0x6d/0x90
 >  [<ffffffff8172b8c5>] ? mutex_lock_nested+0x335/0x3c0
 >  [<ffffffff8164685e>] rtnetlink_rcv_msg+0x18e/0x240
 >  [<ffffffff816466d0>] ? rtnetlink_rcv_msg+0x0/0x240
 >  [<ffffffff816520b9>] netlink_rcv_skb+0x89/0xb0
 >  [<ffffffff816466be>] rtnetlink_rcv+0x2e/0x40
 >  [<ffffffff81651b6b>] ? netlink_unicast+0x11b/0x2f0
 >  [<ffffffff81651d2c>] netlink_unicast+0x2dc/0x2f0
 >  [<ffffffff81630a3c>] ? memcpy_fromiovec+0x7c/0xa0
 >  [<ffffffff81652643>] netlink_sendmsg+0x1d3/0x2e0
 >  [<ffffffff81624e20>] sock_sendmsg+0xc0/0xf0
 >  [<ffffffff8108f9cd>] ? lock_release_non_nested+0x9d/0x340
 >  [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
 >  [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
 >  [<ffffffff810fa386>] ? might_fault+0xc6/0xd0
 >  [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
 >  [<ffffffff81630bfc>] ? verify_iovec+0x4c/0xe0
 >  [<ffffffff81625c3e>] sys_sendmsg+0x1ae/0x360
 >  [<ffffffff810fadf9>] ? __do_fault+0x3f9/0x550
 >  [<ffffffff810fd143>] ? handle_mm_fault+0x1a3/0x790
 >  [<ffffffff8112cc77>] ? fget_light+0xe7/0x2f0
 >  [<ffffffff8108c735>] ? trace_hardirqs_on_caller+0x135/0x180
 >  [<ffffffff8172ccc2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 >  [<ffffffff810030db>] system_call_fastpath+0x16/0x1b
 > 
 > 
 > 
 > 
 > 
 > -- 
 > ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-22  6:18 ` Robert Olsson
@ 2010-03-22  6:51   ` Paul E. McKenney
  2010-04-07  6:10     ` Robert Olsson
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2010-03-22  6:51 UTC (permalink / raw)
  To: Robert Olsson; +Cc: Andi Kleen, robert.olsson, netdev

On Mon, Mar 22, 2010 at 07:18:34AM +0100, Robert Olsson wrote:
> 
> Seems like Paul and Eric fixed this problem... We use fib_trie with 
> major infrastructure but always disable preempt. It was unsafe w.
> preempt at least before Jareks P. patches about a year ago. I havn't
> tested w. preempt after that but maybe someone else have...

Well, if some code path fails either to do rcu_read_lock() or
to acquire RTNL, we will see lockdep splats.

Though I must admit that I would be surprised if there wasn't
more adjustment required in net/ipv4/fib_trie.c -- lots of
rcu_dereference()s in there.

						Thanx, Paul

> Cheers
> 					--ro
> 
> Andi Kleen writes:
>  > Hi,
>  > 
>  > I got the following warning at boot with a 2.6.34-rc2ish git kernel
>  > with RCU debugging and preemption enabled.
>  > 
>  > It seems the problem is that not all callers of fib_find_node
>  > call it with rcu_read_lock() to stabilize access to the fib. 
>  > 
>  > I tried to fix it, but especially for fib_table_insert() that's rather 
>  > tricky: it does a lot of memory allocations and also route flushing and 
>  > other blocking operations while assuming the original fa is RCU stable.
>  > 
>  > I first tried to move some allocations to the beginning and keep
>  > preemption disabled in the rest, but it's difficult with all of them.
>  > No patch because of that.
>  > 
>  > Does the fa need an additional reference count for this problem?
>  > Or perhaps some optimistic locking?
>  > 
>  > -Andi
>  > 
>  > 
>  > ==================================================
>  > [ INFO: suspicious rcu_dereference_check() usage. ]
>  > ---------------------------------------------------
>  > /home/lsrc/git/linux-2.6/net/ipv4/fib_trie.c:964 invoked rcu_dereference_check() without protection!
>  > 
>  > other info that might help us debug this:
>  > 
>  > 
>  > rcu_scheduler_active = 1, debug_locks = 0
>  > 2 locks held by ip/4521:
>  >  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816466af>] rtnetlink_rcv+0x1f/0x40
>  >  #1:  ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8107cde7>] __blocking_notifier_call_chain+0x47/0x90
>  > 
>  > stack backtrace:
>  > Pid: 4521, comm: ip Not tainted 2.6.34-rc2 #5
>  > Call Trace:
>  >  [<ffffffff8108b7e9>] lockdep_rcu_dereference+0xb9/0xc0
>  >  [<ffffffff81696a05>] fib_find_node+0x185/0x1b0
>  >  [<ffffffff8101155f>] ? save_stack_trace+0x2f/0x50
>  >  [<ffffffff81699b1c>] fib_table_insert+0xdc/0xa90
>  >  [<ffffffff8107cde7>] ? __blocking_notifier_call_chain+0x47/0x90
>  >  [<ffffffff8108edb5>] ? __lock_acquire+0x1485/0x1d50
>  >  [<ffffffff816926b0>] fib_magic+0xc0/0xd0
>  >  [<ffffffff81692738>] fib_add_ifaddr+0x78/0x1a0
>  >  [<ffffffff81692e60>] fib_inetaddr_event+0x50/0x2a0
>  >  [<ffffffff8173152d>] notifier_call_chain+0x6d/0xb0
>  >  [<ffffffff8107cdfd>] __blocking_notifier_call_chain+0x5d/0x90
>  >  [<ffffffff8107ce46>] blocking_notifier_call_chain+0x16/0x20
>  >  [<ffffffff81688c0a>] __inet_insert_ifa+0xea/0x180
>  >  [<ffffffff8168971d>] inetdev_event+0x43d/0x490
>  >  [<ffffffff8173152d>] notifier_call_chain+0x6d/0xb0
>  >  [<ffffffff8107cb06>] raw_notifier_call_chain+0x16/0x20
>  >  [<ffffffff81639f00>] __dev_notify_flags+0x40/0xa0
>  >  [<ffffffff81639fa5>] dev_change_flags+0x45/0x70
>  >  [<ffffffff81645c2c>] do_setlink+0x2fc/0x4a0
>  >  [<ffffffff81294176>] ? nla_parse+0x36/0x110
>  >  [<ffffffff81646d54>] rtnl_newlink+0x444/0x540
>  >  [<ffffffff8108c44d>] ? mark_held_locks+0x6d/0x90
>  >  [<ffffffff8172b8c5>] ? mutex_lock_nested+0x335/0x3c0
>  >  [<ffffffff8164685e>] rtnetlink_rcv_msg+0x18e/0x240
>  >  [<ffffffff816466d0>] ? rtnetlink_rcv_msg+0x0/0x240
>  >  [<ffffffff816520b9>] netlink_rcv_skb+0x89/0xb0
>  >  [<ffffffff816466be>] rtnetlink_rcv+0x2e/0x40
>  >  [<ffffffff81651b6b>] ? netlink_unicast+0x11b/0x2f0
>  >  [<ffffffff81651d2c>] netlink_unicast+0x2dc/0x2f0
>  >  [<ffffffff81630a3c>] ? memcpy_fromiovec+0x7c/0xa0
>  >  [<ffffffff81652643>] netlink_sendmsg+0x1d3/0x2e0
>  >  [<ffffffff81624e20>] sock_sendmsg+0xc0/0xf0
>  >  [<ffffffff8108f9cd>] ? lock_release_non_nested+0x9d/0x340
>  >  [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
>  >  [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
>  >  [<ffffffff810fa386>] ? might_fault+0xc6/0xd0
>  >  [<ffffffff810fa33b>] ? might_fault+0x7b/0xd0
>  >  [<ffffffff81630bfc>] ? verify_iovec+0x4c/0xe0
>  >  [<ffffffff81625c3e>] sys_sendmsg+0x1ae/0x360
>  >  [<ffffffff810fadf9>] ? __do_fault+0x3f9/0x550
>  >  [<ffffffff810fd143>] ? handle_mm_fault+0x1a3/0x790
>  >  [<ffffffff8112cc77>] ? fget_light+0xe7/0x2f0
>  >  [<ffffffff8108c735>] ? trace_hardirqs_on_caller+0x135/0x180
>  >  [<ffffffff8172ccc2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  >  [<ffffffff810030db>] system_call_fastpath+0x16/0x1b
>  > 
>  > 
>  > 
>  > 
>  > 
>  > -- 
>  > ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-22  1:01   ` David Miller
@ 2010-03-22  6:51     ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2010-03-22  6:51 UTC (permalink / raw)
  To: David Miller; +Cc: andi, robert.olsson, netdev

On Sun, Mar 21, 2010 at 06:01:40PM -0700, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Sun, 21 Mar 2010 14:37:03 -0700
> 
> > net: suppress lockdep-RCU false positive in FIB trie.
> > 
> > Allow fib_find_node() to be called either under rcu_read_lock()
> > protection or with RTNL held.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Thanks guys, I applied this copy and added Eric's signoff
> since the two patches were essentially identical :-)

Great minds thinking alike and all that.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-03-22  6:51   ` Paul E. McKenney
@ 2010-04-07  6:10     ` Robert Olsson
  2010-04-07  6:45       ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Olsson @ 2010-04-07  6:10 UTC (permalink / raw)
  To: paulmck; +Cc: Robert Olsson, Andi Kleen, robert.olsson, netdev, Jens.Laas


Paul E. McKenney writes:
 > On Mon, Mar 22, 2010 at 07:18:34AM +0100, Robert Olsson wrote:
 > > 
 > > Seems like Paul and Eric fixed this problem... We use fib_trie with 
 > > major infrastructure but always disable preempt. It was unsafe w.
 > > preempt at least before Jareks P. patches about a year ago. I havn't
 > > tested w. preempt after that but maybe someone else have...


 > Though I must admit that I would be surprised if there wasn't
 > more adjustment required in net/ipv4/fib_trie.c -- lots of
 > rcu_dereference()s in there.

 Hi, a follow on this thread. 

 Maybe I was to pessimistic... we've setup  for stress test running during 
 easter vacation

 Testing the fib_trie with preempt enabled. Continuesly loading/flushing  
 "full BGP" via a test script, while routing without the route cache 
 to further stress locking. Average load about 300 kpps. Some more test 
 details below.

 The test was manually stopped after 6 days. So it seems like preempt/rcu 
 has been improved with fib_trie.
 
 Thanks
					--ro

HW
--
CPU  Opteron 2 * 6174, TYAN S8230. Intel 82599

Kernel from net-next-2.6
------------------------
Version 2.6.34-rc1bifrost-x86_64 (gcc version 4.3.2 (GCC) ) #4 SMP PREEMPT
CONFIG_PREEMPT=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_IP_FIB_TRIE=y
CONFIG_FIB_RULES=y
CONFIG_TREE_RCU=y
CONFIG_RCU_FANOUT=64

IP table
--------
"BGP" 279 k routes

Test duration
-------------
09:34:14 up 6 days, 20:57,  4 users,  load average: 5.28, 5.87, 5.53

Test script
-----------
#! /bin/bash
while(true) do
  ip -batch /etc/inet_route_add.bat; 
  ip route list | wc -l;  ( 279 k routes )
  ifconfig eth1 down 
  ifconfig eth1 10.10.11.1 netmask 255.255.255.0
  ip route list | wc -l;  ( 3 routes )
done;

/proc/net/softnet_stat

0000d503 00000000 00000014 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4cc1ecfb 00000000 02bb1bcb 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4cb95da1 00000000 02cc6574 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4cb17932 00000000 02e1abdf 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4cc40ad9 00000000 02f53db0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4cbfe34f 00000000 0306a88d 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4ccb10e9 00000000 03d3a11d 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4ca84c29 00000000 03be0ca1 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4cb710f8 00000000 04070289 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b505d55 00000000 038a9bc0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b54a7be 00000000 03a5de6d 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b5be81f 00000000 03be018c 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b51fdd0 00000000 0559fdd6 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b4fa6fd 00000000 056311dc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b4a61d3 00000000 056a9276 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0b454970 00000000 0572add1 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

FYI. CPU0 is resvered for BGP/ssh/stats etc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RCU problems in fib_table_insert
  2010-04-07  6:10     ` Robert Olsson
@ 2010-04-07  6:45       ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2010-04-07  6:45 UTC (permalink / raw)
  To: Robert Olsson; +Cc: Andi Kleen, robert.olsson, netdev, Jens.Laas

On Wed, Apr 07, 2010 at 08:10:13AM +0200, Robert Olsson wrote:
> 
> Paul E. McKenney writes:
>  > On Mon, Mar 22, 2010 at 07:18:34AM +0100, Robert Olsson wrote:
>  > > 
>  > > Seems like Paul and Eric fixed this problem... We use fib_trie with 
>  > > major infrastructure but always disable preempt. It was unsafe w.
>  > > preempt at least before Jareks P. patches about a year ago. I havn't
>  > > tested w. preempt after that but maybe someone else have...
> 
> 
>  > Though I must admit that I would be surprised if there wasn't
>  > more adjustment required in net/ipv4/fib_trie.c -- lots of
>  > rcu_dereference()s in there.
> 
>  Hi, a follow on this thread. 
> 
>  Maybe I was to pessimistic... we've setup  for stress test running during 
>  easter vacation
> 
>  Testing the fib_trie with preempt enabled. Continuesly loading/flushing  
>  "full BGP" via a test script, while routing without the route cache 
>  to further stress locking. Average load about 300 kpps. Some more test 
>  details below.
> 
>  The test was manually stopped after 6 days. So it seems like preempt/rcu 
>  has been improved with fib_trie.

Always happy to be pleasantly surprised.  ;-)

							Thanx, Paul

>  Thanks
> 					--ro
> 
> HW
> --
> CPU  Opteron 2 * 6174, TYAN S8230. Intel 82599
> 
> Kernel from net-next-2.6
> ------------------------
> Version 2.6.34-rc1bifrost-x86_64 (gcc version 4.3.2 (GCC) ) #4 SMP PREEMPT
> CONFIG_PREEMPT=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_IP_FIB_TRIE=y
> CONFIG_FIB_RULES=y
> CONFIG_TREE_RCU=y
> CONFIG_RCU_FANOUT=64
> 
> IP table
> --------
> "BGP" 279 k routes
> 
> Test duration
> -------------
> 09:34:14 up 6 days, 20:57,  4 users,  load average: 5.28, 5.87, 5.53
> 
> Test script
> -----------
> #! /bin/bash
> while(true) do
>   ip -batch /etc/inet_route_add.bat; 
>   ip route list | wc -l;  ( 279 k routes )
>   ifconfig eth1 down 
>   ifconfig eth1 10.10.11.1 netmask 255.255.255.0
>   ip route list | wc -l;  ( 3 routes )
> done;
> 
> /proc/net/softnet_stat
> 
> 0000d503 00000000 00000014 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4cc1ecfb 00000000 02bb1bcb 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4cb95da1 00000000 02cc6574 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4cb17932 00000000 02e1abdf 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4cc40ad9 00000000 02f53db0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4cbfe34f 00000000 0306a88d 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4ccb10e9 00000000 03d3a11d 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4ca84c29 00000000 03be0ca1 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 4cb710f8 00000000 04070289 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b505d55 00000000 038a9bc0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b54a7be 00000000 03a5de6d 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b5be81f 00000000 03be018c 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b51fdd0 00000000 0559fdd6 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b4fa6fd 00000000 056311dc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b4a61d3 00000000 056a9276 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 0b454970 00000000 0572add1 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 
> FYI. CPU0 is resvered for BGP/ssh/stats etc

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-07  6:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-21 20:25 RCU problems in fib_table_insert Andi Kleen
2010-03-21 21:25 ` Eric Dumazet
2010-03-21 21:38   ` Paul E. McKenney
2010-03-21 21:49     ` Eric Dumazet
2010-03-21 22:57       ` Paul E. McKenney
2010-03-21 21:37 ` Paul E. McKenney
2010-03-22  1:01   ` David Miller
2010-03-22  6:51     ` Paul E. McKenney
2010-03-22  6:18 ` Robert Olsson
2010-03-22  6:51   ` Paul E. McKenney
2010-04-07  6:10     ` Robert Olsson
2010-04-07  6:45       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).