netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ?
@ 2012-08-10  9:27 Eric Dumazet
  2012-08-10 10:42 ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-08-10  9:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric W. Biederman

NETDEV_UNREGISTER_BATCH seems unused we can probably remove it.

I am tracking a device refcount issue, delaying net device dismantle by
1 second in netdev_wait_allrefs()

I guess we need to add a notifier called _after_ the final
synchronize_net() in rollback_registered_many()

[   52.022261] calling rtnetlink_dev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022266] calling fib_rules_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022270] calling arp_netdev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022274] calling ip_netdev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022277] calling fib_netdev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022280] calling ip_mr_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022284] calling ndisc_netdev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022287] calling ip6_route_dev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022290] calling ipv6_dev_notf msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022293] calling packet_netdev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022297] calling dst_dev_notifier msg/val 9 data ffff880610bea000 refcnt 15->15
[   52.022331] calling rtnetlink_dev_notifier msg/val 2 data ffff880610bea000 refcnt 15->15
[   52.022335] calling fib_rules_notifier msg/val 2 data ffff880610bea000 refcnt 15->15
[   52.022340] calling arp_netdev_notifier msg/val 2 data ffff880610bea000 refcnt 15->15
[   52.022350] calling ip_netdev_notifier msg/val 2 data ffff880610bea000 refcnt 15->15
[   52.022374] arp: arp_ifdown dev ffff880610bea000 refcnt 15
[   52.022382] calling fib_netdev_notifier msg/val 2 data ffff880610bea000 refcnt 15->14
[   52.022931] IPv4: rt_free(ffff8802fee7e180)
[   52.023394] calling ip_mr_notifier msg/val 2 data ffff880610bea000 refcnt 14->14
[   52.023402] IPv6: dst_free(ffff88060b680480) refcnt 0
[   52.023404] IPv6: dst_free(ffff880307f7cf00) refcnt 0
[   52.023407] IPv6: dst_free(ffff880307f7cd80) refcnt 0
[   52.023409] IPv6: dst_free(ffff880307f7cc00) refcnt 0
[   52.023410] IPv6: dst_free(ffff88060b680180) refcnt 0
[   52.023412] IPv6: dst_free(ffff88060810c780) refcnt 0
[   52.023414] IPv6: dst_free(ffff88060810c600) refcnt 0
[   52.023415] IPv6: dst_free(ffff88060810c480) refcnt 0
[   52.023421] calling ndisc_netdev_notifier msg/val 2 data ffff880610bea000 refcnt 14->14
[   52.023424] calling ip6_route_dev_notifier msg/val 2 data ffff880610bea000 refcnt 14->14
[   52.023432] calling ipv6_dev_notf msg/val 2 data ffff880610bea000 refcnt 14->14
[   52.023436] calling packet_netdev_notifier msg/val 2 data ffff880610bea000 refcnt 14->14
[   52.023522] dst_ifdown dst ffff8802fee7e000 dev ffff880610bea000 dst->dev ffff880311b33000 unreg 0 dev->refcnt 14
[   52.023525] calling dst_dev_notifier msg/val 2 data ffff880610bea000 refcnt 14->14
[   52.024267] calling rtnetlink_dev_notifier msg/val 6 data ffff880610bea000 refcnt 13->13
[   52.024272] calling fib_rules_notifier msg/val 6 data ffff880610bea000 refcnt 13->13
[   52.024277] calling arp_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 13->13
[   52.024279] inetdev_destroy()
[   52.024315] arp: arp_ifdown dev ffff880610bea000 refcnt 13
[   52.024421] arp: arp_ifdown dev ffff880610bea000 refcnt 12
[   52.024425] calling ip_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 13->12
[   52.024430] arp: arp_ifdown dev ffff880610bea000 refcnt 12
[   52.024434] calling fib_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 12->12
[   52.024437] calling ip_mr_notifier msg/val 6 data ffff880610bea000 refcnt 12->12
[   52.024440] calling ndisc_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 12->12
[   52.024443] calling ip6_route_dev_notifier msg/val 6 data ffff880610bea000 refcnt 12->12
[   52.024458] calling ipv6_dev_notf msg/val 6 data ffff880610bea000 refcnt 12->10
[   52.024462] calling packet_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 10->9
[   52.024465] dst_ifdown dst ffff8802fee7e000 dev ffff880610bea000 dst->dev ffff880311b33000 unreg 1 dev->refcnt 9
[   52.024468] calling dst_dev_notifier msg/val 6 data ffff880610bea000 refcnt 9->9
[   52.024564] calling rtnetlink_dev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024569] calling fib_rules_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024574] calling arp_netdev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024579] calling ip_netdev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024583] calling fib_netdev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024586] calling ip_mr_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024589] calling ndisc_netdev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024592] calling ip6_route_dev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024595] calling ipv6_dev_notf msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024598] calling packet_netdev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.024601] calling dst_dev_notifier msg/val 17 data ffff880610bea000 refcnt 7->7
[   52.028214] IPv4: dst_free(ffff8802fee7e180) refcnt 0
[   52.033207] dst_free(ffff88060816e0c0) refcnt 1
[   52.033212] adding dst ffff88060816e0c0 to garbage
[   52.033214] ------------[ cut here ]------------
[   52.033221] WARNING: at net/core/dst.c:215 __dst_free+0x57/0xf0()
[   52.033224] Hardware name: HP Z600 Workstation
[   52.033225] Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables xt_state ipt_REJECT fuse iptable_mangle iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bridge stp nfsd auth_rpcgss exportfs nfs lockd sunrpc asix usbnet rt61pci crc_itu_t rt2x00pci rt2x00lib eeprom_93cx6 tg3 ixgbe mdio igb
[   52.033254] Pid: 0, comm: swapper/6 Tainted: G        W    3.5.0+ #764
[   52.033255] Call Trace:
[   52.033257]  <IRQ>  [<ffffffff8103de0f>] warn_slowpath_common+0x7f/0xc0
[   52.033266]  [<ffffffff8103de6a>] warn_slowpath_null+0x1a/0x20
[   52.033270]  [<ffffffff8155d247>] __dst_free+0x57/0xf0
[   52.033277]  [<ffffffff815c96ba>] free_fib_info_rcu+0x18a/0x220
[   52.033282]  [<ffffffff810c4897>] rcu_process_callbacks+0x1f7/0x580
[   52.033287]  [<ffffffff81066869>] ? enqueue_hrtimer+0x39/0xe0
[   52.033292]  [<ffffffff81046d60>] __do_softirq+0xc0/0x260
[   52.033296]  [<ffffffff81067939>] ? hrtimer_interrupt+0x169/0x260
[   52.033301]  [<ffffffff816d006c>] call_softirq+0x1c/0x30
[   52.033306]  [<ffffffff81004225>] do_softirq+0x55/0x90
[   52.033309]  [<ffffffff8104720e>] irq_exit+0x9e/0xc0
[   52.033313]  [<ffffffff816d07ee>] smp_apic_timer_interrupt+0x6e/0x99
[   52.033316]  [<ffffffff816cf987>] apic_timer_interrupt+0x67/0x70
[   52.033318]  <EOI>  [<ffffffff81300230>] ? intel_idle+0xf0/0x150
[   52.033326]  [<ffffffff8130020f>] ? intel_idle+0xcf/0x150
[   52.033331]  [<ffffffff814ac779>] cpuidle_enter+0x19/0x20
[   52.033335]  [<ffffffff814ace0c>] cpuidle_idle_call+0xac/0x310
[   52.033340]  [<ffffffff8100b6f5>] cpu_idle+0x85/0xd0
[   52.033345]  [<ffffffff816b65de>] start_secondary+0x21b/0x21d
[   52.033347] ---[ end trace 555e476df6e4a6f3 ]---
[   52.033357] dst_free(ffff88060816e000) refcnt 1
[   52.033359] adding dst ffff88060816e000 to garbage
[   52.033360] ------------[ cut here ]------------
[   52.033363] WARNING: at net/core/dst.c:215 __dst_free+0x57/0xf0()
[   52.033365] Hardware name: HP Z600 Workstation
[   52.033366] Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables xt_state ipt_REJECT fuse iptable_mangle iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bridge stp nfsd auth_rpcgss exportfs nfs lockd sunrpc asix usbnet rt61pci crc_itu_t rt2x00pci rt2x00lib eeprom_93cx6 tg3 ixgbe mdio igb
[   52.033390] Pid: 0, comm: swapper/6 Tainted: G        W    3.5.0+ #764
[   52.033392] Call Trace:
[   52.033393]  <IRQ>  [<ffffffff8103de0f>] warn_slowpath_common+0x7f/0xc0
[   52.033399]  [<ffffffff8103de6a>] warn_slowpath_null+0x1a/0x20
[   52.033402]  [<ffffffff8155d247>] __dst_free+0x57/0xf0
[   52.033406]  [<ffffffff815c9740>] free_fib_info_rcu+0x210/0x220
[   52.033410]  [<ffffffff810c4897>] rcu_process_callbacks+0x1f7/0x580
[   52.033414]  [<ffffffff81066869>] ? enqueue_hrtimer+0x39/0xe0
[   52.033418]  [<ffffffff81046d60>] __do_softirq+0xc0/0x260
[   52.033421]  [<ffffffff81067939>] ? hrtimer_interrupt+0x169/0x260
[   52.033425]  [<ffffffff816d006c>] call_softirq+0x1c/0x30
[   52.033429]  [<ffffffff81004225>] do_softirq+0x55/0x90
[   52.033432]  [<ffffffff8104720e>] irq_exit+0x9e/0xc0
[   52.033436]  [<ffffffff816d07ee>] smp_apic_timer_interrupt+0x6e/0x99
[   52.033439]  [<ffffffff816cf987>] apic_timer_interrupt+0x67/0x70
[   52.033440]  <EOI>  [<ffffffff81300230>] ? intel_idle+0xf0/0x150
[   52.033447]  [<ffffffff8130020f>] ? intel_idle+0xcf/0x150
[   52.033451]  [<ffffffff814ac779>] cpuidle_enter+0x19/0x20
[   52.033455]  [<ffffffff814ace0c>] cpuidle_idle_call+0xac/0x310
[   52.033458]  [<ffffffff8100b6f5>] cpu_idle+0x85/0xd0
[   52.033462]  [<ffffffff816b65de>] start_secondary+0x21b/0x21d
[   52.033464] ---[ end trace 555e476df6e4a6f4 ]---
[   52.132982] dst_gc_task()
[   52.132986] delayed 3 performed 0
[   52.633325] IPv4: rt_free(ffff8802fee7e3c0)
[   52.633827] IPv4: rt_free(ffff8802fee7e240)
[   52.641632] IPv4: dst_free(ffff8802fee7e3c0) refcnt 0
[   52.641634] IPv4: dst_free(ffff8802fee7e240) refcnt 0
[   52.732498] dst_gc_task()
[   52.732502] delayed 3 performed 0
[   53.039739] calling rtnetlink_dev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039744] calling fib_rules_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039748] calling arp_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039751] calling ip_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039758] arp: arp_ifdown dev ffff880610bea000 refcnt 1
[   53.039762] calling fib_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039766] calling ip_mr_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039769] calling ndisc_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039772] calling ip6_route_dev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039778] calling ipv6_dev_notf msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039781] calling packet_netdev_notifier msg/val 6 data ffff880610bea000 refcnt 1->1
[   53.039785] dst_ifdown dst ffff8802fee7e000 dev ffff880610bea000 dst->dev ffff880311b33000 unreg 1 dev->refcnt 1
[   53.039788] dst_ifdown dst ffff88060816e000 dev ffff880610bea000 dst->dev ffff880311b33000 unreg 1 dev->refcnt 1
[   53.039791] dst_ifdown dst ffff88060816e0c0 dev ffff880610bea000 dst->dev ffff880610bea000 unreg 1 dev->refcnt 1
[   53.039794] calling dst_dev_notifier msg/val 6 data ffff880610bea000 refcnt 1->0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ?
  2012-08-10  9:27 [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ? Eric Dumazet
@ 2012-08-10 10:42 ` David Miller
  2012-08-10 11:06   ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2012-08-10 10:42 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ebiederm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 10 Aug 2012 11:27:04 +0200

> NETDEV_UNREGISTER_BATCH seems unused we can probably remove it.

Indeed, the routing cache was the final real user.

> I am tracking a device refcount issue, delaying net device dismantle by
> 1 second in netdev_wait_allrefs()
> 
> I guess we need to add a notifier called _after_ the final
> synchronize_net() in rollback_registered_many()

It's essentially caused by DST_GC_INC, right?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ?
  2012-08-10 10:42 ` David Miller
@ 2012-08-10 11:06   ` Eric Dumazet
  2012-08-10 14:45     ` Eric W. Biederman
  2012-08-10 14:46     ` [PATCH] net: remove delay at device dismantle Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Dumazet @ 2012-08-10 11:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ebiederm

On Fri, 2012-08-10 at 03:42 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 10 Aug 2012 11:27:04 +0200
> 
> > NETDEV_UNREGISTER_BATCH seems unused we can probably remove it.
> 
> Indeed, the routing cache was the final real user.
> 
> > I am tracking a device refcount issue, delaying net device dismantle by
> > 1 second in netdev_wait_allrefs()
> > 
> > I guess we need to add a notifier called _after_ the final
> > synchronize_net() in rollback_registered_many()
> 
> It's essentially caused by DST_GC_INC, right?

No, we in fact need a rcu_barrier(), then another call to
dst_dev_event().

rcu_barrier() is needed so that in-flight call_rcu() of routes (from
rt_free()) are completed. Or else we miss these dst in the
dst_dev_event().

I have a working patch, adding the rcu_barrier() and one additional
NETDEV_UNREGISTER_FINAL event.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ?
  2012-08-10 11:06   ` Eric Dumazet
@ 2012-08-10 14:45     ` Eric W. Biederman
  2012-08-10 15:01       ` Eric Dumazet
  2012-08-10 17:55       ` Paul E. McKenney
  2012-08-10 14:46     ` [PATCH] net: remove delay at device dismantle Eric Dumazet
  1 sibling, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2012-08-10 14:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On Fri, 2012-08-10 at 03:42 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Fri, 10 Aug 2012 11:27:04 +0200
>> 
>> > NETDEV_UNREGISTER_BATCH seems unused we can probably remove it.
>> 
>> Indeed, the routing cache was the final real user.
>> 
>> > I am tracking a device refcount issue, delaying net device dismantle by
>> > 1 second in netdev_wait_allrefs()
>> > 
>> > I guess we need to add a notifier called _after_ the final
>> > synchronize_net() in rollback_registered_many()
>> 
>> It's essentially caused by DST_GC_INC, right?
>
> No, we in fact need a rcu_barrier(), then another call to
> dst_dev_event().
>
> rcu_barrier() is needed so that in-flight call_rcu() of routes (from
> rt_free()) are completed. Or else we miss these dst in the
> dst_dev_event().

> I have a working patch, adding the rcu_barrier() and one additional
> NETDEV_UNREGISTER_FINAL event.

Can someone help bring me up to speed.  What has changed in the
dst ref counting that has invalidated our previous solutions?

As for the idea of putting an rcu_barrier inside of the rtnl_lock.  I
really don't like it. You are trading off a 1000ms singled threaded wait
without locks for extending the hold times of rtnl lock by 12ms or so.

We already have an rcu_barrier on that path in netdev_run_todo,
so we can reorganize things to use that barrier I would be much
happer.  Furthermore I talked to Paul McKenney a while ago
about creating an rcu_barrier expedited and he really did not
like the idea.

Reading through the code we really should get dst_rcu_free
out of the header and make it non-line.  dst_rcu_free can't
possibly be called from a location where it can be inlined.

Trying to understand your analysis I have stared at the code for
a while and I am definitely not seeing any rcu callbacks that
result in calling rt_free.  So one of us is missing something.

All I am seeing from your trace is one call of rtnetlink_dev_notifier
the refcount is at 7 and the next call of rtnetlink_dev_notifier the
refcnt has dropped to 1.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] net: remove delay at device dismantle
  2012-08-10 11:06   ` Eric Dumazet
  2012-08-10 14:45     ` Eric W. Biederman
@ 2012-08-10 14:46     ` Eric Dumazet
  2012-08-10 15:42       ` [PATCH net-next v2] " Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-08-10 14:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Mahesh Bandewar

From: Eric Dumazet <edumazet@google.com>

I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
in rollback_registered_many() after a rcu_barrier().

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
---
 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |   23 +++++++----------------
 net/core/dst.c            |    2 +-
 net/core/rtnetlink.c      |    2 +-
 net/ipv4/fib_frontend.c   |    2 --
 5 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a9db4f3..e3a43fd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1550,7 +1550,7 @@ struct packet_type {
 #define NETDEV_PRE_TYPE_CHANGE	0x000E
 #define NETDEV_POST_TYPE_CHANGE	0x000F
 #define NETDEV_POST_INIT	0x0010
-#define NETDEV_UNREGISTER_BATCH 0x0011
+#define NETDEV_UNREGISTER_FINAL 0x0011
 #define NETDEV_RELEASE		0x0012
 #define NETDEV_NOTIFY_PEERS	0x0013
 #define NETDEV_JOIN		0x0014
diff --git a/net/core/dev.c b/net/core/dev.c
index a39354e..be3a5af 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1394,7 +1394,6 @@ rollback:
 				nb->notifier_call(nb, NETDEV_DOWN, dev);
 			}
 			nb->notifier_call(nb, NETDEV_UNREGISTER, dev);
-			nb->notifier_call(nb, NETDEV_UNREGISTER_BATCH, dev);
 		}
 	}
 
@@ -1436,7 +1435,6 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
 				nb->notifier_call(nb, NETDEV_DOWN, dev);
 			}
 			nb->notifier_call(nb, NETDEV_UNREGISTER, dev);
-			nb->notifier_call(nb, NETDEV_UNREGISTER_BATCH, dev);
 		}
 	}
 unlock:
@@ -5307,14 +5305,14 @@ static void rollback_registered_many(struct list_head *head)
 		netdev_unregister_kobject(dev);
 	}
 
-	/* Process any work delayed until the end of the batch */
-	dev = list_first_entry(head, struct net_device, unreg_list);
-	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
-
 	synchronize_net();
+	/* Wait for rcu callbacks to finish before next phase */
+	rcu_barrier();
 
-	list_for_each_entry(dev, head, unreg_list)
+	list_for_each_entry(dev, head, unreg_list) {
+		call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 		dev_put(dev);
+	}
 }
 
 static void rollback_registered(struct net_device *dev)
@@ -5757,9 +5755,8 @@ static void netdev_wait_allrefs(struct net_device *dev)
 
 			/* Rebroadcast unregister notification */
 			call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
-			/* don't resend NETDEV_UNREGISTER_BATCH, _BATCH users
-			 * should have already handle it the first time */
-
+			rcu_barrier();
+			call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 			if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
 				     &dev->state)) {
 				/* We must not have linkwatch events
@@ -5821,11 +5818,6 @@ void netdev_run_todo(void)
 
 	__rtnl_unlock();
 
-	/* Wait for rcu callbacks to finish before attempting to drain
-	 * the device list.  This usually avoids a 250ms wait.
-	 */
-	if (!list_empty(&list))
-		rcu_barrier();
 
 	while (!list_empty(&list)) {
 		struct net_device *dev
@@ -6226,7 +6218,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	   the device is just moving and can keep their slaves up.
 	*/
 	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
-	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
 
 	/*
diff --git a/net/core/dst.c b/net/core/dst.c
index 56d6361..f6593d2 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -374,7 +374,7 @@ static int dst_dev_event(struct notifier_block *this, unsigned long event,
 	struct dst_entry *dst, *last = NULL;
 
 	switch (event) {
-	case NETDEV_UNREGISTER:
+	case NETDEV_UNREGISTER_FINAL:
 	case NETDEV_DOWN:
 		mutex_lock(&dst_gc_mutex);
 		for (dst = dst_busy_list; dst; dst = dst->next) {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2c5a0a0..30f5a06 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2356,7 +2356,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 	case NETDEV_PRE_TYPE_CHANGE:
 	case NETDEV_GOING_DOWN:
 	case NETDEV_UNREGISTER:
-	case NETDEV_UNREGISTER_BATCH:
+	case NETDEV_UNREGISTER_FINAL:
 	case NETDEV_RELEASE:
 	case NETDEV_JOIN:
 		break;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c43ae3f..4fe50184 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1071,8 +1071,6 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	case NETDEV_CHANGE:
 		rt_cache_flush(dev_net(dev), 0);
 		break;
-	case NETDEV_UNREGISTER_BATCH:
-		break;
 	}
 	return NOTIFY_DONE;
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ?
  2012-08-10 14:45     ` Eric W. Biederman
@ 2012-08-10 15:01       ` Eric Dumazet
  2012-08-10 17:55       ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2012-08-10 15:01 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev

On Fri, 2012-08-10 at 07:45 -0700, Eric W. Biederman wrote:

> Can someone help bring me up to speed.  What has changed in the
> dst ref counting that has invalidated our previous solutions?
> 

In fact your patch (850a545bd8a416484 net: Move rcu_barrier from
rollback_registered_many to netdev_run_todo.) reinstated the problem
again. You didnt notice, but other people can see the problem.

> As for the idea of putting an rcu_barrier inside of the rtnl_lock.  I
> really don't like it. You are trading off a 1000ms singled threaded wait
> without locks for extending the hold times of rtnl lock by 12ms or so.
> 

We probably can keep the rcu_barrier() in netdev_run_todo() and kick the
UNREGISTER_FINAL in netdev_run_todo() too.

I'll try that.


> We already have an rcu_barrier on that path in netdev_run_todo,
> so we can reorganize things to use that barrier I would be much
> happer.  Furthermore I talked to Paul McKenney a while ago
> about creating an rcu_barrier expedited and he really did not
> like the idea.
> 
> Reading through the code we really should get dst_rcu_free
> out of the header and make it non-line.  dst_rcu_free can't
> possibly be called from a location where it can be inlined.
> 
> Trying to understand your analysis I have stared at the code for
> a while and I am definitely not seeing any rcu callbacks that
> result in calling rt_free.  So one of us is missing something.

Recent commits add the rt_free() calls.

Of course, if you unregister a dummy device you wont see the problem.

If you unregister a real device, you definitely hit the problem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-10 14:46     ` [PATCH] net: remove delay at device dismantle Eric Dumazet
@ 2012-08-10 15:42       ` Eric Dumazet
  2012-08-11  5:54         ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-08-10 15:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Mahesh Bandewar, Eric W. Biederman

From: Eric Dumazet <edumazet@google.com>

I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
v2: NETDEV_UNREGISTER_FINAL called outside of rtnl lock
    as its more risky, base this patch on net-next

 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |   22 ++++++++--------------
 net/core/dst.c            |    2 +-
 net/core/fib_rules.c      |    3 ++-
 net/core/rtnetlink.c      |    2 +-
 net/ipv4/devinet.c        |    6 +++++-
 net/ipv4/fib_frontend.c   |    2 --
 7 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a9db4f3..e3a43fd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1550,7 +1550,7 @@ struct packet_type {
 #define NETDEV_PRE_TYPE_CHANGE	0x000E
 #define NETDEV_POST_TYPE_CHANGE	0x000F
 #define NETDEV_POST_INIT	0x0010
-#define NETDEV_UNREGISTER_BATCH 0x0011
+#define NETDEV_UNREGISTER_FINAL 0x0011
 #define NETDEV_RELEASE		0x0012
 #define NETDEV_NOTIFY_PEERS	0x0013
 #define NETDEV_JOIN		0x0014
diff --git a/net/core/dev.c b/net/core/dev.c
index 1f06df8..4c0f837 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1391,7 +1391,6 @@ rollback:
 				nb->notifier_call(nb, NETDEV_DOWN, dev);
 			}
 			nb->notifier_call(nb, NETDEV_UNREGISTER, dev);
-			nb->notifier_call(nb, NETDEV_UNREGISTER_BATCH, dev);
 		}
 	}
 
@@ -1433,7 +1432,6 @@ int unregister_netdevice_notifier(struct notifier_block *nb)
 				nb->notifier_call(nb, NETDEV_DOWN, dev);
 			}
 			nb->notifier_call(nb, NETDEV_UNREGISTER, dev);
-			nb->notifier_call(nb, NETDEV_UNREGISTER_BATCH, dev);
 		}
 	}
 unlock:
@@ -1453,7 +1451,8 @@ EXPORT_SYMBOL(unregister_netdevice_notifier);
 
 int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
 {
-	ASSERT_RTNL();
+	if (val != NETDEV_UNREGISTER_FINAL)
+		ASSERT_RTNL();
 	return raw_notifier_call_chain(&netdev_chain, val, dev);
 }
 EXPORT_SYMBOL(call_netdevice_notifiers);
@@ -5304,10 +5303,6 @@ static void rollback_registered_many(struct list_head *head)
 		netdev_unregister_kobject(dev);
 	}
 
-	/* Process any work delayed until the end of the batch */
-	dev = list_first_entry(head, struct net_device, unreg_list);
-	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
-
 	synchronize_net();
 
 	list_for_each_entry(dev, head, unreg_list)
@@ -5759,9 +5754,8 @@ static void netdev_wait_allrefs(struct net_device *dev)
 
 			/* Rebroadcast unregister notification */
 			call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
-			/* don't resend NETDEV_UNREGISTER_BATCH, _BATCH users
-			 * should have already handle it the first time */
-
+			rcu_barrier();
+			call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
 			if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
 				     &dev->state)) {
 				/* We must not have linkwatch events
@@ -5823,9 +5817,8 @@ void netdev_run_todo(void)
 
 	__rtnl_unlock();
 
-	/* Wait for rcu callbacks to finish before attempting to drain
-	 * the device list.  This usually avoids a 250ms wait.
-	 */
+
+	/* Wait for rcu callbacks to finish before next phase */
 	if (!list_empty(&list))
 		rcu_barrier();
 
@@ -5834,6 +5827,8 @@ void netdev_run_todo(void)
 			= list_first_entry(&list, struct net_device, todo_list);
 		list_del(&dev->todo_list);
 
+		call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
+
 		if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) {
 			pr_err("network todo '%s' but state %d\n",
 			       dev->name, dev->reg_state);
@@ -6228,7 +6223,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	   the device is just moving and can keep their slaves up.
 	*/
 	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
-	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
 
 	/*
diff --git a/net/core/dst.c b/net/core/dst.c
index 069d51d..a2d3624 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -366,7 +366,7 @@ static int dst_dev_event(struct notifier_block *this, unsigned long event,
 	struct dst_entry *dst, *last = NULL;
 
 	switch (event) {
-	case NETDEV_UNREGISTER:
+	case NETDEV_UNREGISTER_FINAL:
 	case NETDEV_DOWN:
 		mutex_lock(&dst_gc_mutex);
 		for (dst = dst_busy_list; dst; dst = dst->next) {
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index ab7db83..5850937 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -711,15 +711,16 @@ static int fib_rules_event(struct notifier_block *this, unsigned long event,
 	struct net *net = dev_net(dev);
 	struct fib_rules_ops *ops;
 
-	ASSERT_RTNL();
 
 	switch (event) {
 	case NETDEV_REGISTER:
+		ASSERT_RTNL();
 		list_for_each_entry(ops, &net->rules_ops, list)
 			attach_rules(&ops->rules_list, dev);
 		break;
 
 	case NETDEV_UNREGISTER:
+		ASSERT_RTNL();
 		list_for_each_entry(ops, &net->rules_ops, list)
 			detach_rules(&ops->rules_list, dev);
 		break;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 34d975b..c64efcf 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2358,7 +2358,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 	case NETDEV_PRE_TYPE_CHANGE:
 	case NETDEV_GOING_DOWN:
 	case NETDEV_UNREGISTER:
-	case NETDEV_UNREGISTER_BATCH:
+	case NETDEV_UNREGISTER_FINAL:
 	case NETDEV_RELEASE:
 	case NETDEV_JOIN:
 		break;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index adf273f..6a5e6e4 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1147,8 +1147,12 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 			 void *ptr)
 {
 	struct net_device *dev = ptr;
-	struct in_device *in_dev = __in_dev_get_rtnl(dev);
+	struct in_device *in_dev;
 
+	if (event == NETDEV_UNREGISTER_FINAL)
+		goto out;
+
+	in_dev = __in_dev_get_rtnl(dev);
 	ASSERT_RTNL();
 
 	if (!in_dev) {
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 7f073a3..43d6fea 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1071,8 +1071,6 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	case NETDEV_CHANGE:
 		rt_cache_flush(dev_net(dev), 0);
 		break;
-	case NETDEV_UNREGISTER_BATCH:
-		break;
 	}
 	return NOTIFY_DONE;
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ?
  2012-08-10 14:45     ` Eric W. Biederman
  2012-08-10 15:01       ` Eric Dumazet
@ 2012-08-10 17:55       ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2012-08-10 17:55 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Eric Dumazet, David Miller, netdev

On Fri, Aug 10, 2012 at 07:45:48AM -0700, Eric W. Biederman wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
> 
> > On Fri, 2012-08-10 at 03:42 -0700, David Miller wrote:
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Fri, 10 Aug 2012 11:27:04 +0200
> >> 
> >> > NETDEV_UNREGISTER_BATCH seems unused we can probably remove it.
> >> 
> >> Indeed, the routing cache was the final real user.
> >> 
> >> > I am tracking a device refcount issue, delaying net device dismantle by
> >> > 1 second in netdev_wait_allrefs()
> >> > 
> >> > I guess we need to add a notifier called _after_ the final
> >> > synchronize_net() in rollback_registered_many()
> >> 
> >> It's essentially caused by DST_GC_INC, right?
> >
> > No, we in fact need a rcu_barrier(), then another call to
> > dst_dev_event().
> >
> > rcu_barrier() is needed so that in-flight call_rcu() of routes (from
> > rt_free()) are completed. Or else we miss these dst in the
> > dst_dev_event().
> 
> > I have a working patch, adding the rcu_barrier() and one additional
> > NETDEV_UNREGISTER_FINAL event.
> 
> Can someone help bring me up to speed.  What has changed in the
> dst ref counting that has invalidated our previous solutions?
> 
> As for the idea of putting an rcu_barrier inside of the rtnl_lock.  I
> really don't like it. You are trading off a 1000ms singled threaded wait
> without locks for extending the hold times of rtnl lock by 12ms or so.
> 
> We already have an rcu_barrier on that path in netdev_run_todo,
> so we can reorganize things to use that barrier I would be much
> happer.  Furthermore I talked to Paul McKenney a while ago
> about creating an rcu_barrier expedited and he really did not
> like the idea.

For whatever it is worth, I do have rcu_barrier_expedited() on my list
of things that at least one person has expressed interest in, but that
I do not yet have a good solution for.  Obstacles include the following:

1.	If a given CPU has lots of callbacks, but is running a real-time
	process, what do you do?  (a) Hammer the real-time process?
	(b) Make rcu_barrier_expedited() wait (current likely choice)?
	(c) Handle via a set priority for callback processing (which
	might be the case for BOOST_RCU builds)?  (d) Force migration
	of the callbacks (mmmaybe...)?  (e) Force migration of the
	real-time process (ouch!)?

2.	Ditto, but huge numbers of callbacks and non-realtime processes.
	Similar solution space.

3.	There will be some real-time disruption from any reasonable
	implementation of rcu_barrier_expedited().  Maybe the RT
	guys choose to map it to rcu_barrier()?

On the other hand, in the same email thread back in May you were also
looking for a kmem_cache_free_rcu().  At the time I didn't have a
good solution for this, but I do believe that I have one now.  ;-)

							Thanx, Paul

> Reading through the code we really should get dst_rcu_free
> out of the header and make it non-line.  dst_rcu_free can't
> possibly be called from a location where it can be inlined.
> 
> Trying to understand your analysis I have stared at the code for
> a while and I am definitely not seeing any rcu callbacks that
> result in calling rt_free.  So one of us is missing something.
> 
> All I am seeing from your trace is one call of rtnetlink_dev_notifier
> the refcount is at 7 and the next call of rtnetlink_dev_notifier the
> refcnt has dropped to 1.
> 
> Eric
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-10 15:42       ` [PATCH net-next v2] " Eric Dumazet
@ 2012-08-11  5:54         ` Eric Dumazet
  2012-08-11  5:57           ` David Miller
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Eric Dumazet @ 2012-08-11  5:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Mahesh Bandewar, Eric W. Biederman

On Fri, 2012-08-10 at 17:42 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> I noticed extra one second delay in device dismantle, tracked down to
> a call to dst_dev_event() while some call_rcu() are still in RCU queues.
> 
...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tom Herbert <therbert@google.com>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> v2: NETDEV_UNREGISTER_FINAL called outside of rtnl lock
>     as its more risky, base this patch on net-next

Also I am leaving for a one week vacation with no access to the
Internet, so better hold this patch until my return ;)

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-11  5:54         ` Eric Dumazet
@ 2012-08-11  5:57           ` David Miller
  2012-08-23  2:18           ` David Miller
  2012-08-23  2:25           ` Gao feng
  2 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2012-08-11  5:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert, maheshb, ebiederm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 11 Aug 2012 07:54:47 +0200

> On Fri, 2012-08-10 at 17:42 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>> 
>> I noticed extra one second delay in device dismantle, tracked down to
>> a call to dst_dev_event() while some call_rcu() are still in RCU queues.
>> 
> ...
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Tom Herbert <therbert@google.com>
>> Cc: Mahesh Bandewar <maheshb@google.com>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> v2: NETDEV_UNREGISTER_FINAL called outside of rtnl lock
>>     as its more risky, base this patch on net-next
> 
> Also I am leaving for a one week vacation with no access to the
> Internet, so better hold this patch until my return ;)

Ok :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-11  5:54         ` Eric Dumazet
  2012-08-11  5:57           ` David Miller
@ 2012-08-23  2:18           ` David Miller
  2012-08-23  2:25           ` Gao feng
  2 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2012-08-23  2:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert, maheshb, ebiederm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 11 Aug 2012 07:54:47 +0200

> On Fri, 2012-08-10 at 17:42 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>> 
>> I noticed extra one second delay in device dismantle, tracked down to
>> a call to dst_dev_event() while some call_rcu() are still in RCU queues.
>> 
> ...
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Tom Herbert <therbert@google.com>
>> Cc: Mahesh Bandewar <maheshb@google.com>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> v2: NETDEV_UNREGISTER_FINAL called outside of rtnl lock
>>     as its more risky, base this patch on net-next
> 
> Also I am leaving for a one week vacation with no access to the
> Internet, so better hold this patch until my return ;)

Since you're back, I've applied this now.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-11  5:54         ` Eric Dumazet
  2012-08-11  5:57           ` David Miller
  2012-08-23  2:18           ` David Miller
@ 2012-08-23  2:25           ` Gao feng
  2012-08-23  2:51             ` David Miller
  2 siblings, 1 reply; 14+ messages in thread
From: Gao feng @ 2012-08-23  2:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Tom Herbert, Mahesh Bandewar,
	Eric W. Biederman

于 2012年08月11日 13:54, Eric Dumazet 写道:
> On Fri, 2012-08-10 at 17:42 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> I noticed extra one second delay in device dismantle, tracked down to
>> a call to dst_dev_event() while some call_rcu() are still in RCU queues.
>>
> ...
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Tom Herbert <therbert@google.com>
>> Cc: Mahesh Bandewar <maheshb@google.com>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> v2: NETDEV_UNREGISTER_FINAL called outside of rtnl lock
>>     as its more risky, base this patch on net-next
> 
> Also I am leaving for a one week vacation with no access to the
> Internet, so better hold this patch until my return ;)
> 

Hi Eric

I got this warning message with your patch applied.

Aug 22 18:32:56 Donkey kernel: [  124.733048] ===============================
Aug 22 18:32:56 Donkey kernel: [  124.733051] [ INFO: suspicious RCU usage. ]
Aug 22 18:32:56 Donkey kernel: [  124.733054] 3.6.0-rc1+ #14 Not tainted
Aug 22 18:32:56 Donkey kernel: [  124.733057] -------------------------------
Aug 22 18:32:56 Donkey kernel: [  124.733060] include/linux/inetdevice.h:229 suspicious rcu_dereference_protected() usage!
Aug 22 18:32:56 Donkey kernel: [  124.733062]
Aug 22 18:32:56 Donkey kernel: [  124.733062] other info that might help us debug this:
Aug 22 18:32:56 Donkey kernel: [  124.733062]
Aug 22 18:32:56 Donkey kernel: [  124.733066]
Aug 22 18:32:56 Donkey kernel: [  124.733066] rcu_scheduler_active = 1, debug_locks = 1
Aug 22 18:32:56 Donkey kernel: [  124.733069] 3 locks held by kworker/u:1/39:
Aug 22 18:32:56 Donkey kernel: [  124.733072]  #0:  (netns){.+.+.+}, at: [<ffffffff8107e117>] process_one_work+0x137/0x5d0
Aug 22 18:32:56 Donkey kernel: [  124.733086]  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff8107e117>] process_one_work+0x137/0x5d0
Aug 22 18:32:56 Donkey kernel: [  124.733095]  #2:  (net_mutex){+.+.+.}, at: [<ffffffff81542fd5>] cleanup_net+0x85/0x1a0
Aug 22 18:32:56 Donkey kernel: [  124.733106]
Aug 22 18:32:56 Donkey kernel: [  124.733106] stack backtrace:
Aug 22 18:32:56 Donkey kernel: [  124.733110] Pid: 39, comm: kworker/u:1 Not tainted 3.6.0-rc1+ #14
Aug 22 18:32:56 Donkey kernel: [  124.733112] Call Trace:
Aug 22 18:32:56 Donkey kernel: [  124.733120]  [<ffffffff810c2312>] lockdep_rcu_suspicious+0xe2/0x130
Aug 22 18:32:56 Donkey kernel: [  124.733127]  [<ffffffff815d35d4>] fib_netdev_event+0x114/0x180
Aug 22 18:32:56 Donkey kernel: [  124.733133]  [<ffffffff816914cc>] notifier_call_chain+0x5c/0x120
Aug 22 18:32:56 Donkey kernel: [  124.733139]  [<ffffffff8108d6b6>] raw_notifier_call_chain+0x16/0x20
Aug 22 18:32:56 Donkey kernel: [  124.733144]  [<ffffffff8154585c>] call_netdevice_notifiers+0x3c/0x70
Aug 22 18:32:56 Donkey kernel: [  124.733149]  [<ffffffff8154f88c>] netdev_run_todo+0x8c/0x280
Aug 22 18:32:56 Donkey kernel: [  124.733154]  [<ffffffff81544c9d>] ? synchronize_net+0x2d/0x40
Aug 22 18:32:56 Donkey kernel: [  124.733159]  [<ffffffff8155b90e>] rtnl_unlock+0xe/0x10
Aug 22 18:32:56 Donkey kernel: [  124.733164]  [<ffffffff81546c1a>] default_device_exit_batch+0xba/0xd0
Aug 22 18:32:56 Donkey kernel: [  124.733170]  [<ffffffff81542275>] ops_exit_list+0x55/0x60
Aug 22 18:32:56 Donkey kernel: [  124.733175]  [<ffffffff8154304b>] cleanup_net+0xfb/0x1a0
Aug 22 18:32:56 Donkey kernel: [  124.733180]  [<ffffffff8107e178>] process_one_work+0x198/0x5d0
Aug 22 18:32:56 Donkey kernel: [  124.733185]  [<ffffffff8107e117>] ? process_one_work+0x137/0x5d0
Aug 22 18:32:56 Donkey kernel: [  124.733191]  [<ffffffff8168b2b8>] ? __schedule+0x428/0x910
Aug 22 18:32:56 Donkey kernel: [  124.733196]  [<ffffffff81080951>] ? worker_thread+0x2a1/0x4b0
Aug 22 18:32:56 Donkey kernel: [  124.733201]  [<ffffffff81542f50>] ? net_drop_ns+0x50/0x50
Aug 22 18:32:56 Donkey kernel: [  124.733207]  [<ffffffff8108083a>] worker_thread+0x18a/0x4b0
Aug 22 18:32:56 Donkey kernel: [  124.733212]  [<ffffffff810806b0>] ? manage_workers+0x2a0/0x2a0
Aug 22 18:32:56 Donkey kernel: [  124.733217]  [<ffffffff810867de>] kthread+0xae/0xc0
Aug 22 18:32:56 Donkey kernel: [  124.733222]  [<ffffffff810c39cd>] ? trace_hardirqs_on+0xd/0x10
Aug 22 18:32:56 Donkey kernel: [  124.733227]  [<ffffffff81697044>] kernel_thread_helper+0x4/0x10
Aug 22 18:32:56 Donkey kernel: [  124.733233]  [<ffffffff8168d570>] ? retint_restore_args+0x13/0x13
Aug 22 18:32:56 Donkey kernel: [  124.733237]  [<ffffffff81086730>] ? flush_kthread_work+0x1a0/0x1a0
Aug 22 18:32:56 Donkey kernel: [  124.733242]  [<ffffffff81697040>] ? gs_change+0x13/0x13
Aug 22 18:32:56 Donkey kernel: [  124.733248]
Aug 22 18:32:56 Donkey kernel: [  124.733250] ===============================

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-23  2:25           ` Gao feng
@ 2012-08-23  2:51             ` David Miller
  2012-08-23  2:58               ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2012-08-23  2:51 UTC (permalink / raw)
  To: gaofeng; +Cc: eric.dumazet, netdev, therbert, maheshb, ebiederm

From: Gao feng <gaofeng@cn.fujitsu.com>
Date: Thu, 23 Aug 2012 10:25:00 +0800

> 于 2012年08月11日 13:54, Eric Dumazet 写道:
>> On Fri, 2012-08-10 at 17:42 +0200, Eric Dumazet wrote:
>>> From: Eric Dumazet <edumazet@google.com>
>>>
>>> I noticed extra one second delay in device dismantle, tracked down to
>>> a call to dst_dev_event() while some call_rcu() are still in RCU queues.
>>>
>> ...
>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>> Cc: Tom Herbert <therbert@google.com>
>>> Cc: Mahesh Bandewar <maheshb@google.com>
>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>> v2: NETDEV_UNREGISTER_FINAL called outside of rtnl lock
>>>     as its more risky, base this patch on net-next
>> 
>> Also I am leaving for a one week vacation with no access to the
>> Internet, so better hold this patch until my return ;)
>> 
> 
> Hi Eric
> 
> I got this warning message with your patch applied.

Ok I won't push this out until this is resolved.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v2] net: remove delay at device dismantle
  2012-08-23  2:51             ` David Miller
@ 2012-08-23  2:58               ` Eric Dumazet
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2012-08-23  2:58 UTC (permalink / raw)
  To: David Miller; +Cc: gaofeng, netdev, therbert, maheshb, ebiederm

On Wed, 2012-08-22 at 19:51 -0700, David Miller wrote:
> From: Gao feng <gaofeng@cn.fujitsu.com>

> > Hi Eric
> > 
> > I got this warning message with your patch applied.
> 
> Ok I won't push this out until this is resolved.

Thanks, I'll send a v2

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-08-23  2:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-10  9:27 [RFC] NETDEV_UNREGISTER_BATCH seems unused nowaday ? Eric Dumazet
2012-08-10 10:42 ` David Miller
2012-08-10 11:06   ` Eric Dumazet
2012-08-10 14:45     ` Eric W. Biederman
2012-08-10 15:01       ` Eric Dumazet
2012-08-10 17:55       ` Paul E. McKenney
2012-08-10 14:46     ` [PATCH] net: remove delay at device dismantle Eric Dumazet
2012-08-10 15:42       ` [PATCH net-next v2] " Eric Dumazet
2012-08-11  5:54         ` Eric Dumazet
2012-08-11  5:57           ` David Miller
2012-08-23  2:18           ` David Miller
2012-08-23  2:25           ` Gao feng
2012-08-23  2:51             ` David Miller
2012-08-23  2:58               ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).