netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [0/8] netpoll/bridge fixes
@ 2010-06-10 12:40 Herbert Xu
  2010-06-10 12:42 ` [PATCH 1/7] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

Hi:

Qianfeng Zhang reported that he was seeing crashes with the
attached backtrace.

I tracked this down to the recently added netpoll support in
the bridge device.  It's a classic use-after-free problem.

Trying to solve it brought out a host of other issues, some of
which existed prior to the new bridge code.  The following patches
attempt to address some of these issues.

Warning, this is completely untested (apart from compiling with
everything enabled) so please look but don't merge :)

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 5337 bytes --]

BUG: unable to handle kernel NULL pointer dereference at 0000000000000400
IP: [<ffffffffa0456824>] __br_deliver+0x64/0xe0 [bridge]
PGD 31e038067 PUD 31e5d6067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/kernel/mm/ksm/run
CPU 3 
Modules linked in: vhost_net(U) macvtap(U) macvlan(U) tun(U) ip6table_filter(U) ip6_tables(U) ebtable_nat(U) ebtables(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) nfsd(U) exportfs(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U)
bridge(U) stp(U) llc(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) kvm_intel(U) kvm(U) snd_hda_codec_realtek(U) snd_hda_intel(U) snd_hda_codec(U) snd_hwdep(U) snd_seq(U) snd_seq_device(U) snd_pcm(U) igb(U) i7core_edac(U) edac_core(U) iTCO_wdt(U) dca(U) snd_timer(U) snd(U) soundcore(U)
snd_page_alloc(U) iTCO_vendor_support(U) sr_mod(U) tg3(U) sg(U) cdrom(U) serio_raw(U) wmi(U) ext4(U) mbcache(U) jbd2(U) sd_mod(U) mptsas(U) crc_t10dif(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) firewire_ohci(U) firewire_core(U) crc_itu_t(U) ahci(U) nouveau(U) ttm(U) drm_kms_helper(U) drm(U)
i2c_algo_bit(U) i2c_core(U) dm_mod(U) [last unloaded: microcode]

Modules linked in: vhost_net(U) macvtap(U) macvlan(U) tun(U) ip6table_filter(U) ip6_tables(U) ebtable_nat(U) ebtables(U) ipt_MASQUERADE(U) iptable_nat(U) nf_nat(U) nfsd(U) exportfs(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U)
bridge(U) stp(U) llc(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) kvm_intel(U) kvm(U) snd_hda_codec_realtek(U) snd_hda_intel(U) snd_hda_codec(U) snd_hwdep(U) snd_seq(U) snd_seq_device(U) snd_pcm(U) igb(U) i7core_edac(U) edac_core(U) iTCO_wdt(U) dca(U) snd_timer(U) snd(U) soundcore(U)
snd_page_alloc(U) iTCO_vendor_support(U) sr_mod(U) tg3(U) sg(U) cdrom(U) serio_raw(U) wmi(U) ext4(U) mbcache(U) jbd2(U) sd_mod(U) mptsas(U) crc_t10dif(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) firewire_ohci(U) firewire_core(U) crc_itu_t(U) ahci(U) nouveau(U) ttm(U) drm_kms_helper(U) drm(U)
i2c_algo_bit(U) i2c_core(U) dm_mod(U) [last unloaded: microcode]
Pid: 2234, comm: netserver Tainted: G        W  2.6.32-31.el6.x86_64 #1 HP Z800 Workstation
RIP: 0010:[<ffffffffa0456824>]  [<ffffffffa0456824>] __br_deliver+0x64/0xe0 [bridge]
RSP: 0018:ffff880320ab7698  EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff88032035a6c0 RCX: ffff880320056140
RDX: 000000000000a971 RSI: 0000000000000282 RDI: ffff88031af1769c
RBP: ffff880320ab76b8 R08: ffff88031af1769c R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88031d17aa80
R13: ffff88031d17aab8 R14: ffff88031af7e8ce R15: ffff88032035a000
FS:  00007ff18f067700(0000) GS:ffff880028260000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000400 CR3: 000000031f101000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netserver (pid: 2234, threadinfo ffff880320ab6000, task ffff88031af62a60)
Stack:
 0000000380000000 ffff88032056f300 ffff88031d17aa80 ffff88032035a6c0
<0> ffff880320ab76c8 ffffffffa04568d5 ffff880320ab76f8 ffffffffa04555ac
<0> ffffffff818bbde0 0000000000000003 ffffffff818bbe20 ffff88031d17aa80
Call Trace:
 [<ffffffffa04568d5>] br_deliver+0x35/0x40 [bridge]
 [<ffffffffa04555ac>] br_dev_xmit+0xbc/0x100 [bridge]
 [<ffffffff8140f708>] dev_hard_start_xmit+0x2b8/0x370
 [<ffffffff81412c3e>] dev_queue_xmit+0x3be/0x4a0
 [<ffffffff81417545>] neigh_resolve_output+0x105/0x370
 [<ffffffff8144e9f0>] ? ip_finish_output+0x0/0x310
 [<ffffffff8144eb2c>] ip_finish_output+0x13c/0x310
 [<ffffffff8144edb8>] ip_output+0xb8/0xc0
 [<ffffffff8144dd0f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff8144dd45>] ip_local_out+0x25/0x30
 [<ffffffff8144e590>] ip_queue_xmit+0x190/0x420
 [<ffffffff814630e1>] tcp_transmit_skb+0x3f1/0x790
 [<ffffffff814649f9>] tcp_send_ack+0xd9/0x120
 [<ffffffff8145c31e>] __tcp_ack_snd_check+0x5e/0xa0
 [<ffffffff81461401>] tcp_rcv_established+0x271/0x820
 [<ffffffff814694b3>] tcp_v4_do_rcv+0x2e3/0x430
 [<ffffffff814694b3>] ? tcp_v4_do_rcv+0x2e3/0x430
 [<ffffffff814003ad>] release_sock+0x5d/0xc0
 [<ffffffff81458744>] tcp_recvmsg+0x864/0xe80
 [<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff813ffa79>] sock_common_recvmsg+0x39/0x50
 [<ffffffff813fd42e>] ? sock_recvmsg+0x13e/0x160
 [<ffffffff813fd423>] sock_recvmsg+0x133/0x160
 [<ffffffff8110c31e>] ? filemap_fault+0xbe/0x510
 [<ffffffff8108ff80>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8110aa87>] ? unlock_page+0x27/0x30
 [<ffffffff81131869>] ? __do_fault+0x439/0x500
 [<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff814003ff>] ? release_sock+0xaf/0xc0
 [<ffffffff813fd771>] sys_recvfrom+0xe1/0x170
 [<ffffffff8101187e>] ? __switch_to+0x26e/0x320
 [<ffffffff810d267e>] ? audit_syscall_entry+0x2e/0x280
 [<ffffffff810d28a2>] ? audit_syscall_entry+0x252/0x280
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
Code: c9 49 c7 c1 30 66 45 a0 4c 89 e2 be 03 00 00 00 bf 07 00 00 00 c7 04 24 00 00 00 80 e8 c6 eb fd e0 83 f8 01 74 31 49 8b 44 24 20 <48> 8b 80 00 04 00 00 48 85 c0 74 0e 48 8b 80 b8 00 00 00 48 8b 
RIP  [<ffffffffa0456824>] __br_deliver+0x64/0xe0 [bridge]
 RSP <ffff880320ab7698>
CR2: 0000000000000400


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/7] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 12:42 ` [PATCH 2/7] bridge: Remove redundant npinfo NULL setting Herbert Xu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup

Sinec we have to null npinfo regardless of whether there is a
ndo_netpoll_cleanup, it makes sense to do this unconditionally
in netpoll_cleanup rather than having every driver do it by
themselves.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/netpoll.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 94825b1..748c930 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -898,8 +898,7 @@ void netpoll_cleanup(struct netpoll *np)
 				ops = np->dev->netdev_ops;
 				if (ops->ndo_netpoll_cleanup)
 					ops->ndo_netpoll_cleanup(np->dev);
-				else
-					np->dev->npinfo = NULL;
+				np->dev->npinfo = NULL;
 			}
 		}
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/7] bridge: Remove redundant npinfo NULL setting
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
  2010-06-10 12:42 ` [PATCH 1/7] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 12:42 ` [PATCH 3/7] netpoll: Fix RCU usage Herbert Xu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

bridge: Remove redundant npinfo NULL setting

Now that netpoll always zaps npinfo we no longer needs to do it
in bridge.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index eedf2c9..dce0611 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -231,7 +231,6 @@ void br_netpoll_cleanup(struct net_device *dev)
 	struct net_bridge_port *p, *n;
 	const struct net_device_ops *ops;
 
-	br->dev->npinfo = NULL;
 	list_for_each_entry_safe(p, n, &br->port_list, list) {
 		if (p->dev) {
 			ops = p->dev->netdev_ops;

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/7] netpoll: Fix RCU usage
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
  2010-06-10 12:42 ` [PATCH 1/7] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
  2010-06-10 12:42 ` [PATCH 2/7] bridge: Remove redundant npinfo NULL setting Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 12:42 ` [PATCH 4/7] netpoll: Add locking for netpoll_setup/cleanup Herbert Xu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Fix RCU usage

The use of RCU in netpoll is incorrect in a number of places:

1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.

This patch fixes those issues.  As most users are in BH context,
this also converts the RCU usage to the BH variant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netpoll.h |   13 ++++++++-----
 net/core/netpoll.c      |   20 ++++++++++++--------
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index e9e2312..95c9f7e 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -57,12 +57,15 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
 #ifdef CONFIG_NETPOLL
 static inline bool netpoll_rx(struct sk_buff *skb)
 {
-	struct netpoll_info *npinfo = skb->dev->npinfo;
+	struct netpoll_info *npinfo;
 	unsigned long flags;
 	bool ret = false;
 
+	rcu_read_lock_bh();
+	npinfo = rcu_dereference(skb->dev->npinfo);
+
 	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
-		return false;
+		goto out;
 
 	spin_lock_irqsave(&npinfo->rx_lock, flags);
 	/* check rx_flags again with the lock held */
@@ -70,12 +73,14 @@ static inline bool netpoll_rx(struct sk_buff *skb)
 		ret = true;
 	spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 
+out:
+	rcu_read_unlock_bh();
 	return ret;
 }
 
 static inline int netpoll_rx_on(struct sk_buff *skb)
 {
-	struct netpoll_info *npinfo = skb->dev->npinfo;
+	struct netpoll_info *npinfo = rcu_dereference(skb->dev->npinfo);
 
 	return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
 }
@@ -91,7 +96,6 @@ static inline void *netpoll_poll_lock(struct napi_struct *napi)
 {
 	struct net_device *dev = napi->dev;
 
-	rcu_read_lock(); /* deal with race on ->npinfo */
 	if (dev && dev->npinfo) {
 		spin_lock(&napi->poll_lock);
 		napi->poll_owner = smp_processor_id();
@@ -108,7 +112,6 @@ static inline void netpoll_poll_unlock(void *have)
 		napi->poll_owner = -1;
 		spin_unlock(&napi->poll_lock);
 	}
-	rcu_read_unlock();
 }
 
 #else
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 748c930..4b7623d 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -292,6 +292,7 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 	unsigned long tries;
 	struct net_device *dev = np->dev;
 	const struct net_device_ops *ops = dev->netdev_ops;
+	/* It is up to the caller to keep npinfo alive. */
 	struct netpoll_info *npinfo = np->dev->npinfo;
 
 	if (!npinfo || !netif_running(dev) || !netif_device_present(dev)) {
@@ -841,10 +842,7 @@ int netpoll_setup(struct netpoll *np)
 	refill_skbs();
 
 	/* last thing to do is link it to the net device structure */
-	ndev->npinfo = npinfo;
-
-	/* avoid racing with NAPI reading npinfo */
-	synchronize_rcu();
+	rcu_assign_pointer(ndev->npinfo, npinfo);
 
 	return 0;
 
@@ -888,6 +886,16 @@ void netpoll_cleanup(struct netpoll *np)
 
 			if (atomic_dec_and_test(&npinfo->refcnt)) {
 				const struct net_device_ops *ops;
+
+				ops = np->dev->netdev_ops;
+				if (ops->ndo_netpoll_cleanup)
+					ops->ndo_netpoll_cleanup(np->dev);
+
+				rcu_assign_pointer(np->dev->npinfo, NULL);
+
+				/* avoid racing with NAPI reading npinfo */
+				synchronize_rcu_bh();
+
 				skb_queue_purge(&npinfo->arp_tx);
 				skb_queue_purge(&npinfo->txq);
 				cancel_rearming_delayed_work(&npinfo->tx_work);
@@ -895,10 +903,6 @@ void netpoll_cleanup(struct netpoll *np)
 				/* clean after last, unfinished work */
 				__skb_queue_purge(&npinfo->txq);
 				kfree(npinfo);
-				ops = np->dev->netdev_ops;
-				if (ops->ndo_netpoll_cleanup)
-					ops->ndo_netpoll_cleanup(np->dev);
-				np->dev->npinfo = NULL;
 			}
 		}
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 4/7] netpoll: Add locking for netpoll_setup/cleanup
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
                   ` (2 preceding siblings ...)
  2010-06-10 12:42 ` [PATCH 3/7] netpoll: Fix RCU usage Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 12:42 ` [PATCH 5/7] netpoll: Add ndo_netpoll_setup Herbert Xu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Add locking for netpoll_setup/cleanup

As it stands, netpoll_setup and netpoll_cleanup have no locking
protection whatsoever.  So chaos ensures if two entities try to
perform them on the same device.

This patch adds RTNL to the equation.  The code has be rearranged
so that bits that do not need RTNL protection are now moved to
the top of netpoll_setup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/netpoll.c |  151 ++++++++++++++++++++++++++---------------------------
 1 file changed, 76 insertions(+), 75 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 4b7623d..9dcd767 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -729,7 +729,6 @@ int netpoll_setup(struct netpoll *np)
 	struct net_device *ndev = NULL;
 	struct in_device *in_dev;
 	struct netpoll_info *npinfo;
-	struct netpoll *npe, *tmp;
 	unsigned long flags;
 	int err;
 
@@ -741,38 +740,6 @@ int netpoll_setup(struct netpoll *np)
 		return -ENODEV;
 	}
 
-	np->dev = ndev;
-	if (!ndev->npinfo) {
-		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
-		if (!npinfo) {
-			err = -ENOMEM;
-			goto put;
-		}
-
-		npinfo->rx_flags = 0;
-		INIT_LIST_HEAD(&npinfo->rx_np);
-
-		spin_lock_init(&npinfo->rx_lock);
-		skb_queue_head_init(&npinfo->arp_tx);
-		skb_queue_head_init(&npinfo->txq);
-		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
-
-		atomic_set(&npinfo->refcnt, 1);
-	} else {
-		npinfo = ndev->npinfo;
-		atomic_inc(&npinfo->refcnt);
-	}
-
-	npinfo->netpoll = np;
-
-	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
-	    !ndev->netdev_ops->ndo_poll_controller) {
-		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
-		       np->name, np->dev_name);
-		err = -ENOTSUPP;
-		goto release;
-	}
-
 	if (!netif_running(ndev)) {
 		unsigned long atmost, atleast;
 
@@ -786,7 +753,7 @@ int netpoll_setup(struct netpoll *np)
 		if (err) {
 			printk(KERN_ERR "%s: failed to open %s\n",
 			       np->name, ndev->name);
-			goto release;
+			goto put;
 		}
 
 		atleast = jiffies + HZ/10;
@@ -823,7 +790,7 @@ int netpoll_setup(struct netpoll *np)
 			printk(KERN_ERR "%s: no IP address for %s, aborting\n",
 			       np->name, np->dev_name);
 			err = -EDESTADDRREQ;
-			goto release;
+			goto put;
 		}
 
 		np->local_ip = in_dev->ifa_list->ifa_local;
@@ -831,6 +798,43 @@ int netpoll_setup(struct netpoll *np)
 		printk(KERN_INFO "%s: local IP %pI4\n", np->name, &np->local_ip);
 	}
 
+	np->dev = ndev;
+
+	/* fill up the skb queue */
+	refill_skbs();
+
+	rtnl_lock();
+	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
+	    !ndev->netdev_ops->ndo_poll_controller) {
+		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
+		       np->name, np->dev_name);
+		err = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (!ndev->npinfo) {
+		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+		if (!npinfo) {
+			err = -ENOMEM;
+			goto unlock;
+		}
+
+		npinfo->rx_flags = 0;
+		INIT_LIST_HEAD(&npinfo->rx_np);
+
+		spin_lock_init(&npinfo->rx_lock);
+		skb_queue_head_init(&npinfo->arp_tx);
+		skb_queue_head_init(&npinfo->txq);
+		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
+
+		atomic_set(&npinfo->refcnt, 1);
+	} else {
+		npinfo = ndev->npinfo;
+		atomic_inc(&npinfo->refcnt);
+	}
+
+	npinfo->netpoll = np;
+
 	if (np->rx_hook) {
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
 		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
@@ -838,24 +842,14 @@ int netpoll_setup(struct netpoll *np)
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 	}
 
-	/* fill up the skb queue */
-	refill_skbs();
-
 	/* last thing to do is link it to the net device structure */
 	rcu_assign_pointer(ndev->npinfo, npinfo);
+	rtnl_unlock();
 
 	return 0;
 
- release:
-	if (!ndev->npinfo) {
-		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		list_for_each_entry_safe(npe, tmp, &npinfo->rx_np, rx) {
-			npe->dev = NULL;
-		}
-		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-
-		kfree(npinfo);
-	}
+unlock:
+	rtnl_unlock();
 put:
 	dev_put(ndev);
 	return err;
@@ -872,43 +866,50 @@ void netpoll_cleanup(struct netpoll *np)
 {
 	struct netpoll_info *npinfo;
 	unsigned long flags;
+	int free = 0;
 
-	if (np->dev) {
-		npinfo = np->dev->npinfo;
-		if (npinfo) {
-			if (!list_empty(&npinfo->rx_np)) {
-				spin_lock_irqsave(&npinfo->rx_lock, flags);
-				list_del(&np->rx);
-				if (list_empty(&npinfo->rx_np))
-					npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
-				spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-			}
+	if (!np->dev)
+		return;
 
-			if (atomic_dec_and_test(&npinfo->refcnt)) {
-				const struct net_device_ops *ops;
+	rtnl_lock();
+	npinfo = np->dev->npinfo;
+	if (npinfo) {
+		if (!list_empty(&npinfo->rx_np)) {
+			spin_lock_irqsave(&npinfo->rx_lock, flags);
+			list_del(&np->rx);
+			if (list_empty(&npinfo->rx_np))
+				npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+			spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+		}
 
-				ops = np->dev->netdev_ops;
-				if (ops->ndo_netpoll_cleanup)
-					ops->ndo_netpoll_cleanup(np->dev);
+		free = atomic_dec_and_test(&npinfo->refcnt);
+		if (free) {
+			const struct net_device_ops *ops;
 
-				rcu_assign_pointer(np->dev->npinfo, NULL);
+			ops = np->dev->netdev_ops;
+			if (ops->ndo_netpoll_cleanup)
+				ops->ndo_netpoll_cleanup(np->dev);
 
-				/* avoid racing with NAPI reading npinfo */
-				synchronize_rcu_bh();
+			rcu_assign_pointer(np->dev->npinfo, NULL);
+		}
+	}
+	rtnl_unlock();
 
-				skb_queue_purge(&npinfo->arp_tx);
-				skb_queue_purge(&npinfo->txq);
-				cancel_rearming_delayed_work(&npinfo->tx_work);
+	if (free) {
+		/* avoid racing with NAPI reading npinfo */
+		synchronize_rcu_bh();
 
-				/* clean after last, unfinished work */
-				__skb_queue_purge(&npinfo->txq);
-				kfree(npinfo);
-			}
-		}
+		skb_queue_purge(&npinfo->arp_tx);
+		skb_queue_purge(&npinfo->txq);
+		cancel_rearming_delayed_work(&npinfo->tx_work);
 
-		dev_put(np->dev);
+		/* clean after last, unfinished work */
+		__skb_queue_purge(&npinfo->txq);
+		kfree(npinfo);
 	}
 
+	dev_put(np->dev);
+
 	np->dev = NULL;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 5/7] netpoll: Add ndo_netpoll_setup
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
                   ` (3 preceding siblings ...)
  2010-06-10 12:42 ` [PATCH 4/7] netpoll: Add locking for netpoll_setup/cleanup Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 12:42 ` [PATCH 6/7] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Add ndo_netpoll_setup

This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h |    2 ++
 net/core/netpoll.c        |   10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 40291f3..619d3f1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -728,6 +728,8 @@ struct net_device_ops {
 						        unsigned short vid);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	void                    (*ndo_poll_controller)(struct net_device *dev);
+	int			(*ndo_netpoll_setup)(struct net_device *dev,
+						     struct netpoll_info *info);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9dcd767..c445896 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -729,6 +729,7 @@ int netpoll_setup(struct netpoll *np)
 	struct net_device *ndev = NULL;
 	struct in_device *in_dev;
 	struct netpoll_info *npinfo;
+	const struct net_device_ops *ops;
 	unsigned long flags;
 	int err;
 
@@ -828,6 +829,13 @@ int netpoll_setup(struct netpoll *np)
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
 
 		atomic_set(&npinfo->refcnt, 1);
+
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_setup) {
+			err = ops->ndo_netpoll_setup(ndev, npinfo);
+			if (err)
+				goto free_npinfo;
+		}
 	} else {
 		npinfo = ndev->npinfo;
 		atomic_inc(&npinfo->refcnt);
@@ -848,6 +856,8 @@ int netpoll_setup(struct netpoll *np)
 
 	return 0;
 
+free_npinfo:
+	kfree(npinfo);
 unlock:
 	rtnl_unlock();
 put:

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 6/7] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
                   ` (4 preceding siblings ...)
  2010-06-10 12:42 ` [PATCH 5/7] netpoll: Add ndo_netpoll_setup Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 12:42 ` [PATCH 7/7] bridge: Fix netpoll support Herbert Xu
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Allow netpoll_setup/cleanup recursion

This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.

They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netpoll.h |    2 
 net/core/netpoll.c      |  176 ++++++++++++++++++++++++++----------------------
 2 files changed, 99 insertions(+), 79 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 95c9f7e..f3ad74a 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -46,9 +46,11 @@ void netpoll_poll(struct netpoll *np);
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len);
 void netpoll_print_options(struct netpoll *np);
 int netpoll_parse_options(struct netpoll *np, char *opt);
+int __netpoll_setup(struct netpoll *np);
 int netpoll_setup(struct netpoll *np);
 int netpoll_trap(void);
 void netpoll_set_trap(int trap);
+void __netpoll_cleanup(struct netpoll *np);
 void netpoll_cleanup(struct netpoll *np);
 int __netpoll_rx(struct sk_buff *skb);
 void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c445896..f728208 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -724,15 +724,78 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 	return -1;
 }
 
-int netpoll_setup(struct netpoll *np)
+int __netpoll_setup(struct netpoll *np)
 {
-	struct net_device *ndev = NULL;
-	struct in_device *in_dev;
+	struct net_device *ndev = np->dev;
 	struct netpoll_info *npinfo;
 	const struct net_device_ops *ops;
 	unsigned long flags;
 	int err;
 
+	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
+	    !ndev->netdev_ops->ndo_poll_controller) {
+		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
+		       np->name, np->dev_name);
+		err = -ENOTSUPP;
+		goto out;
+	}
+
+	if (!ndev->npinfo) {
+		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+		if (!npinfo) {
+			err = -ENOMEM;
+			goto out;
+		}
+
+		npinfo->rx_flags = 0;
+		INIT_LIST_HEAD(&npinfo->rx_np);
+
+		spin_lock_init(&npinfo->rx_lock);
+		skb_queue_head_init(&npinfo->arp_tx);
+		skb_queue_head_init(&npinfo->txq);
+		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
+
+		atomic_set(&npinfo->refcnt, 1);
+
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_setup) {
+			err = ops->ndo_netpoll_setup(ndev, npinfo);
+			if (err)
+				goto free_npinfo;
+		}
+	} else {
+		npinfo = ndev->npinfo;
+		atomic_inc(&npinfo->refcnt);
+	}
+
+	npinfo->netpoll = np;
+
+	if (np->rx_hook) {
+		spin_lock_irqsave(&npinfo->rx_lock, flags);
+		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+		list_add_tail(&np->rx, &npinfo->rx_np);
+		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	}
+
+	/* last thing to do is link it to the net device structure */
+	rcu_assign_pointer(ndev->npinfo, npinfo);
+	rtnl_unlock();
+
+	return 0;
+
+free_npinfo:
+	kfree(npinfo);
+out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(__netpoll_setup);
+
+int netpoll_setup(struct netpoll *np)
+{
+	struct net_device *ndev = NULL;
+	struct in_device *in_dev;
+	int err;
+
 	if (np->dev_name)
 		ndev = dev_get_by_name(&init_net, np->dev_name);
 	if (!ndev) {
@@ -805,61 +868,14 @@ int netpoll_setup(struct netpoll *np)
 	refill_skbs();
 
 	rtnl_lock();
-	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
-	    !ndev->netdev_ops->ndo_poll_controller) {
-		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
-		       np->name, np->dev_name);
-		err = -ENOTSUPP;
-		goto unlock;
-	}
-
-	if (!ndev->npinfo) {
-		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
-		if (!npinfo) {
-			err = -ENOMEM;
-			goto unlock;
-		}
-
-		npinfo->rx_flags = 0;
-		INIT_LIST_HEAD(&npinfo->rx_np);
-
-		spin_lock_init(&npinfo->rx_lock);
-		skb_queue_head_init(&npinfo->arp_tx);
-		skb_queue_head_init(&npinfo->txq);
-		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
-
-		atomic_set(&npinfo->refcnt, 1);
-
-		ops = np->dev->netdev_ops;
-		if (ops->ndo_netpoll_setup) {
-			err = ops->ndo_netpoll_setup(ndev, npinfo);
-			if (err)
-				goto free_npinfo;
-		}
-	} else {
-		npinfo = ndev->npinfo;
-		atomic_inc(&npinfo->refcnt);
-	}
-
-	npinfo->netpoll = np;
-
-	if (np->rx_hook) {
-		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
-		list_add_tail(&np->rx, &npinfo->rx_np);
-		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-	}
-
-	/* last thing to do is link it to the net device structure */
-	rcu_assign_pointer(ndev->npinfo, npinfo);
+	err = __netpoll_setup(np);
 	rtnl_unlock();
 
+	if (err)
+		goto put;
+
 	return 0;
 
-free_npinfo:
-	kfree(npinfo);
-unlock:
-	rtnl_unlock();
 put:
 	dev_put(ndev);
 	return err;
@@ -872,40 +888,32 @@ static int __init netpoll_init(void)
 }
 core_initcall(netpoll_init);
 
-void netpoll_cleanup(struct netpoll *np)
+void __netpoll_cleanup(struct netpoll *np)
 {
 	struct netpoll_info *npinfo;
 	unsigned long flags;
-	int free = 0;
 
-	if (!np->dev)
+	npinfo = np->dev->npinfo;
+	if (!npinfo)
 		return;
 
-	rtnl_lock();
-	npinfo = np->dev->npinfo;
-	if (npinfo) {
-		if (!list_empty(&npinfo->rx_np)) {
-			spin_lock_irqsave(&npinfo->rx_lock, flags);
-			list_del(&np->rx);
-			if (list_empty(&npinfo->rx_np))
-				npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
-			spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-		}
+	if (!list_empty(&npinfo->rx_np)) {
+		spin_lock_irqsave(&npinfo->rx_lock, flags);
+		list_del(&np->rx);
+		if (list_empty(&npinfo->rx_np))
+			npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	}
 
-		free = atomic_dec_and_test(&npinfo->refcnt);
-		if (free) {
-			const struct net_device_ops *ops;
+	if (atomic_dec_and_test(&npinfo->refcnt)) {
+		const struct net_device_ops *ops;
 
-			ops = np->dev->netdev_ops;
-			if (ops->ndo_netpoll_cleanup)
-				ops->ndo_netpoll_cleanup(np->dev);
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_cleanup)
+			ops->ndo_netpoll_cleanup(np->dev);
 
-			rcu_assign_pointer(np->dev->npinfo, NULL);
-		}
-	}
-	rtnl_unlock();
+		rcu_assign_pointer(np->dev->npinfo, NULL);
 
-	if (free) {
 		/* avoid racing with NAPI reading npinfo */
 		synchronize_rcu_bh();
 
@@ -917,9 +925,19 @@ void netpoll_cleanup(struct netpoll *np)
 		__skb_queue_purge(&npinfo->txq);
 		kfree(npinfo);
 	}
+}
+EXPORT_SYMBOL_GPL(__netpoll_cleanup);
 
-	dev_put(np->dev);
+void netpoll_cleanup(struct netpoll *np)
+{
+	if (!np->dev)
+		return;
 
+	rtnl_lock();
+	__netpoll_cleanup(np);
+	rtnl_unlock();
+
+	dev_put(np->dev);
 	np->dev = NULL;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 7/7] bridge: Fix netpoll support
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
                   ` (5 preceding siblings ...)
  2010-06-10 12:42 ` [PATCH 6/7] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
@ 2010-06-10 12:42 ` Herbert Xu
  2010-06-10 14:49 ` [0/8] netpoll/bridge fixes Stephen Hemminger
  2010-06-29 12:53 ` Yanko Kaneti
  8 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

bridge: Fix netpoll support

There are multiple problems with the newly added netpoll support:

1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.

This patch fixes all of these problems.  In particular, we now
allocate proper netpoll structures for each underlying device.

We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll.  Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).

This allows us to do away with the npinfo juggling that caused
problem number 1.

Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.

Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c  |   99 +++++++++++++++++++++++++-----------------------
 net/bridge/br_forward.c |   33 ++++++----------
 net/bridge/br_if.c      |   12 +++--
 net/bridge/br_private.h |   31 +++++++++++----
 4 files changed, 95 insertions(+), 80 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index dce0611..4a572f7 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -47,6 +47,12 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_pull(skb, ETH_HLEN);
 
 	if (is_multicast_ether_addr(dest)) {
+#ifdef CONFIG_NET_POLL_CONTROLLER
+		if (unlikely(in_irq())) {
+			br_flood_deliver(br, skb);
+			goto out;
+		}
+#endif
 		if (br_multicast_rcv(br, NULL, skb))
 			goto out;
 
@@ -199,72 +205,70 @@ static int br_set_tx_csum(struct net_device *dev, u32 data)
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-static bool br_devices_support_netpoll(struct net_bridge *br)
+static void br_poll_controller(struct net_device *br_dev)
 {
-	struct net_bridge_port *p;
-	bool ret = true;
-	int count = 0;
-	unsigned long flags;
-
-	spin_lock_irqsave(&br->lock, flags);
-	list_for_each_entry(p, &br->port_list, list) {
-		count++;
-		if ((p->dev->priv_flags & IFF_DISABLE_NETPOLL) ||
-		    !p->dev->netdev_ops->ndo_poll_controller)
-			ret = false;
-	}
-	spin_unlock_irqrestore(&br->lock, flags);
-	return count != 0 && ret;
 }
 
-static void br_poll_controller(struct net_device *br_dev)
+static void br_netpoll_cleanup(struct net_device *dev)
 {
-	struct netpoll *np = br_dev->npinfo->netpoll;
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_port *p, *n;
 
-	if (np->real_dev != br_dev)
-		netpoll_poll_dev(np->real_dev);
+	list_for_each_entry_safe(p, n, &br->port_list, list) {
+		br_netpoll_disable(p);
+	}
 }
 
-void br_netpoll_cleanup(struct net_device *dev)
+static int br_netpoll_setup(struct net_device *dev, struct netpoll_info *ni)
 {
 	struct net_bridge *br = netdev_priv(dev);
 	struct net_bridge_port *p, *n;
-	const struct net_device_ops *ops;
+	int err = 0;
 
 	list_for_each_entry_safe(p, n, &br->port_list, list) {
-		if (p->dev) {
-			ops = p->dev->netdev_ops;
-			if (ops->ndo_netpoll_cleanup)
-				ops->ndo_netpoll_cleanup(p->dev);
-			else
-				p->dev->npinfo = NULL;
-		}
+		if (!p->dev)
+			continue;
+
+		err = br_netpoll_enable(p);
+		if (err)
+			goto fail;
 	}
+
+out:
+	return err;
+
+fail:
+	br_netpoll_cleanup(dev);
+	goto out;
 }
 
-void br_netpoll_disable(struct net_bridge *br,
-			struct net_device *dev)
+int br_netpoll_enable(struct net_bridge_port *p)
 {
-	if (br_devices_support_netpoll(br))
-		br->dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
-	if (dev->netdev_ops->ndo_netpoll_cleanup)
-		dev->netdev_ops->ndo_netpoll_cleanup(dev);
-	else
-		dev->npinfo = NULL;
+	int err = 0;
+
+	p->np = kzalloc(sizeof(*p->np), GFP_KERNEL);
+	err = -ENOMEM;
+	if (!p->np)
+		goto out;
+
+	p->np->dev = p->dev;
+
+	err = __netpoll_setup(p->np);
+	if (err)
+		kfree(p->np);
+
+out:
+	return err;
 }
 
-void br_netpoll_enable(struct net_bridge *br,
-		       struct net_device *dev)
+void br_netpoll_disable(struct net_bridge_port *p)
 {
-	if (br_devices_support_netpoll(br)) {
-		br->dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
-		if (br->dev->npinfo)
-			dev->npinfo = br->dev->npinfo;
-	} else if (!(br->dev->priv_flags & IFF_DISABLE_NETPOLL)) {
-		br->dev->priv_flags |= IFF_DISABLE_NETPOLL;
-		br_info(br,"new device %s does not support netpoll (disabling)",
-			dev->name);
-	}
+	if (!p->np)
+		return;
+
+	__netpoll_cleanup(p->np);
+	kfree(p->np);
+	p->np = NULL;
 }
 
 #endif
@@ -293,6 +297,7 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_change_mtu		 = br_change_mtu,
 	.ndo_do_ioctl		 = br_dev_ioctl,
 #ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_netpoll_setup	 = br_netpoll_setup,
 	.ndo_netpoll_cleanup	 = br_netpoll_cleanup,
 	.ndo_poll_controller	 = br_poll_controller,
 #endif
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index a98ef13..707b92a 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -50,14 +50,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
 			kfree_skb(skb);
 		else {
 			skb_push(skb, ETH_HLEN);
-
-#ifdef CONFIG_NET_POLL_CONTROLLER
-			if (unlikely(skb->dev->priv_flags & IFF_IN_NETPOLL)) {
-				netpoll_send_skb(skb->dev->npinfo->netpoll, skb);
-				skb->dev->priv_flags &= ~IFF_IN_NETPOLL;
-			} else
-#endif
-				dev_queue_xmit(skb);
+			dev_queue_xmit(skb);
 		}
 	}
 
@@ -73,23 +66,23 @@ int br_forward_finish(struct sk_buff *skb)
 
 static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
 {
+	skb->dev = to->dev;
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
-	struct net_bridge *br = to->br;
-	if (unlikely(br->dev->priv_flags & IFF_IN_NETPOLL)) {
-		struct netpoll *np;
-		to->dev->npinfo = skb->dev->npinfo;
-		np = skb->dev->npinfo->netpoll;
-		np->real_dev = np->dev = to->dev;
-		to->dev->priv_flags |= IFF_IN_NETPOLL;
+	if (unlikely(in_irq())) {
+		if ((packet_length(skb) > skb->dev->mtu && !skb_is_gso(skb)) ||
+		    !to->np)
+			kfree_skb(skb);
+		else {
+			skb_push(skb, ETH_HLEN);
+			netpoll_send_skb(to->np, skb);
+		}
+		return;
 	}
 #endif
-	skb->dev = to->dev;
+
 	NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
 		br_forward_finish);
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	if (skb->dev->npinfo)
-		skb->dev->npinfo->netpoll->dev = br->dev;
-#endif
 }
 
 static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 18b245e..13102e0 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -154,7 +154,8 @@ static void del_nbp(struct net_bridge_port *p)
 	kobject_uevent(&p->kobj, KOBJ_REMOVE);
 	kobject_del(&p->kobj);
 
-	br_netpoll_disable(br, dev);
+	br_netpoll_disable(p);
+
 	call_rcu(&p->rcu, destroy_nbp_rcu);
 }
 
@@ -167,8 +168,6 @@ static void del_br(struct net_bridge *br, struct list_head *head)
 		del_nbp(p);
 	}
 
-	br_netpoll_cleanup(br->dev);
-
 	del_timer_sync(&br->gc_timer);
 
 	br_sysfs_delbr(br->dev);
@@ -428,6 +427,9 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	if (err)
 		goto err2;
 
+	if (br_netpoll_info(br) && ((err = br_netpoll_enable(p))))
+		goto err3;
+
 	rcu_assign_pointer(dev->br_port, p);
 	dev_disable_lro(dev);
 
@@ -448,9 +450,9 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 
 	kobject_uevent(&p->kobj, KOBJ_ADD);
 
-	br_netpoll_enable(br, dev);
-
 	return 0;
+err3:
+	sysfs_remove_link(br->ifobj, p->dev->name);
 err2:
 	br_fdb_delete_by_port(br, p, 1);
 err1:
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0f4a74b..1c55c98 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -143,6 +143,10 @@ struct net_bridge_port
 #ifdef CONFIG_SYSFS
 	char				sysfs_name[IFNAMSIZ];
 #endif
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	struct netpoll			*np;
+#endif
 };
 
 struct br_cpu_netstats {
@@ -273,16 +277,27 @@ extern void br_dev_setup(struct net_device *dev);
 extern netdev_tx_t br_dev_xmit(struct sk_buff *skb,
 			       struct net_device *dev);
 #ifdef CONFIG_NET_POLL_CONTROLLER
-extern void br_netpoll_cleanup(struct net_device *dev);
-extern void br_netpoll_enable(struct net_bridge *br,
-			      struct net_device *dev);
-extern void br_netpoll_disable(struct net_bridge *br,
-			       struct net_device *dev);
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+	return br->dev->npinfo;
+}
+
+extern int br_netpoll_enable(struct net_bridge_port *p);
+extern void br_netpoll_disable(struct net_bridge_port *p);
 #else
-#define br_netpoll_cleanup(br)
-#define br_netpoll_enable(br, dev)
-#define br_netpoll_disable(br, dev)
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+	return NULL;
+}
+
+static inline int br_netpoll_enable(struct net_bridge_port *p)
+{
+	return 0;
+}
 
+static inline void br_netpoll_disable(struct net_bridge_port *p)
+{
+}
 #endif
 
 /* br_fdb.c */

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
                   ` (6 preceding siblings ...)
  2010-06-10 12:42 ` [PATCH 7/7] bridge: Fix netpoll support Herbert Xu
@ 2010-06-10 14:49 ` Stephen Hemminger
  2010-06-10 21:56   ` Herbert Xu
  2010-06-29 12:53 ` Yanko Kaneti
  8 siblings, 1 reply; 63+ messages in thread
From: Stephen Hemminger @ 2010-06-10 14:49 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Thu, 10 Jun 2010 22:40:47 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Hi:
> 
> Qianfeng Zhang reported that he was seeing crashes with the
> attached backtrace.
> 
> I tracked this down to the recently added netpoll support in
> the bridge device.  It's a classic use-after-free problem.
> 
> Trying to solve it brought out a host of other issues, some of
> which existed prior to the new bridge code.  The following patches
> attempt to address some of these issues.
> 
> Warning, this is completely untested (apart from compiling with
> everything enabled) so please look but don't merge :)

Thanks for looking at this, and yes the original code does
look buggy. Not sure if I like the use of in_irq() to handle the netpoll
packets as special case. Is in_irq() really reliable for this?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-10 14:49 ` [0/8] netpoll/bridge fixes Stephen Hemminger
@ 2010-06-10 21:56   ` Herbert Xu
  2010-06-10 21:59     ` Stephen Hemminger
  0 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 21:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Thu, Jun 10, 2010 at 07:49:12AM -0700, Stephen Hemminger wrote:
> 
> Thanks for looking at this, and yes the original code does
> look buggy. Not sure if I like the use of in_irq() to handle the netpoll
> packets as special case. Is in_irq() really reliable for this?

Yes because netpoll always calls ndo_start_xmit with IRQs disabled,
while on the normal TX path IRQs must be enabled (as otherwise
enabling BH would warn).

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-10 21:56   ` Herbert Xu
@ 2010-06-10 21:59     ` Stephen Hemminger
  2010-06-10 22:48       ` Herbert Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Stephen Hemminger @ 2010-06-10 21:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Fri, 11 Jun 2010 07:56:43 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Thu, Jun 10, 2010 at 07:49:12AM -0700, Stephen Hemminger wrote:
> > 
> > Thanks for looking at this, and yes the original code does
> > look buggy. Not sure if I like the use of in_irq() to handle the netpoll
> > packets as special case. Is in_irq() really reliable for this?
> 
> Yes because netpoll always calls ndo_start_xmit with IRQs disabled,
> while on the normal TX path IRQs must be enabled (as otherwise
> enabling BH would warn).

Okay, then add a comment where in_irq is used?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-10 21:59     ` Stephen Hemminger
@ 2010-06-10 22:48       ` Herbert Xu
  2010-06-11  2:11         ` Herbert Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-06-10 22:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
>
> Okay, then add a comment where in_irq is used?

Actually let me put it into a wrapper.  I'll respin the patches.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-10 22:48       ` Herbert Xu
@ 2010-06-11  2:11         ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 1/8] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
                             ` (11 more replies)
  0 siblings, 12 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> >
> > Okay, then add a comment where in_irq is used?
> 
> Actually let me put it into a wrapper.  I'll respin the patches.

OK here is a repost.  And this time it really is 8 patches :)
I've tested it lightly.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/8] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup
  2010-06-11  2:11         ` Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 2/8] bridge: Remove redundant npinfo NULL setting Herbert Xu
                             ` (10 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup

Sinec we have to null npinfo regardless of whether there is a
ndo_netpoll_cleanup, it makes sense to do this unconditionally
in netpoll_cleanup rather than having every driver do it by
themselves.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/netpoll.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 94825b1..748c930 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -898,8 +898,7 @@ void netpoll_cleanup(struct netpoll *np)
 				ops = np->dev->netdev_ops;
 				if (ops->ndo_netpoll_cleanup)
 					ops->ndo_netpoll_cleanup(np->dev);
-				else
-					np->dev->npinfo = NULL;
+				np->dev->npinfo = NULL;
 			}
 		}
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/8] bridge: Remove redundant npinfo NULL setting
  2010-06-11  2:11         ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 1/8] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 3/8] netpoll: Fix RCU usage Herbert Xu
                             ` (9 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

bridge: Remove redundant npinfo NULL setting

Now that netpoll always zaps npinfo we no longer needs to do it
in bridge.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index eedf2c9..dce0611 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -231,7 +231,6 @@ void br_netpoll_cleanup(struct net_device *dev)
 	struct net_bridge_port *p, *n;
 	const struct net_device_ops *ops;
 
-	br->dev->npinfo = NULL;
 	list_for_each_entry_safe(p, n, &br->port_list, list) {
 		if (p->dev) {
 			ops = p->dev->netdev_ops;

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/8] netpoll: Fix RCU usage
  2010-06-11  2:11         ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 1/8] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
  2010-06-11  2:12           ` [PATCH 2/8] bridge: Remove redundant npinfo NULL setting Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11 23:10             ` Paul E. McKenney
  2010-06-11  2:12           ` [PATCH 4/8] netpoll: Add locking for netpoll_setup/cleanup Herbert Xu
                             ` (8 subsequent siblings)
  11 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Fix RCU usage

The use of RCU in netpoll is incorrect in a number of places:

1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.

This patch fixes those issues.  As most users are in BH context,
this also converts the RCU usage to the BH variant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netpoll.h |   13 ++++++++-----
 net/core/netpoll.c      |   20 ++++++++++++--------
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index e9e2312..95c9f7e 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -57,12 +57,15 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
 #ifdef CONFIG_NETPOLL
 static inline bool netpoll_rx(struct sk_buff *skb)
 {
-	struct netpoll_info *npinfo = skb->dev->npinfo;
+	struct netpoll_info *npinfo;
 	unsigned long flags;
 	bool ret = false;
 
+	rcu_read_lock_bh();
+	npinfo = rcu_dereference(skb->dev->npinfo);
+
 	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
-		return false;
+		goto out;
 
 	spin_lock_irqsave(&npinfo->rx_lock, flags);
 	/* check rx_flags again with the lock held */
@@ -70,12 +73,14 @@ static inline bool netpoll_rx(struct sk_buff *skb)
 		ret = true;
 	spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 
+out:
+	rcu_read_unlock_bh();
 	return ret;
 }
 
 static inline int netpoll_rx_on(struct sk_buff *skb)
 {
-	struct netpoll_info *npinfo = skb->dev->npinfo;
+	struct netpoll_info *npinfo = rcu_dereference(skb->dev->npinfo);
 
 	return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
 }
@@ -91,7 +96,6 @@ static inline void *netpoll_poll_lock(struct napi_struct *napi)
 {
 	struct net_device *dev = napi->dev;
 
-	rcu_read_lock(); /* deal with race on ->npinfo */
 	if (dev && dev->npinfo) {
 		spin_lock(&napi->poll_lock);
 		napi->poll_owner = smp_processor_id();
@@ -108,7 +112,6 @@ static inline void netpoll_poll_unlock(void *have)
 		napi->poll_owner = -1;
 		spin_unlock(&napi->poll_lock);
 	}
-	rcu_read_unlock();
 }
 
 #else
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 748c930..4b7623d 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -292,6 +292,7 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 	unsigned long tries;
 	struct net_device *dev = np->dev;
 	const struct net_device_ops *ops = dev->netdev_ops;
+	/* It is up to the caller to keep npinfo alive. */
 	struct netpoll_info *npinfo = np->dev->npinfo;
 
 	if (!npinfo || !netif_running(dev) || !netif_device_present(dev)) {
@@ -841,10 +842,7 @@ int netpoll_setup(struct netpoll *np)
 	refill_skbs();
 
 	/* last thing to do is link it to the net device structure */
-	ndev->npinfo = npinfo;
-
-	/* avoid racing with NAPI reading npinfo */
-	synchronize_rcu();
+	rcu_assign_pointer(ndev->npinfo, npinfo);
 
 	return 0;
 
@@ -888,6 +886,16 @@ void netpoll_cleanup(struct netpoll *np)
 
 			if (atomic_dec_and_test(&npinfo->refcnt)) {
 				const struct net_device_ops *ops;
+
+				ops = np->dev->netdev_ops;
+				if (ops->ndo_netpoll_cleanup)
+					ops->ndo_netpoll_cleanup(np->dev);
+
+				rcu_assign_pointer(np->dev->npinfo, NULL);
+
+				/* avoid racing with NAPI reading npinfo */
+				synchronize_rcu_bh();
+
 				skb_queue_purge(&npinfo->arp_tx);
 				skb_queue_purge(&npinfo->txq);
 				cancel_rearming_delayed_work(&npinfo->tx_work);
@@ -895,10 +903,6 @@ void netpoll_cleanup(struct netpoll *np)
 				/* clean after last, unfinished work */
 				__skb_queue_purge(&npinfo->txq);
 				kfree(npinfo);
-				ops = np->dev->netdev_ops;
-				if (ops->ndo_netpoll_cleanup)
-					ops->ndo_netpoll_cleanup(np->dev);
-				np->dev->npinfo = NULL;
 			}
 		}
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 4/8] netpoll: Add locking for netpoll_setup/cleanup
  2010-06-11  2:11         ` Herbert Xu
                             ` (2 preceding siblings ...)
  2010-06-11  2:12           ` [PATCH 3/8] netpoll: Fix RCU usage Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 5/8] netpoll: Add ndo_netpoll_setup Herbert Xu
                             ` (7 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Add locking for netpoll_setup/cleanup

As it stands, netpoll_setup and netpoll_cleanup have no locking
protection whatsoever.  So chaos ensures if two entities try to
perform them on the same device.

This patch adds RTNL to the equation.  The code has be rearranged
so that bits that do not need RTNL protection are now moved to
the top of netpoll_setup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/netpoll.c |  151 ++++++++++++++++++++++++++---------------------------
 1 file changed, 76 insertions(+), 75 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 4b7623d..9dcd767 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -729,7 +729,6 @@ int netpoll_setup(struct netpoll *np)
 	struct net_device *ndev = NULL;
 	struct in_device *in_dev;
 	struct netpoll_info *npinfo;
-	struct netpoll *npe, *tmp;
 	unsigned long flags;
 	int err;
 
@@ -741,38 +740,6 @@ int netpoll_setup(struct netpoll *np)
 		return -ENODEV;
 	}
 
-	np->dev = ndev;
-	if (!ndev->npinfo) {
-		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
-		if (!npinfo) {
-			err = -ENOMEM;
-			goto put;
-		}
-
-		npinfo->rx_flags = 0;
-		INIT_LIST_HEAD(&npinfo->rx_np);
-
-		spin_lock_init(&npinfo->rx_lock);
-		skb_queue_head_init(&npinfo->arp_tx);
-		skb_queue_head_init(&npinfo->txq);
-		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
-
-		atomic_set(&npinfo->refcnt, 1);
-	} else {
-		npinfo = ndev->npinfo;
-		atomic_inc(&npinfo->refcnt);
-	}
-
-	npinfo->netpoll = np;
-
-	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
-	    !ndev->netdev_ops->ndo_poll_controller) {
-		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
-		       np->name, np->dev_name);
-		err = -ENOTSUPP;
-		goto release;
-	}
-
 	if (!netif_running(ndev)) {
 		unsigned long atmost, atleast;
 
@@ -786,7 +753,7 @@ int netpoll_setup(struct netpoll *np)
 		if (err) {
 			printk(KERN_ERR "%s: failed to open %s\n",
 			       np->name, ndev->name);
-			goto release;
+			goto put;
 		}
 
 		atleast = jiffies + HZ/10;
@@ -823,7 +790,7 @@ int netpoll_setup(struct netpoll *np)
 			printk(KERN_ERR "%s: no IP address for %s, aborting\n",
 			       np->name, np->dev_name);
 			err = -EDESTADDRREQ;
-			goto release;
+			goto put;
 		}
 
 		np->local_ip = in_dev->ifa_list->ifa_local;
@@ -831,6 +798,43 @@ int netpoll_setup(struct netpoll *np)
 		printk(KERN_INFO "%s: local IP %pI4\n", np->name, &np->local_ip);
 	}
 
+	np->dev = ndev;
+
+	/* fill up the skb queue */
+	refill_skbs();
+
+	rtnl_lock();
+	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
+	    !ndev->netdev_ops->ndo_poll_controller) {
+		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
+		       np->name, np->dev_name);
+		err = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (!ndev->npinfo) {
+		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+		if (!npinfo) {
+			err = -ENOMEM;
+			goto unlock;
+		}
+
+		npinfo->rx_flags = 0;
+		INIT_LIST_HEAD(&npinfo->rx_np);
+
+		spin_lock_init(&npinfo->rx_lock);
+		skb_queue_head_init(&npinfo->arp_tx);
+		skb_queue_head_init(&npinfo->txq);
+		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
+
+		atomic_set(&npinfo->refcnt, 1);
+	} else {
+		npinfo = ndev->npinfo;
+		atomic_inc(&npinfo->refcnt);
+	}
+
+	npinfo->netpoll = np;
+
 	if (np->rx_hook) {
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
 		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
@@ -838,24 +842,14 @@ int netpoll_setup(struct netpoll *np)
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 	}
 
-	/* fill up the skb queue */
-	refill_skbs();
-
 	/* last thing to do is link it to the net device structure */
 	rcu_assign_pointer(ndev->npinfo, npinfo);
+	rtnl_unlock();
 
 	return 0;
 
- release:
-	if (!ndev->npinfo) {
-		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		list_for_each_entry_safe(npe, tmp, &npinfo->rx_np, rx) {
-			npe->dev = NULL;
-		}
-		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-
-		kfree(npinfo);
-	}
+unlock:
+	rtnl_unlock();
 put:
 	dev_put(ndev);
 	return err;
@@ -872,43 +866,50 @@ void netpoll_cleanup(struct netpoll *np)
 {
 	struct netpoll_info *npinfo;
 	unsigned long flags;
+	int free = 0;
 
-	if (np->dev) {
-		npinfo = np->dev->npinfo;
-		if (npinfo) {
-			if (!list_empty(&npinfo->rx_np)) {
-				spin_lock_irqsave(&npinfo->rx_lock, flags);
-				list_del(&np->rx);
-				if (list_empty(&npinfo->rx_np))
-					npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
-				spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-			}
+	if (!np->dev)
+		return;
 
-			if (atomic_dec_and_test(&npinfo->refcnt)) {
-				const struct net_device_ops *ops;
+	rtnl_lock();
+	npinfo = np->dev->npinfo;
+	if (npinfo) {
+		if (!list_empty(&npinfo->rx_np)) {
+			spin_lock_irqsave(&npinfo->rx_lock, flags);
+			list_del(&np->rx);
+			if (list_empty(&npinfo->rx_np))
+				npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+			spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+		}
 
-				ops = np->dev->netdev_ops;
-				if (ops->ndo_netpoll_cleanup)
-					ops->ndo_netpoll_cleanup(np->dev);
+		free = atomic_dec_and_test(&npinfo->refcnt);
+		if (free) {
+			const struct net_device_ops *ops;
 
-				rcu_assign_pointer(np->dev->npinfo, NULL);
+			ops = np->dev->netdev_ops;
+			if (ops->ndo_netpoll_cleanup)
+				ops->ndo_netpoll_cleanup(np->dev);
 
-				/* avoid racing with NAPI reading npinfo */
-				synchronize_rcu_bh();
+			rcu_assign_pointer(np->dev->npinfo, NULL);
+		}
+	}
+	rtnl_unlock();
 
-				skb_queue_purge(&npinfo->arp_tx);
-				skb_queue_purge(&npinfo->txq);
-				cancel_rearming_delayed_work(&npinfo->tx_work);
+	if (free) {
+		/* avoid racing with NAPI reading npinfo */
+		synchronize_rcu_bh();
 
-				/* clean after last, unfinished work */
-				__skb_queue_purge(&npinfo->txq);
-				kfree(npinfo);
-			}
-		}
+		skb_queue_purge(&npinfo->arp_tx);
+		skb_queue_purge(&npinfo->txq);
+		cancel_rearming_delayed_work(&npinfo->tx_work);
 
-		dev_put(np->dev);
+		/* clean after last, unfinished work */
+		__skb_queue_purge(&npinfo->txq);
+		kfree(npinfo);
 	}
 
+	dev_put(np->dev);
+
 	np->dev = NULL;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 5/8] netpoll: Add ndo_netpoll_setup
  2010-06-11  2:11         ` Herbert Xu
                             ` (3 preceding siblings ...)
  2010-06-11  2:12           ` [PATCH 4/8] netpoll: Add locking for netpoll_setup/cleanup Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
                             ` (6 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Add ndo_netpoll_setup

This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h |    2 ++
 net/core/netpoll.c        |   10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 40291f3..619d3f1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -728,6 +728,8 @@ struct net_device_ops {
 						        unsigned short vid);
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	void                    (*ndo_poll_controller)(struct net_device *dev);
+	int			(*ndo_netpoll_setup)(struct net_device *dev,
+						     struct netpoll_info *info);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9dcd767..c445896 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -729,6 +729,7 @@ int netpoll_setup(struct netpoll *np)
 	struct net_device *ndev = NULL;
 	struct in_device *in_dev;
 	struct netpoll_info *npinfo;
+	const struct net_device_ops *ops;
 	unsigned long flags;
 	int err;
 
@@ -828,6 +829,13 @@ int netpoll_setup(struct netpoll *np)
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
 
 		atomic_set(&npinfo->refcnt, 1);
+
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_setup) {
+			err = ops->ndo_netpoll_setup(ndev, npinfo);
+			if (err)
+				goto free_npinfo;
+		}
 	} else {
 		npinfo = ndev->npinfo;
 		atomic_inc(&npinfo->refcnt);
@@ -848,6 +856,8 @@ int netpoll_setup(struct netpoll *np)
 
 	return 0;
 
+free_npinfo:
+	kfree(npinfo);
 unlock:
 	rtnl_unlock();
 put:

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-11  2:11         ` Herbert Xu
                             ` (4 preceding siblings ...)
  2010-06-11  2:12           ` [PATCH 5/8] netpoll: Add ndo_netpoll_setup Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-25  1:21             ` Andrew Morton
  2010-06-11  2:12           ` [PATCH 7/8] netpoll: Add netpoll_tx_running Herbert Xu
                             ` (5 subsequent siblings)
  11 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Allow netpoll_setup/cleanup recursion

This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.

They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netpoll.h |    2 
 net/core/netpoll.c      |  176 ++++++++++++++++++++++++++----------------------
 2 files changed, 99 insertions(+), 79 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 95c9f7e..f3ad74a 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -46,9 +46,11 @@ void netpoll_poll(struct netpoll *np);
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len);
 void netpoll_print_options(struct netpoll *np);
 int netpoll_parse_options(struct netpoll *np, char *opt);
+int __netpoll_setup(struct netpoll *np);
 int netpoll_setup(struct netpoll *np);
 int netpoll_trap(void);
 void netpoll_set_trap(int trap);
+void __netpoll_cleanup(struct netpoll *np);
 void netpoll_cleanup(struct netpoll *np);
 int __netpoll_rx(struct sk_buff *skb);
 void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c445896..f728208 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -724,15 +724,78 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 	return -1;
 }
 
-int netpoll_setup(struct netpoll *np)
+int __netpoll_setup(struct netpoll *np)
 {
-	struct net_device *ndev = NULL;
-	struct in_device *in_dev;
+	struct net_device *ndev = np->dev;
 	struct netpoll_info *npinfo;
 	const struct net_device_ops *ops;
 	unsigned long flags;
 	int err;
 
+	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
+	    !ndev->netdev_ops->ndo_poll_controller) {
+		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
+		       np->name, np->dev_name);
+		err = -ENOTSUPP;
+		goto out;
+	}
+
+	if (!ndev->npinfo) {
+		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+		if (!npinfo) {
+			err = -ENOMEM;
+			goto out;
+		}
+
+		npinfo->rx_flags = 0;
+		INIT_LIST_HEAD(&npinfo->rx_np);
+
+		spin_lock_init(&npinfo->rx_lock);
+		skb_queue_head_init(&npinfo->arp_tx);
+		skb_queue_head_init(&npinfo->txq);
+		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
+
+		atomic_set(&npinfo->refcnt, 1);
+
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_setup) {
+			err = ops->ndo_netpoll_setup(ndev, npinfo);
+			if (err)
+				goto free_npinfo;
+		}
+	} else {
+		npinfo = ndev->npinfo;
+		atomic_inc(&npinfo->refcnt);
+	}
+
+	npinfo->netpoll = np;
+
+	if (np->rx_hook) {
+		spin_lock_irqsave(&npinfo->rx_lock, flags);
+		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+		list_add_tail(&np->rx, &npinfo->rx_np);
+		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	}
+
+	/* last thing to do is link it to the net device structure */
+	rcu_assign_pointer(ndev->npinfo, npinfo);
+	rtnl_unlock();
+
+	return 0;
+
+free_npinfo:
+	kfree(npinfo);
+out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(__netpoll_setup);
+
+int netpoll_setup(struct netpoll *np)
+{
+	struct net_device *ndev = NULL;
+	struct in_device *in_dev;
+	int err;
+
 	if (np->dev_name)
 		ndev = dev_get_by_name(&init_net, np->dev_name);
 	if (!ndev) {
@@ -805,61 +868,14 @@ int netpoll_setup(struct netpoll *np)
 	refill_skbs();
 
 	rtnl_lock();
-	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
-	    !ndev->netdev_ops->ndo_poll_controller) {
-		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
-		       np->name, np->dev_name);
-		err = -ENOTSUPP;
-		goto unlock;
-	}
-
-	if (!ndev->npinfo) {
-		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
-		if (!npinfo) {
-			err = -ENOMEM;
-			goto unlock;
-		}
-
-		npinfo->rx_flags = 0;
-		INIT_LIST_HEAD(&npinfo->rx_np);
-
-		spin_lock_init(&npinfo->rx_lock);
-		skb_queue_head_init(&npinfo->arp_tx);
-		skb_queue_head_init(&npinfo->txq);
-		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
-
-		atomic_set(&npinfo->refcnt, 1);
-
-		ops = np->dev->netdev_ops;
-		if (ops->ndo_netpoll_setup) {
-			err = ops->ndo_netpoll_setup(ndev, npinfo);
-			if (err)
-				goto free_npinfo;
-		}
-	} else {
-		npinfo = ndev->npinfo;
-		atomic_inc(&npinfo->refcnt);
-	}
-
-	npinfo->netpoll = np;
-
-	if (np->rx_hook) {
-		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
-		list_add_tail(&np->rx, &npinfo->rx_np);
-		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-	}
-
-	/* last thing to do is link it to the net device structure */
-	rcu_assign_pointer(ndev->npinfo, npinfo);
+	err = __netpoll_setup(np);
 	rtnl_unlock();
 
+	if (err)
+		goto put;
+
 	return 0;
 
-free_npinfo:
-	kfree(npinfo);
-unlock:
-	rtnl_unlock();
 put:
 	dev_put(ndev);
 	return err;
@@ -872,40 +888,32 @@ static int __init netpoll_init(void)
 }
 core_initcall(netpoll_init);
 
-void netpoll_cleanup(struct netpoll *np)
+void __netpoll_cleanup(struct netpoll *np)
 {
 	struct netpoll_info *npinfo;
 	unsigned long flags;
-	int free = 0;
 
-	if (!np->dev)
+	npinfo = np->dev->npinfo;
+	if (!npinfo)
 		return;
 
-	rtnl_lock();
-	npinfo = np->dev->npinfo;
-	if (npinfo) {
-		if (!list_empty(&npinfo->rx_np)) {
-			spin_lock_irqsave(&npinfo->rx_lock, flags);
-			list_del(&np->rx);
-			if (list_empty(&npinfo->rx_np))
-				npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
-			spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-		}
+	if (!list_empty(&npinfo->rx_np)) {
+		spin_lock_irqsave(&npinfo->rx_lock, flags);
+		list_del(&np->rx);
+		if (list_empty(&npinfo->rx_np))
+			npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	}
 
-		free = atomic_dec_and_test(&npinfo->refcnt);
-		if (free) {
-			const struct net_device_ops *ops;
+	if (atomic_dec_and_test(&npinfo->refcnt)) {
+		const struct net_device_ops *ops;
 
-			ops = np->dev->netdev_ops;
-			if (ops->ndo_netpoll_cleanup)
-				ops->ndo_netpoll_cleanup(np->dev);
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_cleanup)
+			ops->ndo_netpoll_cleanup(np->dev);
 
-			rcu_assign_pointer(np->dev->npinfo, NULL);
-		}
-	}
-	rtnl_unlock();
+		rcu_assign_pointer(np->dev->npinfo, NULL);
 
-	if (free) {
 		/* avoid racing with NAPI reading npinfo */
 		synchronize_rcu_bh();
 
@@ -917,9 +925,19 @@ void netpoll_cleanup(struct netpoll *np)
 		__skb_queue_purge(&npinfo->txq);
 		kfree(npinfo);
 	}
+}
+EXPORT_SYMBOL_GPL(__netpoll_cleanup);
 
-	dev_put(np->dev);
+void netpoll_cleanup(struct netpoll *np)
+{
+	if (!np->dev)
+		return;
 
+	rtnl_lock();
+	__netpoll_cleanup(np);
+	rtnl_unlock();
+
+	dev_put(np->dev);
 	np->dev = NULL;
 }
 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 7/8] netpoll: Add netpoll_tx_running
  2010-06-11  2:11         ` Herbert Xu
                             ` (5 preceding siblings ...)
  2010-06-11  2:12           ` [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11  2:12           ` [PATCH 8/8] bridge: Fix netpoll support Herbert Xu
                             ` (4 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

netpoll: Add netpoll_tx_running

This patch adds the helper netpoll_tx_running for use within
ndo_start_xmit.  It returns non-zero if ndo_start_xmit is being
invoked by netpoll, and zero otherwise.

This is currently implemented by simply looking at the hardirq
count.  This is because for all non-netpoll uses of ndo_start_xmit,
IRQs must be enabled while netpoll always disables IRQs before
calling ndo_start_xmit.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netpoll.h |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index f3ad74a..4c77fe7 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -116,6 +116,11 @@ static inline void netpoll_poll_unlock(void *have)
 	}
 }
 
+static inline int netpoll_tx_running(struct net_device *dev)
+{
+	return irqs_disabled();
+}
+
 #else
 static inline int netpoll_rx(struct sk_buff *skb)
 {
@@ -139,6 +144,10 @@ static inline void netpoll_poll_unlock(void *have)
 static inline void netpoll_netdev_init(struct net_device *dev)
 {
 }
+static inline int netpoll_tx_running(struct net_device *dev)
+{
+	return 0;
+}
 #endif
 
 #endif

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 8/8] bridge: Fix netpoll support
  2010-06-11  2:11         ` Herbert Xu
                             ` (6 preceding siblings ...)
  2010-06-11  2:12           ` [PATCH 7/8] netpoll: Add netpoll_tx_running Herbert Xu
@ 2010-06-11  2:12           ` Herbert Xu
  2010-06-11  3:08             ` fired a bug report on bugzilla.redhat.com Qianfeng Zhang
  2010-06-15 10:28             ` [PATCH 8/8] bridge: Fix netpoll support Cong Wang
  2010-06-11 20:03           ` [0/8] netpoll/bridge fixes Matt Mackall
                             ` (3 subsequent siblings)
  11 siblings, 2 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-11  2:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen 

bridge: Fix netpoll support

There are multiple problems with the newly added netpoll support:

1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.

This patch fixes all of these problems.  In particular, we now
allocate proper netpoll structures for each underlying device.

We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll.  Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).

This allows us to do away with the npinfo juggling that caused
problem number 1.

Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.

Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c  |  108 +++++++++++++++++++++++++++---------------------
 net/bridge/br_forward.c |   34 +++++----------
 net/bridge/br_if.c      |   12 +++--
 net/bridge/br_private.h |   46 ++++++++++++++++----
 4 files changed, 118 insertions(+), 82 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index dce0611..cd4f07a 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -47,6 +47,10 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_pull(skb, ETH_HLEN);
 
 	if (is_multicast_ether_addr(dest)) {
+		if (unlikely(netpoll_tx_running(dev))) {
+			br_flood_deliver(br, skb);
+			goto out;
+		}
 		if (br_multicast_rcv(br, NULL, skb))
 			goto out;
 
@@ -199,72 +203,81 @@ static int br_set_tx_csum(struct net_device *dev, u32 data)
 }
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
-static bool br_devices_support_netpoll(struct net_bridge *br)
+static void br_poll_controller(struct net_device *br_dev)
 {
-	struct net_bridge_port *p;
-	bool ret = true;
-	int count = 0;
-	unsigned long flags;
-
-	spin_lock_irqsave(&br->lock, flags);
-	list_for_each_entry(p, &br->port_list, list) {
-		count++;
-		if ((p->dev->priv_flags & IFF_DISABLE_NETPOLL) ||
-		    !p->dev->netdev_ops->ndo_poll_controller)
-			ret = false;
-	}
-	spin_unlock_irqrestore(&br->lock, flags);
-	return count != 0 && ret;
 }
 
-static void br_poll_controller(struct net_device *br_dev)
+static void br_netpoll_cleanup(struct net_device *dev)
 {
-	struct netpoll *np = br_dev->npinfo->netpoll;
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_port *p, *n;
 
-	if (np->real_dev != br_dev)
-		netpoll_poll_dev(np->real_dev);
+	list_for_each_entry_safe(p, n, &br->port_list, list) {
+		br_netpoll_disable(p);
+	}
 }
 
-void br_netpoll_cleanup(struct net_device *dev)
+static int br_netpoll_setup(struct net_device *dev, struct netpoll_info *ni)
 {
 	struct net_bridge *br = netdev_priv(dev);
 	struct net_bridge_port *p, *n;
-	const struct net_device_ops *ops;
+	int err = 0;
 
 	list_for_each_entry_safe(p, n, &br->port_list, list) {
-		if (p->dev) {
-			ops = p->dev->netdev_ops;
-			if (ops->ndo_netpoll_cleanup)
-				ops->ndo_netpoll_cleanup(p->dev);
-			else
-				p->dev->npinfo = NULL;
-		}
+		if (!p->dev)
+			continue;
+
+		err = br_netpoll_enable(p);
+		if (err)
+			goto fail;
 	}
+
+out:
+	return err;
+
+fail:
+	br_netpoll_cleanup(dev);
+	goto out;
 }
 
-void br_netpoll_disable(struct net_bridge *br,
-			struct net_device *dev)
+int br_netpoll_enable(struct net_bridge_port *p)
 {
-	if (br_devices_support_netpoll(br))
-		br->dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
-	if (dev->netdev_ops->ndo_netpoll_cleanup)
-		dev->netdev_ops->ndo_netpoll_cleanup(dev);
-	else
-		dev->npinfo = NULL;
+	struct netpoll *np;
+	int err = 0;
+
+	np = kzalloc(sizeof(*p->np), GFP_KERNEL);
+	err = -ENOMEM;
+	if (!np)
+		goto out;
+
+	np->dev = p->dev;
+
+	err = __netpoll_setup(np);
+	if (err) {
+		kfree(np);
+		goto out;
+	}
+
+	p->np = np;
+
+out:
+	return err;
 }
 
-void br_netpoll_enable(struct net_bridge *br,
-		       struct net_device *dev)
+void br_netpoll_disable(struct net_bridge_port *p)
 {
-	if (br_devices_support_netpoll(br)) {
-		br->dev->priv_flags &= ~IFF_DISABLE_NETPOLL;
-		if (br->dev->npinfo)
-			dev->npinfo = br->dev->npinfo;
-	} else if (!(br->dev->priv_flags & IFF_DISABLE_NETPOLL)) {
-		br->dev->priv_flags |= IFF_DISABLE_NETPOLL;
-		br_info(br,"new device %s does not support netpoll (disabling)",
-			dev->name);
-	}
+	struct netpoll *np = p->np;
+
+	if (!np)
+		return;
+
+	p->np = NULL;
+
+	/* Wait for transmitting packets to finish before freeing. */
+	synchronize_rcu_bh();
+
+	__netpoll_cleanup(np);
+	kfree(np);
 }
 
 #endif
@@ -293,6 +306,7 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_change_mtu		 = br_change_mtu,
 	.ndo_do_ioctl		 = br_dev_ioctl,
 #ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_netpoll_setup	 = br_netpoll_setup,
 	.ndo_netpoll_cleanup	 = br_netpoll_cleanup,
 	.ndo_poll_controller	 = br_poll_controller,
 #endif
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index a98ef13..6e97711 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -50,14 +50,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
 			kfree_skb(skb);
 		else {
 			skb_push(skb, ETH_HLEN);
-
-#ifdef CONFIG_NET_POLL_CONTROLLER
-			if (unlikely(skb->dev->priv_flags & IFF_IN_NETPOLL)) {
-				netpoll_send_skb(skb->dev->npinfo->netpoll, skb);
-				skb->dev->priv_flags &= ~IFF_IN_NETPOLL;
-			} else
-#endif
-				dev_queue_xmit(skb);
+			dev_queue_xmit(skb);
 		}
 	}
 
@@ -73,23 +66,20 @@ int br_forward_finish(struct sk_buff *skb)
 
 static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
 {
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	struct net_bridge *br = to->br;
-	if (unlikely(br->dev->priv_flags & IFF_IN_NETPOLL)) {
-		struct netpoll *np;
-		to->dev->npinfo = skb->dev->npinfo;
-		np = skb->dev->npinfo->netpoll;
-		np->real_dev = np->dev = to->dev;
-		to->dev->priv_flags |= IFF_IN_NETPOLL;
-	}
-#endif
 	skb->dev = to->dev;
+
+	if (unlikely(netpoll_tx_running(to->dev))) {
+		if (packet_length(skb) > skb->dev->mtu && !skb_is_gso(skb))
+			kfree_skb(skb);
+		else {
+			skb_push(skb, ETH_HLEN);
+			br_netpoll_send_skb(to, skb);
+		}
+		return;
+	}
+
 	NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
 		br_forward_finish);
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	if (skb->dev->npinfo)
-		skb->dev->npinfo->netpoll->dev = br->dev;
-#endif
 }
 
 static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 18b245e..13102e0 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -154,7 +154,8 @@ static void del_nbp(struct net_bridge_port *p)
 	kobject_uevent(&p->kobj, KOBJ_REMOVE);
 	kobject_del(&p->kobj);
 
-	br_netpoll_disable(br, dev);
+	br_netpoll_disable(p);
+
 	call_rcu(&p->rcu, destroy_nbp_rcu);
 }
 
@@ -167,8 +168,6 @@ static void del_br(struct net_bridge *br, struct list_head *head)
 		del_nbp(p);
 	}
 
-	br_netpoll_cleanup(br->dev);
-
 	del_timer_sync(&br->gc_timer);
 
 	br_sysfs_delbr(br->dev);
@@ -428,6 +427,9 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	if (err)
 		goto err2;
 
+	if (br_netpoll_info(br) && ((err = br_netpoll_enable(p))))
+		goto err3;
+
 	rcu_assign_pointer(dev->br_port, p);
 	dev_disable_lro(dev);
 
@@ -448,9 +450,9 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 
 	kobject_uevent(&p->kobj, KOBJ_ADD);
 
-	br_netpoll_enable(br, dev);
-
 	return 0;
+err3:
+	sysfs_remove_link(br->ifobj, p->dev->name);
 err2:
 	br_fdb_delete_by_port(br, p, 1);
 err1:
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0f4a74b..29f6d66 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -15,6 +15,7 @@
 
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
+#include <linux/netpoll.h>
 #include <net/route.h>
 
 #define BR_HASH_BITS 8
@@ -143,6 +144,10 @@ struct net_bridge_port
 #ifdef CONFIG_SYSFS
 	char				sysfs_name[IFNAMSIZ];
 #endif
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	struct netpoll			*np;
+#endif
 };
 
 struct br_cpu_netstats {
@@ -273,16 +278,41 @@ extern void br_dev_setup(struct net_device *dev);
 extern netdev_tx_t br_dev_xmit(struct sk_buff *skb,
 			       struct net_device *dev);
 #ifdef CONFIG_NET_POLL_CONTROLLER
-extern void br_netpoll_cleanup(struct net_device *dev);
-extern void br_netpoll_enable(struct net_bridge *br,
-			      struct net_device *dev);
-extern void br_netpoll_disable(struct net_bridge *br,
-			       struct net_device *dev);
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+	return br->dev->npinfo;
+}
+
+static inline void br_netpoll_send_skb(const struct net_bridge_port *p,
+				       struct sk_buff *skb)
+{
+	struct netpoll *np = p->np;
+
+	if (np)
+		netpoll_send_skb(np, skb);
+}
+
+extern int br_netpoll_enable(struct net_bridge_port *p);
+extern void br_netpoll_disable(struct net_bridge_port *p);
 #else
-#define br_netpoll_cleanup(br)
-#define br_netpoll_enable(br, dev)
-#define br_netpoll_disable(br, dev)
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+	return NULL;
+}
+
+static inline void br_netpoll_send_skb(struct net_bridge_port *p,
+				       struct sk_buff *skb)
+{
+}
 
+static inline int br_netpoll_enable(struct net_bridge_port *p)
+{
+	return 0;
+}
+
+static inline void br_netpoll_disable(struct net_bridge_port *p)
+{
+}
 #endif
 
 /* br_fdb.c */

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* fired  a  bug report  on  bugzilla.redhat.com
  2010-06-11  2:12           ` [PATCH 8/8] bridge: Fix netpoll support Herbert Xu
@ 2010-06-11  3:08             ` Qianfeng Zhang
  2010-06-15 10:28             ` [PATCH 8/8] bridge: Fix netpoll support Cong Wang
  1 sibling, 0 replies; 63+ messages in thread
From: Qianfeng Zhang @ 2010-06-11  3:08 UTC (permalink / raw)
  To: Herbert Xu, Michael S. Tsirkin, Stephen Hemminger
  Cc: David S. Miller, netdev, WANG Cong, Matt Mackall,
	Paul E. McKenney

Hi
Thanks everybody. I justly logged a bug report, here it is

https://bugzilla.redhat.com/show_bug.cgi?id=602927


Qianfeng  Zhang (张前锋) 
Senior  Solution  Architect
Linux Platform & Infrastructure
RedHat  Software,  Beijing
Phone:    010-65339374
Mobile:   13911823595
Email:   frzhang@redhat.com





^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-11  2:11         ` Herbert Xu
                             ` (7 preceding siblings ...)
  2010-06-11  2:12           ` [PATCH 8/8] bridge: Fix netpoll support Herbert Xu
@ 2010-06-11 20:03           ` Matt Mackall
  2010-06-15 10:17           ` Cong Wang
                             ` (2 subsequent siblings)
  11 siblings, 0 replies; 63+ messages in thread
From: Matt Mackall @ 2010-06-11 20:03 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stephen Hemminger, Michael S. Tsirkin, Qianfeng Zhang,
	David S. Miller, netdev, WANG Cong

On Fri, 2010-06-11 at 12:11 +1000, Herbert Xu wrote:
> On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> > On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> > >
> > > Okay, then add a comment where in_irq is used?
> > 
> > Actually let me put it into a wrapper.  I'll respin the patches.
> 
> OK here is a repost.  And this time it really is 8 patches :)
> I've tested it lightly.

These all look very nice to me, thanks for digging into this. 

Acked-by: Matt Mackall <mpm@selenic.com>

Hopefully we'll also get an ack from Wang Cong on them and another test
result.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/8] netpoll: Fix RCU usage
  2010-06-11  2:12           ` [PATCH 3/8] netpoll: Fix RCU usage Herbert Xu
@ 2010-06-11 23:10             ` Paul E. McKenney
  0 siblings, 0 replies; 63+ messages in thread
From: Paul E. McKenney @ 2010-06-11 23:10 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen Hemminger, Matt Mackall

On Fri, Jun 11, 2010 at 12:12:44PM +1000, Herbert Xu wrote:
> netpoll: Fix RCU usage
> 
> The use of RCU in netpoll is incorrect in a number of places:
> 
> 1) The initial setting is lacking a write barrier.
> 2) The synchronize_rcu is in the wrong place.
> 3) Read barriers are missing.
> 4) Some places are even missing rcu_read_lock.
> 5) npinfo is zeroed after freeing.
> 
> This patch fixes those issues.  As most users are in BH context,
> this also converts the RCU usage to the BH variant.

Looks good for me from an RCU viewpoint, but as usual I confess much
ignorance of the surrounding code.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> 
>  include/linux/netpoll.h |   13 ++++++++-----
>  net/core/netpoll.c      |   20 ++++++++++++--------
>  2 files changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> index e9e2312..95c9f7e 100644
> --- a/include/linux/netpoll.h
> +++ b/include/linux/netpoll.h
> @@ -57,12 +57,15 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
>  #ifdef CONFIG_NETPOLL
>  static inline bool netpoll_rx(struct sk_buff *skb)
>  {
> -	struct netpoll_info *npinfo = skb->dev->npinfo;
> +	struct netpoll_info *npinfo;
>  	unsigned long flags;
>  	bool ret = false;
> 
> +	rcu_read_lock_bh();
> +	npinfo = rcu_dereference(skb->dev->npinfo);
> +
>  	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
> -		return false;
> +		goto out;
> 
>  	spin_lock_irqsave(&npinfo->rx_lock, flags);
>  	/* check rx_flags again with the lock held */
> @@ -70,12 +73,14 @@ static inline bool netpoll_rx(struct sk_buff *skb)
>  		ret = true;
>  	spin_unlock_irqrestore(&npinfo->rx_lock, flags);
> 
> +out:
> +	rcu_read_unlock_bh();
>  	return ret;
>  }
> 
>  static inline int netpoll_rx_on(struct sk_buff *skb)
>  {
> -	struct netpoll_info *npinfo = skb->dev->npinfo;
> +	struct netpoll_info *npinfo = rcu_dereference(skb->dev->npinfo);
> 
>  	return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
>  }
> @@ -91,7 +96,6 @@ static inline void *netpoll_poll_lock(struct napi_struct *napi)
>  {
>  	struct net_device *dev = napi->dev;
> 
> -	rcu_read_lock(); /* deal with race on ->npinfo */
>  	if (dev && dev->npinfo) {
>  		spin_lock(&napi->poll_lock);
>  		napi->poll_owner = smp_processor_id();
> @@ -108,7 +112,6 @@ static inline void netpoll_poll_unlock(void *have)
>  		napi->poll_owner = -1;
>  		spin_unlock(&napi->poll_lock);
>  	}
> -	rcu_read_unlock();
>  }
> 
>  #else
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 748c930..4b7623d 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -292,6 +292,7 @@ void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
>  	unsigned long tries;
>  	struct net_device *dev = np->dev;
>  	const struct net_device_ops *ops = dev->netdev_ops;
> +	/* It is up to the caller to keep npinfo alive. */
>  	struct netpoll_info *npinfo = np->dev->npinfo;
> 
>  	if (!npinfo || !netif_running(dev) || !netif_device_present(dev)) {
> @@ -841,10 +842,7 @@ int netpoll_setup(struct netpoll *np)
>  	refill_skbs();
> 
>  	/* last thing to do is link it to the net device structure */
> -	ndev->npinfo = npinfo;
> -
> -	/* avoid racing with NAPI reading npinfo */
> -	synchronize_rcu();
> +	rcu_assign_pointer(ndev->npinfo, npinfo);
> 
>  	return 0;
> 
> @@ -888,6 +886,16 @@ void netpoll_cleanup(struct netpoll *np)
> 
>  			if (atomic_dec_and_test(&npinfo->refcnt)) {
>  				const struct net_device_ops *ops;
> +
> +				ops = np->dev->netdev_ops;
> +				if (ops->ndo_netpoll_cleanup)
> +					ops->ndo_netpoll_cleanup(np->dev);
> +
> +				rcu_assign_pointer(np->dev->npinfo, NULL);
> +
> +				/* avoid racing with NAPI reading npinfo */
> +				synchronize_rcu_bh();
> +
>  				skb_queue_purge(&npinfo->arp_tx);
>  				skb_queue_purge(&npinfo->txq);
>  				cancel_rearming_delayed_work(&npinfo->tx_work);
> @@ -895,10 +903,6 @@ void netpoll_cleanup(struct netpoll *np)
>  				/* clean after last, unfinished work */
>  				__skb_queue_purge(&npinfo->txq);
>  				kfree(npinfo);
> -				ops = np->dev->netdev_ops;
> -				if (ops->ndo_netpoll_cleanup)
> -					ops->ndo_netpoll_cleanup(np->dev);
> -				np->dev->npinfo = NULL;
>  			}
>  		}
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-11  2:11         ` Herbert Xu
                             ` (8 preceding siblings ...)
  2010-06-11 20:03           ` [0/8] netpoll/bridge fixes Matt Mackall
@ 2010-06-15 10:17           ` Cong Wang
  2010-06-15 18:39           ` David Miller
  2010-07-19 10:19           ` Michael S. Tsirkin
  11 siblings, 0 replies; 63+ messages in thread
From: Cong Wang @ 2010-06-15 10:17 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stephen Hemminger, Michael S. Tsirkin, Qianfeng Zhang,
	David S. Miller, netdev, Matt Mackall

On 06/11/10 10:11, Herbert Xu wrote:
> On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
>> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
>>>
>>> Okay, then add a comment where in_irq is used?
>>
>> Actually let me put it into a wrapper.  I'll respin the patches.
>
> OK here is a repost.  And this time it really is 8 patches :)
> I've tested it lightly.
>

(Sorry for the delay, I was on vacation.)

Thanks much for your nice work!

Patch 1-7 look great for me, so,

Reviewed-by: WANG Cong <amwang@redhat.com>

I don't quite understand patch 8, will reply to it separately.

Thank you.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 8/8] bridge: Fix netpoll support
  2010-06-11  2:12           ` [PATCH 8/8] bridge: Fix netpoll support Herbert Xu
  2010-06-11  3:08             ` fired a bug report on bugzilla.redhat.com Qianfeng Zhang
@ 2010-06-15 10:28             ` Cong Wang
  2010-06-17 10:38               ` Herbert Xu
  1 sibling, 1 reply; 63+ messages in thread
From: Cong Wang @ 2010-06-15 10:28 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	Stephen Hemminger, Matt Mackall, Paul E. McKenney

On 06/11/10 10:12, Herbert Xu wrote:
> bridge: Fix netpoll support
>
> There are multiple problems with the newly added netpoll support:
>
> 1) Use-after-free on each netpoll packet.
> 2) Invoking unsafe code on netpoll/IRQ path.
> 3) Breaks when netpoll is enabled on the underlying device.
>
> This patch fixes all of these problems.  In particular, we now
> allocate proper netpoll structures for each underlying device.
>
> We only allow netpoll to be enabled on the bridge when all the
> devices underneath it support netpoll.  Once it is enabled, we
> do not allow non-netpoll devices to join the bridge (until netpoll
> is disabled again).
>


This is a good idea!

> This allows us to do away with the npinfo juggling that caused
> problem number 1.
>
> Incidentally this patch fixes number 2 by bypassing unsafe code
> such as multicast snooping and netfilter.


Not sure if I understand problem 2) and 3), this patch is not easy
to review. So, what's the point of adding ->np to struct net_bridge_port?
since we already have p->dev->npinfo->netpoll?

Thanks!


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-11  2:11         ` Herbert Xu
                             ` (9 preceding siblings ...)
  2010-06-15 10:17           ` Cong Wang
@ 2010-06-15 18:39           ` David Miller
  2010-06-16  2:58             ` Eric Dumazet
  2010-07-19 10:19           ` Michael S. Tsirkin
  11 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2010-06-15 18:39 UTC (permalink / raw)
  To: herbert; +Cc: shemminger, mst, frzhang, netdev, amwang, mpm

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 11 Jun 2010 12:11:42 +1000

> On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
>> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
>> >
>> > Okay, then add a comment where in_irq is used?
>> 
>> Actually let me put it into a wrapper.  I'll respin the patches.
> 
> OK here is a repost.  And this time it really is 8 patches :)
> I've tested it lightly.

All applied to net-next-2.6, thanks Herbert.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-15 18:39           ` David Miller
@ 2010-06-16  2:58             ` Eric Dumazet
  2010-06-16  3:03               ` Eric Dumazet
  2010-06-16  5:08               ` Paul E. McKenney
  0 siblings, 2 replies; 63+ messages in thread
From: Eric Dumazet @ 2010-06-16  2:58 UTC (permalink / raw)
  To: David Miller, Paul E. McKenney
  Cc: herbert, shemminger, mst, frzhang, netdev, amwang, mpm

Le mardi 15 juin 2010 à 11:39 -0700, David Miller a écrit :
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Fri, 11 Jun 2010 12:11:42 +1000
> 
> > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> >> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> >> >
> >> > Okay, then add a comment where in_irq is used?
> >> 
> >> Actually let me put it into a wrapper.  I'll respin the patches.
> > 
> > OK here is a repost.  And this time it really is 8 patches :)
> > I've tested it lightly.
> 
> All applied to net-next-2.6, thanks Herbert.

Well...

[   52.914014] ===================================================
[   52.914018] [ INFO: suspicious rcu_dereference_check() usage. ]
[   52.914020] ---------------------------------------------------
[   52.914024] include/linux/netpoll.h:67 invoked rcu_dereference_check() without protection!
[   52.914027] 
[   52.914027] other info that might help us debug this:
[   52.914029] 
[   52.914031] 
[   52.914032] rcu_scheduler_active = 1, debug_locks = 1
[   52.914035] 4 locks held by swapper/0:
[   52.914037]  #0:  (&n->timer){+.-...}, at: [<c103fd95>] run_timer_softirq+0x1b8/0x419
[   52.914052]  #1:  (slock-AF_INET){+.....}, at: [<c12f2b3d>] icmp_send+0x149/0x58b
[   52.914063]  #2:  (rcu_read_lock_bh){.+....}, at: [<c129978d>] dev_queue_xmit+0xf7/0x5df
[   52.914073]  #3:  (rcu_read_lock_bh){.+....}, at: [<c12977ae>] netif_rx+0x0/0x195
[   52.914081] 
[   52.914081] stack backtrace:
[   52.914086] Pid: 0, comm: swapper Not tainted 2.6.35-rc1-00508-gdbe3a24-dirty #78
[   52.914089] Call Trace:
[   52.914095]  [<c132cf0c>] ? printk+0xf/0x13
[   52.914103]  [<c1059ac6>] lockdep_rcu_dereference+0x74/0x7d
[   52.914107]  [<c1297819>] netif_rx+0x6b/0x195
[   52.914111]  [<c129978d>] ? dev_queue_xmit+0xf7/0x5df
[   52.914117]  [<c1240775>] loopback_xmit+0x4a/0x70
[   52.914122]  [<c12995cf>] dev_hard_start_xmit+0x25b/0x322
[   52.914126]  [<c1299b5b>] dev_queue_xmit+0x4c5/0x5df
[   52.914131]  [<c105ccf7>] ? trace_hardirqs_on+0xb/0xd
[   52.914135]  [<c129f611>] neigh_resolve_output+0x2e8/0x33f
[   52.914142]  [<c12a8b2a>] ? eth_header+0x0/0x8e
[   52.914147]  [<c12d3dbb>] ip_finish_output+0x323/0x3b1
[   52.914152]  [<c103955f>] ? local_bh_enable_ip+0x97/0xad
[   52.914156]  [<c12d485d>] ip_output+0xe2/0xfe
[   52.914160]  [<c12d3ff5>] ip_local_out+0x41/0x55
[   52.914164]  [<c12d5755>] ip_push_pending_frames+0x284/0x2fa
[   52.914169]  [<c12f218d>] icmp_push_reply+0xe8/0xf3
[   52.914174]  [<c12f2f36>] icmp_send+0x542/0x58b
[   52.914181]  [<c102b6af>] ? find_busiest_group+0x1c9/0x631
[   52.914188]  [<c12cb280>] ipv4_link_failure+0x17/0x7b
[   52.914193]  [<c12f0841>] arp_error_report+0x46/0x61
[   52.914197]  [<c129f8e0>] neigh_invalidate+0x68/0x80
[   52.914201]  [<c12a0bef>] neigh_timer_handler+0x124/0x1d2
[   52.914206]  [<c103fe7b>] run_timer_softirq+0x29e/0x419
[   52.914210]  [<c12a0acb>] ? neigh_timer_handler+0x0/0x1d2
[   52.914215]  [<c1039a21>] __do_softirq+0x126/0x277
[   52.914219]  [<c10398fb>] ? __do_softirq+0x0/0x277
[   52.914222]  <IRQ>  [<c1039c0d>] ? irq_exit+0x38/0x74
[   52.914230]  [<c1003d1f>] ? do_IRQ+0x87/0x9b
[   52.914235]  [<c1002d2e>] ? common_interrupt+0x2e/0x34
[   52.914241]  [<c105007b>] ? sched_clock_local+0x3f/0x11f
[   52.914249]  [<c11ba45b>] ? acpi_idle_enter_bm+0x271/0x2a0
[   52.914256]  [<c12797bd>] ? cpuidle_idle_call+0x76/0x151
[   52.914261]  [<c1001565>] ? cpu_idle+0x49/0x76
[   52.914266]  [<c1319ece>] ? rest_init+0xd6/0xdb
[   52.914274]  [<c156579f>] ? start_kernel+0x31b/0x320
[   52.914278]  [<c15650c9>] ? i386_start_kernel+0xc9/0xd0


Paul, could you please explain if current lockdep rules are correct, or could be relaxed ?

I thought :

rcu_read_lock_bh();

was a shorthand to

local_disable_bh();
rcu_read_lock();

Why lockdep is not able to make a correct diagnostic ?

Thanks

[PATCH net-next-2.6] netpoll: Fix one rcu_dereference() lockdep splat

lockdep doesnt allow yet following  construct :

rcu_read_lock_bh();
npinfo = rcu_dereference(skb->dev->npinfo);

Fix lockdep splat using rcu_dereference_bh()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netpoll.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 4c77fe7..472365e 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -64,7 +64,7 @@ static inline bool netpoll_rx(struct sk_buff *skb)
 	bool ret = false;
 
 	rcu_read_lock_bh();
-	npinfo = rcu_dereference(skb->dev->npinfo);
+	npinfo = rcu_dereference_bh(skb->dev->npinfo);
 
 	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
 		goto out;



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  2:58             ` Eric Dumazet
@ 2010-06-16  3:03               ` Eric Dumazet
  2010-06-16  3:33                 ` Herbert Xu
  2010-06-16  5:08               ` Paul E. McKenney
  1 sibling, 1 reply; 63+ messages in thread
From: Eric Dumazet @ 2010-06-16  3:03 UTC (permalink / raw)
  To: David Miller
  Cc: Paul E. McKenney, herbert, shemminger, mst, frzhang, netdev,
	amwang, mpm

Le mercredi 16 juin 2010 à 04:59 +0200, Eric Dumazet a écrit :
> Le mardi 15 juin 2010 à 11:39 -0700, David Miller a écrit :
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Fri, 11 Jun 2010 12:11:42 +1000
> > 
> > > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> > >> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> > >> >
> > >> > Okay, then add a comment where in_irq is used?
> > >> 
> > >> Actually let me put it into a wrapper.  I'll respin the patches.
> > > 
> > > OK here is a repost.  And this time it really is 8 patches :)
> > > I've tested it lightly.
> > 
> > All applied to net-next-2.6, thanks Herbert.
> 

For this second splat, I dont know yet how to fix it, its 5 in the
morning here, I need a sleep ;)

At this point, no rcu_lock is held.

I wonder how these patches were tested, Herbert ?

[   74.431712] 
[   74.431713] ===================================================
[   74.431717] [ INFO: suspicious rcu_dereference_check() usage. ]
[   74.431719] ---------------------------------------------------
[   74.431722] include/linux/netpoll.h:85 invoked rcu_dereference_check() without protection!
[   74.431725] 
[   74.431726] other info that might help us debug this:
[   74.431727] 
[   74.431730] 
[   74.431730] rcu_scheduler_active = 1, debug_locks = 1
[   74.431733] no locks held by swapper/0.
[   74.431735] 
[   74.431736] stack backtrace:
[   74.431739] Pid: 0, comm: swapper Not tainted 2.6.35-rc1-00508-gdbe3a24-dirty #78
[   74.431742] Call Trace:
[   74.431748]  [<c132cf0c>] ? printk+0xf/0x13
[   74.431754]  [<c1059ac6>] lockdep_rcu_dereference+0x74/0x7d
[   74.431759]  [<c1297628>] __napi_gro_receive+0x4d/0xf6
[   74.431764]  [<c12977a3>] napi_gro_receive+0x19/0x24
[   74.431775]  [<f805d83f>] bnx2x_rx_int+0x101b/0x124e [bnx2x]
[   74.431781]  [<c1050ffc>] ? async_thread+0x198/0x1de
[   74.431787]  [<c129580f>] ? net_tx_action+0x9a/0x12a
[   74.431797]  [<f805f267>] bnx2x_poll+0x5d/0x18b [bnx2x]
[   74.431801]  [<c1297360>] ? net_rx_action+0x1e4/0x21a
[   74.431805]  [<c105ccb2>] ? trace_hardirqs_on_caller+0xe2/0x11c
[   74.431810]  [<c1297218>] net_rx_action+0x9c/0x21a
[   74.431814]  [<c1039a21>] __do_softirq+0x126/0x277
[   74.431819]  [<c10398fb>] ? __do_softirq+0x0/0x277
[   74.431821]  <IRQ>  [<c1039c0d>] ? irq_exit+0x38/0x74
[   74.431828]  [<c1003d1f>] ? do_IRQ+0x87/0x9b
[   74.431833]  [<c1002d2e>] ? common_interrupt+0x2e/0x34
[   74.431838]  [<c105007b>] ? sched_clock_local+0x3f/0x11f
[   74.431843]  [<c11ba45b>] ? acpi_idle_enter_bm+0x271/0x2a0
[   74.431848]  [<c12797bd>] ? cpuidle_idle_call+0x76/0x151
[   74.431852]  [<c1001565>] ? cpu_idle+0x49/0x76
[   74.431857]  [<c1319ece>] ? rest_init+0xd6/0xdb
[   74.431861]  [<c156579f>] ? start_kernel+0x31b/0x320
[   74.431865]  [<c15650c9>] ? i386_start_kernel+0xc9/0xd0




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  3:03               ` Eric Dumazet
@ 2010-06-16  3:33                 ` Herbert Xu
  2010-06-16  4:47                   ` David Miller
  2010-06-16  6:16                   ` Eric Dumazet
  0 siblings, 2 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-16  3:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Paul E. McKenney, shemminger, mst, frzhang, netdev,
	amwang, mpm

On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
>
> I wonder how these patches were tested, Herbert ?

You know, not everyone enables RCU debugging...

Anyway, this patch should fix the problems you've spotted.

netpoll: Use correct primitives for RCU dereferencing

Now that RCU debugging checks for matching rcu_dereference calls
and rcu_read_lock, we need to use the correct primitives or face
nasty warnings.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 4c77fe7..413742c 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -64,7 +64,7 @@ static inline bool netpoll_rx(struct sk_buff *skb)
 	bool ret = false;
 
 	rcu_read_lock_bh();
-	npinfo = rcu_dereference(skb->dev->npinfo);
+	npinfo = rcu_dereference_bh(skb->dev->npinfo);
 
 	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
 		goto out;
@@ -82,7 +82,7 @@ out:
 
 static inline int netpoll_rx_on(struct sk_buff *skb)
 {
-	struct netpoll_info *npinfo = rcu_dereference(skb->dev->npinfo);
+	struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo);
 
 	return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
 }

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  3:33                 ` Herbert Xu
@ 2010-06-16  4:47                   ` David Miller
  2010-06-16 23:02                     ` Paul E. McKenney
  2010-06-16  6:16                   ` Eric Dumazet
  1 sibling, 1 reply; 63+ messages in thread
From: David Miller @ 2010-06-16  4:47 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, paulmck, shemminger, mst, frzhang, netdev, amwang,
	mpm

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 16 Jun 2010 13:33:36 +1000

> On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
>>
>> I wonder how these patches were tested, Herbert ?
> 
> You know, not everyone enables RCU debugging...

Even though I'm as guilty as you, I have to agree with Eric that
especially us core folks should be running with the various lock
debugging options on all the time.

Maybe someone should add the RCU debugging config option to
Documentation/SubmitChecklist :-)

> Anyway, this patch should fix the problems you've spotted.

> netpoll: Use correct primitives for RCU dereferencing
> 
> Now that RCU debugging checks for matching rcu_dereference calls
> and rcu_read_lock, we need to use the correct primitives or face
> nasty warnings.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks guys!


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  2:58             ` Eric Dumazet
  2010-06-16  3:03               ` Eric Dumazet
@ 2010-06-16  5:08               ` Paul E. McKenney
  2010-06-16  6:21                 ` Eric Dumazet
  1 sibling, 1 reply; 63+ messages in thread
From: Paul E. McKenney @ 2010-06-16  5:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, herbert, shemminger, mst, frzhang, netdev, amwang,
	mpm

On Wed, Jun 16, 2010 at 04:58:59AM +0200, Eric Dumazet wrote:
> Le mardi 15 juin 2010 à 11:39 -0700, David Miller a écrit :
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Fri, 11 Jun 2010 12:11:42 +1000
> > 
> > > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> > >> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> > >> >
> > >> > Okay, then add a comment where in_irq is used?
> > >> 
> > >> Actually let me put it into a wrapper.  I'll respin the patches.
> > > 
> > > OK here is a repost.  And this time it really is 8 patches :)
> > > I've tested it lightly.
> > 
> > All applied to net-next-2.6, thanks Herbert.
> 
> Well...
> 
> [   52.914014] ===================================================
> [   52.914018] [ INFO: suspicious rcu_dereference_check() usage. ]
> [   52.914020] ---------------------------------------------------
> [   52.914024] include/linux/netpoll.h:67 invoked rcu_dereference_check() without protection!
> [   52.914027] 
> [   52.914027] other info that might help us debug this:
> [   52.914029] 
> [   52.914031] 
> [   52.914032] rcu_scheduler_active = 1, debug_locks = 1
> [   52.914035] 4 locks held by swapper/0:
> [   52.914037]  #0:  (&n->timer){+.-...}, at: [<c103fd95>] run_timer_softirq+0x1b8/0x419
> [   52.914052]  #1:  (slock-AF_INET){+.....}, at: [<c12f2b3d>] icmp_send+0x149/0x58b
> [   52.914063]  #2:  (rcu_read_lock_bh){.+....}, at: [<c129978d>] dev_queue_xmit+0xf7/0x5df
> [   52.914073]  #3:  (rcu_read_lock_bh){.+....}, at: [<c12977ae>] netif_rx+0x0/0x195
> [   52.914081] 
> [   52.914081] stack backtrace:
> [   52.914086] Pid: 0, comm: swapper Not tainted 2.6.35-rc1-00508-gdbe3a24-dirty #78
> [   52.914089] Call Trace:
> [   52.914095]  [<c132cf0c>] ? printk+0xf/0x13
> [   52.914103]  [<c1059ac6>] lockdep_rcu_dereference+0x74/0x7d
> [   52.914107]  [<c1297819>] netif_rx+0x6b/0x195
> [   52.914111]  [<c129978d>] ? dev_queue_xmit+0xf7/0x5df
> [   52.914117]  [<c1240775>] loopback_xmit+0x4a/0x70
> [   52.914122]  [<c12995cf>] dev_hard_start_xmit+0x25b/0x322
> [   52.914126]  [<c1299b5b>] dev_queue_xmit+0x4c5/0x5df
> [   52.914131]  [<c105ccf7>] ? trace_hardirqs_on+0xb/0xd
> [   52.914135]  [<c129f611>] neigh_resolve_output+0x2e8/0x33f
> [   52.914142]  [<c12a8b2a>] ? eth_header+0x0/0x8e
> [   52.914147]  [<c12d3dbb>] ip_finish_output+0x323/0x3b1
> [   52.914152]  [<c103955f>] ? local_bh_enable_ip+0x97/0xad
> [   52.914156]  [<c12d485d>] ip_output+0xe2/0xfe
> [   52.914160]  [<c12d3ff5>] ip_local_out+0x41/0x55
> [   52.914164]  [<c12d5755>] ip_push_pending_frames+0x284/0x2fa
> [   52.914169]  [<c12f218d>] icmp_push_reply+0xe8/0xf3
> [   52.914174]  [<c12f2f36>] icmp_send+0x542/0x58b
> [   52.914181]  [<c102b6af>] ? find_busiest_group+0x1c9/0x631
> [   52.914188]  [<c12cb280>] ipv4_link_failure+0x17/0x7b
> [   52.914193]  [<c12f0841>] arp_error_report+0x46/0x61
> [   52.914197]  [<c129f8e0>] neigh_invalidate+0x68/0x80
> [   52.914201]  [<c12a0bef>] neigh_timer_handler+0x124/0x1d2
> [   52.914206]  [<c103fe7b>] run_timer_softirq+0x29e/0x419
> [   52.914210]  [<c12a0acb>] ? neigh_timer_handler+0x0/0x1d2
> [   52.914215]  [<c1039a21>] __do_softirq+0x126/0x277
> [   52.914219]  [<c10398fb>] ? __do_softirq+0x0/0x277
> [   52.914222]  <IRQ>  [<c1039c0d>] ? irq_exit+0x38/0x74
> [   52.914230]  [<c1003d1f>] ? do_IRQ+0x87/0x9b
> [   52.914235]  [<c1002d2e>] ? common_interrupt+0x2e/0x34
> [   52.914241]  [<c105007b>] ? sched_clock_local+0x3f/0x11f
> [   52.914249]  [<c11ba45b>] ? acpi_idle_enter_bm+0x271/0x2a0
> [   52.914256]  [<c12797bd>] ? cpuidle_idle_call+0x76/0x151
> [   52.914261]  [<c1001565>] ? cpu_idle+0x49/0x76
> [   52.914266]  [<c1319ece>] ? rest_init+0xd6/0xdb
> [   52.914274]  [<c156579f>] ? start_kernel+0x31b/0x320
> [   52.914278]  [<c15650c9>] ? i386_start_kernel+0xc9/0xd0
> 
> 
> Paul, could you please explain if current lockdep rules are correct, or could be relaxed ?
> 
> I thought :
> 
> rcu_read_lock_bh();
> 
> was a shorthand to
> 
> local_disable_bh();
> rcu_read_lock();

In CONFIG_TREE_RCU and CONFIG_TINY_RCU, rcu_read_lock_bh() is actually
shorthand for only local_disable_bh().  Therefore, rcu_dereference()
will scream if only rcu_read_lock_bh() is held.

However, in CONFIG_PREEMPT_TREE_RCU, rcu_read_lock_bh() is its own
mechanism that does local_disable_bh() but has its own set of grace
periods, independent of those of rcu_read_lock().

> Why lockdep is not able to make a correct diagnostic ?

Here is the situation I am concerned about:

o	Task 0 does rcu_read_lock(), then p=rcu_dereference_bh().
	If we make the change you are asking for, rcu_dereference_bh()
	is OK with this.

o	Task 0 now is preempted before finishing its RCU read-side
	critical section.

o	Task 1 removes the data element referenced by pointer p,
	then invokes synchronize_rcu_bh().

o	Task 0 does not block synchronize_rcu_bh(), so the grace
	period completes.

o	Task 1 frees up the data element referenced by pointer p,
	which might be reallocated as some other type, unmapped,
	or whatever else.

o	Task 0 resumes, and is sadly disappointed when the data
	element referenced by pointer p has been swept out from
	under it.

Or am I missing something here?

							Thanx, Paul

> Thanks
> 
> [PATCH net-next-2.6] netpoll: Fix one rcu_dereference() lockdep splat
> 
> lockdep doesnt allow yet following  construct :
> 
> rcu_read_lock_bh();
> npinfo = rcu_dereference(skb->dev->npinfo);
> 
> Fix lockdep splat using rcu_dereference_bh()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/linux/netpoll.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> index 4c77fe7..472365e 100644
> --- a/include/linux/netpoll.h
> +++ b/include/linux/netpoll.h
> @@ -64,7 +64,7 @@ static inline bool netpoll_rx(struct sk_buff *skb)
>  	bool ret = false;
> 
>  	rcu_read_lock_bh();
> -	npinfo = rcu_dereference(skb->dev->npinfo);
> +	npinfo = rcu_dereference_bh(skb->dev->npinfo);
> 
>  	if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
>  		goto out;
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  3:33                 ` Herbert Xu
  2010-06-16  4:47                   ` David Miller
@ 2010-06-16  6:16                   ` Eric Dumazet
  1 sibling, 0 replies; 63+ messages in thread
From: Eric Dumazet @ 2010-06-16  6:16 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, Paul E. McKenney, shemminger, mst, frzhang, netdev,
	amwang, mpm

Le mercredi 16 juin 2010 à 13:33 +1000, Herbert Xu a écrit :
> On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
> >
> > I wonder how these patches were tested, Herbert ?
> 
> You know, not everyone enables RCU debugging...
> 

Hmm, this was not an attack Herbert, just a suggestion, I now turn on
RCU lockdep debugging when doing RCU changes, it can helps ;)

> Anyway, this patch should fix the problems you've spotted.
> 

Thanks


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  5:08               ` Paul E. McKenney
@ 2010-06-16  6:21                 ` Eric Dumazet
  2010-06-16 16:01                   ` Paul E. McKenney
  0 siblings, 1 reply; 63+ messages in thread
From: Eric Dumazet @ 2010-06-16  6:21 UTC (permalink / raw)
  To: paulmck
  Cc: David Miller, herbert, shemminger, mst, frzhang, netdev, amwang,
	mpm

Le mardi 15 juin 2010 à 22:08 -0700, Paul E. McKenney a écrit :
> On Wed, Jun 16, 2010 at 04:58:59AM +0200, Eric Dumazet wrote:
> > 
> > Paul, could you please explain if current lockdep rules are correct, or could be relaxed ?
> > 
> > I thought :
> > 
> > rcu_read_lock_bh();
> > 
> > was a shorthand to
> > 
> > local_disable_bh();
> > rcu_read_lock();
> 
> In CONFIG_TREE_RCU and CONFIG_TINY_RCU, rcu_read_lock_bh() is actually
> shorthand for only local_disable_bh().  Therefore, rcu_dereference()
> will scream if only rcu_read_lock_bh() is held.
> 
> However, in CONFIG_PREEMPT_TREE_RCU, rcu_read_lock_bh() is its own
> mechanism that does local_disable_bh() but has its own set of grace
> periods, independent of those of rcu_read_lock().
> 
> > Why lockdep is not able to make a correct diagnostic ?
> 
> Here is the situation I am concerned about:
> 
> o	Task 0 does rcu_read_lock(), then p=rcu_dereference_bh().
> 	If we make the change you are asking for, rcu_dereference_bh()
> 	is OK with this.
> 
> o	Task 0 now is preempted before finishing its RCU read-side
> 	critical section.
> 
> o	Task 1 removes the data element referenced by pointer p,
> 	then invokes synchronize_rcu_bh().
> 
> o	Task 0 does not block synchronize_rcu_bh(), so the grace
> 	period completes.
> 
> o	Task 1 frees up the data element referenced by pointer p,
> 	which might be reallocated as some other type, unmapped,
> 	or whatever else.
> 
> o	Task 0 resumes, and is sadly disappointed when the data
> 	element referenced by pointer p has been swept out from
> 	under it.
> 
> Or am I missing something here?
> 

Nice thing with RCU is that I learn new things every day ;)

Thanks Paul, I'll try to remember all the details ! ;)




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  6:21                 ` Eric Dumazet
@ 2010-06-16 16:01                   ` Paul E. McKenney
  0 siblings, 0 replies; 63+ messages in thread
From: Paul E. McKenney @ 2010-06-16 16:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, herbert, shemminger, mst, frzhang, netdev, amwang,
	mpm

On Wed, Jun 16, 2010 at 08:21:21AM +0200, Eric Dumazet wrote:
> Le mardi 15 juin 2010 à 22:08 -0700, Paul E. McKenney a écrit :
> > On Wed, Jun 16, 2010 at 04:58:59AM +0200, Eric Dumazet wrote:
> > > 
> > > Paul, could you please explain if current lockdep rules are correct, or could be relaxed ?
> > > 
> > > I thought :
> > > 
> > > rcu_read_lock_bh();
> > > 
> > > was a shorthand to
> > > 
> > > local_disable_bh();
> > > rcu_read_lock();
> > 
> > In CONFIG_TREE_RCU and CONFIG_TINY_RCU, rcu_read_lock_bh() is actually
> > shorthand for only local_disable_bh().  Therefore, rcu_dereference()
> > will scream if only rcu_read_lock_bh() is held.
> > 
> > However, in CONFIG_PREEMPT_TREE_RCU, rcu_read_lock_bh() is its own
> > mechanism that does local_disable_bh() but has its own set of grace
> > periods, independent of those of rcu_read_lock().
> > 
> > > Why lockdep is not able to make a correct diagnostic ?
> > 
> > Here is the situation I am concerned about:
> > 
> > o	Task 0 does rcu_read_lock(), then p=rcu_dereference_bh().
> > 	If we make the change you are asking for, rcu_dereference_bh()
> > 	is OK with this.
> > 
> > o	Task 0 now is preempted before finishing its RCU read-side
> > 	critical section.
> > 
> > o	Task 1 removes the data element referenced by pointer p,
> > 	then invokes synchronize_rcu_bh().
> > 
> > o	Task 0 does not block synchronize_rcu_bh(), so the grace
> > 	period completes.
> > 
> > o	Task 1 frees up the data element referenced by pointer p,
> > 	which might be reallocated as some other type, unmapped,
> > 	or whatever else.
> > 
> > o	Task 0 resumes, and is sadly disappointed when the data
> > 	element referenced by pointer p has been swept out from
> > 	under it.
> > 
> > Or am I missing something here?
> > 
> 
> Nice thing with RCU is that I learn new things every day ;)
> 
> Thanks Paul, I'll try to remember all the details ! ;)

;-)

But just to be clear...  All but one use of RCU-bh is in networking,
so if you guys need something different from RCU-bh, let's talk!

And I learn something new about RCU every day as well.  One of today's
lessons is that networking is no longer the only user of RCU-bh.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16  4:47                   ` David Miller
@ 2010-06-16 23:02                     ` Paul E. McKenney
  2010-06-17 10:18                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 63+ messages in thread
From: Paul E. McKenney @ 2010-06-16 23:02 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, eric.dumazet, shemminger, mst, frzhang, netdev, amwang,
	mpm

On Tue, Jun 15, 2010 at 09:47:02PM -0700, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Wed, 16 Jun 2010 13:33:36 +1000
> 
> > On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
> >>
> >> I wonder how these patches were tested, Herbert ?
> > 
> > You know, not everyone enables RCU debugging...
> 
> Even though I'm as guilty as you, I have to agree with Eric that
> especially us core folks should be running with the various lock
> debugging options on all the time.
> 
> Maybe someone should add the RCU debugging config option to
> Documentation/SubmitChecklist :-)

How about the following added to Documentation/RCU/checklist.txt?

The first is in mainline, the second partly there, and the third
is still languishing in my tree.  I did manage to remove a dependency
on other maintainers, so things will hopefully move a bit faster.

							Thanx, Paul

diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 790d1a8..c7c6788 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -365,3 +365,26 @@ over a rather long period of time, but improvements are always welcome!
 	and the compiler to freely reorder code into and out of RCU
 	read-side critical sections.  It is the responsibility of the
 	RCU update-side primitives to deal with this.
+
+17.	Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
+	the __rcu sparse checks to validate your RCU code.  These
+	can help find problems as follows:
+
+	CONFIG_PROVE_RCU: check that accesses to RCU-protected data
+		structures are carried out under the proper RCU
+		read-side critical section, while holding the right
+		combination of locks, or whatever other conditions
+		are appropriate.
+
+	CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
+		same object to call_rcu() (or friends) before an RCU
+		grace period has elapsed since the last time that you
+		passed that same object to call_rcu() (or friends).
+
+	__rcu sparse checks: tag the pointer to the RCU-protected data
+		structure with __rcu, and sparse will warn you if you
+		access that pointer without the services of one of the
+		variants of rcu_dereference().
+
+	These debugging aids can help you find problems that are
+	otherwise extremely difficult to spot.

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-16 23:02                     ` Paul E. McKenney
@ 2010-06-17 10:18                       ` Michael S. Tsirkin
  2010-06-17 21:26                         ` Paul E. McKenney
  0 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2010-06-17 10:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Miller, herbert, eric.dumazet, shemminger, frzhang, netdev,
	amwang, mpm

On Wed, Jun 16, 2010 at 04:02:49PM -0700, Paul E. McKenney wrote:
> On Tue, Jun 15, 2010 at 09:47:02PM -0700, David Miller wrote:
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Wed, 16 Jun 2010 13:33:36 +1000
> > 
> > > On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
> > >>
> > >> I wonder how these patches were tested, Herbert ?
> > > 
> > > You know, not everyone enables RCU debugging...
> > 
> > Even though I'm as guilty as you, I have to agree with Eric that
> > especially us core folks should be running with the various lock
> > debugging options on all the time.
> > 
> > Maybe someone should add the RCU debugging config option to
> > Documentation/SubmitChecklist :-)
> 
> How about the following added to Documentation/RCU/checklist.txt?
> 
> The first is in mainline, the second partly there, and the third
> is still languishing in my tree.  I did manage to remove a dependency
> on other maintainers, so things will hopefully move a bit faster.
> 
> 							Thanx, Paul
> 
> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> index 790d1a8..c7c6788 100644
> --- a/Documentation/RCU/checklist.txt
> +++ b/Documentation/RCU/checklist.txt
> @@ -365,3 +365,26 @@ over a rather long period of time, but improvements are always welcome!
>  	and the compiler to freely reorder code into and out of RCU
>  	read-side critical sections.  It is the responsibility of the
>  	RCU update-side primitives to deal with this.
> +
> +17.	Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
> +	the __rcu sparse checks to validate your RCU code.  These
> +	can help find problems as follows:
> +
> +	CONFIG_PROVE_RCU: check that accesses to RCU-protected data
> +		structures are carried out under the proper RCU
> +		read-side critical section, while holding the right
> +		combination of locks, or whatever other conditions
> +		are appropriate.
> +
> +	CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
> +		same object to call_rcu() (or friends) before an RCU
> +		grace period has elapsed since the last time that you
> +		passed that same object to call_rcu() (or friends).
> +

Cool, will this also work with synchronize etc?

> +	__rcu sparse checks: tag the pointer to the RCU-protected data
> +		structure with __rcu, and sparse will warn you if you
> +		access that pointer without the services of one of the
> +		variants of rcu_dereference().
> +
> +	These debugging aids can help you find problems that are
> +	otherwise extremely difficult to spot.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 8/8] bridge: Fix netpoll support
  2010-06-15 10:28             ` [PATCH 8/8] bridge: Fix netpoll support Cong Wang
@ 2010-06-17 10:38               ` Herbert Xu
  2010-06-17 10:57                 ` Cong Wang
  0 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-06-17 10:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	Stephen Hemminger, Matt Mackall, Paul E. McKenney

On Tue, Jun 15, 2010 at 06:28:17PM +0800, Cong Wang wrote:
>
>> This allows us to do away with the npinfo juggling that caused
>> problem number 1.
>>
>> Incidentally this patch fixes number 2 by bypassing unsafe code
>> such as multicast snooping and netfilter.
>
> Not sure if I understand problem 2) and 3), this patch is not easy
> to review. So, what's the point of adding ->np to struct net_bridge_port?
> since we already have p->dev->npinfo->netpoll?

A netpoll_info structure always maps to one and only one net_device
structure.  While each net_device may have multiple netpoll
structures attached.

You must not share netpoll or netpoll_info structures between
devices since that breaks other users (netconsole or anything
else) from attaching to those devices.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 8/8] bridge: Fix netpoll support
  2010-06-17 10:57                 ` Cong Wang
@ 2010-06-17 10:55                   ` Herbert Xu
  2010-06-18  3:06                     ` Cong Wang
  0 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-06-17 10:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	Stephen Hemminger, Matt Mackall, Paul E. McKenney

On Thu, Jun 17, 2010 at 06:57:28PM +0800, Cong Wang wrote:
>
> Hmm, I get it now. So this helps to fix problem 3)?

Yes, by allocating real netpoll structures for each device that
we're polling, instead of sharing a single one amongst all of
them.

This is also the basis of the solution of the use-after-free bug.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 8/8] bridge: Fix netpoll support
  2010-06-17 10:38               ` Herbert Xu
@ 2010-06-17 10:57                 ` Cong Wang
  2010-06-17 10:55                   ` Herbert Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Cong Wang @ 2010-06-17 10:57 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	Stephen Hemminger, Matt Mackall, Paul E. McKenney

On 06/17/10 18:38, Herbert Xu wrote:
> On Tue, Jun 15, 2010 at 06:28:17PM +0800, Cong Wang wrote:
>>
>>> This allows us to do away with the npinfo juggling that caused
>>> problem number 1.
>>>
>>> Incidentally this patch fixes number 2 by bypassing unsafe code
>>> such as multicast snooping and netfilter.
>>
>> Not sure if I understand problem 2) and 3), this patch is not easy
>> to review. So, what's the point of adding ->np to struct net_bridge_port?
>> since we already have p->dev->npinfo->netpoll?
>
> A netpoll_info structure always maps to one and only one net_device
> structure.  While each net_device may have multiple netpoll
> structures attached.
>
> You must not share netpoll or netpoll_info structures between
> devices since that breaks other users (netconsole or anything
> else) from attaching to those devices.
>

Thanks for explanation.

Hmm, I get it now. So this helps to fix problem 3)?



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-17 10:18                       ` Michael S. Tsirkin
@ 2010-06-17 21:26                         ` Paul E. McKenney
  0 siblings, 0 replies; 63+ messages in thread
From: Paul E. McKenney @ 2010-06-17 21:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, herbert, eric.dumazet, shemminger, frzhang, netdev,
	amwang, mpm

On Thu, Jun 17, 2010 at 01:18:30PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 16, 2010 at 04:02:49PM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 15, 2010 at 09:47:02PM -0700, David Miller wrote:
> > > From: Herbert Xu <herbert@gondor.apana.org.au>
> > > Date: Wed, 16 Jun 2010 13:33:36 +1000
> > > 
> > > > On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
> > > >>
> > > >> I wonder how these patches were tested, Herbert ?
> > > > 
> > > > You know, not everyone enables RCU debugging...
> > > 
> > > Even though I'm as guilty as you, I have to agree with Eric that
> > > especially us core folks should be running with the various lock
> > > debugging options on all the time.
> > > 
> > > Maybe someone should add the RCU debugging config option to
> > > Documentation/SubmitChecklist :-)
> > 
> > How about the following added to Documentation/RCU/checklist.txt?
> > 
> > The first is in mainline, the second partly there, and the third
> > is still languishing in my tree.  I did manage to remove a dependency
> > on other maintainers, so things will hopefully move a bit faster.
> > 
> > 							Thanx, Paul
> > 
> > diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> > index 790d1a8..c7c6788 100644
> > --- a/Documentation/RCU/checklist.txt
> > +++ b/Documentation/RCU/checklist.txt
> > @@ -365,3 +365,26 @@ over a rather long period of time, but improvements are always welcome!
> >  	and the compiler to freely reorder code into and out of RCU
> >  	read-side critical sections.  It is the responsibility of the
> >  	RCU update-side primitives to deal with this.
> > +
> > +17.	Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
> > +	the __rcu sparse checks to validate your RCU code.  These
> > +	can help find problems as follows:
> > +
> > +	CONFIG_PROVE_RCU: check that accesses to RCU-protected data
> > +		structures are carried out under the proper RCU
> > +		read-side critical section, while holding the right
> > +		combination of locks, or whatever other conditions
> > +		are appropriate.
> > +
> > +	CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
> > +		same object to call_rcu() (or friends) before an RCU
> > +		grace period has elapsed since the last time that you
> > +		passed that same object to call_rcu() (or friends).
> > +
> 
> Cool, will this also work with synchronize etc?

Unfortunately, it will not.  With call_rcu() and friends you can tag
the struct rcu_head and track it.  With synchronize_rcu() and friends,
there is nothing to track.  :-(

							Thanx, Paul

> > +	__rcu sparse checks: tag the pointer to the RCU-protected data
> > +		structure with __rcu, and sparse will warn you if you
> > +		access that pointer without the services of one of the
> > +		variants of rcu_dereference().
> > +
> > +	These debugging aids can help you find problems that are
> > +	otherwise extremely difficult to spot.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 8/8] bridge: Fix netpoll support
  2010-06-17 10:55                   ` Herbert Xu
@ 2010-06-18  3:06                     ` Cong Wang
  0 siblings, 0 replies; 63+ messages in thread
From: Cong Wang @ 2010-06-18  3:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	Stephen Hemminger, Matt Mackall, Paul E. McKenney

On 06/17/10 18:55, Herbert Xu wrote:
> On Thu, Jun 17, 2010 at 06:57:28PM +0800, Cong Wang wrote:
>>
>> Hmm, I get it now. So this helps to fix problem 3)?
>
> Yes, by allocating real netpoll structures for each device that
> we're polling, instead of sharing a single one amongst all of
> them.
>
> This is also the basis of the solution of the use-after-free bug.
>

Looks reasonable.
Thank you for your patch!

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-11  2:12           ` [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
@ 2010-06-25  1:21             ` Andrew Morton
  2010-06-25  3:01               ` Herbert Xu
  2010-06-25  3:30               ` David Miller
  0 siblings, 2 replies; 63+ messages in thread
From: Andrew Morton @ 2010-06-25  1:21 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen Hemminger, Matt Mackall, Paul E. McKenney,
	Peter Zijlstra, Ingo Molnar

On Fri, 11 Jun 2010 12:12:48 +1000
Herbert Xu <herbert@gondor.hengli.com.au> wrote:

> netpoll: Allow netpoll_setup/cleanup recursion
> 
> This patch adds the functions __netpoll_setup/__netpoll_cleanup
> which is designed to be called recursively through ndo_netpoll_seutp.
> 
> They must be called with RTNL held, and the caller must initialise
> np->dev and ensure that it has a valid reference count.
> 
> ...
>
> -int netpoll_setup(struct netpoll *np)
> +int __netpoll_setup(struct netpoll *np)
>  {
> -	struct net_device *ndev = NULL;
> -	struct in_device *in_dev;
> +	struct net_device *ndev = np->dev;
>  	struct netpoll_info *npinfo;
>  	const struct net_device_ops *ops;
>  	unsigned long flags;
>  	int err;
>  
> +	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
> +	    !ndev->netdev_ops->ndo_poll_controller) {
> +		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
> +		       np->name, np->dev_name);
> +		err = -ENOTSUPP;
> +		goto out;
> +	}
> +
> +	if (!ndev->npinfo) {
> +		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
> +		if (!npinfo) {
> +			err = -ENOMEM;
> +			goto out;
> +		}
> +
> +		npinfo->rx_flags = 0;
> +		INIT_LIST_HEAD(&npinfo->rx_np);
> +
> +		spin_lock_init(&npinfo->rx_lock);
> +		skb_queue_head_init(&npinfo->arp_tx);
> +		skb_queue_head_init(&npinfo->txq);
> +		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
> +
> +		atomic_set(&npinfo->refcnt, 1);
> +
> +		ops = np->dev->netdev_ops;
> +		if (ops->ndo_netpoll_setup) {
> +			err = ops->ndo_netpoll_setup(ndev, npinfo);
> +			if (err)
> +				goto free_npinfo;
> +		}
> +	} else {
> +		npinfo = ndev->npinfo;
> +		atomic_inc(&npinfo->refcnt);
> +	}
> +
> +	npinfo->netpoll = np;
> +
> +	if (np->rx_hook) {
> +		spin_lock_irqsave(&npinfo->rx_lock, flags);
> +		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
> +		list_add_tail(&np->rx, &npinfo->rx_np);
> +		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
> +	}
> +
> +	/* last thing to do is link it to the net device structure */
> +	rcu_assign_pointer(ndev->npinfo, npinfo);
> +	rtnl_unlock();

And there it is, an unbalanced rtnl_unlock().

This stupid little thing took me over a day's work to find - it's just
been awful.

The user-visible symptom was that a bug in the netpoll code causes the
machine to hang after loading ipv6 (!), because
addrconf_fixup_forwarding()'s rtnl_trylock() kept on failing, and the
restart_syscall() kept on getting restarted, so an initscripts procfs
write just kept banging its head against the excessively-unlocked
mutex.

The mutex code handles an excessively-unlocked mutex (mutex.count==2)
really badly.  Some API functions say "its locked", others say "it
isn't", etc.

Maybe it's better with mutex debugging enabled - didn't try that. 
Things get pretty user-unfriendly when there's a bug within the
netconsole code itself.

Enabling lockdep simply made the bug cure itself - I suspect the mutex
code's handling of mutexes is different if lockdep is enabled.  That
would be pretty bad behaviour from the lockdep code.

I just removed the rtnl_unlock() - I couldn't see much in there which
needed rtnl_locking..

Dave, the fixup should be folded into the original patch please -
otherwise we'll have a machine-hangs-up bisection hole which spans two
weeks work of commits.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  1:21             ` Andrew Morton
@ 2010-06-25  3:01               ` Herbert Xu
  2010-06-25  3:30               ` David Miller
  1 sibling, 0 replies; 63+ messages in thread
From: Herbert Xu @ 2010-06-25  3:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael S. Tsirkin, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Stephen Hemminger, Matt Mackall, Paul E. McKenney,
	Peter Zijlstra, Ingo Molnar

On Thu, Jun 24, 2010 at 06:21:23PM -0700, Andrew Morton wrote:
>
> I just removed the rtnl_unlock() - I couldn't see much in there which
> needed rtnl_locking..

Yes that's the correct fix.  I forgot to remove it from the
recursive setup path after moving the RTNL lock out.  Sorry!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  1:21             ` Andrew Morton
  2010-06-25  3:01               ` Herbert Xu
@ 2010-06-25  3:30               ` David Miller
  2010-06-25  3:50                 ` Andrew Morton
  1 sibling, 1 reply; 63+ messages in thread
From: David Miller @ 2010-06-25  3:30 UTC (permalink / raw)
  To: akpm
  Cc: herbert, mst, frzhang, netdev, amwang, shemminger, mpm, paulmck,
	a.p.zijlstra, mingo

From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 24 Jun 2010 18:21:23 -0700

> Dave, the fixup should be folded into the original patch please -
> otherwise we'll have a machine-hangs-up bisection hole which spans two
> weeks work of commits.

I'm not unwrapping my tree that far back, and there's already been
a dependent tree pull or two which depend upon the tree past that
commit, one of it was a rather large wireless merge.

It's going to screw over too many people.

We'll just merge the bug fix in, and be done with it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  3:30               ` David Miller
@ 2010-06-25  3:50                 ` Andrew Morton
  2010-06-25  4:27                   ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Morton @ 2010-06-25  3:50 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, mst, frzhang, netdev, amwang, shemminger, mpm, paulmck,
	a.p.zijlstra, mingo

On Thu, 24 Jun 2010 20:30:06 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Thu, 24 Jun 2010 18:21:23 -0700
> 
> > Dave, the fixup should be folded into the original patch please -
> > otherwise we'll have a machine-hangs-up bisection hole which spans two
> > weeks work of commits.
> 
> I'm not unwrapping my tree that far back, and there's already been
> a dependent tree pull or two which depend upon the tree past that
> commit, one of it was a rather large wireless merge.
> 
> It's going to screw over too many people.
> 
> We'll just merge the bug fix in, and be done with it.

Sigh.  That really sucks.

What happens if you want to actually *drop* a patch from net-next? 
Surely that happens?


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  3:50                 ` Andrew Morton
@ 2010-06-25  4:27                   ` David Miller
  2010-06-25  4:42                     ` Andrew Morton
  0 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2010-06-25  4:27 UTC (permalink / raw)
  To: akpm
  Cc: herbert, mst, frzhang, netdev, amwang, shemminger, mpm, paulmck,
	a.p.zijlstra, mingo

From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 24 Jun 2010 20:50:59 -0700

> What happens if you want to actually *drop* a patch from net-next? 
> Surely that happens?

I've only respun the tree on two or three occasions and that was
because I made some significant error myself and screwed up the
GIT tree somehow.

We've fixed much worse bugs than this one weeks after the changes
causing them went in, life goes on.

And the fact that it took two weeks of it being in -next before
anyone even reported it says how wide a net this particular bug
covers :-)  (hint: personally I've still never used netconsole
even one single time, and it's been around for what, something
like 6 years?)


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  4:27                   ` David Miller
@ 2010-06-25  4:42                     ` Andrew Morton
  2010-06-25  4:52                       ` David Miller
                                         ` (3 more replies)
  0 siblings, 4 replies; 63+ messages in thread
From: Andrew Morton @ 2010-06-25  4:42 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, mst, frzhang, netdev, amwang, shemminger, mpm, paulmck,
	a.p.zijlstra, mingo

On Thu, 24 Jun 2010 21:27:13 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Thu, 24 Jun 2010 20:50:59 -0700
> 
> > What happens if you want to actually *drop* a patch from net-next? 
> > Surely that happens?
> 
> I've only respun the tree on two or three occasions and that was
> because I made some significant error myself and screwed up the
> GIT tree somehow.
> 
> We've fixed much worse bugs than this one weeks after the changes
> causing them went in, life goes on.

Still sucks - this is a quite ugly drawback to how we're using git. 
I've hit bisection holes several times which held up the show. 
Sometimes you can make them go away by fiddling the .config, other
times I've hunted down the fix and manually applied it for each
iteration.  It makes me feel all guilty each time I ask some poor sap
to bisect a bug for us.

> And the fact that it took two weeks of it being in -next before
> anyone even reported it says how wide a net this particular bug
> covers :-)  (hint: personally I've still never used netconsole
> even one single time, and it's been around for what, something
> like 6 years?)

I'd imagine that netconsole would get in the way rather a lot for net
developers, but it's really useful!


That being said, I wonder why Herbert didn't hit this in his testing. 
I suspect that he'd enabled lockdep, which hid the bug.  I haven't
worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
pretty bad thing to do.

Presumably mutex debugging _would_ have found it, but because the bug
was in netconsole code, the mutex-debugging blurt of course didn't come
out.  We don't replay the log buffer when netconsole is brought up -
perhaps we should.

And that machine has a screwy USB keyboard on which I've never managed
to invoke the vt-srcoll-backwards thing, so it would have been darned
hard for me to see and mutex-debugging warnings anyway.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  4:42                     ` Andrew Morton
@ 2010-06-25  4:52                       ` David Miller
  2010-06-25  8:08                       ` Peter Zijlstra
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2010-06-25  4:52 UTC (permalink / raw)
  To: akpm
  Cc: herbert, mst, frzhang, netdev, amwang, shemminger, mpm, paulmck,
	a.p.zijlstra, mingo

From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 24 Jun 2010 21:42:04 -0700

> I'd imagine that netconsole would get in the way rather a lot for net
> developers, but it's really useful!

I have a fully logging stable serial or hypervisor console on every
one of my boxes, why would I even bother?

> That being said, I wonder why Herbert didn't hit this in his testing. 
> I suspect that he'd enabled lockdep, which hid the bug.  I haven't
> worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
> pretty bad thing to do.

Yes, definitely something to look into.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  4:42                     ` Andrew Morton
  2010-06-25  4:52                       ` David Miller
@ 2010-06-25  8:08                       ` Peter Zijlstra
  2010-06-25  8:42                         ` Andrew Morton
  2010-06-25  8:46                       ` Ingo Molnar
  2010-06-25 10:08                       ` Nick Piggin
  3 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2010-06-25  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, herbert, mst, frzhang, netdev, amwang, shemminger,
	mpm, paulmck, mingo

On Thu, 2010-06-24 at 21:42 -0700, Andrew Morton wrote:
> That being said, I wonder why Herbert didn't hit this in his testing. 
> I suspect that he'd enabled lockdep, which hid the bug.  I haven't
> worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
> pretty bad thing to do. 

Most weird indeed, lockdep is supposed so shout its lungs out when
someone wants to unlock a lock that isn't actually owned by him (and it
not being locked at all certainly implies you're not the owner).

In fact, the below patch results in the below splat -- its also
something that's tested by the locking self-test:

/*
 * Double unlock:
 */
#define E()                                     \
                                                \
        LOCK(A);                                \
        UNLOCK(A);                              \
        UNLOCK(A); /* fail */


---
 kernel/timer.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index ee3f116..0496f71 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1334,6 +1334,8 @@ SYSCALL_DEFINE0(getpid)
 	return task_tgid_vnr(current);
 }
 
+static DEFINE_MUTEX(foo);
+
 /*
  * Accessing ->real_parent is not SMP-safe, it could
  * change from under us. However, we can use a stale
@@ -1344,6 +1346,10 @@ SYSCALL_DEFINE0(getppid)
 {
 	int pid;
 
+	mutex_lock(&foo);
+	mutex_unlock(&foo);
+	mutex_unlock(&foo);
+
 	rcu_read_lock();
 	pid = task_tgid_vnr(current->real_parent);
 	rcu_read_unlock();



=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
init/1 is trying to release lock (foo) at:
[<ffffffff815bf3b6>] mutex_unlock+0xe/0x10
but there are no more locks to release!

other info that might help us debug this:
no locks held by init/1.

stack backtrace:
Pid: 1, comm: init Not tainted 2.6.35-rc3-tip-01034-g5205803-dirty #399
Call Trace:
 [<ffffffff815bf3b6>] ? mutex_unlock+0xe/0x10
 [<ffffffff8106d718>] print_unlock_inbalance_bug+0xd6/0xe1
 [<ffffffff815bf3b6>] ? mutex_unlock+0xe/0x10
 [<ffffffff8106e8c6>] lock_release+0xdb/0x196
 [<ffffffff815bf32f>] __mutex_unlock_slowpath+0xc1/0x13a
 [<ffffffff815bf3b6>] mutex_unlock+0xe/0x10
 [<ffffffff8104f262>] sys_getppid+0x34/0xd8
 [<ffffffff81002cdb>] system_call_fastpath+0x16/0x1b


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  8:08                       ` Peter Zijlstra
@ 2010-06-25  8:42                         ` Andrew Morton
  2010-06-25  9:45                           ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Morton @ 2010-06-25  8:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, herbert, mst, frzhang, netdev, amwang, shemminger,
	mpm, paulmck, mingo

On Fri, 25 Jun 2010 10:08:56 +0200 Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, 2010-06-24 at 21:42 -0700, Andrew Morton wrote:
> > That being said, I wonder why Herbert didn't hit this in his testing. 
> > I suspect that he'd enabled lockdep, which hid the bug.  I haven't
> > worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
> > pretty bad thing to do. 
> 
> Most weird indeed, lockdep is supposed so shout its lungs out when
> someone wants to unlock a lock that isn't actually owned by him (and it
> not being locked at all certainly implies you're not the owner).
> 
> In fact, the below patch results in the below splat -- its also
> something that's tested by the locking self-test:

When I enabled lockdep, the bug actually went away.  Is it possible
that when lockdep detects this bug, it prevents mutex.count from going
from 1 to 2?

It could be that lockdep _did_ detect (and correct!) the bug.  But
because I had no usable console output at the time, I didn't see it.

I did notice that the taint output was "G W".  So something warned
about something, but I don't know what.  But that was happening with
lockdep disabled.


> @@ -1344,6 +1346,10 @@ SYSCALL_DEFINE0(getppid)
>  {
>  	int pid;
>  
> +	mutex_lock(&foo);
> +	mutex_unlock(&foo);
> +	mutex_unlock(&foo);
> +
>  	rcu_read_lock();
>  	pid = task_tgid_vnr(current->real_parent);
>  	rcu_read_unlock();

It'd be interesting to add

	printk("%d:%d\n", __LINE__, atomic_read(&foo.count));

after the mutex_unlock()s.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  4:42                     ` Andrew Morton
  2010-06-25  4:52                       ` David Miller
  2010-06-25  8:08                       ` Peter Zijlstra
@ 2010-06-25  8:46                       ` Ingo Molnar
  2010-06-25 10:08                       ` Nick Piggin
  3 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2010-06-25  8:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, herbert, mst, frzhang, netdev, amwang, shemminger,
	mpm, paulmck, a.p.zijlstra, Linus Torvalds, linux-kernel


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 24 Jun 2010 21:27:13 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Thu, 24 Jun 2010 20:50:59 -0700
> > 
> > > What happens if you want to actually *drop* a patch from net-next? 
> > > Surely that happens?
> > 
> > I've only respun the tree on two or three occasions and that was
> > because I made some significant error myself and screwed up the
> > GIT tree somehow.
> > 
> > We've fixed much worse bugs than this one weeks after the changes
> > causing them went in, life goes on.
> 
> Still sucks - this is a quite ugly drawback to how we're using git. 
> I've hit bisection holes several times which held up the show. 
> Sometimes you can make them go away by fiddling the .config, other
> times I've hunted down the fix and manually applied it for each
> iteration.  It makes me feel all guilty each time I ask some poor sap
> to bisect a bug for us.

One solution would be, for really grave bugs that introduce a significant 
window of bisection breakage, to add a special tag:

  Fixes-Bug: SHA1

Which could then be parsed by a special variant of git bisect and which would 
try to cherry-pick all Fixes-Bug commits into the bisection point.

But there are numerous disadvantages that:

 - The bisection itself would be slower (maybe not a big issue for most 
   bisection sessions)

 - Such cherry-picked trees wouldnt be precisely the same thing as the tree 
   as-it-was-there.

 - Awkward combination bugs could ensue

 - The cherry-pick may fail or may mis-apply things.

 - If the Fixes-Bug tag is typoed or misconstrued then that might not be found 
   until someone tries a bisection and fails ... leading to awkward 
   add-the-tag-again-to-some-commit scenarios.

My gut feeling is that we really dont want to go there. As painful as 
bisection breakages are, they are relatively rare (compared to regular 
regressions), and the value of testing _that precise_ sha1 is a fundamental 
thing - git bisect shouldnt interact and reorder fixes.

If a bug wasnt found sooner then it either didnt matter that much, or the tree 
QA was bad. Neither of those factors forces a change in the development model.

	Ingo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  8:42                         ` Andrew Morton
@ 2010-06-25  9:45                           ` Peter Zijlstra
  0 siblings, 0 replies; 63+ messages in thread
From: Peter Zijlstra @ 2010-06-25  9:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, herbert, mst, frzhang, netdev, amwang, shemminger,
	mpm, paulmck, mingo

On Fri, 2010-06-25 at 01:42 -0700, Andrew Morton wrote:
> On Fri, 25 Jun 2010 10:08:56 +0200 Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, 2010-06-24 at 21:42 -0700, Andrew Morton wrote:
> > > That being said, I wonder why Herbert didn't hit this in his testing. 
> > > I suspect that he'd enabled lockdep, which hid the bug.  I haven't
> > > worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
> > > pretty bad thing to do. 
> > 
> > Most weird indeed, lockdep is supposed so shout its lungs out when
> > someone wants to unlock a lock that isn't actually owned by him (and it
> > not being locked at all certainly implies you're not the owner).
> > 
> > In fact, the below patch results in the below splat -- its also
> > something that's tested by the locking self-test:
> 
> When I enabled lockdep, the bug actually went away.  Is it possible
> that when lockdep detects this bug, it prevents mutex.count from going
> from 1 to 2?

Not lockdep itself but the DEBUG_MUTEXES code (forced by lockdep).

The difference between the normal and the debug code is that the debug
code disables all fast-path code.

The x86 fast-path code does:

 LOCK incl &lock->count
 jg done:
 call slowpath
done:

Since 1++ is >0 it will complete without calling the slow-path, would
do:

 if (__mutex_slowpath_needs_to_unlock()) /* 1 regardless of DEBUG_MUTEX */
   atomic_set(&lock->count, 1);

The question I guess is, do we want double unlocks to go silently
unnoticed? In that case we need to touch the fastpath asm.

> It could be that lockdep _did_ detect (and correct!) the bug.  But
> because I had no usable console output at the time, I didn't see it.
> 
> I did notice that the taint output was "G W".  So something warned
> about something, but I don't know what.  But that was happening with
> lockdep disabled.

Hrmm,. yeah without console output lockdep isn't going to help much,
should we maybe use the speaker to read out the dmesg :-)

> It'd be interesting to add
> 
> 	printk("%d:%d\n", __LINE__, atomic_read(&foo.count));
> 
> after the mutex_unlock()s.

1352:1



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion
  2010-06-25  4:42                     ` Andrew Morton
                                         ` (2 preceding siblings ...)
  2010-06-25  8:46                       ` Ingo Molnar
@ 2010-06-25 10:08                       ` Nick Piggin
  3 siblings, 0 replies; 63+ messages in thread
From: Nick Piggin @ 2010-06-25 10:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, herbert, mst, frzhang, netdev, amwang, shemminger,
	mpm, paulmck, a.p.zijlstra, mingo

On Thu, Jun 24, 2010 at 09:42:04PM -0700, Andrew Morton wrote:
> On Thu, 24 Jun 2010 21:27:13 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Thu, 24 Jun 2010 20:50:59 -0700
> > 
> > > What happens if you want to actually *drop* a patch from net-next? 
> > > Surely that happens?
> > 
> > I've only respun the tree on two or three occasions and that was
> > because I made some significant error myself and screwed up the
> > GIT tree somehow.
> > 
> > We've fixed much worse bugs than this one weeks after the changes
> > causing them went in, life goes on.
> 
> Still sucks - this is a quite ugly drawback to how we're using git. 
> I've hit bisection holes several times which held up the show. 
> Sometimes you can make them go away by fiddling the .config, other
> times I've hunted down the fix and manually applied it for each
> iteration.  It makes me feel all guilty each time I ask some poor sap
> to bisect a bug for us.

This might be somewhat improved by tagging a fix with the id of the
patch causing the regression, so that bisect could go through and try
to apply a fix out of order.

This could be done with a specific note format in the changelog, but
I don't know whether it's realistic to support in git format or if
you'd have to have a tool that builds up a mapping going the other
way...

Could also be useful for distros backporting things.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
                   ` (7 preceding siblings ...)
  2010-06-10 14:49 ` [0/8] netpoll/bridge fixes Stephen Hemminger
@ 2010-06-29 12:53 ` Yanko Kaneti
  8 siblings, 0 replies; 63+ messages in thread
From: Yanko Kaneti @ 2010-06-29 12:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S.Tsirkin, Qianfeng Zhang, David S.Miller, netdev,
	WANG Cong, Stephen Hemminger, Matt Mackall

On Thu, 2010-06-10 at 22:40 +1000, Herbert Xu wrote:
> Hi:
> 
> Qianfeng Zhang reported that he was seeing crashes with the
> attached backtrace.
> 
> I tracked this down to the recently added netpoll support in
> the bridge device.  It's a classic use-after-free problem.
> 
> Trying to solve it brought out a host of other issues, some of
> which existed prior to the new bridge code.  The following patches
> attempt to address some of these issues.
> 
> Warning, this is completely untested (apart from compiling with
> everything enabled) so please look but don't merge :)

FWIW 2.6.35-0.2.rc3.git0.fc14.x86_64 and later rawhide kernels are
causing quite reproducible __br_deliver crashes on routine f13
netinstalls in a kvm guest here.

To test I cherry picked this series +
netpoll-Use-correct-primitives-for-RCU-dereferencing and
net-fix-netpoll-Allow-netpoll_setup-cleanup-recursion from net-next on
top of 2.6.35-0.15.rc3.git3.fc14.x86_64 (which is todays linus tree) and
it seems to fix the crashes for me.

Perhaps the netpoll fixes should find their way to a rc before 2.6.35
goes golden.

Regards
Yanko


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-06-11  2:11         ` Herbert Xu
                             ` (10 preceding siblings ...)
  2010-06-15 18:39           ` David Miller
@ 2010-07-19 10:19           ` Michael S. Tsirkin
  2010-07-19 10:53             ` Herbert Xu
  11 siblings, 1 reply; 63+ messages in thread
From: Michael S. Tsirkin @ 2010-07-19 10:19 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stephen Hemminger, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Fri, Jun 11, 2010 at 12:11:42PM +1000, Herbert Xu wrote:
> On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> > On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> > >
> > > Okay, then add a comment where in_irq is used?
> > 
> > Actually let me put it into a wrapper.  I'll respin the patches.
> 
> OK here is a repost.  And this time it really is 8 patches :)
> I've tested it lightly.
> 
> Cheers,

Meanwhile, should we just disable netpoll for bridge in 2.6.35 and -stable?
We are getting crash reports in virtualization which I suspect are
related to this:
https://bugzilla.kernel.org/show_bug.cgi?id=16413

> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-19 10:19           ` Michael S. Tsirkin
@ 2010-07-19 10:53             ` Herbert Xu
  2010-07-19 11:54               ` Herbert Xu
  0 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-07-19 10:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Mon, Jul 19, 2010 at 01:19:04PM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 11, 2010 at 12:11:42PM +1000, Herbert Xu wrote:
> > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> > > On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> > > >
> > > > Okay, then add a comment where in_irq is used?
> > > 
> > > Actually let me put it into a wrapper.  I'll respin the patches.
> > 
> > OK here is a repost.  And this time it really is 8 patches :)
> > I've tested it lightly.
> > 
> > Cheers,
> 
> Meanwhile, should we just disable netpoll for bridge in 2.6.35 and -stable?
> We are getting crash reports in virtualization which I suspect are
> related to this:
> https://bugzilla.kernel.org/show_bug.cgi?id=16413

I think that's probably the best solution, Dave?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-19 10:53             ` Herbert Xu
@ 2010-07-19 11:54               ` Herbert Xu
  2010-07-19 16:05                 ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-07-19 11:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Qianfeng Zhang, David S. Miller, netdev,
	WANG Cong, Matt Mackall

On Mon, Jul 19, 2010 at 06:53:00PM +0800, Herbert Xu wrote:
>
> > Meanwhile, should we just disable netpoll for bridge in 2.6.35 and -stable?
> > We are getting crash reports in virtualization which I suspect are
> > related to this:
> > https://bugzilla.kernel.org/show_bug.cgi?id=16413
> 
> I think that's probably the best solution, Dave?

I take that back :)

It turns out that 16413 has nothing to do with bridge netpoll
(which was not merged until after 2.6.34) since he's running
2.6.34.

Still, it might be a good idea to disable bridge netpoll in
2.6.35.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-19 11:54               ` Herbert Xu
@ 2010-07-19 16:05                 ` David Miller
  2010-07-19 16:52                   ` Eric Dumazet
  2010-07-20  5:26                   ` Herbert Xu
  0 siblings, 2 replies; 63+ messages in thread
From: David Miller @ 2010-07-19 16:05 UTC (permalink / raw)
  To: herbert; +Cc: mst, shemminger, frzhang, netdev, amwang, mpm

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 19 Jul 2010 19:54:11 +0800

> Still, it might be a good idea to disable bridge netpoll in
> 2.6.35.

I thought we did that already.... oh I see, we did it for bonding:

commit c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b
Author: Andy Gospodarek <andy@greyhouse.net>
Date:   Fri Jun 25 09:50:44 2010 +0000

    bonding: prevent netpoll over bonded interfaces

I'm fine with disabling it for bridging too, just send me a patch
similar to the bonding one.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-19 16:05                 ` David Miller
@ 2010-07-19 16:52                   ` Eric Dumazet
  2010-07-19 20:35                     ` David Miller
  2010-07-20  5:26                   ` Herbert Xu
  1 sibling, 1 reply; 63+ messages in thread
From: Eric Dumazet @ 2010-07-19 16:52 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, mst, shemminger, frzhang, netdev, amwang, mpm

Le lundi 19 juillet 2010 à 09:05 -0700, David Miller a écrit :

> I thought we did that already.... oh I see, we did it for bonding:
> 
> commit c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b
> Author: Andy Gospodarek <andy@greyhouse.net>
> Date:   Fri Jun 25 09:50:44 2010 +0000
> 

BTW, this added following warning :


[PATCH] bonding: avoid a warning

drivers/net/bonding/bond_main.c:179:12: warning: ‘disable_netpoll’
defined but not used

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8228088..20f45cb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -176,7 +176,9 @@ static int arp_ip_count;
 static int bond_mode	= BOND_MODE_ROUNDROBIN;
 static int xmit_hashtype = BOND_XMIT_POLICY_LAYER2;
 static int lacp_fast;
+#ifdef CONFIG_NET_POLL_CONTROLLER
 static int disable_netpoll = 1;
+#endif
 
 const struct bond_parm_tbl bond_lacp_tbl[] = {
 {	"slow",		AD_LACP_SLOW},



^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-19 16:52                   ` Eric Dumazet
@ 2010-07-19 20:35                     ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2010-07-19 20:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: herbert, mst, shemminger, frzhang, netdev, amwang, mpm

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 19 Jul 2010 18:52:36 +0200

> [PATCH] bonding: avoid a warning
> 
> drivers/net/bonding/bond_main.c:179:12: warning: ‘disable_netpoll’
> defined but not used
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied to net-next-2.6, thanks Eric.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-19 16:05                 ` David Miller
  2010-07-19 16:52                   ` Eric Dumazet
@ 2010-07-20  5:26                   ` Herbert Xu
  2010-07-20  6:28                     ` David Miller
  1 sibling, 1 reply; 63+ messages in thread
From: Herbert Xu @ 2010-07-20  5:26 UTC (permalink / raw)
  To: David Miller; +Cc: mst, shemminger, frzhang, netdev, amwang, mpm

On Mon, Jul 19, 2010 at 09:05:03AM -0700, David Miller wrote:
>
> I thought we did that already.... oh I see, we did it for bonding:
> 
> commit c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b
> Author: Andy Gospodarek <andy@greyhouse.net>
> Date:   Fri Jun 25 09:50:44 2010 +0000
> 
>     bonding: prevent netpoll over bonded interfaces
> 
> I'm fine with disabling it for bridging too, just send me a patch
> similar to the bonding one.

OK, here is a minimal patch to remove the offending bits of the
bridge netpoll support.

Note that this patch needs to be reverted in net-next-2.6 as
bridge netpoll works correctly there.

bridge: Partially disable netpoll support

The new netpoll code in bridging contains use-after-free bugs
that are non-trivial to fix.

This patch fixes this by removing the code that uses skbs after
they're freed.

As a consequence, this means that we can no longer call bridge
from the netpoll path, so this patch also removes the controller
function in order to disable netpoll.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index eedf2c9..753fc42 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -217,14 +217,6 @@ static bool br_devices_support_netpoll(struct net_bridge *br)
 	return count != 0 && ret;
 }
 
-static void br_poll_controller(struct net_device *br_dev)
-{
-	struct netpoll *np = br_dev->npinfo->netpoll;
-
-	if (np->real_dev != br_dev)
-		netpoll_poll_dev(np->real_dev);
-}
-
 void br_netpoll_cleanup(struct net_device *dev)
 {
 	struct net_bridge *br = netdev_priv(dev);
@@ -295,7 +287,6 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_do_ioctl		 = br_dev_ioctl,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_netpoll_cleanup	 = br_netpoll_cleanup,
-	.ndo_poll_controller	 = br_poll_controller,
 #endif
 };
 
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index a4e72a8..595da45 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -50,14 +50,7 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
 			kfree_skb(skb);
 		else {
 			skb_push(skb, ETH_HLEN);
-
-#ifdef CONFIG_NET_POLL_CONTROLLER
-			if (unlikely(skb->dev->priv_flags & IFF_IN_NETPOLL)) {
-				netpoll_send_skb(skb->dev->npinfo->netpoll, skb);
-				skb->dev->priv_flags &= ~IFF_IN_NETPOLL;
-			} else
-#endif
-				dev_queue_xmit(skb);
+			dev_queue_xmit(skb);
 		}
 	}
 
@@ -73,23 +66,9 @@ int br_forward_finish(struct sk_buff *skb)
 
 static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
 {
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	struct net_bridge *br = to->br;
-	if (unlikely(br->dev->priv_flags & IFF_IN_NETPOLL)) {
-		struct netpoll *np;
-		to->dev->npinfo = skb->dev->npinfo;
-		np = skb->dev->npinfo->netpoll;
-		np->real_dev = np->dev = to->dev;
-		to->dev->priv_flags |= IFF_IN_NETPOLL;
-	}
-#endif
 	skb->dev = to->dev;
 	NF_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
 		br_forward_finish);
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	if (skb->dev->npinfo)
-		skb->dev->npinfo->netpoll->dev = br->dev;
-#endif
 }
 
 static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [0/8] netpoll/bridge fixes
  2010-07-20  5:26                   ` Herbert Xu
@ 2010-07-20  6:28                     ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2010-07-20  6:28 UTC (permalink / raw)
  To: herbert; +Cc: mst, shemminger, frzhang, netdev, amwang, mpm

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 20 Jul 2010 13:26:45 +0800

> Note that this patch needs to be reverted in net-next-2.6 as
> bridge netpoll works correctly there.

Ok, will do.

> bridge: Partially disable netpoll support
> 
> The new netpoll code in bridging contains use-after-free bugs
> that are non-trivial to fix.
> 
> This patch fixes this by removing the code that uses skbs after
> they're freed.
> 
> As a consequence, this means that we can no longer call bridge
> from the netpoll path, so this patch also removes the controller
> function in order to disable netpoll.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks Herbert.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2010-07-20  6:28 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-10 12:40 [0/8] netpoll/bridge fixes Herbert Xu
2010-06-10 12:42 ` [PATCH 1/7] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
2010-06-10 12:42 ` [PATCH 2/7] bridge: Remove redundant npinfo NULL setting Herbert Xu
2010-06-10 12:42 ` [PATCH 3/7] netpoll: Fix RCU usage Herbert Xu
2010-06-10 12:42 ` [PATCH 4/7] netpoll: Add locking for netpoll_setup/cleanup Herbert Xu
2010-06-10 12:42 ` [PATCH 5/7] netpoll: Add ndo_netpoll_setup Herbert Xu
2010-06-10 12:42 ` [PATCH 6/7] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
2010-06-10 12:42 ` [PATCH 7/7] bridge: Fix netpoll support Herbert Xu
2010-06-10 14:49 ` [0/8] netpoll/bridge fixes Stephen Hemminger
2010-06-10 21:56   ` Herbert Xu
2010-06-10 21:59     ` Stephen Hemminger
2010-06-10 22:48       ` Herbert Xu
2010-06-11  2:11         ` Herbert Xu
2010-06-11  2:12           ` [PATCH 1/8] netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup Herbert Xu
2010-06-11  2:12           ` [PATCH 2/8] bridge: Remove redundant npinfo NULL setting Herbert Xu
2010-06-11  2:12           ` [PATCH 3/8] netpoll: Fix RCU usage Herbert Xu
2010-06-11 23:10             ` Paul E. McKenney
2010-06-11  2:12           ` [PATCH 4/8] netpoll: Add locking for netpoll_setup/cleanup Herbert Xu
2010-06-11  2:12           ` [PATCH 5/8] netpoll: Add ndo_netpoll_setup Herbert Xu
2010-06-11  2:12           ` [PATCH 6/8] netpoll: Allow netpoll_setup/cleanup recursion Herbert Xu
2010-06-25  1:21             ` Andrew Morton
2010-06-25  3:01               ` Herbert Xu
2010-06-25  3:30               ` David Miller
2010-06-25  3:50                 ` Andrew Morton
2010-06-25  4:27                   ` David Miller
2010-06-25  4:42                     ` Andrew Morton
2010-06-25  4:52                       ` David Miller
2010-06-25  8:08                       ` Peter Zijlstra
2010-06-25  8:42                         ` Andrew Morton
2010-06-25  9:45                           ` Peter Zijlstra
2010-06-25  8:46                       ` Ingo Molnar
2010-06-25 10:08                       ` Nick Piggin
2010-06-11  2:12           ` [PATCH 7/8] netpoll: Add netpoll_tx_running Herbert Xu
2010-06-11  2:12           ` [PATCH 8/8] bridge: Fix netpoll support Herbert Xu
2010-06-11  3:08             ` fired a bug report on bugzilla.redhat.com Qianfeng Zhang
2010-06-15 10:28             ` [PATCH 8/8] bridge: Fix netpoll support Cong Wang
2010-06-17 10:38               ` Herbert Xu
2010-06-17 10:57                 ` Cong Wang
2010-06-17 10:55                   ` Herbert Xu
2010-06-18  3:06                     ` Cong Wang
2010-06-11 20:03           ` [0/8] netpoll/bridge fixes Matt Mackall
2010-06-15 10:17           ` Cong Wang
2010-06-15 18:39           ` David Miller
2010-06-16  2:58             ` Eric Dumazet
2010-06-16  3:03               ` Eric Dumazet
2010-06-16  3:33                 ` Herbert Xu
2010-06-16  4:47                   ` David Miller
2010-06-16 23:02                     ` Paul E. McKenney
2010-06-17 10:18                       ` Michael S. Tsirkin
2010-06-17 21:26                         ` Paul E. McKenney
2010-06-16  6:16                   ` Eric Dumazet
2010-06-16  5:08               ` Paul E. McKenney
2010-06-16  6:21                 ` Eric Dumazet
2010-06-16 16:01                   ` Paul E. McKenney
2010-07-19 10:19           ` Michael S. Tsirkin
2010-07-19 10:53             ` Herbert Xu
2010-07-19 11:54               ` Herbert Xu
2010-07-19 16:05                 ` David Miller
2010-07-19 16:52                   ` Eric Dumazet
2010-07-19 20:35                     ` David Miller
2010-07-20  5:26                   ` Herbert Xu
2010-07-20  6:28                     ` David Miller
2010-06-29 12:53 ` Yanko Kaneti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).