[BUG] NULL pointer dereference in skb

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] NULL pointer dereference in skb_dequeue
@ 2008-08-01 23:40 Jeff Kirsher
  2008-08-02  1:03 ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Kirsher @ 2008-08-01 23:40 UTC (permalink / raw)
  To: David Miller, Linux Netdev List, Emil Tantilov

When trying to change the MTU size during traffic, we get a kernel
panic with the latest net-2.6 tree.

Steps to reproduce:

1. Load the driver, bring network up.
2. Start traffic (in my case on all 4 ports mixed TCP/UDP IPv4/6).
3. Reset the port (ethtool -r) or change the MTU (ifconfig eth0 mtu 9216)

NOTES:
ethtool -r doesn't always panic, you may have to do it multiple times.
Changing MTU seems to always trigger the panic.

Here is the kernel trace:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8047b8f3>] __skb_dequeue+0x2d/0x35
PGD 0
Oops: 0002 [1] SMP
CPU 2
Modules linked in: igb nfsd lockd exportfs sunrpc pci_slot inet_lro
[last unloaded: igb]
Pid: 13, comm: events/2 Not tainted 2.6.27-rc1-igb #1
RIP: 0010:[<ffffffff8047b8f3>]  [<ffffffff8047b8f3>] __skb_dequeue+0x2d/0x35
RSP: 0018:ffff8801af949de8  EFLAGS: 00010202
RAX: ffff8801af92e8f8 RBX: ffff8801af92e8f8 RCX: ffff88019b4f3280
RDX: 0000000000000000 RSI: ffff88019e1c6cc0 RDI: ffff8801af92e8f8
RBP: ffff8801af92e800 R08: 0000000000000000 R09: ffff880028055be0
R10: ffff88019e1c6cc0 R11: ffffffff8047bab5 R12: ffff8801abdd2440
R13: ffffffff80671500 R14: ffffffff806bb200 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8801af8686c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000019b522000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process events/2 (pid: 13, threadinfo ffff8801af948000, task ffff8801af947710)
Stack:  ffffffff8047bad4 00000000ffffc553 ffff8801af92e85c ffff8801af92e800
 ffffffff8047bbbb 0000000000000000 ffff88019e1f0000 0000000000000004
 ffffffff80671500 ffffffff8047bb89 ffffffff8047b928 0000000000000000
Call Trace:
 [<ffffffff8047bad4>] ? pfifo_fast_reset+0x1f/0x4c
 [<ffffffff8047bbbb>] ? dev_deactivate_queue+0x32/0x4a
 [<ffffffff8047bb89>] ? dev_deactivate_queue+0x0/0x4a
 [<ffffffff8047b928>] ? netdev_for_each_tx_queue+0x2d/0x42
 [<ffffffff8047bf3e>] ? dev_deactivate+0x17/0xb1
 [<ffffffff80477d0b>] ? __linkwatch_run_queue+0x149/0x184
 [<ffffffff80477d46>] ? linkwatch_event+0x0/0x30
 [<ffffffff80477d70>] ? linkwatch_event+0x2a/0x30
 [<ffffffff8023eaeb>] ? run_workqueue+0x81/0x10a
 [<ffffffff8023f3e7>] ? worker_thread+0xd3/0xde
 [<ffffffff80241bbe>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff8023f314>] ? worker_thread+0x0/0xde
 [<ffffffff80241a57>] ? kthread+0x47/0x74
 [<ffffffff8022c410>] ? schedule_tail+0x28/0x60
 [<ffffffff8020c229>] ? child_rip+0xa/0x11
 [<ffffffff80241a10>] ? kthread+0x0/0x74
 [<ffffffff8020c21f>] ? child_rip+0x0/0x11


Code: 0f 48 39 f9 75 04 31 c9 eb 25 48 85 c9 74 20 ff 4f 10 48 8b 11
48 8b 41 08 48 c7 01 00 00 00 00 48 c7 41 08 00 00 00 00 48 89 10 <48>
89 42 08 48 89 c8 c3 41 55 49 89 f5 41 54 49 89 d4 55 31 ed
RIP  [<ffffffff8047b8f3>] __skb_dequeue+0x2d/0x35
 RSP <ffff8801af949de8>
CR2: 0000000000000008
---[ end trace 01618db2121afa07 ]---


-- 
Cheers,
Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-01 23:40 [BUG] NULL pointer dereference in skb_dequeue Jeff Kirsher
@ 2008-08-02  1:03 ` David Miller
  2008-08-02  1:20   ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-08-02  1:03 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, emil.s.tantilov

From: "Jeff Kirsher" <jeffrey.t.kirsher@intel.com>
Date: Fri, 1 Aug 2008 16:40:04 -0700

> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff8047b8f3>] __skb_dequeue+0x2d/0x35
> PGD 0
> Oops: 0002 [1] SMP
> CPU 2
> Modules linked in: igb nfsd lockd exportfs sunrpc pci_slot inet_lro
> [last unloaded: igb]
> Pid: 13, comm: events/2 Not tainted 2.6.27-rc1-igb #1
> RIP: 0010:[<ffffffff8047b8f3>]  [<ffffffff8047b8f3>] __skb_dequeue+0x2d/0x35
 ...
>  [<ffffffff8047bad4>] ? pfifo_fast_reset+0x1f/0x4c

Looks like two threads are accessing the qdisc SKB lists but one of
them isn't taking the proper qdisc locks.

I can't see how this can happen currently but I'll try to figure it
out.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02  1:03 ` David Miller
@ 2008-08-02  1:20   ` David Miller
  2008-08-02  9:36     ` Tantilov, Emil S
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-08-02  1:20 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, emil.s.tantilov

From: David Miller <davem@davemloft.net>
Date: Fri, 01 Aug 2008 18:03:37 -0700 (PDT)

> Looks like two threads are accessing the qdisc SKB lists but one of
> them isn't taking the proper qdisc locks.
> 
> I can't see how this can happen currently but I'll try to figure it
> out.

I see what's going on.

Once we decide on a root qdisc to process, we shouldn't use
qdisc_root_lock() since that will resample qdisc->dev_queue->qdisc
which might be different.

This points out a core problem, and I might need to add a
root_qdisc backpointer to struct Qdisc to make this all work
out sanely for all cases.

Anyways, please try this patch:

pkt_sched: Use qdisc_lock() on already sampled root qdisc.

Don't use qdisc_root_lock() in these cases as the root
qdisc could have been changed, and we'd thus lock the
wrong object.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 9c9cd4d..113b6b0 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -29,7 +29,7 @@
 /* Main transmission queue. */
 
 /* Modifications to data participating in scheduling must be protected with
- * qdisc_root_lock(qdisc) spinlock.
+ * qdisc_lock(qdisc) spinlock.
  *
  * The idea is the following:
  * - enqueue, dequeue are serialized via qdisc root lock
@@ -126,7 +126,7 @@ static inline int qdisc_restart(struct Qdisc *q)
 	if (unlikely((skb = dequeue_skb(q)) == NULL))
 		return 0;
 
-	root_lock = qdisc_root_lock(q);
+	root_lock = qdisc_lock(q);
 
 	/* And release qdisc */
 	spin_unlock(root_lock);
@@ -659,7 +659,7 @@ static bool some_qdisc_is_running(struct net_device *dev, int lock)
 
 		dev_queue = netdev_get_tx_queue(dev, i);
 		q = dev_queue->qdisc;
-		root_lock = qdisc_root_lock(q);
+		root_lock = qdisc_lock(q);
 
 		if (lock)
 			spin_lock_bh(root_lock);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02  1:20   ` David Miller
@ 2008-08-02  9:36     ` Tantilov, Emil S
  2008-08-02 13:37       ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Tantilov, Emil S @ 2008-08-02  9:36 UTC (permalink / raw)
  To: David Miller, Kirsher, Jeffrey T; +Cc: netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Fri, 01 Aug 2008 18:03:37 -0700 (PDT)
>
>> Looks like two threads are accessing the qdisc SKB lists but one of
>> them isn't taking the proper qdisc locks.
>>
>> I can't see how this can happen currently but I'll try to figure it
>> out.
>
> I see what's going on.
>
> Once we decide on a root qdisc to process, we shouldn't use
> qdisc_root_lock() since that will resample qdisc->dev_queue->qdisc
> which might be different.
>
> This points out a core problem, and I might need to add a
> root_qdisc backpointer to struct Qdisc to make this all work
> out sanely for all cases.
>
> Anyways, please try this patch:

Still panics. Survived few MTU changes, but eventually I got this (see attached file - sorry for the partial dump, but that's all I can do remotely).

Thanks,
Emil

[-- Attachment #2: skb_dequeue.jpg --]
[-- Type: image/jpeg, Size: 94991 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02  9:36     ` Tantilov, Emil S
@ 2008-08-02 13:37       ` Jarek Poplawski
  2008-08-02 16:27         ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-02 13:37 UTC (permalink / raw)
  To: Tantilov, Emil S; +Cc: David Miller, Kirsher, Jeffrey T, netdev@vger.kernel.org

Tantilov, Emil S wrote, On 08/02/2008 11:36 AM:

> David Miller wrote:
...
>> Once we decide on a root qdisc to process, we shouldn't use
>> qdisc_root_lock() since that will resample qdisc->dev_queue->qdisc
>> which might be different.
>>
>> This points out a core problem, and I might need to add a
>> root_qdisc backpointer to struct Qdisc to make this all work
>> out sanely for all cases.
>>
>> Anyways, please try this patch:
> 
> Still panics. Survived few MTU changes, but eventually I got this (see attached file - sorry for the partial dump, but that's all I can do remotely).
> 

I guess this "root lock" has to go back to netdev_queue. Alas, I can't
test this, so if it's not a big problem maybe you could try this patch
before David goes back to this? (His patch should be removed before
using this one.)

Thanks,
Jarek P.

---

 include/linux/netdevice.h |    1 +
 include/net/sch_generic.h |    4 +---
 net/core/dev.c            |    1 +
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ee583f6..5c32b70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -447,6 +447,7 @@ struct netdev_queue {
 	struct net_device	*dev;
 	struct Qdisc		*qdisc;
 	unsigned long		state;
+	spinlock_t		queue_lock;
 	spinlock_t		_xmit_lock;
 	int			xmit_lock_owner;
 	struct Qdisc		*qdisc_sleeping;
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b5f40d7..97ea112 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -195,9 +195,7 @@ static inline struct Qdisc *qdisc_root(struct Qdisc *qdisc)
 
 static inline spinlock_t *qdisc_root_lock(struct Qdisc *qdisc)
 {
-	struct Qdisc *root = qdisc_root(qdisc);
-
-	return qdisc_lock(root);
+	return &qdisc->dev_queue->queue_lock;
 }
 
 static inline struct net_device *qdisc_dev(struct Qdisc *qdisc)
diff --git a/net/core/dev.c b/net/core/dev.c
index 69320a5..73f3a65 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3861,6 +3861,7 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
 					  struct netdev_queue *dev_queue,
 					  void *_unused)
 {
+	spin_lock_init(&dev_queue->queue_lock);
 	spin_lock_init(&dev_queue->_xmit_lock);
 	netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
 	dev_queue->xmit_lock_owner = -1;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 13:37       ` Jarek Poplawski
@ 2008-08-02 16:27         ` Jarek Poplawski
  2008-08-02 19:18           ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-02 16:27 UTC (permalink / raw)
  To: Tantilov, Emil S; +Cc: David Miller, Kirsher, Jeffrey T, netdev@vger.kernel.org

On Sat, Aug 02, 2008 at 03:37:19PM +0200, Jarek Poplawski wrote:
...
> I guess this "root lock" has to go back to netdev_queue. Alas, I can't
> test this, so if it's not a big problem maybe you could try this patch
> before David goes back to this? (His patch should be removed before
> using this one.)

Actually, this patch was incomplete, sorry. Here is a better one,
I hope. But of course, now, even better is to wait for David's
proposal.

Jarek P.

(take 2)
---

 include/linux/netdevice.h |    1 +
 include/net/sch_generic.h |    4 +---
 net/core/dev.c            |    5 +++--
 net/sched/sch_generic.c   |    4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ee583f6..5c32b70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -447,6 +447,7 @@ struct netdev_queue {
 	struct net_device	*dev;
 	struct Qdisc		*qdisc;
 	unsigned long		state;
+	spinlock_t		queue_lock;
 	spinlock_t		_xmit_lock;
 	int			xmit_lock_owner;
 	struct Qdisc		*qdisc_sleeping;
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b5f40d7..97ea112 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -195,9 +195,7 @@ static inline struct Qdisc *qdisc_root(struct Qdisc *qdisc)
 
 static inline spinlock_t *qdisc_root_lock(struct Qdisc *qdisc)
 {
-	struct Qdisc *root = qdisc_root(qdisc);
-
-	return qdisc_lock(root);
+	return &qdisc->dev_queue->queue_lock;
 }
 
 static inline struct net_device *qdisc_dev(struct Qdisc *qdisc)
diff --git a/net/core/dev.c b/net/core/dev.c
index 69320a5..79fe03e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2101,9 +2101,9 @@ static int ing_filter(struct sk_buff *skb)
 
 	q = rxq->qdisc;
 	if (q != &noop_qdisc) {
-		spin_lock(qdisc_lock(q));
+		spin_lock(qdisc_root_lock(q));
 		result = qdisc_enqueue_root(skb, q);
-		spin_unlock(qdisc_lock(q));
+		spin_unlock(qdisc_root_lock(q));
 	}
 
 	return result;
@@ -3861,6 +3861,7 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
 					  struct netdev_queue *dev_queue,
 					  void *_unused)
 {
+	spin_lock_init(&dev_queue->queue_lock);
 	spin_lock_init(&dev_queue->_xmit_lock);
 	netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
 	dev_queue->xmit_lock_owner = -1;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 9c9cd4d..330ea54 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -638,12 +638,12 @@ static void dev_deactivate_queue(struct net_device *dev,
 
 	qdisc = dev_queue->qdisc;
 	if (qdisc) {
-		spin_lock_bh(qdisc_lock(qdisc));
+		spin_lock_bh(qdisc_root_lock(qdisc));
 
 		dev_queue->qdisc = qdisc_default;
 		qdisc_reset(qdisc);
 
-		spin_unlock_bh(qdisc_lock(qdisc));
+		spin_unlock_bh(qdisc_root_lock(qdisc));
 	}
 }
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 16:27         ` Jarek Poplawski
@ 2008-08-02 19:18           ` David Miller
  2008-08-02 19:22             ` David Miller
  2008-08-02 20:19             ` Jarek Poplawski
  0 siblings, 2 replies; 28+ messages in thread
From: David Miller @ 2008-08-02 19:18 UTC (permalink / raw)
  To: jarkao2; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sat, 2 Aug 2008 18:27:33 +0200

> On Sat, Aug 02, 2008 at 03:37:19PM +0200, Jarek Poplawski wrote:
> ...
> > I guess this "root lock" has to go back to netdev_queue. Alas, I can't
> > test this, so if it's not a big problem maybe you could try this patch
> > before David goes back to this? (His patch should be removed before
> > using this one.)
> 
> Actually, this patch was incomplete, sorry. Here is a better one,
> I hope. But of course, now, even better is to wait for David's
> proposal.

Jarek, we can't put the root lock back into the netdev_queue, it
breaks all of the RCU handling of qdisc_destroy() which is fundamental
for how all the multiqueue stuff works now.

See my other emails, it isn't even necessary anyways.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 19:18           ` David Miller
@ 2008-08-02 19:22             ` David Miller
  2008-08-02 19:45               ` Tantilov, Emil S
  2008-08-02 20:19             ` Jarek Poplawski
  1 sibling, 1 reply; 28+ messages in thread
From: David Miller @ 2008-08-02 19:22 UTC (permalink / raw)
  To: jarkao2; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

From: David Miller <davem@davemloft.net>
Date: Sat, 02 Aug 2008 12:18:15 -0700 (PDT)

> See my other emails, it isn't even necessary anyways.

Sorry, because some idiot took part of the conversation private
my follow-on fixup patch didn't make it to the list, here it is.

It goes on top of the original patch I sent out:

diff --git a/net/core/dev.c b/net/core/dev.c
index 69320a5..da7acac 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1796,7 +1796,7 @@ gso:
 	skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
 #endif
 	if (q->enqueue) {
-		spinlock_t *root_lock = qdisc_root_lock(q);
+		spinlock_t *root_lock = qdisc_lock(q);
 
 		spin_lock(root_lock);
 
@@ -1995,7 +1995,7 @@ static void net_tx_action(struct softirq_action *h)
 			smp_mb__before_clear_bit();
 			clear_bit(__QDISC_STATE_SCHED, &q->state);
 
-			root_lock = qdisc_root_lock(q);
+			root_lock = qdisc_lock(q);
 			if (spin_trylock(root_lock)) {
 				qdisc_run(q);
 				spin_unlock(root_lock);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 19:22             ` David Miller
@ 2008-08-02 19:45               ` Tantilov, Emil S
  2008-08-02 21:46                 ` Tantilov, Emil S
  0 siblings, 1 reply; 28+ messages in thread
From: Tantilov, Emil S @ 2008-08-02 19:45 UTC (permalink / raw)
  To: David Miller, jarkao2@gmail.com
  Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org

Still no luck.

I was able to reset the interfaces and change the MTU multiple times until eventually the system froze. No trace.

Emil

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net]
Sent: Saturday, August 02, 2008 12:23 PM
To: jarkao2@gmail.com
Cc: Tantilov, Emil S; Kirsher, Jeffrey T; netdev@vger.kernel.org
Subject: Re: [BUG] NULL pointer dereference in skb_dequeue

From: David Miller <davem@davemloft.net>
Date: Sat, 02 Aug 2008 12:18:15 -0700 (PDT)

> See my other emails, it isn't even necessary anyways.

Sorry, because some idiot took part of the conversation private
my follow-on fixup patch didn't make it to the list, here it is.

It goes on top of the original patch I sent out:

diff --git a/net/core/dev.c b/net/core/dev.c
index 69320a5..da7acac 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1796,7 +1796,7 @@ gso:
        skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
 #endif
        if (q->enqueue) {
-               spinlock_t *root_lock = qdisc_root_lock(q);
+               spinlock_t *root_lock = qdisc_lock(q);

                spin_lock(root_lock);

@@ -1995,7 +1995,7 @@ static void net_tx_action(struct softirq_action *h)
                        smp_mb__before_clear_bit();
                        clear_bit(__QDISC_STATE_SCHED, &q->state);

-                       root_lock = qdisc_root_lock(q);
+                       root_lock = qdisc_lock(q);
                        if (spin_trylock(root_lock)) {
                                qdisc_run(q);
                                spin_unlock(root_lock);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 19:18           ` David Miller
  2008-08-02 19:22             ` David Miller
@ 2008-08-02 20:19             ` Jarek Poplawski
  2008-08-03  9:29               ` Jarek Poplawski
  1 sibling, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-02 20:19 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On Sat, Aug 02, 2008 at 12:18:15PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sat, 2 Aug 2008 18:27:33 +0200
> 
> > On Sat, Aug 02, 2008 at 03:37:19PM +0200, Jarek Poplawski wrote:
> > ...
> > > I guess this "root lock" has to go back to netdev_queue. Alas, I can't
> > > test this, so if it's not a big problem maybe you could try this patch
> > > before David goes back to this? (His patch should be removed before
> > > using this one.)
> > 
> > Actually, this patch was incomplete, sorry. Here is a better one,
> > I hope. But of course, now, even better is to wait for David's
> > proposal.
> 
> Jarek, we can't put the root lock back into the netdev_queue, it
> breaks all of the RCU handling of qdisc_destroy() which is fundamental
> for how all the multiqueue stuff works now.
> 
> See my other emails, it isn't even necessary anyways.

My patch was intended to check this "the hard way" since the subtle
method didn't work. I expected the final fix could be different.

Thanks for the explanations: BTW, I think, the qdisc_root_lock is a
bit misleading name, if qdisc_lock is also used for taking root lock.
Of course, this all makes sense, it simply needs more checking.

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 19:45               ` Tantilov, Emil S
@ 2008-08-02 21:46                 ` Tantilov, Emil S
  2008-08-03  2:26                   ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Tantilov, Emil S @ 2008-08-02 21:46 UTC (permalink / raw)
  To: Tantilov, Emil S, David Miller, jarkao2@gmail.com
  Cc: Kirsher, Jeffrey T, netdev@vger.kernel.org

Tantilov, Emil S wrote:
> Still no luck.
>
> I was able to reset the interfaces and change the MTU multiple times
> until eventually the system froze. No trace.
>
> Emil
>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Saturday, August 02, 2008 12:23 PM
> To: jarkao2@gmail.com
> Cc: Tantilov, Emil S; Kirsher, Jeffrey T; netdev@vger.kernel.org
> Subject: Re: [BUG] NULL pointer dereference in skb_dequeue
>
> From: David Miller <davem@davemloft.net>
> Date: Sat, 02 Aug 2008 12:18:15 -0700 (PDT)
>
>> See my other emails, it isn't even necessary anyways.
>
> Sorry, because some idiot took part of the conversation private
> my follow-on fixup patch didn't make it to the list, here it is.
>
> It goes on top of the original patch I sent out:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 69320a5..da7acac 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1796,7 +1796,7 @@ gso:
>         skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
>  #endif
>         if (q->enqueue) {
> -               spinlock_t *root_lock = qdisc_root_lock(q);
> +               spinlock_t *root_lock = qdisc_lock(q);
>
>                 spin_lock(root_lock);
>
> @@ -1995,7 +1995,7 @@ static void net_tx_action(struct softirq_action
>                         *h) smp_mb__before_clear_bit();
>                         clear_bit(__QDISC_STATE_SCHED, &q->state);
>
> -                       root_lock = qdisc_root_lock(q);
> +                       root_lock = qdisc_lock(q);
>                         if (spin_trylock(root_lock)) {
>                                 qdisc_run(q);
>                                 spin_unlock(root_lock);

Disregard my previous emails. This patch seems to fix the panic. I will leave a longer term stress test running to make sure it'stable.

Thanks,
Emil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 21:46                 ` Tantilov, Emil S
@ 2008-08-03  2:26                   ` David Miller
  2008-08-08 19:38                     ` Tantilov, Emil S
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-08-03  2:26 UTC (permalink / raw)
  To: emil.s.tantilov; +Cc: jarkao2, jeffrey.t.kirsher, netdev

From: "Tantilov, Emil S" <emil.s.tantilov@intel.com>
Date: Sat, 2 Aug 2008 15:46:13 -0600

> Disregard my previous emails. This patch seems to fix the panic. I
> will leave a longer term stress test running to make sure it'stable.

Thanks for testing, I'll commit the final patch and get it to
Linus.

Thanks again.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-02 20:19             ` Jarek Poplawski
@ 2008-08-03  9:29               ` Jarek Poplawski
  2008-08-03  9:50                 ` Jarek Poplawski
  2008-08-03  9:56                 ` David Miller
  0 siblings, 2 replies; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-03  9:29 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On Sat, Aug 02, 2008 at 10:19:44PM +0200, Jarek Poplawski wrote:
> On Sat, Aug 02, 2008 at 12:18:15PM -0700, David Miller wrote:
...
> > Jarek, we can't put the root lock back into the netdev_queue, it
> > breaks all of the RCU handling of qdisc_destroy() which is fundamental
> > for how all the multiqueue stuff works now.
> > 
> > See my other emails, it isn't even necessary anyways.
...
> Thanks for the explanations: BTW, I think, the qdisc_root_lock is a
> bit misleading name, if qdisc_lock is also used for taking root lock.
> Of course, this all makes sense, it simply needs more checking.

After some re-checking one more question: why do you think this
qdisc_root_lock() is safe as sch_tree_lock() (or anywhere else)? It
seems, eg. during deactivation it can get root_lock of qdisc_default,
and proceed with another qdisc?

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-03  9:29               ` Jarek Poplawski
@ 2008-08-03  9:50                 ` Jarek Poplawski
  2008-08-03  9:56                 ` David Miller
  1 sibling, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-03  9:50 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On Sun, Aug 03, 2008 at 11:29:26AM +0200, Jarek Poplawski wrote:
...
> After some re-checking one more question: why do you think this
> qdisc_root_lock() is safe as sch_tree_lock() (or anywhere else)? It
> seems, eg. during deactivation it can get root_lock of qdisc_default,
> and proceed with another qdisc?

I've just seen this new patch with ASSERT_RTNL(), so now I understand.

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-03  9:29               ` Jarek Poplawski
  2008-08-03  9:50                 ` Jarek Poplawski
@ 2008-08-03  9:56                 ` David Miller
  2008-08-03 10:08                   ` Jarek Poplawski
  1 sibling, 1 reply; 28+ messages in thread
From: David Miller @ 2008-08-03  9:56 UTC (permalink / raw)
  To: jarkao2; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 3 Aug 2008 11:29:26 +0200

> After some re-checking one more question: why do you think this
> qdisc_root_lock() is safe as sch_tree_lock() (or anywhere else)? It
> seems, eg. during deactivation it can get root_lock of qdisc_default,
> and proceed with another qdisc?

We hold RTNL at the time.

If it's the default qdisc, that's fine, we'll reset it and free
up the packets when the RCU handler of the qdisc_destroy() runs.

That's a case where locking the wrong qdisc is OK.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-03  9:56                 ` David Miller
@ 2008-08-03 10:08                   ` Jarek Poplawski
  0 siblings, 0 replies; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-03 10:08 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On Sun, Aug 03, 2008 at 02:56:16AM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sun, 3 Aug 2008 11:29:26 +0200
> 
> > After some re-checking one more question: why do you think this
> > qdisc_root_lock() is safe as sch_tree_lock() (or anywhere else)? It
> > seems, eg. during deactivation it can get root_lock of qdisc_default,
> > and proceed with another qdisc?
> 
> We hold RTNL at the time.
> 
> If it's the default qdisc, that's fine, we'll reset it and free
> up the packets when the RCU handler of the qdisc_destroy() runs.
> 
> That's a case where locking the wrong qdisc is OK.

Right! Thanks again,

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-03  2:26                   ` David Miller
@ 2008-08-08 19:38                     ` Tantilov, Emil S
  2008-08-09  7:29                       ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Tantilov, Emil S @ 2008-08-08 19:38 UTC (permalink / raw)
  To: David Miller
  Cc: jarkao2@gmail.com, Kirsher, Jeffrey T, netdev@vger.kernel.org

David Miller wrote:
> From: "Tantilov, Emil S" <emil.s.tantilov@intel.com>
> Date: Sat, 2 Aug 2008 15:46:13 -0600
>
>> Disregard my previous emails. This patch seems to fix the panic. I
>> will leave a longer term stress test running to make sure it'stable.
>
> Thanks for testing, I'll commit the final patch and get it to
> Linus.
>
> Thanks again.

I ran into similar panic again. This time on driver reload while passing traffic.

igb: probe of 0000:0c:00.0 failed with error -2
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffff8047bab8>] __skb_dequeue+0x2a/0x35
PGD 0
Oops: 0002 [1] SMP
CPU 1
Modules linked in: igb(-) nfsd lockd exportfs sunrpc pci_slot inet_lro [last unloaded: igb]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2-igb #2
RIP: 0010:[<ffffffff8047bab8>]  [<ffffffff8047bab8>] __skb_dequeue+0x2a/0x35
RSP: 0018:ffff8801af89beb8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88019a5bf6f8 RCX: ffff88018bb01180
RDX: ffff88019a5bf6f8 RSI: ffff88019a5bf6a4 RDI: ffff88019a5bf6f8
RBP: ffff88019a5bf600 R08: ffff880028042140 R09: ffff8801af88f110
R10: ffff880028043d40 R11: ffffffff8021aea9 R12: ffffffff80686c00
R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8801af868c40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8801af896000, task ffff8801af88f110)
Stack:  ffffffff8047bc9c 0000000000000008 ffff88019a5bf600 ffff88019a5bf6b0
 ffffffff8047c6a6 ffff880028043d40 ffff88019ac294b0 0000000000000002
 ffffffff80259b78 0000000000000001 ffffffff8067d0c0 000000000000000a
Call Trace:
 <IRQ>  [<ffffffff8047bc9c>] ? pfifo_fast_reset+0x1f/0x4c
 [<ffffffff8047c6a6>] ? __qdisc_destroy+0x43/0x8b
 [<ffffffff80259b78>] ? __rcu_process_callbacks+0x107/0x16a
 [<ffffffff80259bfe>] ? rcu_process_callbacks+0x23/0x43
 [<ffffffff80234f94>] ? __do_softirq+0x53/0xa0
 [<ffffffff8020c58c>] ? call_softirq+0x1c/0x28
 [<ffffffff8020dd91>] ? do_softirq+0x2c/0x6b
 [<ffffffff80234e98>] ? irq_exit+0x3f/0x90
 [<ffffffff8021b222>] ? smp_apic_timer_interrupt+0x76/0x81
 [<ffffffff8020bfd6>] ? apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff80211668>] ? mwait_idle+0x35/0x38
 [<ffffffff80209e0a>] ? cpu_idle+0x69/0x9b


Code: c3 48 8b 0f 48 39 f9 75 04 31 c9 eb 25 48 85 c9 74 20 ff 4f 10 48 8b 11 48 8b 41 08 48 c7 01 00 00 00 00 48 c7 41 08 00 00 00 00 <48> 89 10 48 89 42 08 48 89 c8 c3 41 55 49 89 f5 41 54 49 89 d4
RIP  [<ffffffff8047bab8>] __skb_dequeue+0x2a/0x35
 RSP <ffff8801af89beb8>
CR2: 0000000000000000
---[ end trace ccfa2c2f2295fda5 ]---
Kernel panic - not syncing: Aiee, killing interrupt handler

And another one (incomplete):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffff804ebd9b>] _spin_lock+0x5/0x15

Thanks,
Emil

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-08 19:38                     ` Tantilov, Emil S
@ 2008-08-09  7:29                       ` David Miller
  2008-08-09 22:32                         ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-08-09  7:29 UTC (permalink / raw)
  To: emil.s.tantilov; +Cc: jarkao2, jeffrey.t.kirsher, netdev

From: "Tantilov, Emil S" <emil.s.tantilov@intel.com>
Date: Fri, 8 Aug 2008 13:38:32 -0600

> I ran into similar panic again. This time on driver reload while
> passing traffic.

Thanks.  I think I know what's going on here but coming up
with a fix will take a little bit longer than usual.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-09  7:29                       ` David Miller
@ 2008-08-09 22:32                         ` Jarek Poplawski
  2008-08-10 19:04                           ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-09 22:32 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On Sat, Aug 09, 2008 at 12:29:56AM -0700, David Miller wrote:
> From: "Tantilov, Emil S" <emil.s.tantilov@intel.com>
> Date: Fri, 8 Aug 2008 13:38:32 -0600
> 
> > I ran into similar panic again. This time on driver reload while
> > passing traffic.
> 
> Thanks.  I think I know what's going on here but coming up
> with a fix will take a little bit longer than usual.

I guess you're thinking about something bigger, but here is a little,
maybe unrelated idea, which I think looks reasonable: after changing
the qdiscs, we should care more about pending activities on the "real"
one instead of the noop.

Jarek P.
---

 net/sched/sch_generic.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 7cf83b3..7bce88d 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -658,7 +658,7 @@ static bool some_qdisc_is_running(struct net_device *dev, int lock)
 		int val;
 
 		dev_queue = netdev_get_tx_queue(dev, i);
-		q = dev_queue->qdisc;
+		q = dev_queue->qdisc_sleeping;
 		root_lock = qdisc_lock(q);
 
 		if (lock)

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-09 22:32                         ` Jarek Poplawski
@ 2008-08-10 19:04                           ` Jarek Poplawski
  2008-08-11 10:01                             ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-10 19:04 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On Sun, Aug 10, 2008 at 12:32:40AM +0200, Jarek Poplawski wrote:
...
> I guess you're thinking about something bigger, but here is a little,
> maybe unrelated idea, which I think looks reasonable: after changing
> the qdiscs, we should care more about pending activities on the "real"
> one instead of the noop.

Hmm.. Actually, it's completely unreasonable. Let's forget this.

Jarek P.

> ---
> 
>  net/sched/sch_generic.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 7cf83b3..7bce88d 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -658,7 +658,7 @@ static bool some_qdisc_is_running(struct net_device *dev, int lock)
>  		int val;
>  
>  		dev_queue = netdev_get_tx_queue(dev, i);
> -		q = dev_queue->qdisc;
> +		q = dev_queue->qdisc_sleeping;
>  		root_lock = qdisc_lock(q);
>  
>  		if (lock)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-10 19:04                           ` Jarek Poplawski
@ 2008-08-11 10:01                             ` Jarek Poplawski
  2008-08-11 23:26                               ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-11 10:01 UTC (permalink / raw)
  To: David Miller; +Cc: emil.s.tantilov, jeffrey.t.kirsher, netdev

On 10-08-2008 21:04, Jarek Poplawski wrote:
...
> Hmm.. Actually, it's completely unreasonable. Let's forget this.

But accidentally it might even sometimes work here...

Currently, the most suspicious place to me seems to be
__netif_schedule(). Is it legal to store RCU protected pointers out of
rcu_read_lock() sections?

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-11 10:01                             ` Jarek Poplawski
@ 2008-08-11 23:26                               ` Paul E. McKenney
  2008-08-12  6:36                                 ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2008-08-11 23:26 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Mon, Aug 11, 2008 at 10:01:26AM +0000, Jarek Poplawski wrote:
> On 10-08-2008 21:04, Jarek Poplawski wrote:
> ...
> > Hmm.. Actually, it's completely unreasonable. Let's forget this.
> 
> But accidentally it might even sometimes work here...
> 
> Currently, the most suspicious place to me seems to be
> __netif_schedule(). Is it legal to store RCU protected pointers out of
> rcu_read_lock() sections?

Yes, but:

1.	You need to use one of the update-side primitives to do the
	store: rcu_assign_pointer(), list_add_rcu(), etc.

2.	There has to be some way for multiple updaters to coordinate,
	for example:

	a.	Only a single task is permitted to update.

	b.	Locking is used to coordinate among multiple updaters
		(so that only one such updater may proceed at a given
		time).

	c.	Atomic operations are used to coordinate multiple
		updaters.  Here be dragons, go for (a) or (b)
		instead unless you have an extremely good reason
		-and- you have both a proof of correctness and
		a totally brutal and malign test suite.

The main reason to update RCU-protected pointers within rcu_read_lock()
regions is when sharing code between RCU readers and updaters, or when
an RCU reader can see the need to do an update.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-11 23:26                               ` Paul E. McKenney
@ 2008-08-12  6:36                                 ` Jarek Poplawski
  2008-08-12 13:42                                   ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-12  6:36 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Mon, Aug 11, 2008 at 04:26:57PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 11, 2008 at 10:01:26AM +0000, Jarek Poplawski wrote:
> > On 10-08-2008 21:04, Jarek Poplawski wrote:
> > ...
> > > Hmm.. Actually, it's completely unreasonable. Let's forget this.
> > 
> > But accidentally it might even sometimes work here...
> > 
> > Currently, the most suspicious place to me seems to be
> > __netif_schedule(). Is it legal to store RCU protected pointers out of
> > rcu_read_lock() sections?
> 
> Yes, but:
> 
> 1.	You need to use one of the update-side primitives to do the
> 	store: rcu_assign_pointer(), list_add_rcu(), etc.
> 
> 2.	There has to be some way for multiple updaters to coordinate,
> 	for example:
> 
> 	a.	Only a single task is permitted to update.
> 
> 	b.	Locking is used to coordinate among multiple updaters
> 		(so that only one such updater may proceed at a given
> 		time).
> 
> 	c.	Atomic operations are used to coordinate multiple
> 		updaters.  Here be dragons, go for (a) or (b)
> 		instead unless you have an extremely good reason
> 		-and- you have both a proof of correctness and
> 		a totally brutal and malign test suite.
> 
> The main reason to update RCU-protected pointers within rcu_read_lock()
> regions is when sharing code between RCU readers and updaters, or when
> an RCU reader can see the need to do an update.

Sure, but I'm concerned here with pure RCU reading:

>From net/sched/sch_generic.c:

void __qdisc_run(struct Qdisc *q)
{
        unsigned long start_time = jiffies;

        while (qdisc_restart(q)) {
                /*
                 * Postpone processing if
                 * 1. another process needs the CPU;
                 * 2. we've been doing it for too long.
                 */
                if (need_resched() || jiffies != start_time) {
                        __netif_schedule(q);

This function is run from dev_queue_xmit() (net/core/dev.c) under
rcu_read_lock_bh(), and this "q" pointer is passed here for later use
(reading) by softirq run net_tx_action(). Alas in net/ RCU primitives
are probably omitted in a few places...

Thanks for the explanation,
Jarek P.

                        break;
                }
        }

        clear_bit(__QDISC_STATE_RUNNING, &q->state);
}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-12  6:36                                 ` Jarek Poplawski
@ 2008-08-12 13:42                                   ` Paul E. McKenney
  2008-08-12 18:09                                     ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2008-08-12 13:42 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Tue, Aug 12, 2008 at 06:36:22AM +0000, Jarek Poplawski wrote:
> On Mon, Aug 11, 2008 at 04:26:57PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 11, 2008 at 10:01:26AM +0000, Jarek Poplawski wrote:
> > > On 10-08-2008 21:04, Jarek Poplawski wrote:
> > > ...
> > > > Hmm.. Actually, it's completely unreasonable. Let's forget this.
> > > 
> > > But accidentally it might even sometimes work here...
> > > 
> > > Currently, the most suspicious place to me seems to be
> > > __netif_schedule(). Is it legal to store RCU protected pointers out of
> > > rcu_read_lock() sections?
> > 
> > Yes, but:
> > 
> > 1.	You need to use one of the update-side primitives to do the
> > 	store: rcu_assign_pointer(), list_add_rcu(), etc.
> > 
> > 2.	There has to be some way for multiple updaters to coordinate,
> > 	for example:
> > 
> > 	a.	Only a single task is permitted to update.
> > 
> > 	b.	Locking is used to coordinate among multiple updaters
> > 		(so that only one such updater may proceed at a given
> > 		time).
> > 
> > 	c.	Atomic operations are used to coordinate multiple
> > 		updaters.  Here be dragons, go for (a) or (b)
> > 		instead unless you have an extremely good reason
> > 		-and- you have both a proof of correctness and
> > 		a totally brutal and malign test suite.
> > 
> > The main reason to update RCU-protected pointers within rcu_read_lock()
> > regions is when sharing code between RCU readers and updaters, or when
> > an RCU reader can see the need to do an update.
> 
> Sure, but I'm concerned here with pure RCU reading:
> 
> >From net/sched/sch_generic.c:
> 
> void __qdisc_run(struct Qdisc *q)
> {
>         unsigned long start_time = jiffies;
> 
>         while (qdisc_restart(q)) {
>                 /*
>                  * Postpone processing if
>                  * 1. another process needs the CPU;
>                  * 2. we've been doing it for too long.
>                  */
>                 if (need_resched() || jiffies != start_time) {
>                         __netif_schedule(q);
> 
> This function is run from dev_queue_xmit() (net/core/dev.c) under
> rcu_read_lock_bh(), and this "q" pointer is passed here for later use
> (reading) by softirq run net_tx_action(). Alas in net/ RCU primitives
> are probably omitted in a few places...

If I understand this code, one way to handle it would be to increment
q->refcnt before passing to netif_schedule(), then decrementing it
(within an RCU read-side critical section) in the softirq handler.

There are probably other ways to handle this as well.

							Thanx, Paul

> Thanks for the explanation,
> Jarek P.
> 
>                         break;
>                 }
>         }
> 
>         clear_bit(__QDISC_STATE_RUNNING, &q->state);
> }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-12 13:42                                   ` Paul E. McKenney
@ 2008-08-12 18:09                                     ` Jarek Poplawski
  2008-08-12 20:18                                       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-12 18:09 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Tue, Aug 12, 2008 at 06:42:24AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2008 at 06:36:22AM +0000, Jarek Poplawski wrote:
...
> > >From net/sched/sch_generic.c:
> > 
> > void __qdisc_run(struct Qdisc *q)
> > {
> >         unsigned long start_time = jiffies;
> > 
> >         while (qdisc_restart(q)) {
> >                 /*
> >                  * Postpone processing if
> >                  * 1. another process needs the CPU;
> >                  * 2. we've been doing it for too long.
> >                  */
> >                 if (need_resched() || jiffies != start_time) {
> >                         __netif_schedule(q);
> > 
> > This function is run from dev_queue_xmit() (net/core/dev.c) under
> > rcu_read_lock_bh(), and this "q" pointer is passed here for later use
> > (reading) by softirq run net_tx_action(). Alas in net/ RCU primitives
> > are probably omitted in a few places...
> 
> If I understand this code, one way to handle it would be to increment
> q->refcnt before passing to netif_schedule(), then decrementing it
> (within an RCU read-side critical section) in the softirq handler.
> 
> There are probably other ways to handle this as well.

I understand this similarly (but I'm still trying to find out what's
wrong with reading this again in a separate read-side section).

David gave some additional explanations (which BTW don't look to me
like very "orthodox" RCU) in this thread:
http://marc.info/?l=linux-netdev&m=121851847805942&w=2

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-12 18:09                                     ` Jarek Poplawski
@ 2008-08-12 20:18                                       ` Paul E. McKenney
  2008-08-12 21:15                                         ` Jarek Poplawski
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2008-08-12 20:18 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Tue, Aug 12, 2008 at 08:09:27PM +0200, Jarek Poplawski wrote:
> On Tue, Aug 12, 2008 at 06:42:24AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2008 at 06:36:22AM +0000, Jarek Poplawski wrote:
> ...
> > > >From net/sched/sch_generic.c:
> > > 
> > > void __qdisc_run(struct Qdisc *q)
> > > {
> > >         unsigned long start_time = jiffies;
> > > 
> > >         while (qdisc_restart(q)) {
> > >                 /*
> > >                  * Postpone processing if
> > >                  * 1. another process needs the CPU;
> > >                  * 2. we've been doing it for too long.
> > >                  */
> > >                 if (need_resched() || jiffies != start_time) {
> > >                         __netif_schedule(q);
> > > 
> > > This function is run from dev_queue_xmit() (net/core/dev.c) under
> > > rcu_read_lock_bh(), and this "q" pointer is passed here for later use
> > > (reading) by softirq run net_tx_action(). Alas in net/ RCU primitives
> > > are probably omitted in a few places...
> > 
> > If I understand this code, one way to handle it would be to increment
> > q->refcnt before passing to netif_schedule(), then decrementing it
> > (within an RCU read-side critical section) in the softirq handler.
> > 
> > There are probably other ways to handle this as well.
> 
> I understand this similarly (but I'm still trying to find out what's
> wrong with reading this again in a separate read-side section).

The usual problem with re-reading in a separate read-side critical section
is that someone might have removed/destroyed it in the meantime.
Consider the following example:

Task 0:

	rcu_read_lock();
	p = rcu_dereference(global_pointer);
	if (p == NULL) {
		rcu_read_unlock();
		goto somewhere_else;
	}
	do_something_with(p);
	rcu_read_unlock();

	do_some_unrelated_stuff();

	rcu_read_lock();
	do_something_else_with(p);	/* BUG!!! */
	rcu_read_unlock();

	somewhere_else:

Task 1:

	spin_lock(&mylock);
	p = global_pointer;
	global_pointer = NULL;
	spin_unlock(&mylock);
	synchronize_rcu();
	kfree(p);

Suppose task 0 picks up the global_pointer just before task 1 NULLs it.
Then Task 1's synchronize_rcu() is within its rights to return as soon
as task 0 executes its first rcu_read_unlock().  This means that task
1's kfree(p) might happen before task 0's do_something_else_with(p),
which could cause general death and destruction.

> David gave some additional explanations (which BTW don't look to me
> like very "orthodox" RCU) in this thread:
> http://marc.info/?l=linux-netdev&m=121851847805942&w=2

It looks to me like Dave believes that there is in fact a problem:
http://marc.info/?l=linux-netdev&m=121851965707714&w=2

	But if it gets postponed into ksoftirqd... the RCU will pass
	too early.

	I'm still thinking about how to fix this without avoiding RCU
	and without adding new synchronization primitives.

The only change to Dave's comment that I would make is to his first
paragraph:

	But if it gets postponed into ksoftirqd or if the kernel has
	been built with CONFIG_PREEMPT_RCU... the RCU will pass too early.

My thought would be to use a reference count as noted earlier, on the
grounds that postponing to softirq should be relatively rare.  But again
I really cannot claim to understand this code.

Or am I missing something here?

							Thanx, Paul

> Thanks,
> Jarek P.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-12 20:18                                       ` Paul E. McKenney
@ 2008-08-12 21:15                                         ` Jarek Poplawski
  2008-08-12 22:33                                           ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Jarek Poplawski @ 2008-08-12 21:15 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Tue, Aug 12, 2008 at 01:18:58PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 12, 2008 at 08:09:27PM +0200, Jarek Poplawski wrote:
...
> > I understand this similarly (but I'm still trying to find out what's
> > wrong with reading this again in a separate read-side section).
> 
> The usual problem with re-reading in a separate read-side critical section
> is that someone might have removed/destroyed it in the meantime.
> Consider the following example:
> 
> Task 0:
> 
> 	rcu_read_lock();
> 	p = rcu_dereference(global_pointer);
> 	if (p == NULL) {
> 		rcu_read_unlock();
> 		goto somewhere_else;
> 	}
> 	do_something_with(p);
> 	rcu_read_unlock();
> 
> 	do_some_unrelated_stuff();
> 
> 	rcu_read_lock();
> 	do_something_else_with(p);	/* BUG!!! */
> 	rcu_read_unlock();
> 
> 	somewhere_else:
> 
> Task 1:
> 
> 	spin_lock(&mylock);
> 	p = global_pointer;
> 	global_pointer = NULL;
> 	spin_unlock(&mylock);
> 	synchronize_rcu();
> 	kfree(p);
> 
> Suppose task 0 picks up the global_pointer just before task 1 NULLs it.
> Then Task 1's synchronize_rcu() is within its rights to return as soon
> as task 0 executes its first rcu_read_unlock().  This means that task
> 1's kfree(p) might happen before task 0's do_something_else_with(p),
> which could cause general death and destruction.

Of course, I've considered here only re-reading with a separate
rcu_dereference(). BTW, in "our" code we can't have a NULL dereference:
in the "worst" case it points to a noop_qdisc, which is a static
structure with some basic callbacks used during deactivation.

> > David gave some additional explanations (which BTW don't look to me
> > like very "orthodox" RCU) in this thread:
> > http://marc.info/?l=linux-netdev&m=121851847805942&w=2
> 
> It looks to me like Dave believes that there is in fact a problem:
> http://marc.info/?l=linux-netdev&m=121851965707714&w=2
> 
> 	But if it gets postponed into ksoftirqd... the RCU will pass
> 	too early.
> 
> 	I'm still thinking about how to fix this without avoiding RCU
> 	and without adding new synchronization primitives.
> 
> The only change to Dave's comment that I would make is to his first
> paragraph:
> 
> 	But if it gets postponed into ksoftirqd or if the kernel has
> 	been built with CONFIG_PREEMPT_RCU... the RCU will pass too early.

As a matter of fact I wonder if it's 100% safe even without ksoftiqd
or PREEMPT_RCU? Considering that such a softirq handler would be
triggered after rcu_read_unlock_bh(), and maybe after some additional
hard or soft irq handlers, isn't it possible some RCU reclaiming code
running on another cpu could manage to start kfreeing in between?

> My thought would be to use a reference count as noted earlier, on the
> grounds that postponing to softirq should be relatively rare.  But again
> I really cannot claim to understand this code.
> 
> Or am I missing something here?

I don't think so. I guess David've considered this all too, but he
probably wants to re-check for any possible optimizations.

Jarek P.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [BUG] NULL pointer dereference in skb_dequeue
  2008-08-12 21:15                                         ` Jarek Poplawski
@ 2008-08-12 22:33                                           ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2008-08-12 22:33 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, emil.s.tantilov, jeffrey.t.kirsher, netdev

On Tue, Aug 12, 2008 at 11:15:21PM +0200, Jarek Poplawski wrote:
> On Tue, Aug 12, 2008 at 01:18:58PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 12, 2008 at 08:09:27PM +0200, Jarek Poplawski wrote:
> ...
> > > I understand this similarly (but I'm still trying to find out what's
> > > wrong with reading this again in a separate read-side section).
> > 
> > The usual problem with re-reading in a separate read-side critical section
> > is that someone might have removed/destroyed it in the meantime.
> > Consider the following example:
> > 
> > Task 0:
> > 
> > 	rcu_read_lock();
> > 	p = rcu_dereference(global_pointer);
> > 	if (p == NULL) {
> > 		rcu_read_unlock();
> > 		goto somewhere_else;
> > 	}
> > 	do_something_with(p);
> > 	rcu_read_unlock();
> > 
> > 	do_some_unrelated_stuff();
> > 
> > 	rcu_read_lock();
> > 	do_something_else_with(p);	/* BUG!!! */
> > 	rcu_read_unlock();
> > 
> > 	somewhere_else:
> > 
> > Task 1:
> > 
> > 	spin_lock(&mylock);
> > 	p = global_pointer;
> > 	global_pointer = NULL;
> > 	spin_unlock(&mylock);
> > 	synchronize_rcu();
> > 	kfree(p);
> > 
> > Suppose task 0 picks up the global_pointer just before task 1 NULLs it.
> > Then Task 1's synchronize_rcu() is within its rights to return as soon
> > as task 0 executes its first rcu_read_unlock().  This means that task
> > 1's kfree(p) might happen before task 0's do_something_else_with(p),
> > which could cause general death and destruction.
> 
> Of course, I've considered here only re-reading with a separate
> rcu_dereference(). BTW, in "our" code we can't have a NULL dereference:
> in the "worst" case it points to a noop_qdisc, which is a static
> structure with some basic callbacks used during deactivation.

OK, in that case you would not need the NULL check and goto, but
the example would remain the same otherwise.

> > > David gave some additional explanations (which BTW don't look to me
> > > like very "orthodox" RCU) in this thread:
> > > http://marc.info/?l=linux-netdev&m=121851847805942&w=2
> > 
> > It looks to me like Dave believes that there is in fact a problem:
> > http://marc.info/?l=linux-netdev&m=121851965707714&w=2
> > 
> > 	But if it gets postponed into ksoftirqd... the RCU will pass
> > 	too early.
> > 
> > 	I'm still thinking about how to fix this without avoiding RCU
> > 	and without adding new synchronization primitives.
> > 
> > The only change to Dave's comment that I would make is to his first
> > paragraph:
> > 
> > 	But if it gets postponed into ksoftirqd or if the kernel has
> > 	been built with CONFIG_PREEMPT_RCU... the RCU will pass too early.
> 
> As a matter of fact I wonder if it's 100% safe even without ksoftiqd
> or PREEMPT_RCU? Considering that such a softirq handler would be
> triggered after rcu_read_unlock_bh(), and maybe after some additional
> hard or soft irq handlers, isn't it possible some RCU reclaiming code
> running on another cpu could manage to start kfreeing in between?

Good point -- even if it were impossible in the current implementation,
RCU is certainly within its rights to do the kfreeing in between.  So
the code is at best an accident waiting to happen.

> > My thought would be to use a reference count as noted earlier, on the
> > grounds that postponing to softirq should be relatively rare.  But again
> > I really cannot claim to understand this code.
> > 
> > Or am I missing something here?
> 
> I don't think so. I guess David've considered this all too, but he
> probably wants to re-check for any possible optimizations.

Fair enough!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-08-12 22:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-01 23:40 [BUG] NULL pointer dereference in skb_dequeue Jeff Kirsher
2008-08-02  1:03 ` David Miller
2008-08-02  1:20   ` David Miller
2008-08-02  9:36     ` Tantilov, Emil S
2008-08-02 13:37       ` Jarek Poplawski
2008-08-02 16:27         ` Jarek Poplawski
2008-08-02 19:18           ` David Miller
2008-08-02 19:22             ` David Miller
2008-08-02 19:45               ` Tantilov, Emil S
2008-08-02 21:46                 ` Tantilov, Emil S
2008-08-03  2:26                   ` David Miller
2008-08-08 19:38                     ` Tantilov, Emil S
2008-08-09  7:29                       ` David Miller
2008-08-09 22:32                         ` Jarek Poplawski
2008-08-10 19:04                           ` Jarek Poplawski
2008-08-11 10:01                             ` Jarek Poplawski
2008-08-11 23:26                               ` Paul E. McKenney
2008-08-12  6:36                                 ` Jarek Poplawski
2008-08-12 13:42                                   ` Paul E. McKenney
2008-08-12 18:09                                     ` Jarek Poplawski
2008-08-12 20:18                                       ` Paul E. McKenney
2008-08-12 21:15                                         ` Jarek Poplawski
2008-08-12 22:33                                           ` Paul E. McKenney
2008-08-02 20:19             ` Jarek Poplawski
2008-08-03  9:29               ` Jarek Poplawski
2008-08-03  9:50                 ` Jarek Poplawski
2008-08-03  9:56                 ` David Miller
2008-08-03 10:08                   ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).