* [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race
@ 2007-09-20 14:46 Thomas Gleixner
2007-09-21 22:08 ` David Brownell
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2007-09-20 14:46 UTC (permalink / raw)
To: Andrew Morton; +Cc: David Brownell, LKML, Benedikt Spranger, Stable Team
From: Benedikt Spranger <bene@linutronix.de>
An USB error interrupt (e.g. disconnect) nukes the pending requests for
an ethernet gadget device asynchronously. This can race against
eth_start_xmit(), where we end up dereferencing the list head itself.
The nuke code is serialized against eth_start_xmit via dev->req_lock,
but we need to check the list for empty first instead of unconditionally
accessing dev->tx_reqs.next.
This is a long standing bug, which should be fixed in stable as well.
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c
index 593e235..f1d7c82 100644
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * dev->tx_reqs may be empty due to an error interrupt which
+ * nuked all requests.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ netif_stop_queue(net);
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* last request in list: stop queue */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race
2007-09-20 14:46 [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race Thomas Gleixner
@ 2007-09-21 22:08 ` David Brownell
2007-09-22 17:41 ` [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update) Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2007-09-21 22:08 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Andrew Morton, LKML, Benedikt Spranger, Stable Team
On Thursday 20 September 2007, Thomas Gleixner wrote:
> From: Benedikt Spranger <bene@linutronix.de>
>
> An USB error interrupt (e.g. disconnect) nukes the pending requests for
> an ethernet gadget device asynchronously. This can race against
> eth_start_xmit(), where we end up dereferencing the list head itself.
>
> The nuke code is serialized against eth_start_xmit via dev->req_lock,
> but we need to check the list for empty first instead of unconditionally
> accessing dev->tx_reqs.next.
Looks OK, although the comments are confusingly incorrect which
made it hard to see what this was doing:
- There is no "nuke()" method in this driver, even comments
don't use that term.
- When "nuke" is used in the gadget stack, it means the way
that controller drivers scrub out a queue of live requests
queued to an endpoint ... *NOT* recycling memory that sits
on a freelist, which is what's involved here.
- Disconnect is *NOT* an error, it's a routine occurrence
which can happen at essentially any time.
If you update the patch and comments accordingly, ACK.
> This is a long standing bug, which should be fixed in stable as well.
>
> Signed-off-by: Benedikt Spranger <bene@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c
> index 593e235..f1d7c82 100644
> --- a/drivers/usb/gadget/ether.c
> +++ b/drivers/usb/gadget/ether.c
> @@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
> }
>
> spin_lock_irqsave(&dev->req_lock, flags);
> + /*
> + * dev->tx_reqs may be empty due to an error interrupt which
Please reword: "may be empty because disconnecting an
active device cleans out this freelist".
> + * nuked all requests.
> + */
> + if (list_empty(&dev->tx_reqs)) {
> + netif_stop_queue(net);
> + spin_unlock_irqrestore(&dev->req_lock, flags);
> + return 1;
> + }
> +
> req = container_of (dev->tx_reqs.next, struct usb_request, list);
> list_del (&req->list);
> +
> + /* last request in list: stop queue */
> if (list_empty (&dev->tx_reqs))
> netif_stop_queue (net);
> spin_unlock_irqrestore(&dev->req_lock, flags);
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-21 22:08 ` David Brownell
@ 2007-09-22 17:41 ` Thomas Gleixner
2007-09-22 19:18 ` David Brownell
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2007-09-22 17:41 UTC (permalink / raw)
To: David Brownell; +Cc: Andrew Morton, LKML, Benedikt Spranger, Stable Team
From: Benedikt Spranger <bene@linutronix.de>
eth_start_xmit() can race against a disconnect interrupt in the gadget
device driver, which nukes all pending request. Right now we access the
pending request list unconditionally and dereference the request list
head itself in such a case, which results in an Oops.
Check whether the list is empty before actually dereferencing
dev->tx_reqs.next. Also add a comment for the second list_empty check
further down to avoid confusion.
Long standing bug. Patch should be applied to stable as well.
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c
index 593e235..f2a7bd5 100644
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,21 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * dev->tx_reqs may be empty. We raced against a disconnect
+ * interrupt in the gadget device driver, which nuked all
+ * pending requests.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ netif_stop_queue(net);
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* last request in list: stop queue */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-22 17:41 ` [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update) Thomas Gleixner
@ 2007-09-22 19:18 ` David Brownell
2007-09-22 19:30 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2007-09-22 19:18 UTC (permalink / raw)
To: tglx; +Cc: stable, linux-kernel, bene, akpm
I think you misread my comment. Those requests are **NOT** pending!!
So this update has a *MORE* incorrect description of the issue.
That's just the freelist ... it's a fairly conventional model whereby
there's a pool of "free" request slots which can be issued. When the
pool empties, the TX queue shuts down until one of the requests which
is pending in the hardware completes, and makes a slot free.
The problem you're addressing is that there's a small window where a
disconnect IRQ can shut down the TX queue (and empty that freelist)
after upper layers in the network stack started a transmission on
an active (pre-disconnect) TX queue.
That problem is *NOT* related to any pending requests at all!!
NAK...
> From tglx@linutronix.de Sat Sep 22 10:51:00 2007
> Subject: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt
> race -V2 (comments update)
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Sat, 22 Sep 2007 19:41:01 +0200
>
> From: Benedikt Spranger <bene@linutronix.de>
>
> eth_start_xmit() can race against a disconnect interrupt in the gadget
> device driver, which nukes all pending request. Right now we access the
> pending request list unconditionally and dereference the request list
> head itself in such a case, which results in an Oops.
>
> Check whether the list is empty before actually dereferencing
> dev->tx_reqs.next. Also add a comment for the second list_empty check
> further down to avoid confusion.
>
> Long standing bug. Patch should be applied to stable as well.
>
> Signed-off-by: Benedikt Spranger <bene@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c
> index 593e235..f2a7bd5 100644
> ...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-22 19:18 ` David Brownell
@ 2007-09-22 19:30 ` Thomas Gleixner
2007-09-22 20:14 ` David Brownell
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2007-09-22 19:30 UTC (permalink / raw)
To: David Brownell; +Cc: stable, linux-kernel, bene, akpm
On Sat, 2007-09-22 at 12:18 -0700, David Brownell wrote:
> I think you misread my comment. Those requests are **NOT** pending!!
> So this update has a *MORE* incorrect description of the issue.
>
> That's just the freelist ... it's a fairly conventional model whereby
> there's a pool of "free" request slots which can be issued. When the
> pool empties, the TX queue shuts down until one of the requests which
> is pending in the hardware completes, and makes a slot free.
>
> The problem you're addressing is that there's a small window where a
> disconnect IRQ can shut down the TX queue (and empty that freelist)
> after upper layers in the network stack started a transmission on
> an active (pre-disconnect) TX queue.
>
> That problem is *NOT* related to any pending requests at all!!
Sorry, I misunderstood your comment. Can you please add the correct
comment yourself before we play some more rounds of ping pong ?
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-22 19:30 ` Thomas Gleixner
@ 2007-09-22 20:14 ` David Brownell
2007-09-22 20:23 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2007-09-22 20:14 UTC (permalink / raw)
To: tglx; +Cc: stable, linux-kernel, bene, akpm
How's this? Note that the queue should already have been stopped,
so I removed what should be an extra call (as well as fixing the
comments).
- Dave
========
From: Thomas Gleixner <tglx@linutronix.de>
This patch fixes a longstanding race in the Ethernet gadget driver,
which can cause an oops on device disconnect. The fix is just to
make the TX path check whether its freelist is empty. That check
is otherwise not necessary, since the queue is always stopped when
that list empties (and restarted when request completion puts an
entry back on that freelist).
The race window starts when the network code decides to transmit a
packet, and ends when hard_start_xmit() grabs the freelist lock.
If disconnect() is called inside that window, it shuts down the
TX queue and breaks the otherwise-solid assumption that packets are
never sent when the TX queue is stopped.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * the freelist can be empty if an interrupt triggered disconnect()
+ * and reconfigured the gadget (shutting down this queue) after the
+ * network stack decided to xmit but before we got the spinlock.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* temporarily stop TX queue when the freelist empties */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-22 20:14 ` David Brownell
@ 2007-09-22 20:23 ` Thomas Gleixner
2007-09-22 20:53 ` David Brownell
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2007-09-22 20:23 UTC (permalink / raw)
To: David Brownell; +Cc: stable, linux-kernel, bene, akpm
On Sat, 2007-09-22 at 13:14 -0700, David Brownell wrote:
> How's this? Note that the queue should already have been stopped,
> so I removed what should be an extra call (as well as fixing the
> comments).
Yeah, stop queue should be not necessary.
> - Dave
>
> ========
> From: Thomas Gleixner <tglx@linutronix.de>
Please change to:
From: Benedikt Spranger <bene@linutronix.de>
He did all the grump work of figuring out what's going wrong. I was just
the messenger.
> This patch fixes a longstanding race in the Ethernet gadget driver,
> which can cause an oops on device disconnect. The fix is just to
> make the TX path check whether its freelist is empty. That check
> is otherwise not necessary, since the queue is always stopped when
> that list empties (and restarted when request completion puts an
> entry back on that freelist).
Sigh. I need a real deep look inside that code to understand, why
tx_reqs is not a requestlist but a freelist. Very intuitive naming :)
> The race window starts when the network code decides to transmit a
> packet, and ends when hard_start_xmit() grabs the freelist lock.
> If disconnect() is called inside that window, it shuts down the
> TX queue and breaks the otherwise-solid assumption that packets are
> never sent when the TX queue is stopped.
Please add our signed offs as well
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Thanks,
tglx
> --- a/drivers/usb/gadget/ether.c
> +++ b/drivers/usb/gadget/ether.c
> @@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
> }
>
> spin_lock_irqsave(&dev->req_lock, flags);
> + /*
> + * the freelist can be empty if an interrupt triggered disconnect()
> + * and reconfigured the gadget (shutting down this queue) after the
> + * network stack decided to xmit but before we got the spinlock.
> + */
> + if (list_empty(&dev->tx_reqs)) {
> + spin_unlock_irqrestore(&dev->req_lock, flags);
> + return 1;
> + }
> +
> req = container_of (dev->tx_reqs.next, struct usb_request, list);
> list_del (&req->list);
> +
> + /* temporarily stop TX queue when the freelist empties */
> if (list_empty (&dev->tx_reqs))
> netif_stop_queue (net);
> spin_unlock_irqrestore(&dev->req_lock, flags);
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-22 20:23 ` Thomas Gleixner
@ 2007-09-22 20:53 ` David Brownell
2007-09-22 23:32 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2007-09-22 20:53 UTC (permalink / raw)
To: tglx; +Cc: stable, linux-kernel, bene, akpm
> Sigh. I need a real deep look inside that code to understand, why
> tx_reqs is not a requestlist but a freelist. Very intuitive naming :)
It *is* a list of requests: free ones -- the only kind this level of
driver is allowed to remember! ;)
Yeah, I had to go back and read the driver again before I understood
just what problem this patch was trying to fix. Which is why I wanted
to make sure the mismatch between comments and contents was resolved.
- Dave
========== CUT HERE
From: Benedikt Spranger <bene@linutronix.de>
This patch fixes a longstanding race in the Ethernet gadget driver,
which can cause an oops on device disconnect. The fix is just to
make the TX path check whether its freelist is empty. That check
is otherwise not necessary, since the queue is always stopped when
that list empties (and restarted when request completion puts an
entry back on that freelist).
The race window starts when the network code decides to transmit a
packet, and ends when hard_start_xmit() grabs the freelist lock.
When disconnect() is called inside that window, it shuts down the
TX queue and breaks the otherwise-solid assumption that packets are
never sent through a TX queue that's stopped.
Signed-off-by: Benedikt Spranger <bene@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
--- a/drivers/usb/gadget/ether.c
+++ b/drivers/usb/gadget/ether.c
@@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net)
}
spin_lock_irqsave(&dev->req_lock, flags);
+ /*
+ * this freelist can be empty if an interrupt triggered disconnect()
+ * and reconfigured the gadget (shutting down this queue) after the
+ * network stack decided to xmit but before we got the spinlock.
+ */
+ if (list_empty(&dev->tx_reqs)) {
+ spin_unlock_irqrestore(&dev->req_lock, flags);
+ return 1;
+ }
+
req = container_of (dev->tx_reqs.next, struct usb_request, list);
list_del (&req->list);
+
+ /* temporarily stop TX queue when the freelist empties */
if (list_empty (&dev->tx_reqs))
netif_stop_queue (net);
spin_unlock_irqrestore(&dev->req_lock, flags);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
2007-09-22 20:53 ` David Brownell
@ 2007-09-22 23:32 ` Thomas Gleixner
0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2007-09-22 23:32 UTC (permalink / raw)
To: David Brownell; +Cc: stable, linux-kernel, bene, akpm
On Sat, 2007-09-22 at 13:53 -0700, David Brownell wrote:
> > Sigh. I need a real deep look inside that code to understand, why
> > tx_reqs is not a requestlist but a freelist. Very intuitive naming :)
>
> It *is* a list of requests: free ones -- the only kind this level of
> driver is allowed to remember! ;)
>
> Yeah, I had to go back and read the driver again before I understood
> just what problem this patch was trying to fix. Which is why I wanted
> to make sure the mismatch between comments and contents was resolved.
Fair enough. Thanks for sanitizing the comments.
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-09-22 23:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-20 14:46 [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race Thomas Gleixner
2007-09-21 22:08 ` David Brownell
2007-09-22 17:41 ` [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update) Thomas Gleixner
2007-09-22 19:18 ` David Brownell
2007-09-22 19:30 ` Thomas Gleixner
2007-09-22 20:14 ` David Brownell
2007-09-22 20:23 ` Thomas Gleixner
2007-09-22 20:53 ` David Brownell
2007-09-22 23:32 ` Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox