* [PATCH] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
@ 2025-09-20 4:50 I Viswanath
2025-09-20 15:30 ` Andrew Lunn
0 siblings, 1 reply; 15+ messages in thread
From: I Viswanath @ 2025-09-20 4:50 UTC (permalink / raw)
To: petkan, andrew+netdev, davem, edumazet, kuba, pabeni
Cc: linux-usb, netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, I Viswanath, syzbot+78cae3f37c62ad092caa
syzbot reported WARNING in rtl8150_start_xmit/usb_submit_urb.
This is a possible sequence of events:
CPU0 (in rtl8150_start_xmit) CPU1 (in rtl8150_start_xmit) CPU2 (in rtl8150_set_multicast)
netif_stop_queue();
netif_stop_queue();
usb_submit_urb();
netif_wake_queue(); <-- Wakes up TX queue before it's ready
netif_stop_queue();
usb_submit_urb(); <-- Warning
freeing urb
Remove netif_wake_queue and corresponding netif_stop_queue in rtl8150_set_multicast to
prevent this sequence of events
Reported-and-tested-by: syzbot+78cae3f37c62ad092caa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=78cae3f37c62ad092caa
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
Relevant logs:
[ 65.779651][ T5648] About to enter stop queue ffff88805061e000, eth4
[ 65.779664][ T5648] After stop queue ffff88805061e000, eth4
[ 65.780296][ T5648] net eth4: eth name:eth4 SUBMIT: tx_urb=ffff888023219000, status=0, transfer_buffer_length=60, dev=ffff88805061ed80, netdev=ffff88805061e000, skb=ffff88804f907b80
[ 65.790962][ T760] About to enter stop queue ffff88805061e000, eth4
[ 65.790978][ T760] After stop queue ffff88805061e000, eth4
[ 65.791874][ T760] net eth4: We are inside Multicast dev:ffff88805061ed80, netdev:ffff88805061e000
[ 65.793259][ T760] About to enter netif_wake_queue ffff88805061e000, eth4
[ 65.793264][ T760] After netif_wake_queue ffff88805061e000, eth4
[ 65.822319][ T5829] About to enter stop queue ffff88805061e000, eth4
[ 65.823135][ T5829] After stop queue ffff88805061e000, eth4
[ 65.823739][ T5829] net eth4: eth name:eth4 SUBMIT: tx_urb=ffff888023219000, status=-115, transfer_buffer_length=90, dev=ffff88805061ed80, netdev=ffff88805061e000, skb=ffff88804b5363c0
drivers/net/usb/rtl8150.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/usb/rtl8150.c b/drivers/net/usb/rtl8150.c
index ddff6f19ff98..92add3daadbb 100644
--- a/drivers/net/usb/rtl8150.c
+++ b/drivers/net/usb/rtl8150.c
@@ -664,7 +664,6 @@ static void rtl8150_set_multicast(struct net_device *netdev)
rtl8150_t *dev = netdev_priv(netdev);
u16 rx_creg = 0x9e;
- netif_stop_queue(netdev);
if (netdev->flags & IFF_PROMISC) {
rx_creg |= 0x0001;
dev_info(&netdev->dev, "%s: promiscuous mode\n", netdev->name);
@@ -678,7 +677,6 @@ static void rtl8150_set_multicast(struct net_device *netdev)
rx_creg &= 0x00fc;
}
async_set_registers(dev, RCR, sizeof(rx_creg), rx_creg);
- netif_wake_queue(netdev);
}
static netdev_tx_t rtl8150_start_xmit(struct sk_buff *skb,
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-20 4:50 [PATCH] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast I Viswanath
@ 2025-09-20 15:30 ` Andrew Lunn
2025-09-20 16:52 ` viswanath
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2025-09-20 15:30 UTC (permalink / raw)
To: I Viswanath
Cc: petkan, andrew+netdev, davem, edumazet, kuba, pabeni, linux-usb,
netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, syzbot+78cae3f37c62ad092caa
On Sat, Sep 20, 2025 at 10:20:59AM +0530, I Viswanath wrote:
> syzbot reported WARNING in rtl8150_start_xmit/usb_submit_urb.
> This is a possible sequence of events:
>
> CPU0 (in rtl8150_start_xmit) CPU1 (in rtl8150_start_xmit) CPU2 (in rtl8150_set_multicast)
> netif_stop_queue();
> netif_stop_queue();
> usb_submit_urb();
> netif_wake_queue(); <-- Wakes up TX queue before it's ready
> netif_stop_queue();
> usb_submit_urb(); <-- Warning
> freeing urb
>
> Remove netif_wake_queue and corresponding netif_stop_queue in rtl8150_set_multicast to
> prevent this sequence of events
Please expand this sentence with an explanation of why this is
safe. Why are these two calls not needed? The original author of this
code thought they where needed, so you need to explain why they are
not needed.
Andrew
---
pw-bot: cr
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-20 15:30 ` Andrew Lunn
@ 2025-09-20 16:52 ` viswanath
2025-09-20 17:28 ` Andrew Lunn
0 siblings, 1 reply; 15+ messages in thread
From: viswanath @ 2025-09-20 16:52 UTC (permalink / raw)
To: Andrew Lunn
Cc: petkan, andrew+netdev, davem, edumazet, kuba, pabeni, linux-usb,
netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, syzbot+78cae3f37c62ad092caa
On Sat, 20 Sept 2025 at 21:00, Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sat, Sep 20, 2025 at 10:20:59AM +0530, I Viswanath wrote:
> > syzbot reported WARNING in rtl8150_start_xmit/usb_submit_urb.
> > This is a possible sequence of events:
> >
> > CPU0 (in rtl8150_start_xmit) CPU1 (in rtl8150_start_xmit) CPU2 (in rtl8150_set_multicast)
> > netif_stop_queue();
> > netif_stop_queue();
> > usb_submit_urb();
> > netif_wake_queue(); <-- Wakes up TX queue before it's ready
> > netif_stop_queue();
> > usb_submit_urb(); <-- Warning
> > freeing urb
> >
> > Remove netif_wake_queue and corresponding netif_stop_queue in rtl8150_set_multicast to
> > prevent this sequence of events
>
> Please expand this sentence with an explanation of why this is
> safe. Why are these two calls not needed? The original author of this
> code thought they where needed, so you need to explain why they are
> not needed.
>
> Andrew
>
> ---
> pw-bot: cr
Hello,
Thanks for pointing that out. I wasn't thinking from that point of view.
According to Documentation, rtl8150_set_multicast (the
ndo_set_rx_mode callback) should
rely on the netif_addr_lock spinlock, not the netif_tx_lock
manipulated by netif
stop/start/wake queue functions.
However, There is no need to use the netif_addr_lock in the driver
directly because
the core function (dev_set_rx_mode) invoking this function locks
and unlocks the lock
correctly.
Synchronization is therefore handled by the core, making it safe
to remove that lock.
From what I have seen, every network driver assumes this for the
ndo_set_rx_mode callback.
I am not sure what the historical context was for using the
tx_lock as the synchronization
mechanism here but it's definitely not valid in the modern networking stack.
Thanks
Viswanath
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-20 16:52 ` viswanath
@ 2025-09-20 17:28 ` Andrew Lunn
2025-09-20 18:18 ` [PATCH net v2] " I Viswanath
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Lunn @ 2025-09-20 17:28 UTC (permalink / raw)
To: viswanath
Cc: petkan, andrew+netdev, davem, edumazet, kuba, pabeni, linux-usb,
netdev, linux-kernel, skhan, linux-kernel-mentees,
david.hunter.linux, syzbot+78cae3f37c62ad092caa
> Thanks for pointing that out. I wasn't thinking from that point of view.
>
> According to Documentation, rtl8150_set_multicast (the
> ndo_set_rx_mode callback) should
> rely on the netif_addr_lock spinlock, not the netif_tx_lock
> manipulated by netif
> stop/start/wake queue functions.
>
> However, There is no need to use the netif_addr_lock in the driver
> directly because
> the core function (dev_set_rx_mode) invoking this function locks
> and unlocks the lock
> correctly.
>
> Synchronization is therefore handled by the core, making it safe
> to remove that lock.
>
> From what I have seen, every network driver assumes this for the
> ndo_set_rx_mode callback.
>
> I am not sure what the historical context was for using the
> tx_lock as the synchronization
> mechanism here but it's definitely not valid in the modern networking stack.
Thanks. Please include an explanation in V2. Also, please read:
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html
Andrew
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-20 17:28 ` Andrew Lunn
@ 2025-09-20 18:18 ` I Viswanath
2025-09-23 1:07 ` Jakub Kicinski
2025-09-24 7:47 ` Michal Pecio
0 siblings, 2 replies; 15+ messages in thread
From: I Viswanath @ 2025-09-20 18:18 UTC (permalink / raw)
To: andrew
Cc: andrew+netdev, davem, david.hunter.linux, edumazet, kuba,
linux-kernel-mentees, linux-kernel, linux-usb, netdev, pabeni,
petkan, skhan, syzbot+78cae3f37c62ad092caa, viswanathiyyappan
syzbot reported WARNING in rtl8150_start_xmit/usb_submit_urb.
This is the sequence of events that leads to the Warning:
CPU0 (in rtl8150_start_xmit) CPU1 (in rtl8150_start_xmit) CPU2 (in rtl8150_set_multicast)
netif_stop_queue();
netif_stop_queue();
usb_submit_urb();
netif_wake_queue(); <-- Wakes up TX queue before it's ready
netif_stop_queue();
usb_submit_urb(); <-- Warning
freeing urb
rtl8150_set_multicast is rtl8150's implementation of ndo_set_rx_mode and
should not be calling netif_stop_queue and notif_start_queue as these handle
TX queue synchronization.
The net core function dev_set_rx_mode handles the synchronization
for rtl8150_set_multicast making it safe to remove these locks.
Reported-and-tested-by: syzbot+78cae3f37c62ad092caa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=78cae3f37c62ad092caa
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
---
v2:
- Add explanation why netif_stop_queue/netif_wake_queue can be safely removed
- Add the net prefix to the patch, designating it to the net tree
Relevant logs:
[ 65.779651][ T5648] About to enter stop queue ffff88805061e000, eth4
[ 65.779664][ T5648] After stop queue ffff88805061e000, eth4
[ 65.780296][ T5648] net eth4: eth name:eth4 SUBMIT: tx_urb=ffff888023219000, status=0, transfer_buffer_length=60, dev=ffff88805061ed80, netdev=ffff88805061e000, skb=ffff88804f907b80
[ 65.790962][ T760] About to enter stop queue ffff88805061e000, eth4
[ 65.790978][ T760] After stop queue ffff88805061e000, eth4
[ 65.791874][ T760] net eth4: We are inside Multicast dev:ffff88805061ed80, netdev:ffff88805061e000
[ 65.793259][ T760] About to enter netif_wake_queue ffff88805061e000, eth4
[ 65.793264][ T760] After netif_wake_queue ffff88805061e000, eth4
[ 65.822319][ T5829] About to enter stop queue ffff88805061e000, eth4
[ 65.823135][ T5829] After stop queue ffff88805061e000, eth4
[ 65.823739][ T5829] net eth4: eth name:eth4 SUBMIT: tx_urb=ffff888023219000, status=-115, transfer_buffer_length=90, dev=ffff88805061ed80, netdev=ffff88805061e000, skb=ffff88804b5363c0
drivers/net/usb/rtl8150.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/usb/rtl8150.c b/drivers/net/usb/rtl8150.c
index ddff6f19ff98..92add3daadbb 100644
--- a/drivers/net/usb/rtl8150.c
+++ b/drivers/net/usb/rtl8150.c
@@ -664,7 +664,6 @@ static void rtl8150_set_multicast(struct net_device *netdev)
rtl8150_t *dev = netdev_priv(netdev);
u16 rx_creg = 0x9e;
- netif_stop_queue(netdev);
if (netdev->flags & IFF_PROMISC) {
rx_creg |= 0x0001;
dev_info(&netdev->dev, "%s: promiscuous mode\n", netdev->name);
@@ -678,7 +677,6 @@ static void rtl8150_set_multicast(struct net_device *netdev)
rx_creg &= 0x00fc;
}
async_set_registers(dev, RCR, sizeof(rx_creg), rx_creg);
- netif_wake_queue(netdev);
}
static netdev_tx_t rtl8150_start_xmit(struct sk_buff *skb,
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-20 18:18 ` [PATCH net v2] " I Viswanath
@ 2025-09-23 1:07 ` Jakub Kicinski
2025-09-23 7:47 ` Michal Pecio
2025-09-24 7:47 ` Michal Pecio
1 sibling, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2025-09-23 1:07 UTC (permalink / raw)
To: I Viswanath
Cc: andrew, andrew+netdev, davem, david.hunter.linux, edumazet,
linux-kernel-mentees, linux-kernel, linux-usb, netdev, pabeni,
petkan, skhan, syzbot+78cae3f37c62ad092caa
On Sat, 20 Sep 2025 23:48:52 +0530 I Viswanath wrote:
> rtl8150_set_multicast is rtl8150's implementation of ndo_set_rx_mode and
> should not be calling netif_stop_queue and notif_start_queue as these handle
> TX queue synchronization.
>
> The net core function dev_set_rx_mode handles the synchronization
> for rtl8150_set_multicast making it safe to remove these locks.
Last time someone tried to add device ID to this driver was 20 years
ago. Please post a patch to delete this driver completely. If someone
speaks up we'll revert the removal and ask them to test the fix.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-23 1:07 ` Jakub Kicinski
@ 2025-09-23 7:47 ` Michal Pecio
2025-09-23 14:28 ` Jakub Kicinski
0 siblings, 1 reply; 15+ messages in thread
From: Michal Pecio @ 2025-09-23 7:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: I Viswanath, andrew, andrew+netdev, davem, david.hunter.linux,
edumazet, linux-kernel-mentees, linux-kernel, linux-usb, netdev,
pabeni, petkan, skhan, syzbot+78cae3f37c62ad092caa
On Mon, 22 Sep 2025 18:07:42 -0700, Jakub Kicinski wrote:
> On Sat, 20 Sep 2025 23:48:52 +0530 I Viswanath wrote:
> > rtl8150_set_multicast is rtl8150's implementation of ndo_set_rx_mode and
> > should not be calling netif_stop_queue and notif_start_queue as these handle
> > TX queue synchronization.
> >
> > The net core function dev_set_rx_mode handles the synchronization
> > for rtl8150_set_multicast making it safe to remove these locks.
>
> Last time someone tried to add device ID to this driver was 20 years
> ago. Please post a patch to delete this driver completely. If someone
> speaks up we'll revert the removal and ask them to test the fix.
These were quite common, I still have one.
What sort of testing do you need?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-23 7:47 ` Michal Pecio
@ 2025-09-23 14:28 ` Jakub Kicinski
2025-09-23 23:20 ` Michal Pecio
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2025-09-23 14:28 UTC (permalink / raw)
To: Michal Pecio
Cc: I Viswanath, andrew, andrew+netdev, davem, david.hunter.linux,
edumazet, linux-kernel-mentees, linux-kernel, linux-usb, netdev,
pabeni, petkan, skhan, syzbot+78cae3f37c62ad092caa
On Tue, 23 Sep 2025 09:47:11 +0200 Michal Pecio wrote:
> On Mon, 22 Sep 2025 18:07:42 -0700, Jakub Kicinski wrote:
> > On Sat, 20 Sep 2025 23:48:52 +0530 I Viswanath wrote:
> > > rtl8150_set_multicast is rtl8150's implementation of ndo_set_rx_mode and
> > > should not be calling netif_stop_queue and notif_start_queue as these handle
> > > TX queue synchronization.
> > >
> > > The net core function dev_set_rx_mode handles the synchronization
> > > for rtl8150_set_multicast making it safe to remove these locks.
> >
> > Last time someone tried to add device ID to this driver was 20 years
> > ago. Please post a patch to delete this driver completely. If someone
> > speaks up we'll revert the removal and ask them to test the fix.
>
> These were quite common, I still have one.
>
> What sort of testing do you need?
Excellent, could you check if there is any adverse effect of repeatedly
writing the RCR register under heavy Tx traffic (without stopping/waking
the Tx queue)? The driver seems to pause Tx when RCR is written, seems
like an odd thing to do without a reason, but driver authors do the
darndest things.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-23 14:28 ` Jakub Kicinski
@ 2025-09-23 23:20 ` Michal Pecio
2025-09-23 23:37 ` Jakub Kicinski
0 siblings, 1 reply; 15+ messages in thread
From: Michal Pecio @ 2025-09-23 23:20 UTC (permalink / raw)
To: Jakub Kicinski
Cc: I Viswanath, andrew, andrew+netdev, davem, david.hunter.linux,
edumazet, linux-kernel-mentees, linux-kernel, linux-usb, netdev,
pabeni, petkan, skhan, syzbot+78cae3f37c62ad092caa
On Tue, 23 Sep 2025 07:28:09 -0700, Jakub Kicinski wrote:
> Excellent, could you check if there is any adverse effect of
> repeatedly writing the RCR register under heavy Tx traffic (without
> stopping/waking the Tx queue)? The driver seems to pause Tx when RCR
> is written, seems like an odd thing to do without a reason, but
> driver authors do the darndest things.
I don't know what's the point of this, because it doesn't prevent the
async "set RCR" control request racing with an async TX URB submitted
before the queue was stopped or after it was restarted.
Such races could be prevented by net core not calling this while TX
is outstanding and not issuing TX until the control request completes,
but it doesn't look like any of that is the case?
I sucessfully reproduced the double submit bug as follows:
ifconfig eth1 10.9.9.9
arp -s 10.9.8.7 87:87:87:87:87:87 # doesn't actually exist
ping -f 10.9.8.7
while :; do ifconfig eth1 allmulti; ifconfig eth1 -allmulti; done
For some reason I had to use two instances of 'ping -f', not sure why.
Then the double submission warning appears in a few seconds and also
some refcount issues, probably on skbs (dev->tx_skb gets mixed up).
With the patch, it all goes away and doesn't show up even after a few
minutes. I also tried with two TCP streams to a real machine and only
observed a 20KB/s decrease in throughput while the ifconfig allmulti
loop is running, probably due to USB bandwidth. So it looks OK.
But one annoying problem is that those control requests are posted
asynchronously and under my test they seem to accumulate faster than
they drain. I get brief or not so brief lockups when USB core cleans
this up on sudden disconnection. And rtl8150_disconnect() should kill
them, but it doesn't.
Not sure how this is supposed to work in a well-behaved net driver? Is
this callback expected to return without sleeping and have an immediate
effect? I can't see this working with USB.
Regards,
Michal
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-23 23:20 ` Michal Pecio
@ 2025-09-23 23:37 ` Jakub Kicinski
2025-09-25 5:59 ` Deepak Sharma
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2025-09-23 23:37 UTC (permalink / raw)
To: Michal Pecio
Cc: I Viswanath, andrew, andrew+netdev, davem, david.hunter.linux,
edumazet, linux-kernel-mentees, linux-kernel, linux-usb, netdev,
pabeni, petkan, skhan, syzbot+78cae3f37c62ad092caa
On Wed, 24 Sep 2025 01:20:39 +0200 Michal Pecio wrote:
> With the patch, it all goes away and doesn't show up even after a few
> minutes. I also tried with two TCP streams to a real machine and only
> observed a 20KB/s decrease in throughput while the ifconfig allmulti
> loop is running, probably due to USB bandwidth. So it looks OK.
Excellent, could you send an official Tested-by tag?
> But one annoying problem is that those control requests are posted
> asynchronously and under my test they seem to accumulate faster than
> they drain. I get brief or not so brief lockups when USB core cleans
> this up on sudden disconnection. And rtl8150_disconnect() should kill
> them, but it doesn't.
>
> Not sure how this is supposed to work in a well-behaved net driver? Is
> this callback expected to return without sleeping and have an immediate
> effect? I can't see this working with USB.
The set_rx_mode callback is annoying because it can't sleep.
Leading to no end of issues in the drivers.
The best way to deal with this IMHO is to do the confirm from a work
item. Don't try to kick off the config asynchronously, instead schedule
a work which takes a snapshot of the config, and then synchronously
configs the device. The work give us the dirty tracking for free
(if a config change is made while work is running it will get
re-scheduled). And obviously if there's only one work we can't build
up a queue, new request before work had a chance to run will do nothing.
We have added a todo to do something along these lines in
the core 3+ years ago but nobody had the time to tackle this.
The work and taking a snapshot of the rx config are not driver-specific,
so it could all be done in the core and then call a (new) driver NDO.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-23 23:37 ` Jakub Kicinski
@ 2025-09-25 5:59 ` Deepak Sharma
0 siblings, 0 replies; 15+ messages in thread
From: Deepak Sharma @ 2025-09-25 5:59 UTC (permalink / raw)
To: kuba
Cc: andrew+netdev, andrew, davem, david.hunter.linux, edumazet,
linux-kernel-mentees, linux-kernel, linux-usb, michal.pecio,
netdev, pabeni, petkan, skhan, syzbot+78cae3f37c62ad092caa,
viswanathiyyappan
Hi Jakub,
I found the topic very interesting. So I looked into the existing drivers and almost all of them seem to be using `usb_submit_urb`s except `lan78xx` and `smsc75xx`, which have a work item to do the configuration. But I see no synchronization between their work and the data that is used to do the configuration (which can involve multiple requests to the device). Is there any synchronization that I am missing here?
Thanks,
Deepak Sharma
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-20 18:18 ` [PATCH net v2] " I Viswanath
2025-09-23 1:07 ` Jakub Kicinski
@ 2025-09-24 7:47 ` Michal Pecio
2025-09-24 8:02 ` viswanath
1 sibling, 1 reply; 15+ messages in thread
From: Michal Pecio @ 2025-09-24 7:47 UTC (permalink / raw)
To: I Viswanath
Cc: andrew, andrew+netdev, davem, david.hunter.linux, edumazet, kuba,
linux-kernel-mentees, linux-kernel, linux-usb, netdev, pabeni,
petkan, skhan, syzbot+78cae3f37c62ad092caa
On Sat, 20 Sep 2025 23:48:52 +0530, I Viswanath wrote:
> syzbot reported WARNING in rtl8150_start_xmit/usb_submit_urb.
> This is the sequence of events that leads to the Warning:
>
> CPU0 (in rtl8150_start_xmit) CPU1 (in rtl8150_start_xmit) CPU2 (in rtl8150_set_multicast)
> netif_stop_queue();
> netif_stop_queue();
> usb_submit_urb();
> netif_wake_queue(); <-- Wakes up TX queue before it's ready
> netif_stop_queue();
> usb_submit_urb(); <-- Warning
> freeing urb
It's not freeing which matters but URB completion in USB subsystem.
I think this description is needlessly complex, the essence is:
rtl8150_start_xmit() {
netif_stop_queue();
usb_submit_urb(dev->tx_urb);
}
rtl8150_set_multicast() {
netif_stop_queue();
netif_wake_queue(); <-- wakes up TX queue before URB is done
}
rtl8150_start_xmit() {
netif_stop_queue();
usb_submit_urb(dev->tx_urb); <-- double submission
}
> rtl8150_set_multicast is rtl8150's implementation of ndo_set_rx_mode and
> should not be calling netif_stop_queue and notif_start_queue as these handle
> TX queue synchronization.
>
> The net core function dev_set_rx_mode handles the synchronization
> for rtl8150_set_multicast making it safe to remove these locks.
>
> Reported-and-tested-by: syzbot+78cae3f37c62ad092caa@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=78cae3f37c62ad092caa
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
Tested-by: Michal Pecio <michal.pecio@gmail.com>
This is instantly triggered on HW simply by running:
ncat remote-host port < /dev/zero &
ifconfig ethX allmulti
and results in:
[ 1253.338536] URB ffff88810ad01240 submitted while active
[ 1253.338616] WARNING: CPU: 2 PID: 2785 at drivers/usb/core/urb.c:379 usb_submit_urb+0x5f1/0x640 [usbcore]
[ 1253.338686] Modules linked in: usbhid uvcvideo rtl8150 xhci_pci xhci_hcd usbcore ext2 uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common snd_pcsp usb_common serio_raw ppdev dm_mod nfnetlink [last unloaded: usbcore]
[ 1253.338724] CPU: 2 UID: 0 PID: 2785 Comm: ifconfig Tainted: G W 6.17.0-rc4 #1 PREEMPT
[ 1253.338734] Tainted: [W]=WARN
[ 1253.338737] Hardware name: HP HP EliteDesk 705 G3 MT/8265, BIOS P06 Ver. 02.45 07/16/2024
[ 1253.338740] RIP: 0010:usb_submit_urb+0x5f1/0x640 [usbcore]
[ 1253.338791] Code: 56 23 a0 e8 b1 17 3f e1 eb da b8 fe ff ff ff e9 fc fd ff ff 48 89 fe 48 c7 c7 88 20 25 a0 c6 05 c0 30 e1 ff 01 e8 cf 3a f0 e0 <0f> 0b eb a0 b8 f8 ff ff ff e9 d8 fd ff ff b8 ea ff ff ff c3 66 2e
[ 1253.338798] RSP: 0018:ffffc90000154e28 EFLAGS: 00010282
[ 1253.338804] RAX: 000000000000002b RBX: ffff88810ad01240 RCX: 0000000000000027
[ 1253.338808] RDX: ffff888226f17e08 RSI: 0000000000000001 RDI: ffff888226f17e00
[ 1253.338812] RBP: ffff88810be0ff00 R08: 00000000fff7ffff R09: ffffffff85a4d628
[ 1253.338816] R10: ffffffff82e4d680 R11: 0000000000000002 R12: ffff888125e19e00
[ 1253.338820] R13: 00000000000005ea R14: ffff88810be0ff00 R15: ffff8881326b4000
[ 1253.338824] FS: 00007fbb30220740(0000) GS:ffff88829ff7d000(0000) knlGS:0000000000000000
[ 1253.338830] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1253.338834] CR2: 00007fbb301e5e38 CR3: 000000010be04000 CR4: 00000000001506f0
[ 1253.338838] Call Trace:
[ 1253.338846] <IRQ>
[ 1253.338855] rtl8150_start_xmit+0xa1/0x100 [rtl8150]
[ 1253.338865] dev_hard_start_xmit+0x59/0x1c0
[ 1253.338875] sch_direct_xmit+0x117/0x280
[ 1253.338883] __qdisc_run+0x136/0x590
[ 1253.338890] net_tx_action+0x1bb/0x2c0
[ 1253.338898] handle_softirqs+0xcd/0x270
[ 1253.338907] do_softirq+0x3b/0x50
[ 1253.338914] </IRQ>
[ 1253.338916] <TASK>
[ 1253.338919] __local_bh_enable_ip+0x54/0x60
[ 1253.338927] __dev_change_flags+0x9a/0x1e0
[ 1253.338933] ? filemap_map_pages+0x3f3/0x620
[ 1253.338941] netif_change_flags+0x22/0x60
[ 1253.338946] dev_change_flags+0x3d/0x70
[ 1253.338951] devinet_ioctl+0x388/0x710
[ 1253.338959] inet_ioctl+0x145/0x190
[ 1253.338966] ? netdev_name_node_lookup_rcu+0x59/0x70
[ 1253.338971] ? netdev_name_node_lookup_rcu+0x59/0x70
[ 1253.338976] ? dev_get_by_name_rcu+0xa/0x20
[ 1253.338982] ? dev_ioctl+0x2fc/0x4b0
[ 1253.338989] sock_do_ioctl+0x2f/0xd0
[ 1253.338996] __x64_sys_ioctl+0x76/0xc0
[ 1253.339005] do_syscall_64+0x42/0x180
[ 1253.339013] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1253.339019] RIP: 0033:0x7fbb3013fced
[ 1253.339024] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 1253.339028] RSP: 002b:00007ffdc876a850 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1253.339035] RAX: ffffffffffffffda RBX: 00007ffdc876a960 RCX: 00007fbb3013fced
[ 1253.339039] RDX: 00007ffdc876a8b0 RSI: 0000000000008914 RDI: 0000000000000004
[ 1253.339042] RBP: 00007ffdc876a8a0 R08: 000000000000000a R09: 000000000000000b
[ 1253.339045] R10: fffffffffffff8cb R11: 0000000000000246 R12: 00007ffdc876a8b0
[ 1253.339048] R13: 0000000000000004 R14: 0000000000000200 R15: 0000000000000000
[ 1253.339054] </TASK>
[ 1253.339056] ---[ end trace 0000000000000000 ]---
[ 1253.339062] net eth1: failed tx_urb -16
[ 1253.339068] net eth1: failed tx_urb -16
[ 1253.339072] net eth1: failed tx_urb -16
[ 1253.339075] net eth1: failed tx_urb -16
[ 1253.339078] net eth1: failed tx_urb -16
[ 1253.339081] net eth1: failed tx_urb -16
[ 1253.339084] net eth1: failed tx_urb -16
[ 1253.339088] net eth1: failed tx_urb -16
[ 1253.339091] net eth1: failed tx_urb -16
[ 1253.339094] net eth1: failed tx_urb -16
[ 1253.339097] net eth1: failed tx_urb -16
[ 1253.339204] net eth1: failed tx_urb -16
[ 1253.339209] net eth1: failed tx_urb -16
[ 1253.339212] net eth1: failed tx_urb -16
[ 1253.339215] net eth1: failed tx_urb -16
[ 1253.339218] net eth1: failed tx_urb -16
[ 1253.339221] net eth1: failed tx_urb -16
[ 1253.339224] net eth1: failed tx_urb -16
[ 1253.339226] net eth1: failed tx_urb -16
[ 1253.339229] net eth1: failed tx_urb -16
[ 1253.339232] net eth1: failed tx_urb -16
[ 1253.339235] net eth1: failed tx_urb -16
[ 1253.339237] net eth1: failed tx_urb -16
[ 1253.339240] net eth1: failed tx_urb -16
[ 1253.339243] net eth1: failed tx_urb -16
[ 1253.339246] net eth1: failed tx_urb -16
[ 1253.339249] net eth1: failed tx_urb -16
[ 1253.339252] net eth1: failed tx_urb -16
[ 1253.339255] net eth1: failed tx_urb -16
[ 1253.339258] net eth1: failed tx_urb -16
[ 1253.339261] net eth1: failed tx_urb -16
[ 1253.339263] net eth1: failed tx_urb -16
[ 1253.339266] net eth1: failed tx_urb -16
[ 1253.339268] net eth1: failed tx_urb -16
[ 1253.339348] rtl8150 1-1:1.0 eth1: entered allmulticast mode
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-24 7:47 ` Michal Pecio
@ 2025-09-24 8:02 ` viswanath
2025-09-24 9:36 ` Michal Pecio
0 siblings, 1 reply; 15+ messages in thread
From: viswanath @ 2025-09-24 8:02 UTC (permalink / raw)
To: Michal Pecio
Cc: andrew, andrew+netdev, davem, david.hunter.linux, edumazet, kuba,
linux-kernel-mentees, linux-kernel, linux-usb, netdev, pabeni,
petkan, skhan, syzbot+78cae3f37c62ad092caa
On Wed, 24 Sept 2025 at 13:17, Michal Pecio <michal.pecio@gmail.com> wrote:
>
> It's not freeing which matters but URB completion in the USB subsystem.
Does URB completion include both successful and failed completions? I
decided to go
with "free urb" because I wasn't sure of that.
> I think this description is needlessly complex, the essence is:
>
> rtl8150_start_xmit() {
> netif_stop_queue();
> usb_submit_urb(dev->tx_urb);
> }
>
> rtl8150_set_multicast() {
> netif_stop_queue();
> netif_wake_queue(); <-- wakes up TX queue before URB is done
> }
>
> rtl8150_start_xmit() {
> netif_stop_queue();
> usb_submit_urb(dev->tx_urb); <-- double submission
> }
I wasn't sure how to describe the flow of execution in a multi threaded program.
I will resubmit a v3 with this version of the execution flow
> > Reported-and-tested-by: syzbot+78cae3f37c62ad092caa@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=78cae3f37c62ad092caa
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com>
>
> Tested-by: Michal Pecio <michal.pecio@gmail.com>
Thanks,
Viswanath
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-24 8:02 ` viswanath
@ 2025-09-24 9:36 ` Michal Pecio
2025-09-24 10:25 ` viswanath
0 siblings, 1 reply; 15+ messages in thread
From: Michal Pecio @ 2025-09-24 9:36 UTC (permalink / raw)
To: viswanath
Cc: andrew, andrew+netdev, davem, david.hunter.linux, edumazet, kuba,
linux-kernel-mentees, linux-kernel, linux-usb, netdev, pabeni,
petkan, skhan, syzbot+78cae3f37c62ad092caa
On Wed, 24 Sep 2025 13:32:52 +0530, viswanath wrote:
> On Wed, 24 Sept 2025 at 13:17, Michal Pecio <michal.pecio@gmail.com> wrote:
> >
> > It's not freeing which matters but URB completion in the USB subsystem.
>
> Does URB completion include both successful and failed completions? I
> decided to go with "free urb" because I wasn't sure of that.
I think yes, usually in USB-speak "completion" is when the URB is
finished for any reason, including error or unlink/cancellation.
"Free" could suggest usb_free_urb().
But I see your point. Maybe "finish execution" is less ambiguous?
> I wasn't sure how to describe the flow of execution in a multi threaded program.
> I will resubmit a v3 with this version of the execution flow
I think it's an irrelevant detail which CPU executed which function.
It could all happen sequentially on a single core and it's still the
same bug.
In fact, I just reproduced it with all CPUs offlined except one.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net v2] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast
2025-09-24 9:36 ` Michal Pecio
@ 2025-09-24 10:25 ` viswanath
0 siblings, 0 replies; 15+ messages in thread
From: viswanath @ 2025-09-24 10:25 UTC (permalink / raw)
To: Michal Pecio
Cc: andrew, andrew+netdev, davem, david.hunter.linux, edumazet, kuba,
linux-kernel-mentees, linux-kernel, linux-usb, netdev, pabeni,
petkan, skhan, syzbot+78cae3f37c62ad092caa
On Wed, 24 Sept 2025 at 15:06, Michal Pecio <michal.pecio@gmail.com> wrote:
>
> I think yes, usually in USB-speak "completion" is when the URB is
> finished for any reason, including error or unlink/cancellation.
> "Free" could suggest usb_free_urb().
>
> But I see your point. Maybe "finish execution" is less ambiguous?
>
I will use completion if it's the standard terminology
> I think it's an irrelevant detail which CPU executed which function.
> It could all happen sequentially on a single core and it's still the
> same bug.
>
> In fact, I just reproduced it with all CPUs offlined except one.
My bad, I see it now. I keep forgetting the actual urb execution
is asynchronous
Thanks
Viswanath
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-09-25 6:02 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-20 4:50 [PATCH] net: usb: Remove disruptive netif_wake_queue in rtl8150_set_multicast I Viswanath
2025-09-20 15:30 ` Andrew Lunn
2025-09-20 16:52 ` viswanath
2025-09-20 17:28 ` Andrew Lunn
2025-09-20 18:18 ` [PATCH net v2] " I Viswanath
2025-09-23 1:07 ` Jakub Kicinski
2025-09-23 7:47 ` Michal Pecio
2025-09-23 14:28 ` Jakub Kicinski
2025-09-23 23:20 ` Michal Pecio
2025-09-23 23:37 ` Jakub Kicinski
2025-09-25 5:59 ` Deepak Sharma
2025-09-24 7:47 ` Michal Pecio
2025-09-24 8:02 ` viswanath
2025-09-24 9:36 ` Michal Pecio
2025-09-24 10:25 ` viswanath
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).