* [PATCH v2] xen/netfront: Fix TX response spurious interrupts
@ 2025-07-15 16:11 Anthoine Bourgeois
2025-07-17 14:29 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Anthoine Bourgeois @ 2025-07-15 16:11 UTC (permalink / raw)
To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Wei Liu,
Paul Durrant, xen-devel, netdev
Cc: Anthoine Bourgeois, Elliott Mitchell
We found at Vates that there are lot of spurious interrupts when
benchmarking the PV drivers of Xen. This issue appeared with a patch
that addresses security issue XSA-391 (see Fixes below). On an iperf
benchmark, spurious interrupts can represent up to 50% of the
interrupts.
Spurious interrupts are interrupts that are rised for nothing, there is
no work to do. This appends because the function that handles the
interrupts ("xennet_tx_buf_gc") is also called at the end of the request
path to garbage collect the responses received during the transmission
load.
The request path is doing the work that the interrupt handler should
have done otherwise. This is particurary true when there is more than
one vcpu and get worse linearly with the number of vcpu/queue.
Moreover, this problem is amplifyed by the penalty imposed by a spurious
interrupt. When an interrupt is found spurious the interrupt chip will
delay the EOI to slowdown the backend. This delay will allow more
responses to be handled by the request path and then there will be more
chance the next interrupt will not find any work to do, creating a new
spurious interrupt.
This causes performance issue. The solution here is to remove the calls
from the request path and let the interrupt handler do the processing of
the responses. This approch removes spurious interrupts (<0.05%) and
also has the benefit of freeing up cycles in the request path, allowing
it to process more work, which improves performance compared to masking
the spurious interrupt one way or another.
Some vif throughput performance figures from a 8 vCPUs, 4GB of RAM HVM
guest(s):
Without this patch on the :
vm -> dom0: 4.5Gb/s
vm -> vm: 7.0Gb/s
Without XSA-391 patch (revert of b27d47950e48):
vm -> dom0: 8.3Gb/s
vm -> vm: 8.7Gb/s
With XSA-391 and this patch:
vm -> dom0: 11.5Gb/s
vm -> vm: 12.6Gb/s
v2:
- add tags
- resend with the maintainers in the recipients list
Fixes: b27d47950e48 ("xen/netfront: harden netfront against event channel storms")
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
---
drivers/net/xen-netfront.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 9bac50963477..a11a0e949400 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -638,8 +638,6 @@ static int xennet_xdp_xmit_one(struct net_device *dev,
tx_stats->packets++;
u64_stats_update_end(&tx_stats->syncp);
- xennet_tx_buf_gc(queue);
-
return 0;
}
@@ -849,9 +847,6 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
tx_stats->packets++;
u64_stats_update_end(&tx_stats->syncp);
- /* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
- xennet_tx_buf_gc(queue);
-
if (!netfront_tx_slot_available(queue))
netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
--
2.49.1
Anthoine Bourgeois | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xen/netfront: Fix TX response spurious interrupts
2025-07-15 16:11 [PATCH v2] xen/netfront: Fix TX response spurious interrupts Anthoine Bourgeois
@ 2025-07-17 14:29 ` Jakub Kicinski
2025-07-18 7:19 ` Jürgen Groß
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-07-17 14:29 UTC (permalink / raw)
To: Anthoine Bourgeois
Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Wei Liu,
Paul Durrant, xen-devel, netdev, Elliott Mitchell
On Tue, 15 Jul 2025 16:11:29 +0000 Anthoine Bourgeois wrote:
> Fixes: b27d47950e48 ("xen/netfront: harden netfront against event channel storms")
Not entirely sure who you expect to apply this patch, but if networking
then I wouldn't classify this is a fix. The "regression" happened 4
years ago. And this patch doesn't seem to be tuning the logic added by
the cited commit. I think this is an optimization, -next material, and
therefore there should be no Fixes tag here. You can refer to the commit
without the tag.
> @@ -849,9 +847,6 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
> tx_stats->packets++;
> u64_stats_update_end(&tx_stats->syncp);
>
> - /* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
> - xennet_tx_buf_gc(queue);
> -
> if (!netfront_tx_slot_available(queue))
> netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
I thought normally reaping completions from the Tx path is done
to prevent the queue from filling up, when the device-generated
completions are slow or the queue is short. I say "normally" but
this is relatively a uncommon thing to do in networking.
Maybe it's my lack of Xen knowledge but it would be good to add to
the commit message why these calls where here in the first place.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xen/netfront: Fix TX response spurious interrupts
2025-07-17 14:29 ` Jakub Kicinski
@ 2025-07-18 7:19 ` Jürgen Groß
2025-07-18 17:12 ` Jakub Kicinski
2025-07-18 8:10 ` Anthoine Bourgeois
2025-07-23 22:23 ` Elliott Mitchell
2 siblings, 1 reply; 6+ messages in thread
From: Jürgen Groß @ 2025-07-18 7:19 UTC (permalink / raw)
To: Jakub Kicinski, Anthoine Bourgeois
Cc: Stefano Stabellini, Oleksandr Tyshchenko, Wei Liu, Paul Durrant,
xen-devel, netdev, Elliott Mitchell
[-- Attachment #1.1.1: Type: text/plain, Size: 1781 bytes --]
On 17.07.25 16:29, Jakub Kicinski wrote:
> On Tue, 15 Jul 2025 16:11:29 +0000 Anthoine Bourgeois wrote:
>> Fixes: b27d47950e48 ("xen/netfront: harden netfront against event channel storms")
>
> Not entirely sure who you expect to apply this patch, but if networking
> then I wouldn't classify this is a fix. The "regression" happened 4
> years ago. And this patch doesn't seem to be tuning the logic added by
> the cited commit. I think this is an optimization, -next material, and
> therefore there should be no Fixes tag here. You can refer to the commit
> without the tag.
I think in the end it is a fix of the initial xen-netfront.c contribution
(commit 0d160211965b).
I'm fine to change the Fixes: tag and apply the patch via the Xen tree.
>
>> @@ -849,9 +847,6 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
>> tx_stats->packets++;
>> u64_stats_update_end(&tx_stats->syncp);
>>
>> - /* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
>> - xennet_tx_buf_gc(queue);
>> -
>> if (!netfront_tx_slot_available(queue))
>> netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
>
> I thought normally reaping completions from the Tx path is done
> to prevent the queue from filling up, when the device-generated
> completions are slow or the queue is short. I say "normally" but
> this is relatively a uncommon thing to do in networking.
> Maybe it's my lack of Xen knowledge but it would be good to add to
> the commit message why these calls where here in the first place.
I guess the reason for this addition is unknown (singular, as the XDP related
one was probably just a copy-and-paste), as it has been there since the first
version of the driver.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xen/netfront: Fix TX response spurious interrupts
2025-07-17 14:29 ` Jakub Kicinski
2025-07-18 7:19 ` Jürgen Groß
@ 2025-07-18 8:10 ` Anthoine Bourgeois
2025-07-23 22:23 ` Elliott Mitchell
2 siblings, 0 replies; 6+ messages in thread
From: Anthoine Bourgeois @ 2025-07-18 8:10 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Wei Liu,
Paul Durrant, xen-devel, netdev, Elliott Mitchell
On Thu, Jul 17, 2025 at 07:29:51AM -0700, Jakub Kicinski wrote:
>On Tue, 15 Jul 2025 16:11:29 +0000 Anthoine Bourgeois wrote:
>> Fixes: b27d47950e48 ("xen/netfront: harden netfront against event channel storms")
>
>Not entirely sure who you expect to apply this patch, but if networking
>then I wouldn't classify this is a fix. The "regression" happened 4
>years ago. And this patch doesn't seem to be tuning the logic added by
>the cited commit. I think this is an optimization, -next material, and
>therefore there should be no Fixes tag here. You can refer to the commit
>without the tag.
Ok, you're right the cited commit exacerbates a problem that was already
there before.
I will change this in v3.
>> @@ -849,9 +847,6 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
>> tx_stats->packets++;
>> u64_stats_update_end(&tx_stats->syncp);
>>
>> - /* Note: It is not safe to access skb after xennet_tx_buf_gc()! */
>> - xennet_tx_buf_gc(queue);
>> -
>> if (!netfront_tx_slot_available(queue))
>> netif_tx_stop_queue(netdev_get_tx_queue(dev, queue->id));
>
>I thought normally reaping completions from the Tx path is done
>to prevent the queue from filling up, when the device-generated
>completions are slow or the queue is short. I say "normally" but
>this is relatively a uncommon thing to do in networking.
>Maybe it's my lack of Xen knowledge but it would be good to add to
>the commit message why these calls where here in the first place.
Good to know how it should "normally" works, I'm not an expert.
The patch also has the advantage of standardizing the network driver
with other Xen PV drivers that do not have this reponse collection
outside of the interrupt handler.
As this part of the code is here since the driver was upsteamed and the
author no longer works on xen, I will do my best to add my guess on why
this code was there.
Regards,
Anthoine
Anthoine Bourgeois | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xen/netfront: Fix TX response spurious interrupts
2025-07-18 7:19 ` Jürgen Groß
@ 2025-07-18 17:12 ` Jakub Kicinski
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-07-18 17:12 UTC (permalink / raw)
To: Jürgen Groß
Cc: Anthoine Bourgeois, Stefano Stabellini, Oleksandr Tyshchenko,
Wei Liu, Paul Durrant, xen-devel, netdev, Elliott Mitchell
On Fri, 18 Jul 2025 09:19:17 +0200 Jürgen Groß wrote:
> On 17.07.25 16:29, Jakub Kicinski wrote:
> > On Tue, 15 Jul 2025 16:11:29 +0000 Anthoine Bourgeois wrote:
> >> Fixes: b27d47950e48 ("xen/netfront: harden netfront against event channel storms")
> >
> > Not entirely sure who you expect to apply this patch, but if networking
> > then I wouldn't classify this is a fix. The "regression" happened 4
> > years ago. And this patch doesn't seem to be tuning the logic added by
> > the cited commit. I think this is an optimization, -next material, and
> > therefore there should be no Fixes tag here. You can refer to the commit
> > without the tag.
>
> I think in the end it is a fix of the initial xen-netfront.c contribution
> (commit 0d160211965b).
>
> I'm fine to change the Fixes: tag and apply the patch via the Xen tree.
SGTM, FWIW. But I'd like to reiterate my humble recommendation to treat
this as an optimization, and not add the Fixes tag.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xen/netfront: Fix TX response spurious interrupts
2025-07-17 14:29 ` Jakub Kicinski
2025-07-18 7:19 ` Jürgen Groß
2025-07-18 8:10 ` Anthoine Bourgeois
@ 2025-07-23 22:23 ` Elliott Mitchell
2 siblings, 0 replies; 6+ messages in thread
From: Elliott Mitchell @ 2025-07-23 22:23 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Anthoine Bourgeois, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Wei Liu, Paul Durrant, xen-devel, netdev
On Thu, Jul 17, 2025 at 07:29:51AM -0700, Jakub Kicinski wrote:
> On Tue, 15 Jul 2025 16:11:29 +0000 Anthoine Bourgeois wrote:
> > Fixes: b27d47950e48 ("xen/netfront: harden netfront against event channel storms")
>
> Not entirely sure who you expect to apply this patch, but if networking
> then I wouldn't classify this is a fix. The "regression" happened 4
> years ago. And this patch doesn't seem to be tuning the logic added by
> the cited commit. I think this is an optimization, -next material, and
> therefore there should be no Fixes tag here. You can refer to the commit
> without the tag.
Sometimes the line between bugfix and optimization can be unclear. To
me this qualifies as a bugfix since it results in non-zero values in
/sys/devices/vif-*/xenbus/spurious_events. Spurious interrupts should
never occur, as such I would classify this as bug.
I do though think "Fixes: 0d160211965b" is more appropriate since that is
where the bug originates. Commit b27d47950e48 merely caused the bug to
result in performance loss and trigger bug/attack detection flags.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) /
\_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-07-23 22:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 16:11 [PATCH v2] xen/netfront: Fix TX response spurious interrupts Anthoine Bourgeois
2025-07-17 14:29 ` Jakub Kicinski
2025-07-18 7:19 ` Jürgen Groß
2025-07-18 17:12 ` Jakub Kicinski
2025-07-18 8:10 ` Anthoine Bourgeois
2025-07-23 22:23 ` Elliott Mitchell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).