* Performance regression on lan966x when extracting frames
@ 2023-05-15 9:12 Horatiu Vultur
2023-05-15 12:30 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Horatiu Vultur @ 2023-05-15 9:12 UTC (permalink / raw)
To: edumazet, netdev
Hi,
I have noticed that on the HEAD of net-next[0] there is a performance drop
for lan966x when extracting frames towards the CPU. Lan966x has a Cortex
A7 CPU. All the tests are done using iperf3 command like this:
'iperf3 -c 10.97.10.1 -R'
So on net-next, I can see the following:
[ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender
And it gets around ~97000 interrupts.
While going back to the commit[1], I can see the following:
[ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender
And it gets around ~1000 interrupts.
I have done a little bit of searching and I have noticed that this
commit [2] introduce the regression.
I have tried to revert this commit on net-next and tried again, then I
can see much better results but not exactly the same:
[ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender
And it gets around ~700 interrupts.
So my question is, was I supposed to change something in lan966x driver?
or is there a bug in lan966x driver that pop up because of this change?
Any advice will be great. Thanks!
[0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()")
[1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'")
[2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ")
--
/Horatiu
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Performance regression on lan966x when extracting frames 2023-05-15 9:12 Performance regression on lan966x when extracting frames Horatiu Vultur @ 2023-05-15 12:30 ` Eric Dumazet 2023-05-16 7:45 ` Horatiu Vultur 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2023-05-15 12:30 UTC (permalink / raw) To: Horatiu Vultur; +Cc: netdev On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur <horatiu.vultur@microchip.com> wrote: > > Hi, > > I have noticed that on the HEAD of net-next[0] there is a performance drop > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > A7 CPU. All the tests are done using iperf3 command like this: > 'iperf3 -c 10.97.10.1 -R' > > So on net-next, I can see the following: > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > And it gets around ~97000 interrupts. > > While going back to the commit[1], I can see the following: > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > And it gets around ~1000 interrupts. > > I have done a little bit of searching and I have noticed that this > commit [2] introduce the regression. > I have tried to revert this commit on net-next and tried again, then I > can see much better results but not exactly the same: > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > And it gets around ~700 interrupts. > > So my question is, was I supposed to change something in lan966x driver? > or is there a bug in lan966x driver that pop up because of this change? > > Any advice will be great. Thanks! > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > Hmmm... thanks for the report. This seems related to softirq (k)scheduling. Have you tried to apply this recent commit ? Commit-ID: d15121be7485655129101f3960ae6add40204463 Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 Author: Paolo Abeni <pabeni@redhat.com> AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 Committer: Thomas Gleixner <tglx@linutronix.de> CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 Revert "softirq: Let ksoftirqd do its job" Alternative would be to try this : diff --git a/net/core/dev.c b/net/core/dev.c index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6713,8 +6713,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) list_splice(&list, &sd->poll_list); if (!list_empty(&sd->poll_list)) __raise_softirq_irqoff(NET_RX_SOFTIRQ); - else - sd->in_net_rx_action = false; + + sd->in_net_rx_action = false; net_rps_action_and_irq_enable(sd); end:; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-15 12:30 ` Eric Dumazet @ 2023-05-16 7:45 ` Horatiu Vultur 2023-05-16 8:04 ` Eric Dumazet 0 siblings, 1 reply; 10+ messages in thread From: Horatiu Vultur @ 2023-05-16 7:45 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev The 05/15/2023 14:30, Eric Dumazet wrote: > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > <horatiu.vultur@microchip.com> wrote: Hi Eric, Thanks for looking at this. > > > > Hi, > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > A7 CPU. All the tests are done using iperf3 command like this: > > 'iperf3 -c 10.97.10.1 -R' > > > > So on net-next, I can see the following: > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > And it gets around ~97000 interrupts. > > > > While going back to the commit[1], I can see the following: > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > And it gets around ~1000 interrupts. > > > > I have done a little bit of searching and I have noticed that this > > commit [2] introduce the regression. > > I have tried to revert this commit on net-next and tried again, then I > > can see much better results but not exactly the same: > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > And it gets around ~700 interrupts. > > > > So my question is, was I supposed to change something in lan966x driver? > > or is there a bug in lan966x driver that pop up because of this change? > > > > Any advice will be great. Thanks! > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > Hmmm... thanks for the report. > > This seems related to softirq (k)scheduling. > > Have you tried to apply this recent commit ? > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > Author: Paolo Abeni <pabeni@redhat.com> > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > Committer: Thomas Gleixner <tglx@linutronix.de> > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > Revert "softirq: Let ksoftirqd do its job" I have tried to apply this patch but the results are the same: [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender And it gets just a little bit bigger number of interrupts ~11000 > > > Alternative would be to try this : > > diff --git a/net/core/dev.c b/net/core/dev.c > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6713,8 +6713,8 @@ static __latent_entropy void > net_rx_action(struct softirq_action *h) > list_splice(&list, &sd->poll_list); > if (!list_empty(&sd->poll_list)) > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > - else > - sd->in_net_rx_action = false; > + > + sd->in_net_rx_action = false; > > net_rps_action_and_irq_enable(sd); > end:; I have tried to use also this change with and without the previous patch but the result is the same: [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender And it is the same number of interrupts. Is something else that I should try? -- /Horatiu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 7:45 ` Horatiu Vultur @ 2023-05-16 8:04 ` Eric Dumazet 2023-05-16 9:27 ` Horatiu Vultur 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2023-05-16 8:04 UTC (permalink / raw) To: Horatiu Vultur; +Cc: netdev On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur <horatiu.vultur@microchip.com> wrote: > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > <horatiu.vultur@microchip.com> wrote: > > Hi Eric, > > Thanks for looking at this. > > > > > > > Hi, > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > A7 CPU. All the tests are done using iperf3 command like this: > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > So on net-next, I can see the following: > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > And it gets around ~97000 interrupts. > > > > > > While going back to the commit[1], I can see the following: > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > And it gets around ~1000 interrupts. > > > > > > I have done a little bit of searching and I have noticed that this > > > commit [2] introduce the regression. > > > I have tried to revert this commit on net-next and tried again, then I > > > can see much better results but not exactly the same: > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > And it gets around ~700 interrupts. > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > Any advice will be great. Thanks! > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > Hmmm... thanks for the report. > > > > This seems related to softirq (k)scheduling. > > > > Have you tried to apply this recent commit ? > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > Author: Paolo Abeni <pabeni@redhat.com> > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > Committer: Thomas Gleixner <tglx@linutronix.de> > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > Revert "softirq: Let ksoftirqd do its job" > > I have tried to apply this patch but the results are the same: > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > Alternative would be to try this : > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > net_rx_action(struct softirq_action *h) > > list_splice(&list, &sd->poll_list); > > if (!list_empty(&sd->poll_list)) > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > - else > > - sd->in_net_rx_action = false; > > + > > + sd->in_net_rx_action = false; > > > > net_rps_action_and_irq_enable(sd); > > end:; > > I have tried to use also this change with and without the previous patch > but the result is the same: > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > And it is the same number of interrupts. > > Is something else that I should try? High number of interrupts for a saturated receiver seems wrong. (Unless it is not saturating the cpu ?) Perhaps hard irqs are not properly disabled by this driver. You also could try using napi_schedule_prep(), just in case it helps. diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c index bd72fbc2220f3010afd8b90f3704e261b9d0a98f..4694f4f34e6caf5cf540ada17a472c3c57f10823 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c @@ -628,10 +628,12 @@ irqreturn_t lan966x_fdma_irq_handler(int irq, void *args) err = lan_rd(lan966x, FDMA_INTR_ERR); if (db) { - lan_wr(0, lan966x, FDMA_INTR_DB_ENA); - lan_wr(db, lan966x, FDMA_INTR_DB); + if (napi_schedule_prep(&lan966x->napi)) { + lan_wr(0, lan966x, FDMA_INTR_DB_ENA); + lan_wr(db, lan966x, FDMA_INTR_DB); - napi_schedule(&lan966x->napi); + __napi_schedule(&lan966x->napi); + } } if (err) { ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 8:04 ` Eric Dumazet @ 2023-05-16 9:27 ` Horatiu Vultur 2023-05-16 9:59 ` Eric Dumazet 2023-05-16 10:16 ` Paolo Abeni 0 siblings, 2 replies; 10+ messages in thread From: Horatiu Vultur @ 2023-05-16 9:27 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev The 05/16/2023 10:04, Eric Dumazet wrote: > > On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur > <horatiu.vultur@microchip.com> wrote: > > > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > > <horatiu.vultur@microchip.com> wrote: > > > > Hi Eric, > > > > Thanks for looking at this. > > > > > > > > > > Hi, > > > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > > A7 CPU. All the tests are done using iperf3 command like this: > > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > > > So on net-next, I can see the following: > > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > > And it gets around ~97000 interrupts. > > > > > > > > While going back to the commit[1], I can see the following: > > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > > And it gets around ~1000 interrupts. > > > > > > > > I have done a little bit of searching and I have noticed that this > > > > commit [2] introduce the regression. > > > > I have tried to revert this commit on net-next and tried again, then I > > > > can see much better results but not exactly the same: > > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > > And it gets around ~700 interrupts. > > > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > > > Any advice will be great. Thanks! > > > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > > > > > Hmmm... thanks for the report. > > > > > > This seems related to softirq (k)scheduling. > > > > > > Have you tried to apply this recent commit ? > > > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > > Author: Paolo Abeni <pabeni@redhat.com> > > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > > Committer: Thomas Gleixner <tglx@linutronix.de> > > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > > > Revert "softirq: Let ksoftirqd do its job" > > > > I have tried to apply this patch but the results are the same: > > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > > > > > Alternative would be to try this : > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > > 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > > net_rx_action(struct softirq_action *h) > > > list_splice(&list, &sd->poll_list); > > > if (!list_empty(&sd->poll_list)) > > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > - else > > > - sd->in_net_rx_action = false; > > > + > > > + sd->in_net_rx_action = false; > > > > > > net_rps_action_and_irq_enable(sd); > > > end:; > > > > I have tried to use also this change with and without the previous patch > > but the result is the same: > > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > > And it is the same number of interrupts. > > > > Is something else that I should try? > > High number of interrupts for a saturated receiver seems wrong. > (Unless it is not saturating the cpu ?) The CPU usage seems to be almost at 100%. This is the output of top command: 149 132 root R 5032 0% 96% iperf3 -c 10.97.10.1 -R 12 2 root SW 0 0% 3% [ksoftirqd/0] 150 132 root R 2652 0% 1% top ... > > Perhaps hard irqs are not properly disabled by this driver. > > You also could try using napi_schedule_prep(), just in case it helps. > > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c > b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c > index bd72fbc2220f3010afd8b90f3704e261b9d0a98f..4694f4f34e6caf5cf540ada17a472c3c57f10823 > 100644 > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c > @@ -628,10 +628,12 @@ irqreturn_t lan966x_fdma_irq_handler(int irq, void *args) > err = lan_rd(lan966x, FDMA_INTR_ERR); > > if (db) { > - lan_wr(0, lan966x, FDMA_INTR_DB_ENA); > - lan_wr(db, lan966x, FDMA_INTR_DB); > + if (napi_schedule_prep(&lan966x->napi)) { > + lan_wr(0, lan966x, FDMA_INTR_DB_ENA); > + lan_wr(db, lan966x, FDMA_INTR_DB); > > - napi_schedule(&lan966x->napi); > + __napi_schedule(&lan966x->napi); > + } > } I get the same result as before with this. [ 5] 0.00-10.01 sec 477 MBytes 399 Mbits/sec 177 sender I have applied this change without applying any of the other changes that you suggested before. Should I apply also those the changes? > > if (err) { -- /Horatiu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 9:27 ` Horatiu Vultur @ 2023-05-16 9:59 ` Eric Dumazet 2023-05-16 10:17 ` Eric Dumazet 2023-05-16 10:16 ` Paolo Abeni 1 sibling, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2023-05-16 9:59 UTC (permalink / raw) To: Horatiu Vultur; +Cc: netdev On Tue, May 16, 2023 at 11:27 AM Horatiu Vultur <horatiu.vultur@microchip.com> wrote: > > The 05/16/2023 10:04, Eric Dumazet wrote: > > > > On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur > > <horatiu.vultur@microchip.com> wrote: > > > > > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > > > <horatiu.vultur@microchip.com> wrote: > > > > > > Hi Eric, > > > > > > Thanks for looking at this. > > > > > > > > > > > > > Hi, > > > > > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > > > A7 CPU. All the tests are done using iperf3 command like this: > > > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > > > > > So on net-next, I can see the following: > > > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > > > And it gets around ~97000 interrupts. > > > > > > > > > > While going back to the commit[1], I can see the following: > > > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > > > And it gets around ~1000 interrupts. > > > > > > > > > > I have done a little bit of searching and I have noticed that this > > > > > commit [2] introduce the regression. > > > > > I have tried to revert this commit on net-next and tried again, then I > > > > > can see much better results but not exactly the same: > > > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > > > And it gets around ~700 interrupts. > > > > > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > > > > > Any advice will be great. Thanks! > > > > > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > > > > > > > > > Hmmm... thanks for the report. > > > > > > > > This seems related to softirq (k)scheduling. > > > > > > > > Have you tried to apply this recent commit ? > > > > > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > > > Author: Paolo Abeni <pabeni@redhat.com> > > > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > > > Committer: Thomas Gleixner <tglx@linutronix.de> > > > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > > > > > Revert "softirq: Let ksoftirqd do its job" > > > > > > I have tried to apply this patch but the results are the same: > > > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > > > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > > > > > > > > > Alternative would be to try this : > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > > > 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > > > net_rx_action(struct softirq_action *h) > > > > list_splice(&list, &sd->poll_list); > > > > if (!list_empty(&sd->poll_list)) > > > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > - else > > > > - sd->in_net_rx_action = false; > > > > + > > > > + sd->in_net_rx_action = false; > > > > > > > > net_rps_action_and_irq_enable(sd); > > > > end:; > > > > > > I have tried to use also this change with and without the previous patch > > > but the result is the same: > > > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > > > And it is the same number of interrupts. > > > > > > Is something else that I should try? > > > > High number of interrupts for a saturated receiver seems wrong. > > (Unless it is not saturating the cpu ?) > > The CPU usage seems to be almost at 100%. This is the output of top > command: > 149 132 root R 5032 0% 96% iperf3 -c 10.97.10.1 -R > 12 2 root SW 0 0% 3% [ksoftirqd/0] > 150 132 root R 2652 0% 1% top Strange... There might be some scheduling artifacts in TCP stack for your particular workload. Perhaps leading to fewer ACK packets being sent, and slowing down the sender. It is unclear where cpu cycles are eaten. Normally the kernel->user copy should be the limiting factor. Please try perf record -a -g sleep 10 perf report --no-children --stdio I do not see obvious problems with my commit ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 9:59 ` Eric Dumazet @ 2023-05-16 10:17 ` Eric Dumazet 0 siblings, 0 replies; 10+ messages in thread From: Eric Dumazet @ 2023-05-16 10:17 UTC (permalink / raw) To: Horatiu Vultur; +Cc: netdev On Tue, May 16, 2023 at 11:59 AM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, May 16, 2023 at 11:27 AM Horatiu Vultur > <horatiu.vultur@microchip.com> wrote: > > > > The 05/16/2023 10:04, Eric Dumazet wrote: > > > > > > On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur > > > <horatiu.vultur@microchip.com> wrote: > > > > > > > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > > > > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > > > > <horatiu.vultur@microchip.com> wrote: > > > > > > > > Hi Eric, > > > > > > > > Thanks for looking at this. > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > > > > A7 CPU. All the tests are done using iperf3 command like this: > > > > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > > > > > > > So on net-next, I can see the following: > > > > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > > > > And it gets around ~97000 interrupts. > > > > > > > > > > > > While going back to the commit[1], I can see the following: > > > > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > > > > And it gets around ~1000 interrupts. > > > > > > > > > > > > I have done a little bit of searching and I have noticed that this > > > > > > commit [2] introduce the regression. > > > > > > I have tried to revert this commit on net-next and tried again, then I > > > > > > can see much better results but not exactly the same: > > > > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > > > > And it gets around ~700 interrupts. > > > > > > > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > > > > > > > Any advice will be great. Thanks! > > > > > > > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > > > > > > > > > > > > > Hmmm... thanks for the report. > > > > > > > > > > This seems related to softirq (k)scheduling. > > > > > > > > > > Have you tried to apply this recent commit ? > > > > > > > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > > > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > > > > Author: Paolo Abeni <pabeni@redhat.com> > > > > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > > > > Committer: Thomas Gleixner <tglx@linutronix.de> > > > > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > > > > > > > Revert "softirq: Let ksoftirqd do its job" > > > > > > > > I have tried to apply this patch but the results are the same: > > > > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > > > > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > > > > > > > > > > > > > Alternative would be to try this : > > > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > > > > 100644 > > > > > --- a/net/core/dev.c > > > > > +++ b/net/core/dev.c > > > > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > > > > net_rx_action(struct softirq_action *h) > > > > > list_splice(&list, &sd->poll_list); > > > > > if (!list_empty(&sd->poll_list)) > > > > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > > - else > > > > > - sd->in_net_rx_action = false; > > > > > + > > > > > + sd->in_net_rx_action = false; > > > > > > > > > > net_rps_action_and_irq_enable(sd); > > > > > end:; > > > > > > > > I have tried to use also this change with and without the previous patch > > > > but the result is the same: > > > > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > > > > And it is the same number of interrupts. > > > > > > > > Is something else that I should try? > > > > > > High number of interrupts for a saturated receiver seems wrong. > > > (Unless it is not saturating the cpu ?) > > > > The CPU usage seems to be almost at 100%. This is the output of top > > command: > > 149 132 root R 5032 0% 96% iperf3 -c 10.97.10.1 -R > > 12 2 root SW 0 0% 3% [ksoftirqd/0] > > 150 132 root R 2652 0% 1% top > > Strange... There might be some scheduling artifacts in TCP stack for > your particular workload. > Perhaps leading to fewer ACK packets being sent, and slowing down the sender. > > It is unclear where cpu cycles are eaten. Normally the kernel->user > copy should be the limiting factor. > > Please try > perf record -a -g sleep 10 > perf report --no-children --stdio > > > I do not see obvious problems with my commit I suspect the TCP receive queue fills up, holding too many MSS (pages) at once, and the driver page recycling strategy is defeated. You could use "ss -temoi" while iperf3 is running, to look at how big the receive queue is. Then you can reduce /proc/sys/net/ipv4/tcp_rmem[2] to limit the number of pages held by a TCP socket when its receive queue is not drained fast enough. (and restart iperf3 session) echo "4096 131072 1048576" >/proc/sys/net/ipv4_tcp_rmem If this changes performance, you might want to adjust RX ring size (or page_pool capacity), because this driver seems to use 512 slots. 512 slots for standard MTU means that no more than 741376 bytes of payload should sit in TCP receive queues. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 9:27 ` Horatiu Vultur 2023-05-16 9:59 ` Eric Dumazet @ 2023-05-16 10:16 ` Paolo Abeni 2023-05-16 14:11 ` Horatiu Vultur 1 sibling, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2023-05-16 10:16 UTC (permalink / raw) To: Horatiu Vultur, Eric Dumazet; +Cc: netdev On Tue, 2023-05-16 at 11:27 +0200, Horatiu Vultur wrote: > The 05/16/2023 10:04, Eric Dumazet wrote: > > > > On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur > > <horatiu.vultur@microchip.com> wrote: > > > > > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > > > <horatiu.vultur@microchip.com> wrote: > > > > > > Hi Eric, > > > > > > Thanks for looking at this. > > > > > > > > > > > > > Hi, > > > > > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > > > A7 CPU. All the tests are done using iperf3 command like this: > > > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > > > > > So on net-next, I can see the following: > > > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > > > And it gets around ~97000 interrupts. > > > > > > > > > > While going back to the commit[1], I can see the following: > > > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > > > And it gets around ~1000 interrupts. > > > > > > > > > > I have done a little bit of searching and I have noticed that this > > > > > commit [2] introduce the regression. > > > > > I have tried to revert this commit on net-next and tried again, then I > > > > > can see much better results but not exactly the same: > > > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > > > And it gets around ~700 interrupts. > > > > > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > > > > > Any advice will be great. Thanks! > > > > > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > > > > > > > > > Hmmm... thanks for the report. > > > > > > > > This seems related to softirq (k)scheduling. > > > > > > > > Have you tried to apply this recent commit ? > > > > > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > > > Author: Paolo Abeni <pabeni@redhat.com> > > > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > > > Committer: Thomas Gleixner <tglx@linutronix.de> > > > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > > > > > Revert "softirq: Let ksoftirqd do its job" > > > > > > I have tried to apply this patch but the results are the same: > > > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > > > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > > > > > > > > > Alternative would be to try this : > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > > > 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > > > net_rx_action(struct softirq_action *h) > > > > list_splice(&list, &sd->poll_list); > > > > if (!list_empty(&sd->poll_list)) > > > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > - else > > > > - sd->in_net_rx_action = false; > > > > + > > > > + sd->in_net_rx_action = false; > > > > > > > > net_rps_action_and_irq_enable(sd); > > > > end:; > > > > > > I have tried to use also this change with and without the previous patch > > > but the result is the same: > > > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > > > And it is the same number of interrupts. > > > > > > Is something else that I should try? > > > > High number of interrupts for a saturated receiver seems wrong. > > (Unless it is not saturating the cpu ?) > > The CPU usage seems to be almost at 100%. This is the output of top > command: > 149 132 root R 5032 0% 96% iperf3 -c 10.97.10.1 -R > 12 2 root SW 0 0% 3% [ksoftirqd/0] > 150 132 root R 2652 0% 1% top > ... Sorry for the dumb question, is the above with fdma == false? (that is, no napi?) Why can't lan966x_xtr_irq_handler() be converted to the napi model regardless of fdma ?!? Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 10:16 ` Paolo Abeni @ 2023-05-16 14:11 ` Horatiu Vultur 2023-05-16 14:32 ` Paolo Abeni 0 siblings, 1 reply; 10+ messages in thread From: Horatiu Vultur @ 2023-05-16 14:11 UTC (permalink / raw) To: Paolo Abeni; +Cc: Eric Dumazet, netdev The 05/16/2023 12:16, Paolo Abeni wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Tue, 2023-05-16 at 11:27 +0200, Horatiu Vultur wrote: > > The 05/16/2023 10:04, Eric Dumazet wrote: > > > > > > On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur > > > <horatiu.vultur@microchip.com> wrote: > > > > > > > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > > > > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > > > > <horatiu.vultur@microchip.com> wrote: > > > > > > > > Hi Eric, > > > > > > > > Thanks for looking at this. > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > > > > A7 CPU. All the tests are done using iperf3 command like this: > > > > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > > > > > > > So on net-next, I can see the following: > > > > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > > > > And it gets around ~97000 interrupts. > > > > > > > > > > > > While going back to the commit[1], I can see the following: > > > > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > > > > And it gets around ~1000 interrupts. > > > > > > > > > > > > I have done a little bit of searching and I have noticed that this > > > > > > commit [2] introduce the regression. > > > > > > I have tried to revert this commit on net-next and tried again, then I > > > > > > can see much better results but not exactly the same: > > > > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > > > > And it gets around ~700 interrupts. > > > > > > > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > > > > > > > Any advice will be great. Thanks! > > > > > > > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > > > > > > > > > > > > > Hmmm... thanks for the report. > > > > > > > > > > This seems related to softirq (k)scheduling. > > > > > > > > > > Have you tried to apply this recent commit ? > > > > > > > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > > > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > > > > Author: Paolo Abeni <pabeni@redhat.com> > > > > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > > > > Committer: Thomas Gleixner <tglx@linutronix.de> > > > > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > > > > > > > Revert "softirq: Let ksoftirqd do its job" > > > > > > > > I have tried to apply this patch but the results are the same: > > > > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > > > > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > > > > > > > > > > > > > Alternative would be to try this : > > > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > > > > 100644 > > > > > --- a/net/core/dev.c > > > > > +++ b/net/core/dev.c > > > > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > > > > net_rx_action(struct softirq_action *h) > > > > > list_splice(&list, &sd->poll_list); > > > > > if (!list_empty(&sd->poll_list)) > > > > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > > - else > > > > > - sd->in_net_rx_action = false; > > > > > + > > > > > + sd->in_net_rx_action = false; > > > > > > > > > > net_rps_action_and_irq_enable(sd); > > > > > end:; > > > > > > > > I have tried to use also this change with and without the previous patch > > > > but the result is the same: > > > > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > > > > And it is the same number of interrupts. > > > > > > > > Is something else that I should try? > > > > > > High number of interrupts for a saturated receiver seems wrong. > > > (Unless it is not saturating the cpu ?) > > > > The CPU usage seems to be almost at 100%. This is the output of top > > command: > > 149 132 root R 5032 0% 96% iperf3 -c 10.97.10.1 -R > > 12 2 root SW 0 0% 3% [ksoftirqd/0] > > 150 132 root R 2652 0% 1% top > > ... > > Sorry for the dumb question, is the above with fdma == false? (that is, > no napi?) Why can't lan966x_xtr_irq_handler() be converted to the napi > model regardless of fdma ?!? No, this is with fdma == true. Where we use napi. Will it be any advantage to use NAPI for lan966x_xtr_irq_handler()? Because for lan966x_xtr_irq_handler() we will still need to read each word of the frame, which I think will be a big drawback compared with lan966x_fdma_irq_handler(). Or did I misunderstand the question? > > Thanks, > > Paolo > -- /Horatiu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance regression on lan966x when extracting frames 2023-05-16 14:11 ` Horatiu Vultur @ 2023-05-16 14:32 ` Paolo Abeni 0 siblings, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2023-05-16 14:32 UTC (permalink / raw) To: Horatiu Vultur; +Cc: Eric Dumazet, netdev On Tue, 2023-05-16 at 16:11 +0200, Horatiu Vultur wrote: > The 05/16/2023 12:16, Paolo Abeni wrote: > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > > > On Tue, 2023-05-16 at 11:27 +0200, Horatiu Vultur wrote: > > > The 05/16/2023 10:04, Eric Dumazet wrote: > > > > > > > > On Tue, May 16, 2023 at 9:45 AM Horatiu Vultur > > > > <horatiu.vultur@microchip.com> wrote: > > > > > > > > > > The 05/15/2023 14:30, Eric Dumazet wrote: > > > > > > > > > > > > On Mon, May 15, 2023 at 11:12 AM Horatiu Vultur > > > > > > <horatiu.vultur@microchip.com> wrote: > > > > > > > > > > Hi Eric, > > > > > > > > > > Thanks for looking at this. > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I have noticed that on the HEAD of net-next[0] there is a performance drop > > > > > > > for lan966x when extracting frames towards the CPU. Lan966x has a Cortex > > > > > > > A7 CPU. All the tests are done using iperf3 command like this: > > > > > > > 'iperf3 -c 10.97.10.1 -R' > > > > > > > > > > > > > > So on net-next, I can see the following: > > > > > > > [ 5] 0.00-10.01 sec 473 MBytes 396 Mbits/sec 456 sender > > > > > > > And it gets around ~97000 interrupts. > > > > > > > > > > > > > > While going back to the commit[1], I can see the following: > > > > > > > [ 5] 0.00-10.02 sec 632 MBytes 529 Mbits/sec 11 sender > > > > > > > And it gets around ~1000 interrupts. > > > > > > > > > > > > > > I have done a little bit of searching and I have noticed that this > > > > > > > commit [2] introduce the regression. > > > > > > > I have tried to revert this commit on net-next and tried again, then I > > > > > > > can see much better results but not exactly the same: > > > > > > > [ 5] 0.00-10.01 sec 616 MBytes 516 Mbits/sec 0 sender > > > > > > > And it gets around ~700 interrupts. > > > > > > > > > > > > > > So my question is, was I supposed to change something in lan966x driver? > > > > > > > or is there a bug in lan966x driver that pop up because of this change? > > > > > > > > > > > > > > Any advice will be great. Thanks! > > > > > > > > > > > > > > [0] befcc1fce564 ("sfc: fix use-after-free in efx_tc_flower_record_encap_match()") > > > > > > > [1] d4671cb96fa3 ("Merge branch 'lan966x-tx-rx-improve'") > > > > > > > [2] 8b43fd3d1d7d ("net: optimize ____napi_schedule() to avoid extra NET_RX_SOFTIRQ") > > > > > > > > > > > > > > > > > > > > > > > > > > Hmmm... thanks for the report. > > > > > > > > > > > > This seems related to softirq (k)scheduling. > > > > > > > > > > > > Have you tried to apply this recent commit ? > > > > > > > > > > > > Commit-ID: d15121be7485655129101f3960ae6add40204463 > > > > > > Gitweb: https://git.kernel.org/tip/d15121be7485655129101f3960ae6add40204463 > > > > > > Author: Paolo Abeni <pabeni@redhat.com> > > > > > > AuthorDate: Mon, 08 May 2023 08:17:44 +02:00 > > > > > > Committer: Thomas Gleixner <tglx@linutronix.de> > > > > > > CommitterDate: Tue, 09 May 2023 21:50:27 +02:00 > > > > > > > > > > > > Revert "softirq: Let ksoftirqd do its job" > > > > > > > > > > I have tried to apply this patch but the results are the same: > > > > > [ 5] 0.00-10.01 sec 478 MBytes 400 Mbits/sec 188 sender > > > > > And it gets just a little bit bigger number of interrupts ~11000 > > > > > > > > > > > > > > > > > > > > > > > Alternative would be to try this : > > > > > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > > > index b3c13e0419356b943e90b1f46dd7e035c6ec1a9c..f570a3ca00e7aa0e605178715f90bae17b86f071 > > > > > > 100644 > > > > > > --- a/net/core/dev.c > > > > > > +++ b/net/core/dev.c > > > > > > @@ -6713,8 +6713,8 @@ static __latent_entropy void > > > > > > net_rx_action(struct softirq_action *h) > > > > > > list_splice(&list, &sd->poll_list); > > > > > > if (!list_empty(&sd->poll_list)) > > > > > > __raise_softirq_irqoff(NET_RX_SOFTIRQ); > > > > > > - else > > > > > > - sd->in_net_rx_action = false; > > > > > > + > > > > > > + sd->in_net_rx_action = false; > > > > > > > > > > > > net_rps_action_and_irq_enable(sd); > > > > > > end:; > > > > > > > > > > I have tried to use also this change with and without the previous patch > > > > > but the result is the same: > > > > > [ 5] 0.00-10.01 sec 478 MBytes 401 Mbits/sec 256 sender > > > > > And it is the same number of interrupts. > > > > > > > > > > Is something else that I should try? > > > > > > > > High number of interrupts for a saturated receiver seems wrong. > > > > (Unless it is not saturating the cpu ?) > > > > > > The CPU usage seems to be almost at 100%. This is the output of top > > > command: > > > 149 132 root R 5032 0% 96% iperf3 -c 10.97.10.1 -R > > > 12 2 root SW 0 0% 3% [ksoftirqd/0] > > > 150 132 root R 2652 0% 1% top > > > ... > > > > Sorry for the dumb question, is the above with fdma == false? (that is, > > no napi?) Why can't lan966x_xtr_irq_handler() be converted to the napi > > model regardless of fdma ?!? > > No, this is with fdma == true. Where we use napi. > > Will it be any advantage to use NAPI for lan966x_xtr_irq_handler()? Using NAPI you will avoid extra queuing and will gain GRO. Should make quite a difference. > Because for lan966x_xtr_irq_handler() we will still need to read each > word of the frame, which I think will be a big drawback compared with > lan966x_fdma_irq_handler(). I guess/hope all the lan966x_rx_frame_word() work could be moved into the napi poll callback. In any case the fdma == false code path will be likely quite slower then the fdma == true path - and hopefully faster then the current code. > Or did I misunderstand the question? I think you didn't ;) Cheers, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-05-16 14:33 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-15 9:12 Performance regression on lan966x when extracting frames Horatiu Vultur 2023-05-15 12:30 ` Eric Dumazet 2023-05-16 7:45 ` Horatiu Vultur 2023-05-16 8:04 ` Eric Dumazet 2023-05-16 9:27 ` Horatiu Vultur 2023-05-16 9:59 ` Eric Dumazet 2023-05-16 10:17 ` Eric Dumazet 2023-05-16 10:16 ` Paolo Abeni 2023-05-16 14:11 ` Horatiu Vultur 2023-05-16 14:32 ` Paolo Abeni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).