netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] mlxsw: Improve events processing performance
@ 2024-04-26 12:42 Petr Machata
  2024-04-26 12:42 ` [PATCH net-next 1/5] mlxsw: pci: Handle up to 64 Rx completions in tasklet Petr Machata
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Petr Machata @ 2024-04-26 12:42 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, Amit Cohen, mlxsw

Amit Cohen writes:

Spectrum ASICs only support a single interrupt, it means that all the
events are handled by one IRQ (interrupt request) handler.

Currently, we schedule a tasklet to handle events in EQ, then we also use
tasklet for CQ, SDQ and RDQ. Tasklet runs in softIRQ (software IRQ)
context, and will be run on the same CPU which scheduled it. It means that
today we have one CPU which handles all the packets (both network packets
and EMADs) from hardware.

The existing implementation is not efficient and can be improved.

Measuring latency of EMADs in the driver (without the time in FW) shows
that latency is increased by factor of 28 (x28) when network traffic is
handled by the driver.

Measuring throughput in CPU shows that CPU can handle ~35% less packets
of specific flow when corrupted packets are also handled by the driver.
There are cases that these values even worse, we measure decrease of ~44%
packet rate.

This can be improved if network packet and EMADs will be handled in
parallel by several CPUs, and more than that, if different types of traffic
will be handled in parallel. We can achieve this using NAPI.

This set converts the driver to process completions from hardware via NAPI.
The idea is to add NAPI instance per CQ (which is mapped 1:1 to SDQ/RDQ),
which means that each DQ can be handled separately. we have DQ for EMADs
and DQs for each trap group (like LLDP, BGP, L3 drops, etc..). See more
details in commit messages.

An additional improvement which is done as part of this set is related to
doorbells' ring. The idea is to handle small chunks of Rx packets (which
is also recommended using NAPI) and ring doorbells once per chunk. This
reduces the access to hardware which is expensive (time wise) and might
take time because of memory barriers.

With this set we can see better performance.
To summerize:

EMADs latency:
+------------------------------------------------------------------------+
|                  | Before this set           | Now                     |
|------------------|---------------------------|-------------------------|
| Increased factor | x28                       | x1.5                    |
+------------------------------------------------------------------------+
Note that we can see even measurements that show better latency when
traffic is handled by the driver.

Throughput:
+------------------------------------------------------------------------+
|             | Before this set            | Now                         |
|-------------|----------------------------|-----------------------------|
| Reduced     | 35%                        | 6%                          |
| packet rate |                            |                             |
+------------------------------------------------------------------------+

Additional improvements are planned - use page pool for buffer allocations
and avoid cache miss of each SKB using napi_build_skb().

Patch set overview:
Patches #1-#2 improve access to hardware by reducing dorbells' rings
Patch #3-#4 are preaparations for NAPI usage
Patch #5 converts the driver to use NAPI

Amit Cohen (5):
  mlxsw: pci: Handle up to 64 Rx completions in tasklet
  mlxsw: pci: Ring RDQ and CQ doorbells once per several completions
  mlxsw: pci: Initialize dummy net devices for NAPI
  mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure
  mlxsw: pci: Use NAPI for event processing

 drivers/net/ethernet/mellanox/mlxsw/pci.c | 204 ++++++++++++++++------
 1 file changed, 150 insertions(+), 54 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-29  9:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-26 12:42 [PATCH net-next 0/5] mlxsw: Improve events processing performance Petr Machata
2024-04-26 12:42 ` [PATCH net-next 1/5] mlxsw: pci: Handle up to 64 Rx completions in tasklet Petr Machata
2024-04-26 12:42 ` [PATCH net-next 2/5] mlxsw: pci: Ring RDQ and CQ doorbells once per several completions Petr Machata
2024-04-26 12:42 ` [PATCH net-next 3/5] mlxsw: pci: Initialize dummy net devices for NAPI Petr Machata
2024-04-26 12:42 ` [PATCH net-next 4/5] mlxsw: pci: Reorganize 'mlxsw_pci_queue' structure Petr Machata
2024-04-26 12:42 ` [PATCH net-next 5/5] mlxsw: pci: Use NAPI for event processing Petr Machata
2024-04-29  9:50 ` [PATCH net-next 0/5] mlxsw: Improve events processing performance patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).