public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] scsi/block: NUMA-local allocations and false-sharing fixes
@ 2026-04-02  7:46 Sumit Saxena
  2026-04-02  7:46 ` [PATCH 1/3] scsi: use NUMA-local allocation for sdev and starget Sumit Saxena
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sumit Saxena @ 2026-04-02  7:46 UTC (permalink / raw)
  To: martin.petersen, axboe
  Cc: linux-scsi, linux-block, mpi3mr-linuxdrv.pdl, Sumit Saxena

This series contains three performance improvements targeting the SCSI
and block layers on multi-socket NUMA systems.

On multi-socket NUMA systems we observed extreme I/O throughput variance
of 50-60% between runs.  This series identifies and fixes two root causes:
cross-node memory accesses due to NUMA-unaware allocations in the scan
path, and false sharing between hot atomic counters in
struct request_queue and struct scsi_device.

The first patch makes the SCSI scan path allocate scsi_device and
scsi_target on the NUMA node of the host adapter.

The second patch addresses false sharing in struct request_queue.
This patch touches include/linux/blkdev.h, so needs review from
linux-block, an Acked-by from the block maintainer is requested before
merging via the SCSI tree.

The third patch addresses a false-sharing problem in struct
scsi_device.

Performance notes:

Tested on a dual-socket NUMA system with an mpi3mr HBA, running fio
(random read, 4K, QD 64, 16 jobs, 60s, direct I/O). 
IOPS figures are in KIOPS (thousands of IOPS):

  Configuration                    Avg KIOPS   Range (KIOPS)   Spread
  Baseline                         6,255       4,200 - 6,700   ~37%
  Baseline + patches 2-3 (align)   6,653       6,000 - 7,000   ~15%
  Baseline + all patches (1-3)     6,649       6,400 - 7,000    ~9%

Key findings:
  - Cacheline alignment patches (2-3) raise average IOPS by ~6% and
    cut throughput spread from ~37% to ~15%.
  - Adding the NUMA allocation patch (1) further tightens the spread
    to ~9% with negligible impact on average throughput.
  - The combined effect reduces the observed 50-60% run-to-run variance
    to under 10%, significantly improving workload predictability.

No functional regressions observed.

This patch series is based on Martin's for-next tree.

James Rizzo (3):
  scsi: use NUMA-local allocation for sdev and starget
  block: align nr_active_requests_shared_tags to avoid cache line
    contention
  scsi: align scsi_device iodone_cnt to avoid cache line contention

 drivers/scsi/scsi_scan.c   | 9 ++++++---
 include/linux/blkdev.h     | 4 +++-
 include/scsi/scsi_device.h | 4 +++-
 3 files changed, 12 insertions(+), 5 deletions(-)

-- 
2.43.7

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-09  6:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02  7:46 [PATCH 0/3] scsi/block: NUMA-local allocations and false-sharing fixes Sumit Saxena
2026-04-02  7:46 ` [PATCH 1/3] scsi: use NUMA-local allocation for sdev and starget Sumit Saxena
2026-04-02  7:46 ` [PATCH 2/3] block: align nr_active_requests_shared_tags to avoid cache line contention Sumit Saxena
2026-04-02 15:54   ` Bart Van Assche
2026-04-09  6:13     ` Sumit Saxena
2026-04-02  7:46 ` [PATCH 3/3] scsi: align scsi_device iodone_cnt " Sumit Saxena
2026-04-02 15:58   ` Bart Van Assche
2026-04-09  6:17     ` Sumit Saxena

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox