From: Sumit Saxena <sumit.saxena@broadcom.com>
To: "Martin K . Petersen" <martin.petersen@oracle.com>,
Jens Axboe <axboe@kernel.dk>
Cc: "James E . J . Bottomley" <James.Bottomley@HansenPartnership.com>,
linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
Adam Radford <aradford@gmail.com>,
Khalid Aziz <khalid@gonehiking.org>,
Adaptec OEM Raid Solutions <aacraid@microsemi.com>,
Matthew Wilcox <willy@infradead.org>,
Hannes Reinecke <hare@suse.com>,
"Juergen E . Fischer" <fischer@norbit.de>,
Russell King <linux@armlinux.org.uk>,
linux-arm-kernel@lists.infradead.org,
Finn Thain <fthain@linux-m68k.org>,
Michael Schmitz <schmitzmic@gmail.com>,
Anil Gurumurthy <anil.gurumurthy@qlogic.com>,
Sudarsana Kalluru <sudarsana.kalluru@qlogic.com>,
Oliver Neukum <oliver@neukum.org>, Ali Akcaagac <aliakc@web.de>,
Jamie Lenehan <lenehan@twibble.org>,
Ram Vegesna <ram.vegesna@broadcom.com>,
target-devel@vger.kernel.org,
Bradley Grove <linuxdrivers@attotech.com>,
Satish Kharat <satishkh@cisco.com>,
Sesidhar Baddela <sebaddel@cisco.com>,
Karan Tilak Kumar <kartilak@cisco.com>,
Yihang Li <liyihang9@h-partners.com>,
Don Brace <don.brace@microchip.com>,
storagedev@microchip.com,
HighPoint Linux Team <linux@highpoint-tech.com>,
Tyrel Datwyler <tyreld@linux.ibm.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <chleroy@kernel.org>,
linuxppc-dev@lists.ozlabs.org, Brian King <brking@us.ibm.com>,
Lee Duncan <lduncan@suse.com>, Chris Leech <cleech@redhat.com>,
Mike Christie <michael.christie@oracle.com>,
open-iscsi@googlegroups.com, Justin Tee <justin.tee@broadcom.com>,
Paul Ely <paul.ely@broadcom.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
Chandrakanth Patil <chandrakanth.patil@broadcom.com>,
megaraidlinux.pdl@broadcom.com,
Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>,
Sreekanth Reddy <sreekanth.reddy@broadcom.com>,
mpi3mr-linuxdrv.pdl@broadcom.com,
Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>,
Ranjan Kumar <ranjan.kumar@broadcom.com>,
MPT-FusionLinux.pdl@broadcom.com,
Daniel Palmer <daniel@thingy.jp>,
GOTO Masanori <gotom@debian.or.jp>,
YOKOTA Hiroshi <yokota@netlab.is.tsukuba.ac.jp>,
Jack Wang <jinpu.wang@cloud.ionos.com>,
Geoff Levand <geoff@infradead.org>, Michael Reed <mdr@sgi.com>,
Nilesh Javali <njavali@marvell.com>,
GR-QLogic-Storage-Upstream@marvell.com,
Narsimhulu Musini <nmusini@cisco.com>,
"K . Y . Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
Long Li <longli@microsoft.com>,
linux-hyperv@vger.kernel.org,
"Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Eugenio Perez <eperezma@redhat.com>,
virtualization@lists.linux.dev,
Vishal Bhakta <vishal.bhakta@broadcom.com>,
bcm-kernel-feedback-list@broadcom.com,
Juergen Gross <jgross@suse.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
xen-devel@lists.xenproject.org,
Sumit Saxena <sumit.saxena@broadcom.com>
Subject: [PATCH v3 0/4] scsi/block: NUMA-local scan allocations, shared-tag path cleanup, and SCSI I/O counters
Date: Tue, 9 Jun 2026 17:47:59 +0530 [thread overview]
Message-ID: <20260609121806.2121755-1-sumit.saxena@broadcom.com> (raw)
This series contains three performance improvements targeting the SCSI
and block layers on multi-socket NUMA and heavily loaded SMP systems.
On multi-socket NUMA systems we observed extreme I/O throughput variance
of 50-60% between runs. This series identifies and fixes two root causes:
cross-node memory accesses due to NUMA-unaware allocations in the scan
path, and false sharing between hot atomic counters in struct request_queue
and struct scsi_device.
Performance notes:
Tested on a dual-socket NUMA system (2x 32-core, 256 GB/socket) with
an mpi3mr HBA, running fio (random read, 4K, QD 64, 16 jobs, 60 s,
direct I/O). IOPS figures are in KIOPS (thousands of IOPS):
Configuration Avg KIOPS Range (KIOPS) Spread
Baseline 6,255 4,200 - 6,700 ~37%
Baseline + all patches 7,350 7,000 - 7,700 ~10%
Key findings:
These patches combinedly reduces the observed 50-60% run-to-run variance
to under 10%, significantly improving workload predictability and
improves IOPs by 16-18%.
No functional regressions observed.
Changes in v3
-------------
-Handled feedback from Bart Van Assche and John Garry.
-Added a patch for shost local NUMA allocation.
-Converted ioerr_cnt and iotmo_cnt atomic counters into per-cpu counters.
Changes in v2
--------------
Patch 1 — Same functional goal as v1 patch 1: NUMA-local scsi_device /
scsi_target allocations in the scan path so steady-state I/O does not
habitually touch remote memory when the host has a fixed DMA/NUMA
affinity.
Patch 2 — Replaces v1’s ____cacheline_aligned_in_smp on
nr_active_requests_shared_tags with removal of the shared-tag fairness
throttling machinery (including hctx_may_queue(), blk_mq_hw_ctx.nr_active,
and request_queue.nr_active_requests_shared_tags and their updates).
This follows the earlier standalone proposal by Bart Van Assche [1],
rebased for the current tree; it removes the high-frequency atomic
accounting that motivated the v1 false-sharing workaround and, in our
testing, improves IOPS on the order of roughly 16–18% for the shared-tag
workload exercised.
Patch 3 — Replaces v1’s cache-line padding of iodone_cnt with
percpu_counter for both iorequest_cnt and iodone_cnt, so submission and
completion paths mostly update CPU-local state instead of bouncing a
single cache line, without inflating struct scsi_device for SMP
alignment.
Merge / review hints
--------------------
Patch 3 touches the block layer and should have block maintainer review;
rest of patches are SCSI-oriented. Please route or Ack as your subsystem
workflow requires.
Bart Van Assche (1):
block: drop shared-tag fairness throttling
James Rizzo (1):
scsi: scan: allocate sdev and starget on the NUMA node of the host
adapter
Sumit Saxena (2):
scsi: host: allocate struct Scsi_Host on the NUMA node of the host
adapter
scsi: use percpu counters for iostat counters in struct scsi_device
block/blk-core.c | 2 -
block/blk-mq-debugfs.c | 22 ++++-
block/blk-mq-tag.c | 4 -
block/blk-mq.c | 17 +---
block/blk-mq.h | 100 ----------------------
drivers/scsi/3w-9xxx.c | 2 +-
drivers/scsi/3w-sas.c | 2 +-
drivers/scsi/3w-xxxx.c | 2 +-
drivers/scsi/53c700.c | 2 +-
drivers/scsi/BusLogic.c | 2 +-
drivers/scsi/a100u2w.c | 2 +-
drivers/scsi/a2091.c | 2 +-
drivers/scsi/a3000.c | 2 +-
drivers/scsi/aacraid/linit.c | 2 +-
drivers/scsi/advansys.c | 6 +-
drivers/scsi/aha152x.c | 2 +-
drivers/scsi/aha1542.c | 2 +-
drivers/scsi/aha1740.c | 2 +-
drivers/scsi/aic7xxx/aic79xx_osm.c | 2 +-
drivers/scsi/aic7xxx/aic7xxx_osm.c | 2 +-
drivers/scsi/aic94xx/aic94xx_init.c | 2 +-
drivers/scsi/am53c974.c | 2 +-
drivers/scsi/arcmsr/arcmsr_hba.c | 3 +-
drivers/scsi/arm/acornscsi.c | 2 +-
drivers/scsi/arm/arxescsi.c | 2 +-
drivers/scsi/arm/cumana_1.c | 2 +-
drivers/scsi/arm/cumana_2.c | 2 +-
drivers/scsi/arm/eesox.c | 2 +-
drivers/scsi/arm/oak.c | 2 +-
drivers/scsi/arm/powertec.c | 2 +-
drivers/scsi/atari_scsi.c | 2 +-
drivers/scsi/atp870u.c | 2 +-
drivers/scsi/bfa/bfad_im.c | 2 +-
drivers/scsi/csiostor/csio_init.c | 4 +-
drivers/scsi/dc395x.c | 2 +-
drivers/scsi/dmx3191d.c | 2 +-
drivers/scsi/elx/efct/efct_xport.c | 4 +-
drivers/scsi/esas2r/esas2r_main.c | 2 +-
drivers/scsi/fdomain.c | 2 +-
drivers/scsi/fnic/fnic_main.c | 2 +-
drivers/scsi/g_NCR5380.c | 2 +-
drivers/scsi/gvp11.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 +-
drivers/scsi/hosts.c | 6 +-
drivers/scsi/hpsa.c | 2 +-
drivers/scsi/hptiop.c | 2 +-
drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
drivers/scsi/imm.c | 2 +-
drivers/scsi/initio.c | 2 +-
drivers/scsi/ipr.c | 2 +-
drivers/scsi/ips.c | 2 +-
drivers/scsi/isci/init.c | 2 +-
drivers/scsi/jazz_esp.c | 2 +-
drivers/scsi/libiscsi.c | 2 +-
drivers/scsi/lpfc/lpfc_init.c | 2 +-
drivers/scsi/mac53c94.c | 2 +-
drivers/scsi/mac_esp.c | 2 +-
drivers/scsi/mac_scsi.c | 2 +-
drivers/scsi/megaraid.c | 2 +-
drivers/scsi/megaraid/megaraid_mbox.c | 2 +-
drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
drivers/scsi/mesh.c | 2 +-
drivers/scsi/mpi3mr/mpi3mr_os.c | 2 +-
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 4 +-
drivers/scsi/mvme147.c | 2 +-
drivers/scsi/mvsas/mv_init.c | 2 +-
drivers/scsi/mvumi.c | 2 +-
drivers/scsi/myrb.c | 2 +-
drivers/scsi/myrs.c | 2 +-
drivers/scsi/ncr53c8xx.c | 2 +-
drivers/scsi/nsp32.c | 2 +-
drivers/scsi/pcmcia/nsp_cs.c | 2 +-
drivers/scsi/pcmcia/qlogic_stub.c | 2 +-
drivers/scsi/pcmcia/sym53c500_cs.c | 2 +-
drivers/scsi/pm8001/pm8001_init.c | 2 +-
drivers/scsi/pmcraid.c | 2 +-
drivers/scsi/ppa.c | 2 +-
drivers/scsi/ps3rom.c | 2 +-
drivers/scsi/qla1280.c | 2 +-
drivers/scsi/qla2xxx/qla_mid.c | 2 +-
drivers/scsi/qla2xxx/qla_os.c | 2 +-
drivers/scsi/qlogicfas.c | 2 +-
drivers/scsi/qlogicpti.c | 2 +-
drivers/scsi/scsi_debug.c | 2 +-
drivers/scsi/scsi_error.c | 4 +-
drivers/scsi/scsi_lib.c | 10 +--
drivers/scsi/scsi_scan.c | 15 +++-
drivers/scsi/scsi_sysfs.c | 23 +++--
drivers/scsi/sd.c | 2 +-
drivers/scsi/sgiwd93.c | 2 +-
drivers/scsi/smartpqi/smartpqi_init.c | 2 +-
drivers/scsi/snic/snic_main.c | 2 +-
drivers/scsi/stex.c | 2 +-
drivers/scsi/storvsc_drv.c | 2 +-
drivers/scsi/sun3_scsi.c | 2 +-
drivers/scsi/sun3x_esp.c | 2 +-
drivers/scsi/sun_esp.c | 2 +-
drivers/scsi/sym53c8xx_2/sym_glue.c | 2 +-
drivers/scsi/virtio_scsi.c | 2 +-
drivers/scsi/vmw_pvscsi.c | 2 +-
drivers/scsi/wd719x.c | 2 +-
drivers/scsi/xen-scsifront.c | 2 +-
drivers/scsi/zorro_esp.c | 2 +-
include/linux/blk-mq.h | 6 --
include/linux/blkdev.h | 2 -
include/scsi/libfc.h | 2 +-
include/scsi/scsi_device.h | 9 +-
include/scsi/scsi_host.h | 3 +-
110 files changed, 168 insertions(+), 258 deletions(-)
--
2.43.7
next reply other threads:[~2026-06-10 4:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 12:17 Sumit Saxena [this message]
2026-06-09 12:18 ` [PATCH v3 1/4] scsi: scan: allocate sdev and starget on the NUMA node of the host adapter Sumit Saxena
2026-06-10 6:00 ` Hannes Reinecke
2026-06-09 12:18 ` [PATCH v3 2/4] scsi: host: allocate struct Scsi_Host " Sumit Saxena
2026-06-09 13:03 ` John Garry
2026-06-10 5:59 ` Hannes Reinecke
2026-06-09 12:18 ` [PATCH v3 3/4] block: drop shared-tag fairness throttling Sumit Saxena
2026-06-10 6:14 ` Christoph Hellwig
[not found] ` <CAL2rwxr1uGshb1o=jvP2OnBffNz2cKXj8tHuAUCN5HFuy2vB_g@mail.gmail.com>
2026-06-10 16:35 ` Keith Busch
2026-06-10 6:18 ` Hannes Reinecke
2026-06-09 12:18 ` [PATCH v3 4/4] scsi: use percpu counters for iostat counters in struct scsi_device Sumit Saxena
2026-06-10 6:21 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260609121806.2121755-1-sumit.saxena@broadcom.com \
--to=sumit.saxena@broadcom.com \
--cc=GR-QLogic-Storage-Upstream@marvell.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=MPT-FusionLinux.pdl@broadcom.com \
--cc=aacraid@microsemi.com \
--cc=aliakc@web.de \
--cc=anil.gurumurthy@qlogic.com \
--cc=aradford@gmail.com \
--cc=axboe@kernel.dk \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=brking@us.ibm.com \
--cc=chandrakanth.patil@broadcom.com \
--cc=chleroy@kernel.org \
--cc=cleech@redhat.com \
--cc=daniel@thingy.jp \
--cc=decui@microsoft.com \
--cc=don.brace@microchip.com \
--cc=eperezma@redhat.com \
--cc=fischer@norbit.de \
--cc=fthain@linux-m68k.org \
--cc=geoff@infradead.org \
--cc=gotom@debian.or.jp \
--cc=haiyangz@microsoft.com \
--cc=hare@suse.com \
--cc=jasowang@redhat.com \
--cc=jgross@suse.com \
--cc=jinpu.wang@cloud.ionos.com \
--cc=justin.tee@broadcom.com \
--cc=kartilak@cisco.com \
--cc=kashyap.desai@broadcom.com \
--cc=khalid@gonehiking.org \
--cc=kys@microsoft.com \
--cc=lduncan@suse.com \
--cc=lenehan@twibble.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=linux@highpoint-tech.com \
--cc=linuxdrivers@attotech.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liyihang9@h-partners.com \
--cc=longli@microsoft.com \
--cc=maddy@linux.ibm.com \
--cc=martin.petersen@oracle.com \
--cc=mdr@sgi.com \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=michael.christie@oracle.com \
--cc=mpe@ellerman.id.au \
--cc=mpi3mr-linuxdrv.pdl@broadcom.com \
--cc=mst@redhat.com \
--cc=njavali@marvell.com \
--cc=nmusini@cisco.com \
--cc=npiggin@gmail.com \
--cc=oleksandr_tyshchenko@epam.com \
--cc=oliver@neukum.org \
--cc=open-iscsi@googlegroups.com \
--cc=paul.ely@broadcom.com \
--cc=pbonzini@redhat.com \
--cc=ram.vegesna@broadcom.com \
--cc=ranjan.kumar@broadcom.com \
--cc=sathya.prakash@broadcom.com \
--cc=satishkh@cisco.com \
--cc=schmitzmic@gmail.com \
--cc=sebaddel@cisco.com \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sreekanth.reddy@broadcom.com \
--cc=sstabellini@kernel.org \
--cc=stefanha@redhat.com \
--cc=storagedev@microchip.com \
--cc=sudarsana.kalluru@qlogic.com \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=target-devel@vger.kernel.org \
--cc=tyreld@linux.ibm.com \
--cc=virtualization@lists.linux.dev \
--cc=vishal.bhakta@broadcom.com \
--cc=wei.liu@kernel.org \
--cc=willy@infradead.org \
--cc=xen-devel@lists.xenproject.org \
--cc=yokota@netlab.is.tsukuba.ac.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox