From: Kashyap Desai <kashyap.desai@broadcom.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Mike Snitzer <snitzer@redhat.com>,
linux-scsi@vger.kernel.org, Arun Easi <arun.easi@cavium.com>,
Omar Sandoval <osandov@fb.com>,
"Martin K . Petersen" <martin.petersen@oracle.com>,
James Bottomley <james.bottomley@hansenpartnership.com>,
Christoph Hellwig <hch@lst.de>,
Don Brace <don.brace@microsemi.com>,
Peter Rivera <peter.rivera@broadcom.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Laurence Oberman <loberman@redhat.com>
Subject: RE: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
Date: Tue, 6 Feb 2018 11:33:50 +0530 [thread overview]
Message-ID: <4c8162129f2a09dca2d1e7eefd61e272@mail.gmail.com> (raw)
In-Reply-To: <20180205101734.GA32558@ming.t460p>
> > We still have more than one reply queue ending up completion one CPU.
>
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY) has to be used, that means
> smp_affinity_enable has to be set as 1, but seems it is the default
setting.
>
> Please see kernel/irq/affinity.c, especially irq_calc_affinity_vectors()
which
> figures out an optimal number of vectors, and the computation is based
on
> cpumask_weight(cpu_possible_mask) now. If all offline CPUs are mapped to
> some of reply queues, these queues won't be active(no request submitted
to
> these queues). The mechanism of PCI_IRQ_AFFINITY basically makes sure
that
> more than one irq vector won't be handled by one same CPU, and the irq
> vector spread is done in irq_create_affinity_masks().
>
> > Try to reduce MSI-x vector of megaraid_sas or mpt3sas driver via
> > module parameter to simulate the issue. We need more number of Online
> > CPU than reply-queue.
>
> IMO, you don't need to simulate the issue, pci_alloc_irq_vectors(
> PCI_IRQ_AFFINITY) will handle that for you. You can dump the returned
irq
> vector number, num_possible_cpus()/num_online_cpus() and each irq
> vector's affinity assignment.
>
> > We may see completion redirected to original CPU because of
> > "QUEUE_FLAG_SAME_FORCE", but ISR of low level driver can keep one CPU
> > busy in local ISR routine.
>
> Could you dump each irq vector's affinity assignment of your megaraid in
your
> test?
To quickly reproduce, I restricted to single MSI-x vector on megaraid_sas
driver. System has total 16 online CPUs.
Output of affinity hints.
kernel version:
Linux rhel7.3 4.15.0-rc1+ #2 SMP Mon Feb 5 12:13:34 EST 2018 x86_64 x86_64
x86_64 GNU/Linux
PCI name is 83:00.0, dump its irq affinity:
irq 105, cpu list 0-3,8-11
Affinity mask is created properly, but only CPU-0 is overloaded with
interrupt processing.
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 8 9 10 11
node 0 size: 47861 MB
node 0 free: 46516 MB
node 1 cpus: 4 5 6 7 12 13 14 15
node 1 size: 64491 MB
node 1 free: 62805 MB
node distances:
node 0 1
0: 10 21
1: 21 10
Output of system activities (sar). (gnice is 100% and it is consumed in
megaraid_sas ISR routine.)
12:44:40 PM CPU %usr %nice %sys %iowait %steal
%irq %soft %guest %gnice %idle
12:44:41 PM all 6.03 0.00 29.98 0.00
0.00 0.00 0.00 0.00 0.00 63.99
12:44:41 PM 0 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 100.00 0
In my test, I used rq_affinity is set to 2. (QUEUE_FLAG_SAME_FORCE). I
also used " host_tagset" V2 patch set for megaraid_sas.
Using RFC requested in -
"https://marc.info/?l=linux-scsi&m=151601833418346&w=2 " lockup is avoided
(you can noticed that gnice is shifted to softirq. Even though it is 100%
consumed, There is always exit for existing completion loop due to
irqpoll_weight @irq_poll_init().
Average: CPU %usr %nice %sys %iowait %steal
%irq %soft %guest %gnice %idle
Average: all 4.25 0.00 21.61 0.00
0.00 0.00 6.61 0.00 0.00 67.54
Average: 0 0.00 0.00 0.00 0.00
0.00 0.00 100.00 0.00 0.00 0.00
Hope this clarifies. We need different fix to avoid lockups. Can we
consider using irq poll interface if #CPU is more than Reply queue/MSI-x.
?
>
> And the following script can do it easily, and the pci path (the 1st
column of
> lspci output) need to be passed, such as: 00:1c.4,
>
> #!/bin/sh
> if [ $# -ge 1 ]; then
> PCID=$1
> else
> PCID=`lspci | grep "Non-Volatile memory" | cut -c1-7` fi PCIP=`find
> /sys/devices -name *$PCID | grep pci` IRQS=`ls $PCIP/msi_irqs`
>
> echo "kernel version: "
> uname -a
>
> echo "PCI name is $PCID, dump its irq affinity:"
> for IRQ in $IRQS; do
> CPUS=`cat /proc/irq/$IRQ/smp_affinity_list`
> echo "\tirq $IRQ, cpu list $CPUS"
> done
>
>
> Thanks,
> Ming
next prev parent reply other threads:[~2018-02-06 6:03 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-03 4:21 [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
2018-02-03 4:21 ` [PATCH 1/5] blk-mq: tags: define several fields of tags as pointer Ming Lei
2018-02-05 6:57 ` Hannes Reinecke
2018-02-08 17:34 ` Bart Van Assche
2018-02-08 17:34 ` Bart Van Assche
2018-02-03 4:21 ` [PATCH 2/5] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
2018-02-05 6:54 ` Hannes Reinecke
2018-02-05 10:35 ` Ming Lei
2018-02-03 4:21 ` [PATCH 3/5] block: null_blk: introduce module parameter of 'g_global_tags' Ming Lei
2018-02-05 6:54 ` Hannes Reinecke
2018-02-03 4:21 ` [PATCH 4/5] scsi: introduce force_blk_mq Ming Lei
2018-02-05 6:57 ` Hannes Reinecke
2018-02-03 4:21 ` [PATCH 5/5] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity Ming Lei
2018-02-05 6:57 ` Hannes Reinecke
2018-02-05 6:58 ` [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Hannes Reinecke
2018-02-05 7:05 ` Kashyap Desai
2018-02-05 7:05 ` Kashyap Desai
2018-02-05 10:17 ` Ming Lei
2018-02-06 6:03 ` Kashyap Desai [this message]
2018-02-06 8:04 ` Ming Lei
2018-02-06 11:29 ` Kashyap Desai
2018-02-06 12:31 ` Ming Lei
2018-02-06 14:27 ` Kashyap Desai
2018-02-06 15:46 ` Ming Lei
2018-02-07 6:50 ` Hannes Reinecke
2018-02-07 12:23 ` Ming Lei
2018-02-07 14:14 ` Kashyap Desai
2018-02-08 1:23 ` Ming Lei
2018-02-08 7:00 ` Hannes Reinecke
2018-02-08 16:53 ` Ming Lei
2018-02-09 4:58 ` Kashyap Desai
2018-02-09 5:31 ` Ming Lei
2018-02-09 8:42 ` Kashyap Desai
2018-02-10 1:01 ` Ming Lei
2018-02-11 5:31 ` Ming Lei
2018-02-12 18:35 ` Kashyap Desai
2018-02-13 0:40 ` Ming Lei
2018-02-14 6:28 ` Kashyap Desai
2018-02-05 10:23 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c8162129f2a09dca2d1e7eefd61e272@mail.gmail.com \
--to=kashyap.desai@broadcom.com \
--cc=arun.easi@cavium.com \
--cc=axboe@kernel.dk \
--cc=don.brace@microsemi.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=james.bottomley@hansenpartnership.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=loberman@redhat.com \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@redhat.com \
--cc=osandov@fb.com \
--cc=pbonzini@redhat.com \
--cc=peter.rivera@broadcom.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.