Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: axboe@kernel.dk (Jens Axboe)
Subject: [GIT PULL] nvme fix for 4.16-rc6
Date: Thu, 22 Mar 2018 15:32:45 -0600	[thread overview]
Message-ID: <71eda309-fa4a-7641-9292-fcc6c77d44af@kernel.dk> (raw)
In-Reply-To: <20180322210917.GK12909@localhost.localdomain>

On 3/22/18 3:09 PM, Keith Busch wrote:
> On Wed, Mar 21, 2018@03:44:32PM -0600, Jens Axboe wrote:
>>
>> [   30.241598] nvme nvme2: pci function 0000:0b:00.0                            
>> [   30.247205] nvme nvme3: pci function 0000:81:00.0                            
>> [   30.252684] nvme nvme4: pci function 0000:82:00.0                            
>> [   30.258144] nvme nvme5: pci function 0000:83:00.0                            
>> [   30.263606] nvme nvme6: pci function 0000:84:00.0                            
>> [   30.360555] nvme nvme3: could not set timestamp (8194)                       
>> [   30.481649] nvme nvme6: Shutdown timeout set to 8 seconds                    
>> [   38.790949] nvme nvme4: Device not ready; aborting initialisation            
>> [   38.797857] nvme nvme4: Removing after probe failure status: -19             
>> [   60.708816] nvme nvme3: I/O 363 QID 8 timeout, completion polled             
>> [   60.708820] nvme nvme6: I/O 781 QID 7 timeout, completion polled             
>> [   68.068772] nvme nvme2: I/O 769 QID 28 timeout, completion polled            
>> [   91.108626] nvme nvme6: I/O 781 QID 7 timeout, completion polled             
>> [   98.660581] nvme nvme2: I/O 769 QID 28 timeout, completion polled            
>> [  121.702691] nvme nvme6: I/O 100 QID 7 timeout, completion polled             
>> [  128.998648] nvme nvme3: I/O 387 QID 4 timeout, completion polled             
>> [  152.038523] nvme nvme6: I/O 781 QID 7 timeout, completion polled             
>>
>> This is just doing an fdisk -l after load. No interrupts triggering,
>> looking at /proc/interrupts for the queues that timeout.
> 
> So no interrupts triggered for the queues that timeout, but you are
> getting interrupts for other queues. Are there by chance many spurious
> interrupts for those other queues?

I picked one device, and I did:

for i in $(seq 0 47); do echo cpu $i; taskset -c $i dd if=/dev/nvme6n1 of=/dev/null bs=4k iflag=direct count=1; done
[...]
cpu 17
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 30.608 s, 0.1 kB/s

same slowness for CPU 19, 21, 23, 41, 43, 45, 47. Looking at dmesg:
[  701.725191] nvme nvme6: I/O 912 QID 7 timeout, completion polled
[  732.124646] nvme nvme6: I/O 896 QID 7 timeout, completion polled
[  762.652479] nvme nvme6: I/O 840 QID 7 timeout, completion polled
[  793.052259] nvme nvme6: I/O 754 QID 7 timeout, completion polled
[  823.644082] nvme nvme6: I/O 339 QID 7 timeout, completion polled
[  853.979878] nvme nvme6: I/O 86 QID 7 timeout, completion polled
[  884.571686] nvme nvme6: I/O 997 QID 7 timeout, completion polled
[  914.907506] nvme nvme6: I/O 694 QID 7 timeout, completion polled

we see them all timing out after 30s, and all being on QID 7, which
is hctx6. There are 8 hw queues (and interrupts), only some of
them seem to be actually triggering looking at /proc/interrupts.

There seems to be some mismatch. nvme6q7 is 244:

# cat /proc/irq/244/smp_affinity_list 
49,51,53,55,57,59,61,63

and 243 is nvme6q6:

# cat /proc/irq/243/smp_affinity_list 
17,19,21,23,41,43,45,47

244 has never triggered, if I do:

# taskset -c 17 dd if=/dev/nvme6n1 of=/dev/null bs=4k iflag=direct count=1

then look at interrupts, none of the nvme6 associated interrupts have
triggered.

-- 
Jens Axboe

  reply	other threads:[~2018-03-22 21:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16 16:01 [GIT PULL] nvme fix for 4.16-rc6 Keith Busch
2018-03-16 16:14 ` Jens Axboe
2018-03-16 16:16   ` Christoph Hellwig
2018-03-16 16:26     ` Jens Axboe
2018-03-16 16:53       ` Keith Busch
2018-03-21 20:59       ` Keith Busch
2018-03-21 21:02         ` Jens Axboe
2018-03-21 21:44           ` Jens Axboe
2018-03-21 22:08             ` Keith Busch
2018-03-22 21:09             ` Keith Busch
2018-03-22 21:32               ` Jens Axboe [this message]
2018-03-22 22:02                 ` Keith Busch
2018-03-22 22:09                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71eda309-fa4a-7641-9292-fcc6c77d44af@kernel.dk \
    --to=axboe@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox