All of lore.kernel.org
 help / color / mirror / Atom feed
From: jthumshirn@suse.de (Johannes Thumshirn)
Subject: I/O Errors due to keepalive timeouts with NVMf RDMA
Date: Fri, 7 Jul 2017 11:48:38 +0200	[thread overview]
Message-ID: <20170707094838.GD16648@linux-x5ow.site> (raw)

Hi,

In my recent tests I'm facing I/O errors with nvme_rdma because of the
keepalive timer expiring.

This is easily reproducible on hfi1, but also on mlx4 with the follwing fio
job:

[global]
direct=1
rw=randrw
ioengine=libaio 
size=16g 
norandommap 
time_based
runtime=10m 
group_reporting 
bs=4k 
iodepth=128
numjobs=88

[NVMf-test]
filename=/dev/nvme0n1 


This happens with libaio as well as psync as I/O engine (haven't checked
others yet).

here's the dmesg excerpt:
nvme nvme0: failed nvme_keep_alive_end_io error=-5
nvme nvme0: Reconnecting in 10 seconds...
blk_update_request: 31 callbacks suppressed
blk_update_request: I/O error, dev nvme0n1, sector 73391680
blk_update_request: I/O error, dev nvme0n1, sector 52827640
blk_update_request: I/O error, dev nvme0n1, sector 125050288
blk_update_request: I/O error, dev nvme0n1, sector 32099608
blk_update_request: I/O error, dev nvme0n1, sector 65805440
blk_update_request: I/O error, dev nvme0n1, sector 120114368
blk_update_request: I/O error, dev nvme0n1, sector 48812368
nvme0n1: detected capacity change from 68719476736 to -67549595420313600
blk_update_request: I/O error, dev nvme0n1, sector 0
buffer_io_error: 23 callbacks suppressed
Buffer I/O error on dev nvme0n1, logical block 0, async page read
blk_update_request: I/O error, dev nvme0n1, sector 0
Buffer I/O error on dev nvme0n1, logical block 0, async page read
blk_update_request: I/O error, dev nvme0n1, sector 0
Buffer I/O error on dev nvme0n1, logical block 0, async page read
ldm_validate_partition_table(): Disk read failed.
Buffer I/O error on dev nvme0n1, logical block 0, async page read
Buffer I/O error on dev nvme0n1, logical block 0, async page read
Buffer I/O error on dev nvme0n1, logical block 0, async page read
Buffer I/O error on dev nvme0n1, logical block 0, async page read
Buffer I/O error on dev nvme0n1, logical block 0, async page read
Buffer I/O error on dev nvme0n1, logical block 3, async page read
Buffer I/O error on dev nvme0n1, logical block 0, async page read
nvme0n1: unable to read partition table

I'm seeing this on stock v4.12 as well as on our backports.

My current hypothesis is that I saturate the RDMA link so the keepalives have
no chance to get to the target. Is there a way to priorize the admin queue
somehow?

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

             reply	other threads:[~2017-07-07  9:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-07  9:48 Johannes Thumshirn [this message]
2017-07-08 18:14 ` I/O Errors due to keepalive timeouts with NVMf RDMA Max Gurtovoy
2017-07-10  7:59   ` Johannes Thumshirn
2017-07-10  7:06 ` Sagi Grimberg
2017-07-10  7:17   ` Hannes Reinecke
2017-07-10  8:46     ` Max Gurtovoy
2017-07-10  9:10       ` Johannes Thumshirn
2017-07-10 10:13         ` Sagi Grimberg
2017-07-10 10:20           ` Johannes Thumshirn
2017-07-10 11:04             ` Sagi Grimberg
2017-07-10 11:33               ` Johannes Thumshirn
2017-07-10 11:41                 ` Sagi Grimberg
2017-07-10 11:50                   ` Johannes Thumshirn
2017-07-10 12:04                     ` Sagi Grimberg
2017-07-11  8:52                       ` Johannes Thumshirn
2017-07-11  9:19                         ` Sagi Grimberg
2017-07-11  9:21                           ` Johannes Thumshirn
2017-07-14 11:25                           ` Johannes Thumshirn
2017-08-15 22:46                             ` Guilherme G. Piccoli
2017-08-16  8:16                               ` Christoph Hellwig
2017-08-16 16:19                                 ` Guilherme G. Piccoli
2017-08-28 10:15                                   ` Guan Junxiong
2017-07-10  8:59     ` Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170707094838.GD16648@linux-x5ow.site \
    --to=jthumshirn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.