All of lore.kernel.org
 help / color / mirror / Atom feed
From: jthumshirn@suse.de (Johannes Thumshirn)
Subject: I/O Errors due to keepalive timeouts with NVMf RDMA
Date: Mon, 10 Jul 2017 13:33:53 +0200	[thread overview]
Message-ID: <20170710113353.GG5105@linux-x5ow.site> (raw)
In-Reply-To: <77c7d11c-bd67-8663-cc10-da3af8bfcd22@grimberg.me>

On Mon, Jul 10, 2017@02:04:37PM +0300, Sagi Grimberg wrote:
> 
> >OK, running a test now. I have a local test patch that cancels and
> >re-schedules the kato work on every mq_ops->complete() for testing as well
> >which I also like to check as a proof of my hypothesis and then I'll report
> >back.
> 
> That won't work as the target is relying to get a keep-alive every
> kato+grace-constant, otherwise it will teardown the controller

Damn, OK. I'll  re-think my approach

ANyways here are my results:
Target:
[254069.431101] nvmet: adding queue 1 to ctrl 1.
[254069.446254] nvmet: adding queue 2 to ctrl 1.
[...]
[254070.017617] nvmet: adding queue 44 to ctrl 1.
[254190.693126] nvmet: ctrl 1 update keep-alive timer for 130 secs
[254311.910372] nvmet: ctrl 1 update keep-alive timer for 130 secs
[254444.269014] nvmet: ctrl 1 keep-alive timer (130 seconds) expired!
[254444.283809] nvmet: ctrl 1 fatal error occurred!
[254444.298315] nvmet_rdma: freeing queue 0
[254444.308572] nvmet_rdma: freeing queue 1
[...]
[254444.767472] nvmet_rdma: freeing queue 44

Host:
[353698.784927] nvme nvme0: creating 44 I/O queues.
[353699.572467] nvme nvme0: new ctrl: NQN
"nqn.2014-08.org.nvmexpress:NVMf:uuid:c36f2c23-354d-416c-95de-f2b8ec353a82",
addr 1.1.1.2:4420
[353960.804750] nvme nvme0: SEND for CQE 0xffff88011c0cca58 failed with status
transport retry counter exceeded (12)
[353960.840895] nvme nvme0: Reconnecting in 10 seconds...
[353960.853582] blk_update_request: I/O error, dev nvme0n1, sector 14183280
[353960.869599] blk_update_request: I/O error, dev nvme0n1, sector 32251848
[353960.869601] blk_update_request: I/O error, dev nvme0n1, sector 3500872
[353960.869602] blk_update_request: I/O error, dev nvme0n1, sector 3266216
[353960.869603] blk_update_request: I/O error, dev nvme0n1, sector 12926288
[353960.869607] blk_update_request: I/O error, dev nvme0n1, sector 27661040
[353960.869609] blk_update_request: I/O error, dev nvme0n1, sector 32564280
[353960.869610] blk_update_request: I/O error, dev nvme0n1, sector 12912072
[353960.869611] blk_update_request: I/O error, dev nvme0n1, sector 16570728
[353960.869613] blk_update_request: I/O error, dev nvme0n1, sector 33096144
[353961.036738] nvme0n1: detected capacity change from 68719476736 to
-67526893324191744
[353961.055986] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.073360] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090572] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090575] ldm_validate_partition_table(): Disk read failed.
[353961.090578] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090582] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090585] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090589] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090593] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090598] Buffer I/O error on dev nvme0n1, logical block 3, async page
read
[353961.090602] Buffer I/O error on dev nvme0n1, logical block 0, async page
read
[353961.090607]  nvme0n1: unable to read partition table
[353973.021283] nvme nvme0: rdma_resolve_addr wait failed (-104).
[353973.048717] nvme nvme0: Failed reconnect attempt 1
[353973.060073] nvme nvme0: Reconnecting in 10 seconds...
[353983.101337] nvme nvme0: rdma_resolve_addr wait failed (-104).
[353983.128739] nvme nvme0: Failed reconnect attempt 2
[353983.140280] nvme nvme0: Reconnecting in 10 seconds...
[353993.181354] nvme nvme0: rdma_resolve_addr wait failed (-104).
[353993.208714] nvme nvme0: Failed reconnect attempt 3
[353993.208716] nvme nvme0: Reconnecting in 10 seconds...
[354003.229292] nvme nvme0: rdma_resolve_addr wait failed (-104).
[354003.256712] nvme nvme0: Failed reconnect attempt 4
[354003.268189] nvme nvme0: Reconnecting in 10 seconds...
[354013.309211] nvme nvme0: rdma_resolve_addr wait failed (-104).
[354013.336695] nvme nvme0: Failed reconnect attempt 5
[354013.348043] nvme nvme0: Reconnecting in 10 seconds...
[354023.389262] nvme nvme0: rdma_resolve_addr wait failed (-104).
[354023.416682] nvme nvme0: Failed reconnect attempt 6
[354023.428021] nvme nvme0: Reconnecting in 10 seconds...


-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

  reply	other threads:[~2017-07-10 11:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-07  9:48 I/O Errors due to keepalive timeouts with NVMf RDMA Johannes Thumshirn
2017-07-08 18:14 ` Max Gurtovoy
2017-07-10  7:59   ` Johannes Thumshirn
2017-07-10  7:06 ` Sagi Grimberg
2017-07-10  7:17   ` Hannes Reinecke
2017-07-10  8:46     ` Max Gurtovoy
2017-07-10  9:10       ` Johannes Thumshirn
2017-07-10 10:13         ` Sagi Grimberg
2017-07-10 10:20           ` Johannes Thumshirn
2017-07-10 11:04             ` Sagi Grimberg
2017-07-10 11:33               ` Johannes Thumshirn [this message]
2017-07-10 11:41                 ` Sagi Grimberg
2017-07-10 11:50                   ` Johannes Thumshirn
2017-07-10 12:04                     ` Sagi Grimberg
2017-07-11  8:52                       ` Johannes Thumshirn
2017-07-11  9:19                         ` Sagi Grimberg
2017-07-11  9:21                           ` Johannes Thumshirn
2017-07-14 11:25                           ` Johannes Thumshirn
2017-08-15 22:46                             ` Guilherme G. Piccoli
2017-08-16  8:16                               ` Christoph Hellwig
2017-08-16 16:19                                 ` Guilherme G. Piccoli
2017-08-28 10:15                                   ` Guan Junxiong
2017-07-10  8:59     ` Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170710113353.GG5105@linux-x5ow.site \
    --to=jthumshirn@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.