linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvme/rdma initiator stuck on reboot
Date: Thu, 18 Aug 2016 08:59:15 -0500	[thread overview]
Message-ID: <012701d1f958$b4953290$1dbf97b0$@opengridcomputing.com> (raw)
In-Reply-To: <e2e04664-c374-3745-ecf3-f49ca7a3addf@grimberg.me>

> 
> >> Can this be related due to the fact that we use a signle-threaded
> >> workqueue for delete/reset/reconnect? (delete cancel_sync the active
> >> reconnect work...)
> >>
> >> Does this untested patch help?
> >
> > That seems to do it!
> 
> Is this a formal tested-by?

Sure, but let me ask a question:  So the bug was that the delete controller
worker was blocked waiting for the reconnect worker to complete.  Yes?  And the
reconnect worker was never completing?  Why is that?  Here are a few tidbits
about iWARP connections:  address resolution == neighbor discovery.  So if the
neighbor is unreachable, it will take a few seconds for the OS to give up and
fail the resolution.  If the neigh entry is valid and the peer becomes
unreachable during connection setup, it might take 60 seconds or so for a
connect operation to give up and fail.  So this is probably slowing the
reconnect thread down.   But shouldn't the reconnect thread notice that a delete
is trying to happen and bail out?  

  reply	other threads:[~2016-08-18 13:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-16 19:40 nvme/rdma initiator stuck on reboot Steve Wise
2016-08-17 10:23 ` Sagi Grimberg
2016-08-17 14:33   ` Steve Wise
2016-08-17 14:46     ` Sagi Grimberg
2016-08-17 15:13       ` Steve Wise
2016-08-18  7:01         ` Sagi Grimberg
2016-08-18 13:59           ` Steve Wise [this message]
2016-08-18 14:47             ` Steve Wise
2016-08-18 15:21             ` 'Christoph Hellwig'
2016-08-18 17:59               ` Steve Wise
2016-08-18 18:50                 ` Steve Wise
2016-08-18 19:11                   ` Steve Wise
2016-08-19  8:58               ` Sagi Grimberg
2016-08-19 14:22                 ` Steve Wise
     [not found]                 ` <008001d1fa25$0c960fb0$25c22f10$@opengridcomputing.com>
2016-08-19 14:24                   ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='012701d1f958$b4953290$1dbf97b0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).