public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Chris Leech <cleech@redhat.com>
To: linux-nvme@lists.infradead.org, sagi@grimberg.me, hch@lst.de
Cc: lengchao@huawei.com, dwagner@suse.de, hare@suse.de,
	mlombard@redhat.com, jmeneghi@redhat.com
Subject: nvme-multipath: round-robin infinite looping
Date: Mon, 21 Mar 2022 15:43:01 -0700	[thread overview]
Message-ID: <20220321224304.955072-1-cleech@redhat.com> (raw)

I've been looking at a lockup reported by a partner doing nvme-tcp
testing, and I believe there's an issue between nvme_ns_remove and
nvme_round_robin_path that can result in infinite looping.

It seems like the same concern that was raised by Chao Leng in this
thread: https://lore.kernel.org/all/bd37abd5-759d-efe2-fdcd-8b004a41c75a@huawei.com/

The ordering of nvme_ns_remove is constructed to prevent races that
would re-assign a namespace being removed to the current_path cache.
That leaves a period where a namespace in current_path is not in the
path sibling list.  But nvme_round_robin_path makes the assumption that
the "old" ns taken from current_path is always on the list, and the odd
list traversal with nvme_next_ns isn't safe with an RCU list that can
change while it's being read.

I'm not convinced that there is a way to meet all of these assumptions
only looking at the list and current_path. I think it can be done if the
NVME_NS_READY flag is taken into account, but possibly needing an
additional synchronization point.

I'm following this email with details from a kdump analysis that shows
this happening, with a current_path entry partially removed from the
list (pointing into the list, but not on it, as list_del_rcu does) and a
CPU stuck in the inner loop of nvme_round_robin_path.

And then a couple of suggestions, for trying to fix this in
nvme_ns_remove as well as an easy backstop for nvme_round_robin_path
that would prevent endless looping without fixing the race.  

Just looking for discussion on these right now, we're working on getting
them tested.

- Chris



             reply	other threads:[~2022-03-21 22:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21 22:43 Chris Leech [this message]
2022-03-21 22:43 ` kdump details (Re: nvme-multipath: round-robin infinite looping) Chris Leech
2022-03-22 11:16   ` Christoph Hellwig
2022-03-22 15:36     ` Chris Leech
2022-03-21 22:43 ` [RFC PATCH] nvme-multipath: break endless loop in nvme_round_robin_path Chris Leech
2022-03-22 11:17   ` Christoph Hellwig
2022-03-22 12:07     ` Daniel Wagner
2022-03-22 15:42     ` Chris Leech
2022-03-21 22:43 ` [RFC PATCH] nvme: fix RCU hole that allowed for endless looping in multipath round robin Chris Leech
2022-03-23 14:54   ` Sagi Grimberg
2022-03-23 15:34     ` Christoph Hellwig
2022-03-23 19:07       ` John Meneghini
2022-04-05 13:14         ` John Meneghini
2022-03-25  6:36   ` Christoph Hellwig
2022-03-25 12:42   ` Sagi Grimberg
2022-04-05 17:25     ` Chris Leech

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220321224304.955072-1-cleech@redhat.com \
    --to=cleech@redhat.com \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jmeneghi@redhat.com \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mlombard@redhat.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox