From: guenther@tum.de (Stephan Günther)
Subject: nvme: controller resets
Date: Wed, 11 Nov 2015 23:09:57 +0100 [thread overview]
Message-ID: <e3daa6a6ab157228a1f5606a776b01ea@localhost> (raw)
In-Reply-To: <CAKajsGNGvWw1+B7X8HxdvP0MyvAawb_PStu_adOC8wogOF-Fbg@mail.gmail.com>
On 2015/November/12 03:26, Vedant Lath wrote:
> On Wed, Nov 11, 2015@3:58 AM, Vedant Lath <vedant@lath.in> wrote:
> > On Tue, Nov 10, 2015@9:21 PM, Keith Busch <keith.busch@intel.com> wrote:
> >> Not sure really. Normally I file a f/w bug for this kind of thing. :)
> >>
> >> But I'll throw out some potential ideas. Try trottling driver capabilities
> >> and see if anything improves: reduce queue count to 1 and depth to 2
> >> (requires code change).
> >>
> >> If you're able to recreate with reduced settings, then your controller's
> >> failure can be caused by a single command, and it's hopefully just a
> >> matter of finding that command.
> >>
> >> If the problem is not reproducible with reduced settings, then perhaps
> >> it's related to concurrent queue usage or high depth, and you can play
> >> with either to see if you discover anything interesting.
> >>
> >> Of course, I could be way off...
> >
> > Is there any way to monitor all the commands going through the wire?
> > Wouldn't that help? That would at least tell us which NVMe command
> > results in a reset, and the flow of the commands leading up to the
> > reset can give us more context into the error.
>
> Reducing I/O queue depth to 2 fixes the crash. Increasing I/O queue
> depth to 3 again results in a crash.
The device fails to initialize with those settings for me. However,
think I found the problem:
@@ -2273,7 +2276,7 @@ static void nvme_alloc_ns(struct nvme_dev *dev, unsigned nsid)
if (dev->stripe_size)
blk_queue_chunk_sectors(ns->queue, dev->stripe_size >> 9);
if (dev->vwc & NVME_CTRL_VWC_PRESENT)
- blk_queue_flush(ns->queue, REQ_FLUSH | REQ_FUA);
+ blk_queue_flush(ns->queue, REQ_FUA);
blk_queue_virt_boundary(ns->queue, dev->page_size - 1);
disk->major = nvme_major
With these changes I was able to create a btrfs, copy several GiB of
data, umount, remount, scrub, and balance.
The probem is *not* the flush itself (issueing the ioctl does not
provoke the error. It is either a combination of flush with other
commands or some flags issued together with a flush.
next prev parent reply other threads:[~2015-11-11 22:09 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-10 14:30 nvme: controller resets Stephan Günther
2015-11-10 15:51 ` Keith Busch
2015-11-10 20:45 ` Stephan Günther
2015-11-10 21:16 ` Vedant Lath
2015-11-10 21:34 ` Stephan Günther
2015-11-10 21:43 ` Vedant Lath
2015-11-10 22:02 ` Stephan Günther
2015-11-10 22:28 ` Vedant Lath
2015-11-11 21:56 ` Vedant Lath
2015-11-11 22:09 ` Stephan Günther [this message]
2015-11-12 14:02 ` Vedant Lath
2015-11-11 22:14 ` Keith Busch
2015-11-12 9:45 ` Vedant Lath
2015-11-12 11:26 ` Vedant Lath
2015-11-16 21:33 ` Stephan Günther
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e3daa6a6ab157228a1f5606a776b01ea@localhost \
--to=guenther@tum.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).