* Deprecating NVME_IOCTL_SUBSYS_RESET
@ 2018-05-10 15:06 Alex G.
2018-05-10 16:13 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Alex G. @ 2018-05-10 15:06 UTC (permalink / raw)
Hi,
I've been getting reports that nvme subsystem resets end up taking down
the entire machine. That's very easy to do with PCIe drives, since a
NSSR also brings down the PCIe link. Any in-flight posted requests can
generate unsupported request errors, and non-posted requests can
generate completion timeouts, or Fatal MCEs on some PCIe root ports.
In a perfect world, PCIe errors would be handled by their respective
layers, and we wouldn't need to care. Unfortunately, PCIe error handling
is still an ill conceived idea and afterthought. What concerns me is the
potential of NSSR to propagate outside of nvme. I suspect other fabrics
have much better error handling, but I wouldn't be surprised to see
similar failures.
There are ways to harden the IOCTL by quiescing all IO before issuing
the actual reset. Such safeguards are implemented everywhere else in the
driver. Is NVME_IOCTL_SUBSYS_RESET used in the real-world? I think it's
too big of an attack surface, and we're better off with -EOPNOTSUPP.
I don't see any benefit in keeping it around. Thpughts?
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
* Deprecating NVME_IOCTL_SUBSYS_RESET
2018-05-10 15:06 Deprecating NVME_IOCTL_SUBSYS_RESET Alex G.
@ 2018-05-10 16:13 ` Keith Busch
2018-05-10 16:18 ` Alex_Gagniuc
0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2018-05-10 16:13 UTC (permalink / raw)
On Thu, May 10, 2018@10:06:42AM -0500, Alex G. wrote:
> There are ways to harden the IOCTL by quiescing all IO before issuing the
> actual reset. Such safeguards are implemented everywhere else in the driver.
> Is NVME_IOCTL_SUBSYS_RESET used in the real-world? I think it's too big of
> an attack surface, and we're better off with -EOPNOTSUPP.
Quiescing here is not really a solution since an NVMe Subsystem Reset
resets all the controllers in that subsystem, some of which may be
connected to different hosts: we can't quiesce those other hosts from
the driver level on the host issuing the reset.
I'm not sure we want to get rid of this feature either since this is
sometimes the only option for completing a firmware upgrade.
That said, I have heard enough cases where this reset method is not
successful, so there's some work to do here. Most failures seem to be
around the handling of the rapid link down-up sequence, and success
seems very dependent on the platform and the device used.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Deprecating NVME_IOCTL_SUBSYS_RESET
2018-05-10 16:13 ` Keith Busch
@ 2018-05-10 16:18 ` Alex_Gagniuc
2018-05-10 16:52 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Alex_Gagniuc @ 2018-05-10 16:18 UTC (permalink / raw)
On 5/10/2018 11:11 AM, Keith Busch wrote:
> On Thu, May 10, 2018@10:06:42AM -0500, Alex G. wrote:
>> There are ways to harden the IOCTL by quiescing all IO before issuing the
>> actual reset. Such safeguards are implemented everywhere else in the driver.
>> Is NVME_IOCTL_SUBSYS_RESET used in the real-world? I think it's too big of
>> an attack surface, and we're better off with -EOPNOTSUPP.
>
> Quiescing here is not really a solution since an NVMe Subsystem Reset
> resets all the controllers in that subsystem, some of which may be
> connected to different hosts: we can't quiesce those other hosts from
> the driver level on the host issuing the reset.
>
> I'm not sure we want to get rid of this feature either since this is
> sometimes the only option for completing a firmware upgrade.
According to (my reading of) the NVMe spec, an NSSR and PCIe link reset
should be equivalent. Are we hitting non-compliant drives where that is
not the case, or can those FW upgrades just use the PCIe link reset?
> That said, I have heard enough cases where this reset method is not
> successful, so there's some work to do here. Most failures seem to be
> around the handling of the rapid link down-up sequence, and success
> seems very dependent on the platform and the device used.
There is also the controller reset, which seems less intrusive, though I
haven't looked into it. Can that be used instead of the subsystem reset
for those cases where a subsystem reset is allegedly needed?
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
* Deprecating NVME_IOCTL_SUBSYS_RESET
2018-05-10 16:18 ` Alex_Gagniuc
@ 2018-05-10 16:52 ` Keith Busch
0 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2018-05-10 16:52 UTC (permalink / raw)
On Thu, May 10, 2018@04:18:06PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>
> According to (my reading of) the NVMe spec, an NSSR and PCIe link reset
> should be equivalent.
That can't possibly be the case. A PCIe link reset affects a single
device on that PCIe link, but a NSSR resets ALL controllers in the
subsystem, some of which may be on different links.
The NSSR does indeed lead to a PCIe link reset, but there's a lot more
to it that that.
> > That said, I have heard enough cases where this reset method is not
> > successful, so there's some work to do here. Most failures seem to be
> > around the handling of the rapid link down-up sequence, and success
> > seems very dependent on the platform and the device used.
>
> There is also the controller reset, which seems less intrusive, though I
> haven't looked into it. Can that be used instead of the subsystem reset
> for those cases where a subsystem reset is allegedly needed?
According to the spec (section 5.11 Firmware Commit), a device may to
require a subsystem reset in order to complete a firmware upgrade. No
other type of reset will do the trick.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-05-10 16:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-10 15:06 Deprecating NVME_IOCTL_SUBSYS_RESET Alex G.
2018-05-10 16:13 ` Keith Busch
2018-05-10 16:18 ` Alex_Gagniuc
2018-05-10 16:52 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).