* abort question
@ 2015-06-11 10:46 Christoph Hellwig
2015-06-11 13:12 ` Matthew Wilcox
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2015-06-11 10:46 UTC (permalink / raw)
Don't we need to reserve a request and SQ entry to that we can
always send an abort? Otherwise a lockded up controller will never
send a abort and always just reset the timer, and never escalate
to a controller reset.
^ permalink raw reply [flat|nested] 4+ messages in thread
* abort question
2015-06-11 10:46 abort question Christoph Hellwig
@ 2015-06-11 13:12 ` Matthew Wilcox
2015-06-11 13:24 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2015-06-11 13:12 UTC (permalink / raw)
On Thu, Jun 11, 2015@03:46:03AM -0700, Christoph Hellwig wrote:
> Don't we need to reserve a request and SQ entry to that we can
> always send an abort? Otherwise a lockded up controller will never
> send a abort and always just reset the timer, and never escalate
> to a controller reset.
Aborts are sent on the admin queue, not the IO queue. There should
always be plenty of space on the admin queue.
^ permalink raw reply [flat|nested] 4+ messages in thread
* abort question
2015-06-11 13:12 ` Matthew Wilcox
@ 2015-06-11 13:24 ` Christoph Hellwig
2015-06-11 15:44 ` Keith Busch
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2015-06-11 13:24 UTC (permalink / raw)
On Thu, Jun 11, 2015@09:12:54AM -0400, Matthew Wilcox wrote:
> On Thu, Jun 11, 2015@03:46:03AM -0700, Christoph Hellwig wrote:
> > Don't we need to reserve a request and SQ entry to that we can
> > always send an abort? Otherwise a lockded up controller will never
> > send a abort and always just reset the timer, and never escalate
> > to a controller reset.
>
> Aborts are sent on the admin queue, not the IO queue. There should
> always be plenty of space on the admin queue.
The default admin queue has 256 entries, of which we reserve one for the
AEN command. I've been hacking up a NVMe command fuzzer that sends
semi-random [1] commands to a device, and I manage to reproduce a case
where it seems like aborts don't make progress. I haven't fully sorted
it out yet, but it seems like aborts don't happen.
[1] I had to black list commands like I/O CQ/SQ deletion as that crashes
the driver pretty reliably.
^ permalink raw reply [flat|nested] 4+ messages in thread
* abort question
2015-06-11 13:24 ` Christoph Hellwig
@ 2015-06-11 15:44 ` Keith Busch
0 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2015-06-11 15:44 UTC (permalink / raw)
On Thu, 11 Jun 2015, Christoph Hellwig wrote:
> On Thu, Jun 11, 2015@09:12:54AM -0400, Matthew Wilcox wrote:
>> On Thu, Jun 11, 2015@03:46:03AM -0700, Christoph Hellwig wrote:
>>> Don't we need to reserve a request and SQ entry to that we can
>>> always send an abort? Otherwise a lockded up controller will never
>>> send a abort and always just reset the timer, and never escalate
>>> to a controller reset.
>>
>> Aborts are sent on the admin queue, not the IO queue. There should
>> always be plenty of space on the admin queue.
>
> The default admin queue has 256 entries, of which we reserve one for the
> AEN command. I've been hacking up a NVMe command fuzzer that sends
> semi-random [1] commands to a device, and I manage to reproduce a case
> where it seems like aborts don't make progress. I haven't fully sorted
> it out yet, but it seems like aborts don't happen.
The AEN is special. We want to submit one, but we can't leave the request
"active" without deadlockling blk-mq's hot-cpu notification, so it's the
only reserved command in the admin tagset for this special treatment. We
can't reserve another without risking tag collisions.
If an admin command times out, we go straight to the heavy hammer and
reset the controller, so we don't need an available tag to issue abort.
If you've managed to exhaust all 254 general purpose admin tags and an IO
request times out, we've got a problem, but should fix itself eventually
when one of the admin commands completes or times out.
> [1] I had to black list commands like I/O CQ/SQ deletion as that crashes
> the driver pretty reliably.
There are ways to crash the system with the passthru. The IOCTL is a
prividged command: with great power comes great responsibility. :)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-06-11 15:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-11 10:46 abort question Christoph Hellwig
2015-06-11 13:12 ` Matthew Wilcox
2015-06-11 13:24 ` Christoph Hellwig
2015-06-11 15:44 ` Keith Busch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.