* abort question @ 2015-06-11 10:46 Christoph Hellwig 2015-06-11 13:12 ` Matthew Wilcox 0 siblings, 1 reply; 4+ messages in thread From: Christoph Hellwig @ 2015-06-11 10:46 UTC (permalink / raw) Don't we need to reserve a request and SQ entry to that we can always send an abort? Otherwise a lockded up controller will never send a abort and always just reset the timer, and never escalate to a controller reset. ^ permalink raw reply [flat|nested] 4+ messages in thread
* abort question 2015-06-11 10:46 abort question Christoph Hellwig @ 2015-06-11 13:12 ` Matthew Wilcox 2015-06-11 13:24 ` Christoph Hellwig 0 siblings, 1 reply; 4+ messages in thread From: Matthew Wilcox @ 2015-06-11 13:12 UTC (permalink / raw) On Thu, Jun 11, 2015@03:46:03AM -0700, Christoph Hellwig wrote: > Don't we need to reserve a request and SQ entry to that we can > always send an abort? Otherwise a lockded up controller will never > send a abort and always just reset the timer, and never escalate > to a controller reset. Aborts are sent on the admin queue, not the IO queue. There should always be plenty of space on the admin queue. ^ permalink raw reply [flat|nested] 4+ messages in thread
* abort question 2015-06-11 13:12 ` Matthew Wilcox @ 2015-06-11 13:24 ` Christoph Hellwig 2015-06-11 15:44 ` Keith Busch 0 siblings, 1 reply; 4+ messages in thread From: Christoph Hellwig @ 2015-06-11 13:24 UTC (permalink / raw) On Thu, Jun 11, 2015@09:12:54AM -0400, Matthew Wilcox wrote: > On Thu, Jun 11, 2015@03:46:03AM -0700, Christoph Hellwig wrote: > > Don't we need to reserve a request and SQ entry to that we can > > always send an abort? Otherwise a lockded up controller will never > > send a abort and always just reset the timer, and never escalate > > to a controller reset. > > Aborts are sent on the admin queue, not the IO queue. There should > always be plenty of space on the admin queue. The default admin queue has 256 entries, of which we reserve one for the AEN command. I've been hacking up a NVMe command fuzzer that sends semi-random [1] commands to a device, and I manage to reproduce a case where it seems like aborts don't make progress. I haven't fully sorted it out yet, but it seems like aborts don't happen. [1] I had to black list commands like I/O CQ/SQ deletion as that crashes the driver pretty reliably. ^ permalink raw reply [flat|nested] 4+ messages in thread
* abort question 2015-06-11 13:24 ` Christoph Hellwig @ 2015-06-11 15:44 ` Keith Busch 0 siblings, 0 replies; 4+ messages in thread From: Keith Busch @ 2015-06-11 15:44 UTC (permalink / raw) On Thu, 11 Jun 2015, Christoph Hellwig wrote: > On Thu, Jun 11, 2015@09:12:54AM -0400, Matthew Wilcox wrote: >> On Thu, Jun 11, 2015@03:46:03AM -0700, Christoph Hellwig wrote: >>> Don't we need to reserve a request and SQ entry to that we can >>> always send an abort? Otherwise a lockded up controller will never >>> send a abort and always just reset the timer, and never escalate >>> to a controller reset. >> >> Aborts are sent on the admin queue, not the IO queue. There should >> always be plenty of space on the admin queue. > > The default admin queue has 256 entries, of which we reserve one for the > AEN command. I've been hacking up a NVMe command fuzzer that sends > semi-random [1] commands to a device, and I manage to reproduce a case > where it seems like aborts don't make progress. I haven't fully sorted > it out yet, but it seems like aborts don't happen. The AEN is special. We want to submit one, but we can't leave the request "active" without deadlockling blk-mq's hot-cpu notification, so it's the only reserved command in the admin tagset for this special treatment. We can't reserve another without risking tag collisions. If an admin command times out, we go straight to the heavy hammer and reset the controller, so we don't need an available tag to issue abort. If you've managed to exhaust all 254 general purpose admin tags and an IO request times out, we've got a problem, but should fix itself eventually when one of the admin commands completes or times out. > [1] I had to black list commands like I/O CQ/SQ deletion as that crashes > the driver pretty reliably. There are ways to crash the system with the passthru. The IOCTL is a prividged command: with great power comes great responsibility. :) ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-06-11 15:44 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-11 10:46 abort question Christoph Hellwig 2015-06-11 13:12 ` Matthew Wilcox 2015-06-11 13:24 ` Christoph Hellwig 2015-06-11 15:44 ` Keith Busch
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.