* [LSF/MM/BPF BOF] Userspace command abouts @ 2023-02-16 11:50 Hannes Reinecke 2023-02-16 16:40 ` Keith Busch 0 siblings, 1 reply; 22+ messages in thread From: Hannes Reinecke @ 2023-02-16 11:50 UTC (permalink / raw) To: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc Hi all, it has come up in other threads, so it might be worthwhile to have its own topic: Userspace command aborts As it stands we cannot abort I/O commands from userspace. This is hitting us when running in a virtual machine: The VM sets a timeout when submitting a command, but that information can't be transmitted to the VM host. The VM host then issues a different command (with another timeout), and again that timeout can't be transmitted to the attached devices. So when the VM detects a timeout, it will try to issue an abort, but that goes nowhere as the VM host has no way to abort commands from userspace. So in the end the VM has to wait for the command to complete, causing stalls in the VM if the host had to undergo error recovery or something. With io_uring or CDL we now have some mechanism which look as if they would allow us to implement command aborts. So this BoF will be around discussions on how aborts from userspace could be implemented, whether any of the above methods are suitable, or whether there are other ideas on how that could be done. Cheers, Hannes -- Still without a .sig on this computer ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-16 11:50 [LSF/MM/BPF BOF] Userspace command abouts Hannes Reinecke @ 2023-02-16 16:40 ` Keith Busch 2023-02-17 18:53 ` Chaitanya Kulkarni 2023-02-20 11:24 ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg 0 siblings, 2 replies; 22+ messages in thread From: Keith Busch @ 2023-02-16 16:40 UTC (permalink / raw) To: Hannes Reinecke Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote: > Hi all, > > it has come up in other threads, so it might be worthwhile to have its own > topic: > > Userspace command aborts > > As it stands we cannot abort I/O commands from userspace. > This is hitting us when running in a virtual machine: > The VM sets a timeout when submitting a command, but that > information can't be transmitted to the VM host. The VM host > then issues a different command (with another timeout), and > again that timeout can't be transmitted to the attached devices. > So when the VM detects a timeout, it will try to issue an abort, > but that goes nowhere as the VM host has no way to abort commands > from userspace. > So in the end the VM has to wait for the command to complete, causing > stalls in the VM if the host had to undergo error recovery or something. Aborts are racy. A lot of hardware implements these as a no-op, too. > With io_uring or CDL we now have some mechanism which look as if they > would allow us to implement command aborts. CDL on the other hand sounds more promising. > So this BoF will be around discussions on how aborts from userspace could be > implemented, whether any of the above methods are suitable, or whether there > are other ideas on how that could be done. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-16 16:40 ` Keith Busch @ 2023-02-17 18:53 ` Chaitanya Kulkarni 2023-02-18 9:50 ` [LSF/MM/BPF BOF] Userspace command aborts Hannes Reinecke 2023-02-20 11:24 ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg 1 sibling, 1 reply; 22+ messages in thread From: Chaitanya Kulkarni @ 2023-02-17 18:53 UTC (permalink / raw) To: Keith Busch, Hannes Reinecke Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/16/23 08:40, Keith Busch wrote: > On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote: >> Hi all, >> >> it has come up in other threads, so it might be worthwhile to have its own >> topic: >> >> Userspace command aborts >> >> As it stands we cannot abort I/O commands from userspace. >> This is hitting us when running in a virtual machine: >> The VM sets a timeout when submitting a command, but that >> information can't be transmitted to the VM host. The VM host >> then issues a different command (with another timeout), and >> again that timeout can't be transmitted to the attached devices. >> So when the VM detects a timeout, it will try to issue an abort, >> but that goes nowhere as the VM host has no way to abort commands >> from userspace. >> So in the end the VM has to wait for the command to complete, causing >> stalls in the VM if the host had to undergo error recovery or something. > > Aborts are racy. A lot of hardware implements these as a no-op, too. > I'd avoid implementing userspace aborts and fix things in spec first. -ck ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command aborts 2023-02-17 18:53 ` Chaitanya Kulkarni @ 2023-02-18 9:50 ` Hannes Reinecke 2023-02-21 18:15 ` Chaitanya Kulkarni 0 siblings, 1 reply; 22+ messages in thread From: Hannes Reinecke @ 2023-02-18 9:50 UTC (permalink / raw) To: Chaitanya Kulkarni, Keith Busch Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/17/23 19:53, Chaitanya Kulkarni wrote: > On 2/16/23 08:40, Keith Busch wrote: >> On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote: >>> Hi all, >>> >>> it has come up in other threads, so it might be worthwhile to have its own >>> topic: >>> >>> Userspace command aborts >>> >>> As it stands we cannot abort I/O commands from userspace. >>> This is hitting us when running in a virtual machine: >>> The VM sets a timeout when submitting a command, but that >>> information can't be transmitted to the VM host. The VM host >>> then issues a different command (with another timeout), and >>> again that timeout can't be transmitted to the attached devices. >>> So when the VM detects a timeout, it will try to issue an abort, >>> but that goes nowhere as the VM host has no way to abort commands >>> from userspace. >>> So in the end the VM has to wait for the command to complete, causing >>> stalls in the VM if the host had to undergo error recovery or something. >> >> Aborts are racy. A lot of hardware implements these as a no-op, too. >> > > I'd avoid implementing userspace aborts and fix things in spec first. > What's there to fix in the spec for aborts? You can't avoid the fact that aborts might be sent just at the time when the completion arrives ... Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command aborts 2023-02-18 9:50 ` [LSF/MM/BPF BOF] Userspace command aborts Hannes Reinecke @ 2023-02-21 18:15 ` Chaitanya Kulkarni 0 siblings, 0 replies; 22+ messages in thread From: Chaitanya Kulkarni @ 2023-02-21 18:15 UTC (permalink / raw) To: Hannes Reinecke, Keith Busch Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/18/2023 1:50 AM, Hannes Reinecke wrote: > On 2/17/23 19:53, Chaitanya Kulkarni wrote: >> On 2/16/23 08:40, Keith Busch wrote: >>> On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote: >>>> Hi all, >>>> >>>> it has come up in other threads, so it might be worthwhile to have >>>> its own >>>> topic: >>>> >>>> Userspace command aborts >>>> >>>> As it stands we cannot abort I/O commands from userspace. >>>> This is hitting us when running in a virtual machine: >>>> The VM sets a timeout when submitting a command, but that >>>> information can't be transmitted to the VM host. The VM host >>>> then issues a different command (with another timeout), and >>>> again that timeout can't be transmitted to the attached devices. >>>> So when the VM detects a timeout, it will try to issue an abort, >>>> but that goes nowhere as the VM host has no way to abort commands >>>> from userspace. >>>> So in the end the VM has to wait for the command to complete, causing >>>> stalls in the VM if the host had to undergo error recovery or >>>> something. >>> >>> Aborts are racy. A lot of hardware implements these as a no-op, too. >> >> I'd avoid implementing userspace aborts and fix things in spec first. >> > What's there to fix in the spec for aborts? You can't avoid the fact > that aborts might be sent just at the time when the completion arrives ... > Given that the racy nature I'm am not sure if we can do something in spec that can allow us to deal with racy scenario(s) to allow userspace abort. Also, we do issue abort command from timoeout handler for NVMe PCIe and I think different combinations of userspace abort, timeout handler abort, and completion arrival at the time of userspace abort submission can lead to unclear implementation and more userspace application confusion. -ck ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-16 16:40 ` Keith Busch 2023-02-17 18:53 ` Chaitanya Kulkarni @ 2023-02-20 11:24 ` Sagi Grimberg 2023-02-21 16:25 ` Douglas Gilbert 1 sibling, 1 reply; 22+ messages in thread From: Sagi Grimberg @ 2023-02-20 11:24 UTC (permalink / raw) To: Keith Busch, Hannes Reinecke Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc >> Hi all, >> >> it has come up in other threads, so it might be worthwhile to have its own >> topic: >> >> Userspace command aborts >> >> As it stands we cannot abort I/O commands from userspace. >> This is hitting us when running in a virtual machine: >> The VM sets a timeout when submitting a command, but that >> information can't be transmitted to the VM host. The VM host >> then issues a different command (with another timeout), and >> again that timeout can't be transmitted to the attached devices. >> So when the VM detects a timeout, it will try to issue an abort, >> but that goes nowhere as the VM host has no way to abort commands >> from userspace. >> So in the end the VM has to wait for the command to complete, causing >> stalls in the VM if the host had to undergo error recovery or something. > > Aborts are racy. A lot of hardware implements these as a no-op, too. Indeed. >> With io_uring or CDL we now have some mechanism which look as if they >> would allow us to implement command aborts. > > CDL on the other hand sounds more promising. > >> So this BoF will be around discussions on how aborts from userspace could be >> implemented, whether any of the above methods are suitable, or whether there >> are other ideas on how that could be done. I did not understand what is the relationship between aborts and CDL. Sounds to me that this would tie in to something like Time Limited Error Recovery (TLER) and LR bit set based on ioprio? I am unclear where do aborts come into play here. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-20 11:24 ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg @ 2023-02-21 16:25 ` Douglas Gilbert 2023-02-22 14:37 ` Sagi Grimberg 0 siblings, 1 reply; 22+ messages in thread From: Douglas Gilbert @ 2023-02-21 16:25 UTC (permalink / raw) To: Sagi Grimberg, Keith Busch, Hannes Reinecke Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc On 2023-02-20 06:24, Sagi Grimberg wrote: > >>> Hi all, >>> >>> it has come up in other threads, so it might be worthwhile to have its own >>> topic: >>> >>> Userspace command aborts >>> >>> As it stands we cannot abort I/O commands from userspace. >>> This is hitting us when running in a virtual machine: >>> The VM sets a timeout when submitting a command, but that >>> information can't be transmitted to the VM host. The VM host >>> then issues a different command (with another timeout), and >>> again that timeout can't be transmitted to the attached devices. >>> So when the VM detects a timeout, it will try to issue an abort, >>> but that goes nowhere as the VM host has no way to abort commands >>> from userspace. >>> So in the end the VM has to wait for the command to complete, causing >>> stalls in the VM if the host had to undergo error recovery or something. >> >> Aborts are racy. A lot of hardware implements these as a no-op, too. > > Indeed. > >>> With io_uring or CDL we now have some mechanism which look as if they >>> would allow us to implement command aborts. >> >> CDL on the other hand sounds more promising. >> >>> So this BoF will be around discussions on how aborts from userspace could be >>> implemented, whether any of the above methods are suitable, or whether there >>> are other ideas on how that could be done. > > I did not understand what is the relationship between aborts and CDL. > Sounds to me that this would tie in to something like Time Limited Error > Recovery (TLER) and LR bit set based on ioprio? > > I am unclear where do aborts come into play here. CDL: Command Duration Limits One use case is reading from storage for audio visual output. An application only wants to wait so long (e.g. one or two frames on the video output) before it wants to forget about the current read (i.e. "abort" it) and move onto the next read. An alert viewer might notice a momentary freeze frame. The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32) commands. CDL also depends on the CDLP and RWCDLP fields in the REPORT SUPPORTED OPERATION CODES command and one of the CDL mode pages. So there may be some additional "wiring" needed in the SCSI subsystem. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-21 16:25 ` Douglas Gilbert @ 2023-02-22 14:37 ` Sagi Grimberg 2023-02-22 14:53 ` Keith Busch 0 siblings, 1 reply; 22+ messages in thread From: Sagi Grimberg @ 2023-02-22 14:37 UTC (permalink / raw) To: dgilbert, Keith Busch, Hannes Reinecke Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc >> I did not understand what is the relationship between aborts and CDL. >> Sounds to me that this would tie in to something like Time Limited Error >> Recovery (TLER) and LR bit set based on ioprio? >> >> I am unclear where do aborts come into play here. > > CDL: Command Duration Limits > > One use case is reading from storage for audio visual output. > An application only wants to wait so long (e.g. one or two frames > on the video output) before it wants to forget about the current > read (i.e. "abort" it) and move onto the next read. An alert viewer > might notice a momentary freeze frame. > > The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32) > commands. CDL also depends on the CDLP and RWCDLP fields in the > REPORT SUPPORTED OPERATION CODES command and one of the CDL > mode pages. So there may be some additional "wiring" needed in the > SCSI subsystem. I still don't understand where issuing aborts from userspace come into play here... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-22 14:37 ` Sagi Grimberg @ 2023-02-22 14:53 ` Keith Busch 2023-02-23 15:35 ` Sagi Grimberg 0 siblings, 1 reply; 22+ messages in thread From: Keith Busch @ 2023-02-22 14:53 UTC (permalink / raw) To: Sagi Grimberg Cc: dgilbert, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc On Wed, Feb 22, 2023 at 04:37:51PM +0200, Sagi Grimberg wrote: > > > > I did not understand what is the relationship between aborts and CDL. > > > Sounds to me that this would tie in to something like Time Limited Error > > > Recovery (TLER) and LR bit set based on ioprio? > > > > > > I am unclear where do aborts come into play here. > > > > CDL: Command Duration Limits > > > > One use case is reading from storage for audio visual output. > > An application only wants to wait so long (e.g. one or two frames > > on the video output) before it wants to forget about the current > > read (i.e. "abort" it) and move onto the next read. An alert viewer > > might notice a momentary freeze frame. > > > > The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32) > > commands. CDL also depends on the CDLP and RWCDLP fields in the > > REPORT SUPPORTED OPERATION CODES command and one of the CDL > > mode pages. So there may be some additional "wiring" needed in the > > SCSI subsystem. > > I still don't understand where issuing aborts from userspace come into > play here... The only connection is that aborts are obsolete and unnecessary if you have a working CDL implementation. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-22 14:53 ` Keith Busch @ 2023-02-23 15:35 ` Sagi Grimberg 2023-02-24 23:54 ` Chaitanya Kulkarni 0 siblings, 1 reply; 22+ messages in thread From: Sagi Grimberg @ 2023-02-23 15:35 UTC (permalink / raw) To: Keith Busch Cc: dgilbert, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc >>>> I did not understand what is the relationship between aborts and CDL. >>>> Sounds to me that this would tie in to something like Time Limited Error >>>> Recovery (TLER) and LR bit set based on ioprio? >>>> >>>> I am unclear where do aborts come into play here. >>> >>> CDL: Command Duration Limits >>> >>> One use case is reading from storage for audio visual output. >>> An application only wants to wait so long (e.g. one or two frames >>> on the video output) before it wants to forget about the current >>> read (i.e. "abort" it) and move onto the next read. An alert viewer >>> might notice a momentary freeze frame. >>> >>> The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32) >>> commands. CDL also depends on the CDLP and RWCDLP fields in the >>> REPORT SUPPORTED OPERATION CODES command and one of the CDL >>> mode pages. So there may be some additional "wiring" needed in the >>> SCSI subsystem. >> >> I still don't understand where issuing aborts from userspace come into >> play here... > > The only connection is that aborts are obsolete and unnecessary if > you have a working CDL implementation. OK, that makes sense. Indeed I *think* that nvme can support CDL and if this is useful for userspace then this is an interesting path to take. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-23 15:35 ` Sagi Grimberg @ 2023-02-24 23:54 ` Chaitanya Kulkarni 2023-02-25 1:51 ` Keith Busch 0 siblings, 1 reply; 22+ messages in thread From: Chaitanya Kulkarni @ 2023-02-24 23:54 UTC (permalink / raw) To: Sagi Grimberg, Keith Busch, hch@lst.de, martin.petersen@oracle.com, Damien Le Moal Cc: dgilbert@interlog.com, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org (+Martin, Damien) On 2/23/2023 7:35 AM, Sagi Grimberg wrote: > >>>>> I did not understand what is the relationship between aborts and CDL. >>>>> Sounds to me that this would tie in to something like Time Limited >>>>> Error >>>>> Recovery (TLER) and LR bit set based on ioprio? >>>>> >>>>> I am unclear where do aborts come into play here. >>>> >>>> CDL: Command Duration Limits >>>> >>>> One use case is reading from storage for audio visual output. >>>> An application only wants to wait so long (e.g. one or two frames >>>> on the video output) before it wants to forget about the current >>>> read (i.e. "abort" it) and move onto the next read. An alert viewer >>>> might notice a momentary freeze frame. >>>> >>>> The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the >>>> READ(16,32) >>>> commands. CDL also depends on the CDLP and RWCDLP fields in the >>>> REPORT SUPPORTED OPERATION CODES command and one of the CDL >>>> mode pages. So there may be some additional "wiring" needed in the >>>> SCSI subsystem. >>> >>> I still don't understand where issuing aborts from userspace come into >>> play here... >> >> The only connection is that aborts are obsolete and unnecessary if >> you have a working CDL implementation. > > OK, that makes sense. Indeed I *think* that nvme can support CDL and if > this is useful for userspace then this is an interesting path to take. I do think that we should work on CDL for NVMe as it will solve some of the timeout related problems effectively than using aborts or any other mechanism. -ck ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-24 23:54 ` Chaitanya Kulkarni @ 2023-02-25 1:51 ` Keith Busch 2023-02-25 4:15 ` Damien Le Moal 2023-02-27 8:20 ` Hannes Reinecke 0 siblings, 2 replies; 22+ messages in thread From: Keith Busch @ 2023-02-25 1:51 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com, Damien Le Moal, dgilbert@interlog.com, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: > I do think that we should work on CDL for NVMe as it will solve some of > the timeout related problems effectively than using aborts or any other > mechanism. That proposal exists in NVMe TWG, but doesn't appear to have recent activity. The last I heard, one point of contention was where the duration limit property exists: within the command, or the queue. From my perspective, if it's not at the queue level, the limit becomes meaningless, but hey, it's not up to me. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-25 1:51 ` Keith Busch @ 2023-02-25 4:15 ` Damien Le Moal 2023-02-25 16:14 ` James Smart 2023-02-27 16:33 ` Sagi Grimberg 2023-02-27 8:20 ` Hannes Reinecke 1 sibling, 2 replies; 22+ messages in thread From: Damien Le Moal @ 2023-02-25 4:15 UTC (permalink / raw) To: Keith Busch, Chaitanya Kulkarni Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/25/23 10:51, Keith Busch wrote: > On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: >> I do think that we should work on CDL for NVMe as it will solve some of >> the timeout related problems effectively than using aborts or any other >> mechanism. > > That proposal exists in NVMe TWG, but doesn't appear to have recent activity. > The last I heard, one point of contention was where the duration limit property > exists: within the command, or the queue. From my perspective, if it's not at > the queue level, the limit becomes meaningless, but hey, it's not up to me. Limit attached to the command makes things more flexible and easier for the host, so personally, I prefer that. But this has an impact on the controller: the device needs to pull in *all* commands to be able to know the limits and do scheduling/aborts appropriately. That is not something that the device designers like, for obvious reasons (device internal resources...). On the other hand, limits attached to queues could lead to either a serious increase in the number of queues (PCI space & number of IRQ vectors limits), or, loss of performance as a particular queue with the desired limit would be accessed from multiple CPUs on the host (lock contention). Tricky problem I think with lots of compromises. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-25 4:15 ` Damien Le Moal @ 2023-02-25 16:14 ` James Smart 2023-02-27 16:33 ` Sagi Grimberg 1 sibling, 0 replies; 22+ messages in thread From: James Smart @ 2023-02-25 16:14 UTC (permalink / raw) To: Damien Le Moal, Keith Busch, Chaitanya Kulkarni Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/24/2023 8:15 PM, Damien Le Moal wrote: > On 2/25/23 10:51, Keith Busch wrote: >> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: >>> I do think that we should work on CDL for NVMe as it will solve some of >>> the timeout related problems effectively than using aborts or any other >>> mechanism. >> >> That proposal exists in NVMe TWG, but doesn't appear to have recent activity. >> The last I heard, one point of contention was where the duration limit property >> exists: within the command, or the queue. From my perspective, if it's not at >> the queue level, the limit becomes meaningless, but hey, it's not up to me. > > Limit attached to the command makes things more flexible and easier for the > host, so personally, I prefer that. But this has an impact on the controller: > the device needs to pull in *all* commands to be able to know the limits and do > scheduling/aborts appropriately. That is not something that the device designers > like, for obvious reasons (device internal resources...). > > On the other hand, limits attached to queues could lead to either a serious > increase in the number of queues (PCI space & number of IRQ vectors limits), or, > loss of performance as a particular queue with the desired limit would be > accessed from multiple CPUs on the host (lock contention). Tricky problem I > think with lots of compromises. > From a fabrics perspective: - at the command: is workable. However, the times are distorted as it won't include fabric transmission time of the cmd or rsp, nor any retransission of cmd xmt or rsp xmt under the fabric protecting against loss. - at the queue: is not workable. It effectively becomes a host transport timer as the cdl has to cover all fabric transmission times and the only entity that can time/enforce the timer is the host transport. Also, what does the host transport do when the timer expires ? there are only a couple of things it can do, all of them disruptive and at best delaying the response back to the caller. - CDL can only be meaningful (ie completion times close to cdl) in the absence of transport errors. Cmd termination, perhaps tied with connection loss/failure detection as well as connection/queue termination or or association termination - can have timers that are well above the CDL value. Any cmd completion guarantee within time-X can become meaningless. -- james ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-25 4:15 ` Damien Le Moal 2023-02-25 16:14 ` James Smart @ 2023-02-27 16:33 ` Sagi Grimberg 2023-02-27 17:28 ` Hannes Reinecke 2023-02-27 21:17 ` Damien Le Moal 1 sibling, 2 replies; 22+ messages in thread From: Sagi Grimberg @ 2023-02-27 16:33 UTC (permalink / raw) To: Damien Le Moal, Keith Busch, Chaitanya Kulkarni Cc: hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org >> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: >>> I do think that we should work on CDL for NVMe as it will solve some of >>> the timeout related problems effectively than using aborts or any other >>> mechanism. >> >> That proposal exists in NVMe TWG, but doesn't appear to have recent activity. >> The last I heard, one point of contention was where the duration limit property >> exists: within the command, or the queue. From my perspective, if it's not at >> the queue level, the limit becomes meaningless, but hey, it's not up to me. > > Limit attached to the command makes things more flexible and easier for the > host, so personally, I prefer that. But this has an impact on the controller: > the device needs to pull in *all* commands to be able to know the limits and do > scheduling/aborts appropriately. That is not something that the device designers > like, for obvious reasons (device internal resources...). > > On the other hand, limits attached to queues could lead to either a serious > increase in the number of queues (PCI space & number of IRQ vectors limits), or, > loss of performance as a particular queue with the desired limit would be > accessed from multiple CPUs on the host (lock contention). Tricky problem I > think with lots of compromises. I'm not up to speed on how CDL is defined, but I'm unclear how CDL at the queue level would cause the host to open more queues? Another question, does CDL have any relationship with NVMe "Time Limited Error Recovery"? where the host can set a feature for timeout and indicate if the controller should respect it per command? While this is not a full-blown every queue/command has its own timeout, it could address the original use-case given by Hannes. And it's already there. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-27 16:33 ` Sagi Grimberg @ 2023-02-27 17:28 ` Hannes Reinecke 2023-02-27 17:44 ` Keith Busch 2023-02-27 21:17 ` Damien Le Moal 1 sibling, 1 reply; 22+ messages in thread From: Hannes Reinecke @ 2023-02-27 17:28 UTC (permalink / raw) To: Sagi Grimberg, Damien Le Moal, Keith Busch, Chaitanya Kulkarni Cc: hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/27/23 17:33, Sagi Grimberg wrote: > >>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: >>>> I do think that we should work on CDL for NVMe as it will solve some of >>>> the timeout related problems effectively than using aborts or any other >>>> mechanism. >>> >>> That proposal exists in NVMe TWG, but doesn't appear to have recent >>> activity. >>> The last I heard, one point of contention was where the duration >>> limit property >>> exists: within the command, or the queue. From my perspective, if >>> it's not at >>> the queue level, the limit becomes meaningless, but hey, it's not up >>> to me. >> >> Limit attached to the command makes things more flexible and easier >> for the >> host, so personally, I prefer that. But this has an impact on the >> controller: >> the device needs to pull in *all* commands to be able to know the >> limits and do >> scheduling/aborts appropriately. That is not something that the device >> designers >> like, for obvious reasons (device internal resources...). >> >> On the other hand, limits attached to queues could lead to either a >> serious >> increase in the number of queues (PCI space & number of IRQ vectors >> limits), or, >> loss of performance as a particular queue with the desired limit would be >> accessed from multiple CPUs on the host (lock contention). Tricky >> problem I >> think with lots of compromises. > > I'm not up to speed on how CDL is defined, but I'm unclear how CDL at > the queue level would cause the host to open more queues? > > Another question, does CDL have any relationship with NVMe "Time Limited > Error Recovery"? where the host can set a feature for timeout and > indicate if the controller should respect it per command? > > While this is not a full-blown every queue/command has its own timeout, > it could address the original use-case given by Hannes. And it's already > there. I guess that is the NVMe version of CDLs; can you give me a reference for it? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-27 17:28 ` Hannes Reinecke @ 2023-02-27 17:44 ` Keith Busch 2023-02-27 21:18 ` Damien Le Moal ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Keith Busch @ 2023-02-27 17:44 UTC (permalink / raw) To: Hannes Reinecke Cc: Sagi Grimberg, Damien Le Moal, Chaitanya Kulkarni, hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote: > On 2/27/23 17:33, Sagi Grimberg wrote: > > > > I'm not up to speed on how CDL is defined, but I'm unclear how CDL at > > the queue level would cause the host to open more queues? Because each CDL class would need its own submission queue in that scheme. They can all share a single completion queue, so this scheme doesn't necassarily increase the number of interrupt vectors. > > Another question, does CDL have any relationship with NVMe "Time Limited > > Error Recovery"? where the host can set a feature for timeout and > > indicate if the controller should respect it per command? > > > > While this is not a full-blown every queue/command has its own timeout, > > it could address the original use-case given by Hannes. And it's already > > there. > I guess that is the NVMe version of CDLs; can you give me a reference for > it? They're not the same. TLER starts timing after a command experiences a recoverable error, where CDL is an end-to-end timing for all commands. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-27 17:44 ` Keith Busch @ 2023-02-27 21:18 ` Damien Le Moal 2023-02-27 21:42 ` Damien Le Moal 2023-02-28 8:05 ` Sagi Grimberg 2 siblings, 0 replies; 22+ messages in thread From: Damien Le Moal @ 2023-02-27 21:18 UTC (permalink / raw) To: Keith Busch, Hannes Reinecke Cc: Sagi Grimberg, Chaitanya Kulkarni, hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/28/23 02:44, Keith Busch wrote: > On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote: >> On 2/27/23 17:33, Sagi Grimberg wrote: >>> >>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at >>> the queue level would cause the host to open more queues? > > Because each CDL class would need its own submission queue in that scheme. They > can all share a single completion queue, so this scheme doesn't necassarily > increase the number of interrupt vectors. Ah yes. good point. I always forget about the shared completion queue :) >>> Another question, does CDL have any relationship with NVMe "Time Limited >>> Error Recovery"? where the host can set a feature for timeout and >>> indicate if the controller should respect it per command? >>> >>> While this is not a full-blown every queue/command has its own timeout, >>> it could address the original use-case given by Hannes. And it's already >>> there. >> I guess that is the NVMe version of CDLs; can you give me a reference for >> it? > > They're not the same. TLER starts timing after a command experiences a > recoverable error, where CDL is an end-to-end timing for all commands. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-27 17:44 ` Keith Busch 2023-02-27 21:18 ` Damien Le Moal @ 2023-02-27 21:42 ` Damien Le Moal 2023-02-28 8:05 ` Sagi Grimberg 2 siblings, 0 replies; 22+ messages in thread From: Damien Le Moal @ 2023-02-27 21:42 UTC (permalink / raw) To: Keith Busch, Hannes Reinecke Cc: Sagi Grimberg, Chaitanya Kulkarni, hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/28/23 02:44, Keith Busch wrote: > On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote: >> On 2/27/23 17:33, Sagi Grimberg wrote: >>> >>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at >>> the queue level would cause the host to open more queues? > > Because each CDL class would need its own submission queue in that scheme. They > can all share a single completion queue, so this scheme doesn't necassarily > increase the number of interrupt vectors. > >>> Another question, does CDL have any relationship with NVMe "Time Limited >>> Error Recovery"? where the host can set a feature for timeout and >>> indicate if the controller should respect it per command? >>> >>> While this is not a full-blown every queue/command has its own timeout, >>> it could address the original use-case given by Hannes. And it's already >>> there. >> I guess that is the NVMe version of CDLs; can you give me a reference for >> it? > > They're not the same. TLER starts timing after a command experiences a > recoverable error, where CDL is an end-to-end timing for all commands. Note here that with the current T10/T13 CDL definitions, end-to-end actually means from the time the command is received by the device to the time the device signals the command completion. That does not include the transport & host adapter queueing (if there is an HBA). And I guess this is the issue at hand for fabrics: how to integrate the transport times. I guess the CDL descriptors could have one additional limit for that, but then the duration guideline limit definition would need to be tweaked. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-27 17:44 ` Keith Busch 2023-02-27 21:18 ` Damien Le Moal 2023-02-27 21:42 ` Damien Le Moal @ 2023-02-28 8:05 ` Sagi Grimberg 2 siblings, 0 replies; 22+ messages in thread From: Sagi Grimberg @ 2023-02-28 8:05 UTC (permalink / raw) To: Keith Busch, Hannes Reinecke Cc: Damien Le Moal, Chaitanya Kulkarni, hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org >>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at >>> the queue level would cause the host to open more queues? > > Because each CDL class would need its own submission queue in that scheme. They > can all share a single completion queue, so this scheme doesn't necassarily > increase the number of interrupt vectors. Ah, that is less desirable I think, Although we can already do multiple queue maps, I think that the proliferation of queues is harmful in the long run. >>> Another question, does CDL have any relationship with NVMe "Time Limited >>> Error Recovery"? where the host can set a feature for timeout and >>> indicate if the controller should respect it per command? >>> >>> While this is not a full-blown every queue/command has its own timeout, >>> it could address the original use-case given by Hannes. And it's already >>> there. >> I guess that is the NVMe version of CDLs; can you give me a reference for >> it? > > They're not the same. TLER starts timing after a command experiences a > recoverable error, where CDL is an end-to-end timing for all commands. Ah, ok. I didn't realize that TLER starts after an error. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-27 16:33 ` Sagi Grimberg 2023-02-27 17:28 ` Hannes Reinecke @ 2023-02-27 21:17 ` Damien Le Moal 1 sibling, 0 replies; 22+ messages in thread From: Damien Le Moal @ 2023-02-27 21:17 UTC (permalink / raw) To: Sagi Grimberg, Keith Busch, Chaitanya Kulkarni Cc: hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com, Hannes Reinecke, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/28/23 01:33, Sagi Grimberg wrote: > >>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: >>>> I do think that we should work on CDL for NVMe as it will solve some of >>>> the timeout related problems effectively than using aborts or any other >>>> mechanism. >>> >>> That proposal exists in NVMe TWG, but doesn't appear to have recent activity. >>> The last I heard, one point of contention was where the duration limit property >>> exists: within the command, or the queue. From my perspective, if it's not at >>> the queue level, the limit becomes meaningless, but hey, it's not up to me. >> >> Limit attached to the command makes things more flexible and easier for the >> host, so personally, I prefer that. But this has an impact on the controller: >> the device needs to pull in *all* commands to be able to know the limits and do >> scheduling/aborts appropriately. That is not something that the device designers >> like, for obvious reasons (device internal resources...). >> >> On the other hand, limits attached to queues could lead to either a serious >> increase in the number of queues (PCI space & number of IRQ vectors limits), or, >> loss of performance as a particular queue with the desired limit would be >> accessed from multiple CPUs on the host (lock contention). Tricky problem I >> think with lots of compromises. > > I'm not up to speed on how CDL is defined, but I'm unclear how CDL at > the queue level would cause the host to open more queues? There would be the need for one queue pair per limit defined, in addition to the regular "no limits" queue pairs. And given that CDL allows defining up to 7 limits for read AND write commands, if kept as-is, this means potentially 14 additional queue pairs shared among all CPUs, or even more than that if one wants per CPU queues with limits. > Another question, does CDL have any relationship with NVMe "Time Limited > Error Recovery"? where the host can set a feature for timeout and > indicate if the controller should respect it per command? This NVMe feature does map to one of the possible limits that can be defined with CDL. CDL currently allows 3 different limits: - Active time limit: limit on command execution involving media access - inactive time limit: limit on device internal queueing time before processing of the command starts (aging control) - Duration guideline: overall limit on the command processing by the device > While this is not a full-blown every queue/command has its own timeout, > it could address the original use-case given by Hannes. And it's already > there. The above limits are what is currently defined in T10/T13 for SCSI/ATA devices. NVMe may need some tweaks to get a better mapping to the different use cases. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [LSF/MM/BPF BOF] Userspace command abouts 2023-02-25 1:51 ` Keith Busch 2023-02-25 4:15 ` Damien Le Moal @ 2023-02-27 8:20 ` Hannes Reinecke 1 sibling, 0 replies; 22+ messages in thread From: Hannes Reinecke @ 2023-02-27 8:20 UTC (permalink / raw) To: Keith Busch, Chaitanya Kulkarni Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com, Damien Le Moal, dgilbert@interlog.com, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org On 2/25/23 02:51, Keith Busch wrote: > On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote: >> I do think that we should work on CDL for NVMe as it will solve some of >> the timeout related problems effectively than using aborts or any other >> mechanism. > > That proposal exists in NVMe TWG, but doesn't appear to have recent activity. > The last I heard, one point of contention was where the duration limit property > exists: within the command, or the queue. From my perspective, if it's not at > the queue level, the limit becomes meaningless, but hey, it's not up to me. And that is one of the issues I'd like to discuss. As it stands CDL are defined for the controller only, queuing effects from the transport are out of scope (for the current CDL definition). So for NVMe-oF we would need to discuss how we can specify CDLs for fabrics; especially the relationship between CDLs and transport timeouts are ... interesting, and we need to discuss how we can correlate both. Having it on the queue as you suggested would be cool as it would give a nice overall number, but discussions with the driver vendors were not encouraging; they're having a hard time giving timeout guarantees in really quirky failure cases. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-02-28 8:05 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-16 11:50 [LSF/MM/BPF BOF] Userspace command abouts Hannes Reinecke 2023-02-16 16:40 ` Keith Busch 2023-02-17 18:53 ` Chaitanya Kulkarni 2023-02-18 9:50 ` [LSF/MM/BPF BOF] Userspace command aborts Hannes Reinecke 2023-02-21 18:15 ` Chaitanya Kulkarni 2023-02-20 11:24 ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg 2023-02-21 16:25 ` Douglas Gilbert 2023-02-22 14:37 ` Sagi Grimberg 2023-02-22 14:53 ` Keith Busch 2023-02-23 15:35 ` Sagi Grimberg 2023-02-24 23:54 ` Chaitanya Kulkarni 2023-02-25 1:51 ` Keith Busch 2023-02-25 4:15 ` Damien Le Moal 2023-02-25 16:14 ` James Smart 2023-02-27 16:33 ` Sagi Grimberg 2023-02-27 17:28 ` Hannes Reinecke 2023-02-27 17:44 ` Keith Busch 2023-02-27 21:18 ` Damien Le Moal 2023-02-27 21:42 ` Damien Le Moal 2023-02-28 8:05 ` Sagi Grimberg 2023-02-27 21:17 ` Damien Le Moal 2023-02-27 8:20 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox