[LSF/MM/BPF BOF] Userspace command abouts

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* [LSF/MM/BPF BOF] Userspace command abouts
@ 2023-02-16 11:50 Hannes Reinecke
  2023-02-16 16:40 ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2023-02-16 11:50 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc

Hi all,

it has come up in other threads, so it might be worthwhile to have its 
own topic:

Userspace command aborts

As it stands we cannot abort I/O commands from userspace.
This is hitting us when running in a virtual machine:
The VM sets a timeout when submitting a command, but that
information can't be transmitted to the VM host. The VM host
then issues a different command (with another timeout), and
again that timeout can't be transmitted to the attached devices.
So when the VM detects a timeout, it will try to issue an abort,
but that goes nowhere as the VM host has no way to abort commands
from userspace.
So in the end the VM has to wait for the command to complete, causing
stalls in the VM if the host had to undergo error recovery or something.

With io_uring or CDL we now have some mechanism which look as if they
would allow us to implement command aborts.
So this BoF will be around discussions on how aborts from userspace 
could be implemented, whether any of the above methods are suitable, or 
whether there are other ideas on how that could be done.

Cheers,

Hannes
-- 
Still without a .sig on this computer

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-16 11:50 [LSF/MM/BPF BOF] Userspace command abouts Hannes Reinecke
@ 2023-02-16 16:40 ` Keith Busch
  2023-02-17 18:53   ` Chaitanya Kulkarni
  2023-02-20 11:24   ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg
  0 siblings, 2 replies; 22+ messages in thread
From: Keith Busch @ 2023-02-16 16:40 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc

On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote:
> Hi all,
> 
> it has come up in other threads, so it might be worthwhile to have its own
> topic:
> 
> Userspace command aborts
> 
> As it stands we cannot abort I/O commands from userspace.
> This is hitting us when running in a virtual machine:
> The VM sets a timeout when submitting a command, but that
> information can't be transmitted to the VM host. The VM host
> then issues a different command (with another timeout), and
> again that timeout can't be transmitted to the attached devices.
> So when the VM detects a timeout, it will try to issue an abort,
> but that goes nowhere as the VM host has no way to abort commands
> from userspace.
> So in the end the VM has to wait for the command to complete, causing
> stalls in the VM if the host had to undergo error recovery or something.

Aborts are racy. A lot of hardware implements these as a no-op, too.
 
> With io_uring or CDL we now have some mechanism which look as if they
> would allow us to implement command aborts.

CDL on the other hand sounds more promising.

> So this BoF will be around discussions on how aborts from userspace could be
> implemented, whether any of the above methods are suitable, or whether there
> are other ideas on how that could be done.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-16 16:40 ` Keith Busch
@ 2023-02-17 18:53   ` Chaitanya Kulkarni
  2023-02-18  9:50     ` [LSF/MM/BPF BOF] Userspace command aborts Hannes Reinecke
  2023-02-20 11:24   ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg
  1 sibling, 1 reply; 22+ messages in thread
From: Chaitanya Kulkarni @ 2023-02-17 18:53 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/16/23 08:40, Keith Busch wrote:
> On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote:
>> Hi all,
>>
>> it has come up in other threads, so it might be worthwhile to have its own
>> topic:
>>
>> Userspace command aborts
>>
>> As it stands we cannot abort I/O commands from userspace.
>> This is hitting us when running in a virtual machine:
>> The VM sets a timeout when submitting a command, but that
>> information can't be transmitted to the VM host. The VM host
>> then issues a different command (with another timeout), and
>> again that timeout can't be transmitted to the attached devices.
>> So when the VM detects a timeout, it will try to issue an abort,
>> but that goes nowhere as the VM host has no way to abort commands
>> from userspace.
>> So in the end the VM has to wait for the command to complete, causing
>> stalls in the VM if the host had to undergo error recovery or something.
> 
> Aborts are racy. A lot of hardware implements these as a no-op, too.
>   

I'd avoid implementing userspace aborts and fix things in spec first.

-ck


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command aborts
  2023-02-17 18:53   ` Chaitanya Kulkarni
@ 2023-02-18  9:50     ` Hannes Reinecke
  2023-02-21 18:15       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2023-02-18  9:50 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Keith Busch
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/17/23 19:53, Chaitanya Kulkarni wrote:
> On 2/16/23 08:40, Keith Busch wrote:
>> On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote:
>>> Hi all,
>>>
>>> it has come up in other threads, so it might be worthwhile to have its own
>>> topic:
>>>
>>> Userspace command aborts
>>>
>>> As it stands we cannot abort I/O commands from userspace.
>>> This is hitting us when running in a virtual machine:
>>> The VM sets a timeout when submitting a command, but that
>>> information can't be transmitted to the VM host. The VM host
>>> then issues a different command (with another timeout), and
>>> again that timeout can't be transmitted to the attached devices.
>>> So when the VM detects a timeout, it will try to issue an abort,
>>> but that goes nowhere as the VM host has no way to abort commands
>>> from userspace.
>>> So in the end the VM has to wait for the command to complete, causing
>>> stalls in the VM if the host had to undergo error recovery or something.
>>
>> Aborts are racy. A lot of hardware implements these as a no-op, too.
>>    
> 
> I'd avoid implementing userspace aborts and fix things in spec first.
> 
What's there to fix in the spec for aborts? You can't avoid the fact 
that aborts might be sent just at the time when the completion arrives ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command aborts
  2023-02-18  9:50     ` [LSF/MM/BPF BOF] Userspace command aborts Hannes Reinecke
@ 2023-02-21 18:15       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 22+ messages in thread
From: Chaitanya Kulkarni @ 2023-02-21 18:15 UTC (permalink / raw)
  To: Hannes Reinecke, Keith Busch
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/18/2023 1:50 AM, Hannes Reinecke wrote:
> On 2/17/23 19:53, Chaitanya Kulkarni wrote:
>> On 2/16/23 08:40, Keith Busch wrote:
>>> On Thu, Feb 16, 2023 at 12:50:03PM +0100, Hannes Reinecke wrote:
>>>> Hi all,
>>>>
>>>> it has come up in other threads, so it might be worthwhile to have 
>>>> its own
>>>> topic:
>>>>
>>>> Userspace command aborts
>>>>
>>>> As it stands we cannot abort I/O commands from userspace.
>>>> This is hitting us when running in a virtual machine:
>>>> The VM sets a timeout when submitting a command, but that
>>>> information can't be transmitted to the VM host. The VM host
>>>> then issues a different command (with another timeout), and
>>>> again that timeout can't be transmitted to the attached devices.
>>>> So when the VM detects a timeout, it will try to issue an abort,
>>>> but that goes nowhere as the VM host has no way to abort commands
>>>> from userspace.
>>>> So in the end the VM has to wait for the command to complete, causing
>>>> stalls in the VM if the host had to undergo error recovery or 
>>>> something.
>>>
>>> Aborts are racy. A lot of hardware implements these as a no-op, too.
>>
>> I'd avoid implementing userspace aborts and fix things in spec first.
>>
> What's there to fix in the spec for aborts? You can't avoid the fact 
> that aborts might be sent just at the time when the completion arrives ...
> 

Given that the racy nature I'm am not sure if we can do something in
spec that can allow us to deal with racy scenario(s) to allow userspace
abort.

Also, we do issue abort command from timoeout handler for NVMe PCIe and
I think different combinations of userspace abort, timeout handler
abort, and completion arrival at the time of userspace abort submission
can lead to unclear implementation and more userspace application
confusion.

-ck



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-16 16:40 ` Keith Busch
  2023-02-17 18:53   ` Chaitanya Kulkarni
@ 2023-02-20 11:24   ` Sagi Grimberg
  2023-02-21 16:25     ` Douglas Gilbert
  1 sibling, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2023-02-20 11:24 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc


>> Hi all,
>>
>> it has come up in other threads, so it might be worthwhile to have its own
>> topic:
>>
>> Userspace command aborts
>>
>> As it stands we cannot abort I/O commands from userspace.
>> This is hitting us when running in a virtual machine:
>> The VM sets a timeout when submitting a command, but that
>> information can't be transmitted to the VM host. The VM host
>> then issues a different command (with another timeout), and
>> again that timeout can't be transmitted to the attached devices.
>> So when the VM detects a timeout, it will try to issue an abort,
>> but that goes nowhere as the VM host has no way to abort commands
>> from userspace.
>> So in the end the VM has to wait for the command to complete, causing
>> stalls in the VM if the host had to undergo error recovery or something.
> 
> Aborts are racy. A lot of hardware implements these as a no-op, too.

Indeed.

>> With io_uring or CDL we now have some mechanism which look as if they
>> would allow us to implement command aborts.
> 
> CDL on the other hand sounds more promising.
> 
>> So this BoF will be around discussions on how aborts from userspace could be
>> implemented, whether any of the above methods are suitable, or whether there
>> are other ideas on how that could be done.

I did not understand what is the relationship between aborts and CDL.
Sounds to me that this would tie in to something like Time Limited Error
Recovery (TLER) and LR bit set based on ioprio?

I am unclear where do aborts come into play here.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-20 11:24   ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg
@ 2023-02-21 16:25     ` Douglas Gilbert
  2023-02-22 14:37       ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Douglas Gilbert @ 2023-02-21 16:25 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch, Hannes Reinecke
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc

On 2023-02-20 06:24, Sagi Grimberg wrote:
> 
>>> Hi all,
>>>
>>> it has come up in other threads, so it might be worthwhile to have its own
>>> topic:
>>>
>>> Userspace command aborts
>>>
>>> As it stands we cannot abort I/O commands from userspace.
>>> This is hitting us when running in a virtual machine:
>>> The VM sets a timeout when submitting a command, but that
>>> information can't be transmitted to the VM host. The VM host
>>> then issues a different command (with another timeout), and
>>> again that timeout can't be transmitted to the attached devices.
>>> So when the VM detects a timeout, it will try to issue an abort,
>>> but that goes nowhere as the VM host has no way to abort commands
>>> from userspace.
>>> So in the end the VM has to wait for the command to complete, causing
>>> stalls in the VM if the host had to undergo error recovery or something.
>>
>> Aborts are racy. A lot of hardware implements these as a no-op, too.
> 
> Indeed.
> 
>>> With io_uring or CDL we now have some mechanism which look as if they
>>> would allow us to implement command aborts.
>>
>> CDL on the other hand sounds more promising.
>>
>>> So this BoF will be around discussions on how aborts from userspace could be
>>> implemented, whether any of the above methods are suitable, or whether there
>>> are other ideas on how that could be done.
> 
> I did not understand what is the relationship between aborts and CDL.
> Sounds to me that this would tie in to something like Time Limited Error
> Recovery (TLER) and LR bit set based on ioprio?
> 
> I am unclear where do aborts come into play here.

CDL: Command Duration Limits

One use case is reading from storage for audio visual output.
An application only wants to wait so long (e.g. one or two frames
on the video output) before it wants to forget about the current
read (i.e. "abort" it) and move onto the next read. An alert viewer
might notice a momentary freeze frame.

The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32)
commands. CDL also depends on the CDLP and RWCDLP fields in the
REPORT SUPPORTED OPERATION CODES command and one of the CDL
mode pages. So there may be some additional "wiring" needed in the
SCSI subsystem.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-21 16:25     ` Douglas Gilbert
@ 2023-02-22 14:37       ` Sagi Grimberg
  2023-02-22 14:53         ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2023-02-22 14:37 UTC (permalink / raw)
  To: dgilbert, Keith Busch, Hannes Reinecke
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc


>> I did not understand what is the relationship between aborts and CDL.
>> Sounds to me that this would tie in to something like Time Limited Error
>> Recovery (TLER) and LR bit set based on ioprio?
>>
>> I am unclear where do aborts come into play here.
> 
> CDL: Command Duration Limits
> 
> One use case is reading from storage for audio visual output.
> An application only wants to wait so long (e.g. one or two frames
> on the video output) before it wants to forget about the current
> read (i.e. "abort" it) and move onto the next read. An alert viewer
> might notice a momentary freeze frame.
> 
> The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32)
> commands. CDL also depends on the CDLP and RWCDLP fields in the
> REPORT SUPPORTED OPERATION CODES command and one of the CDL
> mode pages. So there may be some additional "wiring" needed in the
> SCSI subsystem.

I still don't understand where issuing aborts from userspace come into
play here...


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-22 14:37       ` Sagi Grimberg
@ 2023-02-22 14:53         ` Keith Busch
  2023-02-23 15:35           ` Sagi Grimberg
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Busch @ 2023-02-22 14:53 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: dgilbert, Hannes Reinecke, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc

On Wed, Feb 22, 2023 at 04:37:51PM +0200, Sagi Grimberg wrote:
> 
> > > I did not understand what is the relationship between aborts and CDL.
> > > Sounds to me that this would tie in to something like Time Limited Error
> > > Recovery (TLER) and LR bit set based on ioprio?
> > > 
> > > I am unclear where do aborts come into play here.
> > 
> > CDL: Command Duration Limits
> > 
> > One use case is reading from storage for audio visual output.
> > An application only wants to wait so long (e.g. one or two frames
> > on the video output) before it wants to forget about the current
> > read (i.e. "abort" it) and move onto the next read. An alert viewer
> > might notice a momentary freeze frame.
> > 
> > The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32)
> > commands. CDL also depends on the CDLP and RWCDLP fields in the
> > REPORT SUPPORTED OPERATION CODES command and one of the CDL
> > mode pages. So there may be some additional "wiring" needed in the
> > SCSI subsystem.
> 
> I still don't understand where issuing aborts from userspace come into
> play here...

The only connection is that aborts are obsolete and unnecessary if
you have a working CDL implementation.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-22 14:53         ` Keith Busch
@ 2023-02-23 15:35           ` Sagi Grimberg
  2023-02-24 23:54             ` Chaitanya Kulkarni
  0 siblings, 1 reply; 22+ messages in thread
From: Sagi Grimberg @ 2023-02-23 15:35 UTC (permalink / raw)
  To: Keith Busch
  Cc: dgilbert, Hannes Reinecke, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, lsf-pc


>>>> I did not understand what is the relationship between aborts and CDL.
>>>> Sounds to me that this would tie in to something like Time Limited Error
>>>> Recovery (TLER) and LR bit set based on ioprio?
>>>>
>>>> I am unclear where do aborts come into play here.
>>>
>>> CDL: Command Duration Limits
>>>
>>> One use case is reading from storage for audio visual output.
>>> An application only wants to wait so long (e.g. one or two frames
>>> on the video output) before it wants to forget about the current
>>> read (i.e. "abort" it) and move onto the next read. An alert viewer
>>> might notice a momentary freeze frame.
>>>
>>> The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the READ(16,32)
>>> commands. CDL also depends on the CDLP and RWCDLP fields in the
>>> REPORT SUPPORTED OPERATION CODES command and one of the CDL
>>> mode pages. So there may be some additional "wiring" needed in the
>>> SCSI subsystem.
>>
>> I still don't understand where issuing aborts from userspace come into
>> play here...
> 
> The only connection is that aborts are obsolete and unnecessary if
> you have a working CDL implementation.

OK, that makes sense. Indeed I *think* that nvme can support CDL and if
this is useful for userspace then this is an interesting path to take.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-23 15:35           ` Sagi Grimberg
@ 2023-02-24 23:54             ` Chaitanya Kulkarni
  2023-02-25  1:51               ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Chaitanya Kulkarni @ 2023-02-24 23:54 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch, hch@lst.de,
	martin.petersen@oracle.com, Damien Le Moal
  Cc: dgilbert@interlog.com, Hannes Reinecke,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

(+Martin, Damien)

On 2/23/2023 7:35 AM, Sagi Grimberg wrote:
> 
>>>>> I did not understand what is the relationship between aborts and CDL.
>>>>> Sounds to me that this would tie in to something like Time Limited 
>>>>> Error
>>>>> Recovery (TLER) and LR bit set based on ioprio?
>>>>>
>>>>> I am unclear where do aborts come into play here.
>>>>
>>>> CDL: Command Duration Limits
>>>>
>>>> One use case is reading from storage for audio visual output.
>>>> An application only wants to wait so long (e.g. one or two frames
>>>> on the video output) before it wants to forget about the current
>>>> read (i.e. "abort" it) and move onto the next read. An alert viewer
>>>> might notice a momentary freeze frame.
>>>>
>>>> The SCSI CDL mechanism uses the DL0, DL1 and DL2 bits in the 
>>>> READ(16,32)
>>>> commands. CDL also depends on the CDLP and RWCDLP fields in the
>>>> REPORT SUPPORTED OPERATION CODES command and one of the CDL
>>>> mode pages. So there may be some additional "wiring" needed in the
>>>> SCSI subsystem.
>>>
>>> I still don't understand where issuing aborts from userspace come into
>>> play here...
>>
>> The only connection is that aborts are obsolete and unnecessary if
>> you have a working CDL implementation.
> 
> OK, that makes sense. Indeed I *think* that nvme can support CDL and if
> this is useful for userspace then this is an interesting path to take.

I do think that we should work on CDL for NVMe as it will solve some of
the timeout related problems effectively than using aborts or any other
mechanism.

-ck



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-24 23:54             ` Chaitanya Kulkarni
@ 2023-02-25  1:51               ` Keith Busch
  2023-02-25  4:15                 ` Damien Le Moal
  2023-02-27  8:20                 ` Hannes Reinecke
  0 siblings, 2 replies; 22+ messages in thread
From: Keith Busch @ 2023-02-25  1:51 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com,
	Damien Le Moal, dgilbert@interlog.com, Hannes Reinecke,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
> I do think that we should work on CDL for NVMe as it will solve some of
> the timeout related problems effectively than using aborts or any other
> mechanism.

That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
The last I heard, one point of contention was where the duration limit property
exists: within the command, or the queue. From my perspective, if it's not at
the queue level, the limit becomes meaningless, but hey, it's not up to me.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-25  1:51               ` Keith Busch
@ 2023-02-25  4:15                 ` Damien Le Moal
  2023-02-25 16:14                   ` James Smart
  2023-02-27 16:33                   ` Sagi Grimberg
  2023-02-27  8:20                 ` Hannes Reinecke
  1 sibling, 2 replies; 22+ messages in thread
From: Damien Le Moal @ 2023-02-25  4:15 UTC (permalink / raw)
  To: Keith Busch, Chaitanya Kulkarni
  Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com,
	dgilbert@interlog.com, Hannes Reinecke,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/25/23 10:51, Keith Busch wrote:
> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>> I do think that we should work on CDL for NVMe as it will solve some of
>> the timeout related problems effectively than using aborts or any other
>> mechanism.
> 
> That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
> The last I heard, one point of contention was where the duration limit property
> exists: within the command, or the queue. From my perspective, if it's not at
> the queue level, the limit becomes meaningless, but hey, it's not up to me.

Limit attached to the command makes things more flexible and easier for the
host, so personally, I prefer that. But this has an impact on the controller:
the device needs to pull in *all* commands to be able to know the limits and do
scheduling/aborts appropriately. That is not something that the device designers
like, for obvious reasons (device internal resources...).

On the other hand, limits attached to queues could lead to either a serious
increase in the number of queues (PCI space & number of IRQ vectors limits), or,
loss of performance as a particular queue with the desired limit would be
accessed from multiple CPUs on the host (lock contention). Tricky problem I
think with lots of compromises.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-25  4:15                 ` Damien Le Moal
@ 2023-02-25 16:14                   ` James Smart
  2023-02-27 16:33                   ` Sagi Grimberg
  1 sibling, 0 replies; 22+ messages in thread
From: James Smart @ 2023-02-25 16:14 UTC (permalink / raw)
  To: Damien Le Moal, Keith Busch, Chaitanya Kulkarni
  Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com,
	dgilbert@interlog.com, Hannes Reinecke,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/24/2023 8:15 PM, Damien Le Moal wrote:
> On 2/25/23 10:51, Keith Busch wrote:
>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>>> I do think that we should work on CDL for NVMe as it will solve some of
>>> the timeout related problems effectively than using aborts or any other
>>> mechanism.
>>
>> That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
>> The last I heard, one point of contention was where the duration limit property
>> exists: within the command, or the queue. From my perspective, if it's not at
>> the queue level, the limit becomes meaningless, but hey, it's not up to me.
> 
> Limit attached to the command makes things more flexible and easier for the
> host, so personally, I prefer that. But this has an impact on the controller:
> the device needs to pull in *all* commands to be able to know the limits and do
> scheduling/aborts appropriately. That is not something that the device designers
> like, for obvious reasons (device internal resources...).
> 
> On the other hand, limits attached to queues could lead to either a serious
> increase in the number of queues (PCI space & number of IRQ vectors limits), or,
> loss of performance as a particular queue with the desired limit would be
> accessed from multiple CPUs on the host (lock contention). Tricky problem I
> think with lots of compromises.
> 

 From a fabrics perspective:

- at the command: is workable.  However, the times are distorted as it 
won't include fabric transmission time of the cmd or rsp, nor any 
retransission of cmd xmt or rsp xmt under the fabric protecting against 
loss.

- at the queue: is not workable. It effectively becomes a host transport 
timer as the cdl has to cover all fabric transmission times and the only 
entity that can time/enforce the timer is the host transport. Also, what 
does the host transport do when the timer expires ? there are only a 
couple of things it can do, all of them disruptive and at best delaying 
the response back to the caller.

- CDL can only be meaningful (ie completion times close to cdl) in the 
absence of transport errors. Cmd termination, perhaps tied with 
connection loss/failure detection as well as connection/queue 
termination or or association termination - can have timers that are 
well above the CDL value.  Any cmd completion guarantee within time-X 
can become meaningless.

-- james




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-25  4:15                 ` Damien Le Moal
  2023-02-25 16:14                   ` James Smart
@ 2023-02-27 16:33                   ` Sagi Grimberg
  2023-02-27 17:28                     ` Hannes Reinecke
  2023-02-27 21:17                     ` Damien Le Moal
  1 sibling, 2 replies; 22+ messages in thread
From: Sagi Grimberg @ 2023-02-27 16:33 UTC (permalink / raw)
  To: Damien Le Moal, Keith Busch, Chaitanya Kulkarni
  Cc: hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com,
	Hannes Reinecke, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	lsf-pc@lists.linuxfoundation.org


>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>>> I do think that we should work on CDL for NVMe as it will solve some of
>>> the timeout related problems effectively than using aborts or any other
>>> mechanism.
>>
>> That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
>> The last I heard, one point of contention was where the duration limit property
>> exists: within the command, or the queue. From my perspective, if it's not at
>> the queue level, the limit becomes meaningless, but hey, it's not up to me.
> 
> Limit attached to the command makes things more flexible and easier for the
> host, so personally, I prefer that. But this has an impact on the controller:
> the device needs to pull in *all* commands to be able to know the limits and do
> scheduling/aborts appropriately. That is not something that the device designers
> like, for obvious reasons (device internal resources...).
> 
> On the other hand, limits attached to queues could lead to either a serious
> increase in the number of queues (PCI space & number of IRQ vectors limits), or,
> loss of performance as a particular queue with the desired limit would be
> accessed from multiple CPUs on the host (lock contention). Tricky problem I
> think with lots of compromises.

I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
the queue level would cause the host to open more queues?

Another question, does CDL have any relationship with NVMe "Time Limited
Error Recovery"? where the host can set a feature for timeout and
indicate if the controller should respect it per command?

While this is not a full-blown every queue/command has its own timeout,
it could address the original use-case given by Hannes. And it's already
there.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-27 16:33                   ` Sagi Grimberg
@ 2023-02-27 17:28                     ` Hannes Reinecke
  2023-02-27 17:44                       ` Keith Busch
  2023-02-27 21:17                     ` Damien Le Moal
  1 sibling, 1 reply; 22+ messages in thread
From: Hannes Reinecke @ 2023-02-27 17:28 UTC (permalink / raw)
  To: Sagi Grimberg, Damien Le Moal, Keith Busch, Chaitanya Kulkarni
  Cc: hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/27/23 17:33, Sagi Grimberg wrote:
> 
>>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>>>> I do think that we should work on CDL for NVMe as it will solve some of
>>>> the timeout related problems effectively than using aborts or any other
>>>> mechanism.
>>>
>>> That proposal exists in NVMe TWG, but doesn't appear to have recent 
>>> activity.
>>> The last I heard, one point of contention was where the duration 
>>> limit property
>>> exists: within the command, or the queue. From my perspective, if 
>>> it's not at
>>> the queue level, the limit becomes meaningless, but hey, it's not up 
>>> to me.
>>
>> Limit attached to the command makes things more flexible and easier 
>> for the
>> host, so personally, I prefer that. But this has an impact on the 
>> controller:
>> the device needs to pull in *all* commands to be able to know the 
>> limits and do
>> scheduling/aborts appropriately. That is not something that the device 
>> designers
>> like, for obvious reasons (device internal resources...).
>>
>> On the other hand, limits attached to queues could lead to either a 
>> serious
>> increase in the number of queues (PCI space & number of IRQ vectors 
>> limits), or,
>> loss of performance as a particular queue with the desired limit would be
>> accessed from multiple CPUs on the host (lock contention). Tricky 
>> problem I
>> think with lots of compromises.
> 
> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
> the queue level would cause the host to open more queues?
> 
> Another question, does CDL have any relationship with NVMe "Time Limited
> Error Recovery"? where the host can set a feature for timeout and
> indicate if the controller should respect it per command?
> 
> While this is not a full-blown every queue/command has its own timeout,
> it could address the original use-case given by Hannes. And it's already
> there.
I guess that is the NVMe version of CDLs; can you give me a reference 
for it?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-27 17:28                     ` Hannes Reinecke
@ 2023-02-27 17:44                       ` Keith Busch
  2023-02-27 21:18                         ` Damien Le Moal
                                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Keith Busch @ 2023-02-27 17:44 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, Damien Le Moal, Chaitanya Kulkarni, hch@lst.de,
	martin.petersen@oracle.com, dgilbert@interlog.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote:
> On 2/27/23 17:33, Sagi Grimberg wrote:
> > 
> > I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
> > the queue level would cause the host to open more queues?

Because each CDL class would need its own submission queue in that scheme. They
can all share a single completion queue, so this scheme doesn't necassarily
increase the number of interrupt vectors.

> > Another question, does CDL have any relationship with NVMe "Time Limited
> > Error Recovery"? where the host can set a feature for timeout and
> > indicate if the controller should respect it per command?
> > 
> > While this is not a full-blown every queue/command has its own timeout,
> > it could address the original use-case given by Hannes. And it's already
> > there.
> I guess that is the NVMe version of CDLs; can you give me a reference for
> it?

They're not the same. TLER starts timing after a command experiences a
recoverable error, where CDL is an end-to-end timing for all commands.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-27 17:44                       ` Keith Busch
@ 2023-02-27 21:18                         ` Damien Le Moal
  2023-02-27 21:42                         ` Damien Le Moal
  2023-02-28  8:05                         ` Sagi Grimberg
  2 siblings, 0 replies; 22+ messages in thread
From: Damien Le Moal @ 2023-02-27 21:18 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: Sagi Grimberg, Chaitanya Kulkarni, hch@lst.de,
	martin.petersen@oracle.com, dgilbert@interlog.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/28/23 02:44, Keith Busch wrote:
> On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote:
>> On 2/27/23 17:33, Sagi Grimberg wrote:
>>>
>>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
>>> the queue level would cause the host to open more queues?
> 
> Because each CDL class would need its own submission queue in that scheme. They
> can all share a single completion queue, so this scheme doesn't necassarily
> increase the number of interrupt vectors.

Ah yes. good point. I always forget about the shared completion queue :)

>>> Another question, does CDL have any relationship with NVMe "Time Limited
>>> Error Recovery"? where the host can set a feature for timeout and
>>> indicate if the controller should respect it per command?
>>>
>>> While this is not a full-blown every queue/command has its own timeout,
>>> it could address the original use-case given by Hannes. And it's already
>>> there.
>> I guess that is the NVMe version of CDLs; can you give me a reference for
>> it?
> 
> They're not the same. TLER starts timing after a command experiences a
> recoverable error, where CDL is an end-to-end timing for all commands.

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-27 17:44                       ` Keith Busch
  2023-02-27 21:18                         ` Damien Le Moal
@ 2023-02-27 21:42                         ` Damien Le Moal
  2023-02-28  8:05                         ` Sagi Grimberg
  2 siblings, 0 replies; 22+ messages in thread
From: Damien Le Moal @ 2023-02-27 21:42 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: Sagi Grimberg, Chaitanya Kulkarni, hch@lst.de,
	martin.petersen@oracle.com, dgilbert@interlog.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/28/23 02:44, Keith Busch wrote:
> On Mon, Feb 27, 2023 at 06:28:41PM +0100, Hannes Reinecke wrote:
>> On 2/27/23 17:33, Sagi Grimberg wrote:
>>>
>>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
>>> the queue level would cause the host to open more queues?
> 
> Because each CDL class would need its own submission queue in that scheme. They
> can all share a single completion queue, so this scheme doesn't necassarily
> increase the number of interrupt vectors.
> 
>>> Another question, does CDL have any relationship with NVMe "Time Limited
>>> Error Recovery"? where the host can set a feature for timeout and
>>> indicate if the controller should respect it per command?
>>>
>>> While this is not a full-blown every queue/command has its own timeout,
>>> it could address the original use-case given by Hannes. And it's already
>>> there.
>> I guess that is the NVMe version of CDLs; can you give me a reference for
>> it?
> 
> They're not the same. TLER starts timing after a command experiences a
> recoverable error, where CDL is an end-to-end timing for all commands.

Note here that with the current T10/T13 CDL definitions, end-to-end actually
means from the time the command is received by the device to the time the device
signals the command completion.

That does not include the transport & host adapter queueing (if there is an
HBA). And I guess this is the issue at hand for fabrics: how to integrate the
transport times. I guess the CDL descriptors could have one additional limit for
that, but then the duration guideline limit definition would need to be tweaked.

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-27 17:44                       ` Keith Busch
  2023-02-27 21:18                         ` Damien Le Moal
  2023-02-27 21:42                         ` Damien Le Moal
@ 2023-02-28  8:05                         ` Sagi Grimberg
  2 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2023-02-28  8:05 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke
  Cc: Damien Le Moal, Chaitanya Kulkarni, hch@lst.de,
	martin.petersen@oracle.com, dgilbert@interlog.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org


>>> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
>>> the queue level would cause the host to open more queues?
> 
> Because each CDL class would need its own submission queue in that scheme. They
> can all share a single completion queue, so this scheme doesn't necassarily
> increase the number of interrupt vectors.

Ah, that is less desirable I think, Although we can already do
multiple queue maps, I think that the proliferation of queues is
harmful in the long run.

>>> Another question, does CDL have any relationship with NVMe "Time Limited
>>> Error Recovery"? where the host can set a feature for timeout and
>>> indicate if the controller should respect it per command?
>>>
>>> While this is not a full-blown every queue/command has its own timeout,
>>> it could address the original use-case given by Hannes. And it's already
>>> there.
>> I guess that is the NVMe version of CDLs; can you give me a reference for
>> it?
> 
> They're not the same. TLER starts timing after a command experiences a
> recoverable error, where CDL is an end-to-end timing for all commands.

Ah, ok. I didn't realize that TLER starts after an error.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-27 16:33                   ` Sagi Grimberg
  2023-02-27 17:28                     ` Hannes Reinecke
@ 2023-02-27 21:17                     ` Damien Le Moal
  1 sibling, 0 replies; 22+ messages in thread
From: Damien Le Moal @ 2023-02-27 21:17 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch, Chaitanya Kulkarni
  Cc: hch@lst.de, martin.petersen@oracle.com, dgilbert@interlog.com,
	Hannes Reinecke, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	lsf-pc@lists.linuxfoundation.org

On 2/28/23 01:33, Sagi Grimberg wrote:
> 
>>> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>>>> I do think that we should work on CDL for NVMe as it will solve some of
>>>> the timeout related problems effectively than using aborts or any other
>>>> mechanism.
>>>
>>> That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
>>> The last I heard, one point of contention was where the duration limit property
>>> exists: within the command, or the queue. From my perspective, if it's not at
>>> the queue level, the limit becomes meaningless, but hey, it's not up to me.
>>
>> Limit attached to the command makes things more flexible and easier for the
>> host, so personally, I prefer that. But this has an impact on the controller:
>> the device needs to pull in *all* commands to be able to know the limits and do
>> scheduling/aborts appropriately. That is not something that the device designers
>> like, for obvious reasons (device internal resources...).
>>
>> On the other hand, limits attached to queues could lead to either a serious
>> increase in the number of queues (PCI space & number of IRQ vectors limits), or,
>> loss of performance as a particular queue with the desired limit would be
>> accessed from multiple CPUs on the host (lock contention). Tricky problem I
>> think with lots of compromises.
> 
> I'm not up to speed on how CDL is defined, but I'm unclear how CDL at
> the queue level would cause the host to open more queues?

There would be the need for one queue pair per limit defined, in addition to the
regular "no limits" queue pairs. And given that CDL allows defining up to 7
limits for read AND write commands, if kept as-is, this means potentially 14
additional queue pairs shared among all CPUs, or even more than that if one
wants per CPU queues with limits.

> Another question, does CDL have any relationship with NVMe "Time Limited
> Error Recovery"? where the host can set a feature for timeout and
> indicate if the controller should respect it per command?

This NVMe feature does map to one of the possible limits that can be defined
with CDL. CDL currently allows 3 different limits:
 - Active time limit: limit on command execution involving media access
 - inactive time limit: limit on device internal queueing time before processing
of the command starts (aging control)
 - Duration guideline: overall limit on the command processing by the device

> While this is not a full-blown every queue/command has its own timeout,
> it could address the original use-case given by Hannes. And it's already
> there.

The above limits are what is currently defined in T10/T13 for SCSI/ATA devices.
NVMe may need some tweaks to get a better mapping to the different use cases.

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [LSF/MM/BPF BOF] Userspace command abouts
  2023-02-25  1:51               ` Keith Busch
  2023-02-25  4:15                 ` Damien Le Moal
@ 2023-02-27  8:20                 ` Hannes Reinecke
  1 sibling, 0 replies; 22+ messages in thread
From: Hannes Reinecke @ 2023-02-27  8:20 UTC (permalink / raw)
  To: Keith Busch, Chaitanya Kulkarni
  Cc: Sagi Grimberg, hch@lst.de, martin.petersen@oracle.com,
	Damien Le Moal, dgilbert@interlog.com,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, lsf-pc@lists.linuxfoundation.org

On 2/25/23 02:51, Keith Busch wrote:
> On Fri, Feb 24, 2023 at 11:54:39PM +0000, Chaitanya Kulkarni wrote:
>> I do think that we should work on CDL for NVMe as it will solve some of
>> the timeout related problems effectively than using aborts or any other
>> mechanism.
> 
> That proposal exists in NVMe TWG, but doesn't appear to have recent activity.
> The last I heard, one point of contention was where the duration limit property
> exists: within the command, or the queue. From my perspective, if it's not at
> the queue level, the limit becomes meaningless, but hey, it's not up to me.

And that is one of the issues I'd like to discuss.
As it stands CDL are defined for the controller only, queuing effects 
from the transport are out of scope (for the current CDL definition).
So for NVMe-oF we would need to discuss how we can specify CDLs for 
fabrics; especially the relationship between CDLs and transport timeouts 
are ... interesting, and we need to discuss how we can correlate both.

Having it on the queue as you suggested would be cool as it would give a 
nice overall number, but discussions with the driver vendors were not 
encouraging; they're having a hard time giving timeout guarantees in 
really quirky failure cases.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-02-28  8:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-16 11:50 [LSF/MM/BPF BOF] Userspace command abouts Hannes Reinecke
2023-02-16 16:40 ` Keith Busch
2023-02-17 18:53   ` Chaitanya Kulkarni
2023-02-18  9:50     ` [LSF/MM/BPF BOF] Userspace command aborts Hannes Reinecke
2023-02-21 18:15       ` Chaitanya Kulkarni
2023-02-20 11:24   ` [LSF/MM/BPF BOF] Userspace command abouts Sagi Grimberg
2023-02-21 16:25     ` Douglas Gilbert
2023-02-22 14:37       ` Sagi Grimberg
2023-02-22 14:53         ` Keith Busch
2023-02-23 15:35           ` Sagi Grimberg
2023-02-24 23:54             ` Chaitanya Kulkarni
2023-02-25  1:51               ` Keith Busch
2023-02-25  4:15                 ` Damien Le Moal
2023-02-25 16:14                   ` James Smart
2023-02-27 16:33                   ` Sagi Grimberg
2023-02-27 17:28                     ` Hannes Reinecke
2023-02-27 17:44                       ` Keith Busch
2023-02-27 21:18                         ` Damien Le Moal
2023-02-27 21:42                         ` Damien Le Moal
2023-02-28  8:05                         ` Sagi Grimberg
2023-02-27 21:17                     ` Damien Le Moal
2023-02-27  8:20                 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox