[PATCH] virtio_blk: always post notifications under the lock

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] virtio_blk: always post notifications under the lock
@ 2025-01-07 18:25 Andrew Boyer
  2025-01-08  2:02 ` Jason Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Andrew Boyer @ 2025-01-07 18:25 UTC (permalink / raw)
  To: andrew.boyer
  Cc: Viktor Prutyanov, Michael S . Tsirkin, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization, linux-block

Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature
support") added notification data support to the core virtio driver
code. When this feature is enabled, the notification includes the
updated producer index for the queue. Thus it is now critical that
notifications arrive in order.

The virtio_blk driver has historically not worried about notification
ordering. Modify it so that the prepare and kick steps are both done
under the vq lock.

Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature support")
Cc: Viktor Prutyanov <viktor@daynix.com>
Cc: virtualization@lists.linux.dev
Cc: linux-block@vger.kernel.org
---
 drivers/block/virtio_blk.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 3efe378f1386..14d9e66bb844 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
 {
 	struct virtio_blk *vblk = hctx->queue->queuedata;
 	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
-	bool kick;
 
 	spin_lock_irq(&vq->lock);
-	kick = virtqueue_kick_prepare(vq->vq);
+	virtqueue_kick(vq->vq);
 	spin_unlock_irq(&vq->lock);
-
-	if (kick)
-		virtqueue_notify(vq->vq);
 }
 
 static blk_status_t virtblk_fail_to_queue(struct request *req, int rc)
@@ -432,7 +428,6 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 	unsigned long flags;
 	int qid = hctx->queue_num;
-	bool notify = false;
 	blk_status_t status;
 	int err;
 
@@ -454,12 +449,10 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 		return virtblk_fail_to_queue(req, err);
 	}
 
-	if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
-		notify = true;
+	if (bd->last)
+		virtqueue_kick(vblk->vqs[qid].vq);
 	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
 
-	if (notify)
-		virtqueue_notify(vblk->vqs[qid].vq);
 	return BLK_STS_OK;
 }
 
@@ -476,7 +469,6 @@ static void virtblk_add_req_batch(struct virtio_blk_vq *vq,
 {
 	struct request *req;
 	unsigned long flags;
-	bool kick;
 
 	spin_lock_irqsave(&vq->lock, flags);
 
@@ -492,11 +484,8 @@ static void virtblk_add_req_batch(struct virtio_blk_vq *vq,
 		}
 	}
 
-	kick = virtqueue_kick_prepare(vq->vq);
+	virtqueue_kick(vq->vq);
 	spin_unlock_irqrestore(&vq->lock, flags);
-
-	if (kick)
-		virtqueue_notify(vq->vq);
 }
 
 static void virtio_queue_rqs(struct rq_list *rqlist)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-07 18:25 [PATCH] virtio_blk: always post notifications under the lock Andrew Boyer
@ 2025-01-08  2:02 ` Jason Wang
  2025-01-09 10:19 ` Michael S. Tsirkin
  2025-01-09 12:01 ` Christian Borntraeger
  2 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2025-01-08  2:02 UTC (permalink / raw)
  To: Andrew Boyer
  Cc: Viktor Prutyanov, Michael S . Tsirkin, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization, linux-block

On Wed, Jan 8, 2025 at 2:27 AM Andrew Boyer <andrew.boyer@amd.com> wrote:
>
> Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature
> support") added notification data support to the core virtio driver
> code. When this feature is enabled, the notification includes the
> updated producer index for the queue. Thus it is now critical that
> notifications arrive in order.
>
> The virtio_blk driver has historically not worried about notification
> ordering. Modify it so that the prepare and kick steps are both done
> under the vq lock.

Do we have performance numbers when VIRTIO_F_NOTIFICATION_DATA is not
negotiated?

We need to make sure it doesn't introduce any regression in the case
like virtualization setup since now there could be an vmexit when
holding the virtqueue lock.

Thanks

>
> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
> Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature support")
> Cc: Viktor Prutyanov <viktor@daynix.com>
> Cc: virtualization@lists.linux.dev
> Cc: linux-block@vger.kernel.org
> ---
>  drivers/block/virtio_blk.c | 19 ++++---------------
>  1 file changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 3efe378f1386..14d9e66bb844 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>  {
>         struct virtio_blk *vblk = hctx->queue->queuedata;
>         struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> -       bool kick;
>
>         spin_lock_irq(&vq->lock);
> -       kick = virtqueue_kick_prepare(vq->vq);
> +       virtqueue_kick(vq->vq);
>         spin_unlock_irq(&vq->lock);
> -
> -       if (kick)
> -               virtqueue_notify(vq->vq);
>  }
>
>  static blk_status_t virtblk_fail_to_queue(struct request *req, int rc)
> @@ -432,7 +428,6 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>         struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>         unsigned long flags;
>         int qid = hctx->queue_num;
> -       bool notify = false;
>         blk_status_t status;
>         int err;
>
> @@ -454,12 +449,10 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>                 return virtblk_fail_to_queue(req, err);
>         }
>
> -       if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
> -               notify = true;
> +       if (bd->last)
> +               virtqueue_kick(vblk->vqs[qid].vq);
>         spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>
> -       if (notify)
> -               virtqueue_notify(vblk->vqs[qid].vq);
>         return BLK_STS_OK;
>  }
>
> @@ -476,7 +469,6 @@ static void virtblk_add_req_batch(struct virtio_blk_vq *vq,
>  {
>         struct request *req;
>         unsigned long flags;
> -       bool kick;
>
>         spin_lock_irqsave(&vq->lock, flags);
>
> @@ -492,11 +484,8 @@ static void virtblk_add_req_batch(struct virtio_blk_vq *vq,
>                 }
>         }
>
> -       kick = virtqueue_kick_prepare(vq->vq);
> +       virtqueue_kick(vq->vq);
>         spin_unlock_irqrestore(&vq->lock, flags);
> -
> -       if (kick)
> -               virtqueue_notify(vq->vq);
>  }
>
>  static void virtio_queue_rqs(struct rq_list *rqlist)
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-07 18:25 [PATCH] virtio_blk: always post notifications under the lock Andrew Boyer
  2025-01-08  2:02 ` Jason Wang
@ 2025-01-09 10:19 ` Michael S. Tsirkin
  2025-01-09 12:01 ` Christian Borntraeger
  2 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-09 10:19 UTC (permalink / raw)
  To: Andrew Boyer
  Cc: Viktor Prutyanov, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe, virtualization, linux-block

On Tue, Jan 07, 2025 at 10:25:16AM -0800, Andrew Boyer wrote:
> Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature
> support") added notification data support to the core virtio driver
> code. When this feature is enabled, the notification includes the
> updated producer index for the queue. Thus it is now critical that
> notifications arrive in order.
> 
> The virtio_blk driver has historically not worried about notification
> ordering. Modify it so that the prepare and kick steps are both done
> under the vq lock.
> 
> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
> Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature support")
> Cc: Viktor Prutyanov <viktor@daynix.com>
> Cc: virtualization@lists.linux.dev
> Cc: linux-block@vger.kernel.org


Hmm. Not good, notify can be very slow, holding a lock is a bad idea.
Basically, virtqueue_notify must work ouside of locks, this
means af8ececda185 is broken and we did not notice.

Let's fix it please.

Try some kind of compare and swap scheme where we detect that index
was updated since? Will allow skipping a notification, too.




> ---
>  drivers/block/virtio_blk.c | 19 ++++---------------
>  1 file changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 3efe378f1386..14d9e66bb844 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>  {
>  	struct virtio_blk *vblk = hctx->queue->queuedata;
>  	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> -	bool kick;
>  
>  	spin_lock_irq(&vq->lock);
> -	kick = virtqueue_kick_prepare(vq->vq);
> +	virtqueue_kick(vq->vq);
>  	spin_unlock_irq(&vq->lock);
> -
> -	if (kick)
> -		virtqueue_notify(vq->vq);
>  }
>  
>  static blk_status_t virtblk_fail_to_queue(struct request *req, int rc)
> @@ -432,7 +428,6 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>  	unsigned long flags;
>  	int qid = hctx->queue_num;
> -	bool notify = false;
>  	blk_status_t status;
>  	int err;
>  
> @@ -454,12 +449,10 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		return virtblk_fail_to_queue(req, err);
>  	}
>  
> -	if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
> -		notify = true;
> +	if (bd->last)
> +		virtqueue_kick(vblk->vqs[qid].vq);
>  	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>  
> -	if (notify)
> -		virtqueue_notify(vblk->vqs[qid].vq);
>  	return BLK_STS_OK;
>  }
>  
> @@ -476,7 +469,6 @@ static void virtblk_add_req_batch(struct virtio_blk_vq *vq,
>  {
>  	struct request *req;
>  	unsigned long flags;
> -	bool kick;
>  
>  	spin_lock_irqsave(&vq->lock, flags);
>  
> @@ -492,11 +484,8 @@ static void virtblk_add_req_batch(struct virtio_blk_vq *vq,
>  		}
>  	}
>  
> -	kick = virtqueue_kick_prepare(vq->vq);
> +	virtqueue_kick(vq->vq);
>  	spin_unlock_irqrestore(&vq->lock, flags);
> -
> -	if (kick)
> -		virtqueue_notify(vq->vq);
>  }
>  
>  static void virtio_queue_rqs(struct rq_list *rqlist)
> -- 
> 2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-07 18:25 [PATCH] virtio_blk: always post notifications under the lock Andrew Boyer
  2025-01-08  2:02 ` Jason Wang
  2025-01-09 10:19 ` Michael S. Tsirkin
@ 2025-01-09 12:01 ` Christian Borntraeger
  2025-01-09 13:42   ` Michael S. Tsirkin
  2 siblings, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2025-01-09 12:01 UTC (permalink / raw)
  To: Andrew Boyer
  Cc: Viktor Prutyanov, Michael S . Tsirkin, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization, linux-block



Am 07.01.25 um 19:25 schrieb Andrew Boyer:
> Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature
> support") added notification data support to the core virtio driver
> code. When this feature is enabled, the notification includes the
> updated producer index for the queue. Thus it is now critical that
> notifications arrive in order.
> 
> The virtio_blk driver has historically not worried about notification
> ordering. Modify it so that the prepare and kick steps are both done
> under the vq lock.
> 
> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
> Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature support")
> Cc: Viktor Prutyanov <viktor@daynix.com>
> Cc: virtualization@lists.linux.dev
> Cc: linux-block@vger.kernel.org
> ---
>   drivers/block/virtio_blk.c | 19 ++++---------------
>   1 file changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 3efe378f1386..14d9e66bb844 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>   {
>   	struct virtio_blk *vblk = hctx->queue->queuedata;
>   	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> -	bool kick;
>   
>   	spin_lock_irq(&vq->lock);
> -	kick = virtqueue_kick_prepare(vq->vq);
> +	virtqueue_kick(vq->vq);
>   	spin_unlock_irq(&vq->lock);
> -
> -	if (kick)
> -		virtqueue_notify(vq->vq);
>   }

I would assume this will be a performance nightmare for normal IO.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-09 12:01 ` Christian Borntraeger
@ 2025-01-09 13:42   ` Michael S. Tsirkin
  2025-01-22 14:48     ` Boyer, Andrew
       [not found]     ` <FE77DD4F-AB39-4781-9D24-06F171F47FED@amd.com>
  0 siblings, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-09 13:42 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Andrew Boyer, Viktor Prutyanov, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization, linux-block

On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote:
> 
> 
> Am 07.01.25 um 19:25 schrieb Andrew Boyer:
> > Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature
> > support") added notification data support to the core virtio driver
> > code. When this feature is enabled, the notification includes the
> > updated producer index for the queue. Thus it is now critical that
> > notifications arrive in order.
> > 
> > The virtio_blk driver has historically not worried about notification
> > ordering. Modify it so that the prepare and kick steps are both done
> > under the vq lock.
> > 
> > Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
> > Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> > Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature support")
> > Cc: Viktor Prutyanov <viktor@daynix.com>
> > Cc: virtualization@lists.linux.dev
> > Cc: linux-block@vger.kernel.org
> > ---
> >   drivers/block/virtio_blk.c | 19 ++++---------------
> >   1 file changed, 4 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 3efe378f1386..14d9e66bb844 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >   {
> >   	struct virtio_blk *vblk = hctx->queue->queuedata;
> >   	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> > -	bool kick;
> >   	spin_lock_irq(&vq->lock);
> > -	kick = virtqueue_kick_prepare(vq->vq);
> > +	virtqueue_kick(vq->vq);
> >   	spin_unlock_irq(&vq->lock);
> > -
> > -	if (kick)
> > -		virtqueue_notify(vq->vq);
> >   }
> 
> I would assume this will be a performance nightmare for normal IO.

Indeed.
AMD guys, can't device survive with reordered notifications?
Basically just drop a notification if you see index
going back?


-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-09 13:42   ` Michael S. Tsirkin
@ 2025-01-22 14:48     ` Boyer, Andrew
       [not found]     ` <FE77DD4F-AB39-4781-9D24-06F171F47FED@amd.com>
  1 sibling, 0 replies; 24+ messages in thread
From: Boyer, Andrew @ 2025-01-22 14:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christian Borntraeger, Boyer, Andrew, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Jan 9, 2025, at 8:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote:
>> 
>> 
>> Am 07.01.25 um 19:25 schrieb Andrew Boyer:
>>> Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature
>>> support") added notification data support to the core virtio driver
>>> code. When this feature is enabled, the notification includes the
>>> updated producer index for the queue. Thus it is now critical that
>>> notifications arrive in order.
>>> 
>>> The virtio_blk driver has historically not worried about notification
>>> ordering. Modify it so that the prepare and kick steps are both done
>>> under the vq lock.
>>> 
>>> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
>>> Reviewed-by: Brett Creeley <brett.creeley@amd.com>
>>> Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA feature support")
>>> Cc: Viktor Prutyanov <viktor@daynix.com>
>>> Cc: virtualization@lists.linux.dev
>>> Cc: linux-block@vger.kernel.org
>>> ---
>>>  drivers/block/virtio_blk.c | 19 ++++---------------
>>>  1 file changed, 4 insertions(+), 15 deletions(-)
>>> 
>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>> index 3efe378f1386..14d9e66bb844 100644
>>> --- a/drivers/block/virtio_blk.c
>>> +++ b/drivers/block/virtio_blk.c
>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>  {
>>>    struct virtio_blk *vblk = hctx->queue->queuedata;
>>>    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>> -   bool kick;
>>>    spin_lock_irq(&vq->lock);
>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>> +   virtqueue_kick(vq->vq);
>>>    spin_unlock_irq(&vq->lock);
>>> -
>>> -   if (kick)
>>> -           virtqueue_notify(vq->vq);
>>>  }
>> 
>> I would assume this will be a performance nightmare for normal IO.
> 

Hello Michael and Christian and Jason,
Thank you for taking a look.

Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.

I don't have any sort of setup where I can try it but I would appreciate it if someone else could.

> Hmm. Not good, notify can be very slow, holding a lock is a bad idea.
> Basically, virtqueue_notify must work ouside of locks, this
> means af8ececda185 is broken and we did not notice.
> 
> Let's fix it please.

With so many broken kernels already in the wild, I think disabling F_NOTIFICATION_DATA for virtio-blk would be a reasonable solution.

> Try some kind of compare and swap scheme where we detect that index
> was updated since? Will allow skipping a notification, too.

Do you have an idea of how this might be done? Anything I've come up with involves a lock. Would it be doable to have a lock for the vq management stuff and a second one to post notifications?

> AMD guys, can't device survive with reordered notifications?
> Basically just drop a notification if you see index
> going back?

This is the driver lying to us about the state of the queue; it's not going to be possible for us to work around it in hardware. For starters, how would we detect queue wrap around?

Thank you,
Andrew

(Apologies for sending twice, mail client issue)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
       [not found]     ` <FE77DD4F-AB39-4781-9D24-06F171F47FED@amd.com>
@ 2025-01-22 15:13       ` Michael S. Tsirkin
  2025-01-22 17:45         ` Boyer, Andrew
  2025-01-22 15:15       ` (repost) " Michael S. Tsirkin
  2025-01-22 17:33       ` Christian Borntraeger
  2 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-22 15:13 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 02:44:50PM +0000, Boyer, Andrew wrote:
> 
> 
>     On Jan 9, 2025, at 8:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote:
> 
> 
>         Am 07.01.25 um 19:25 schrieb Andrew Boyer:
> 
>             Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
>             feature
>             support") added notification data support to the core virtio driver
>             code. When this feature is enabled, the notification includes the
>             updated producer index for the queue. Thus it is now critical that
>             notifications arrive in order.
> 
>             The virtio_blk driver has historically not worried about
>             notification
>             ordering. Modify it so that the prepare and kick steps are both
>             done
>             under the vq lock.
> 
>             Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
>             Reviewed-by: Brett Creeley <brett.creeley@amd.com>
>             Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
>             feature support")
>             Cc: Viktor Prutyanov <viktor@daynix.com>
>             Cc: virtualization@lists.linux.dev
>             Cc: linux-block@vger.kernel.org
>             ---
>              drivers/block/virtio_blk.c | 19 ++++---------------
>              1 file changed, 4 insertions(+), 15 deletions(-)
> 
>             diff --git a/drivers/block/virtio_blk.c b/drivers/block/
>             virtio_blk.c
>             index 3efe378f1386..14d9e66bb844 100644
>             --- a/drivers/block/virtio_blk.c
>             +++ b/drivers/block/virtio_blk.c
>             @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct
>             blk_mq_hw_ctx *hctx)
>              {
>                struct virtio_blk *vblk = hctx->queue->queuedata;
>                struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>             -   bool kick;
>                spin_lock_irq(&vq->lock);
>             -   kick = virtqueue_kick_prepare(vq->vq);
>             +   virtqueue_kick(vq->vq);
>                spin_unlock_irq(&vq->lock);
>             -
>             -   if (kick)
>             -           virtqueue_notify(vq->vq);
>              }
> 
> 
>         I would assume this will be a performance nightmare for normal IO.
> 
> 
> 
> 
> Hello Michael and Christian and Jason,
> Thank you for taking a look.
> 
> Is the performance concern that the vmexit might lead to the underlying virtual
> storage stack doing the work immediately? Any other job posting to the same
> queue would presumably be blocked on a vmexit when it goes to attempt its own
> notification. That would be almost the same as having the other job block on a
> lock during the operation, although I guess if you are skipping notifications
> somehow it would look different.
> 
> I don't have any sort of setup where I can try it but I would appreciate it if
> someone else could.
> 
> 
>     Hmm. Not good, notify can be very slow, holding a lock is a bad idea.
>     Basically, virtqueue_notify must work ouside of locks, this
>     means af8ececda185 is broken and we did not notice.
> 
>     Let's fix it please.
> 
> 
> With so many broken kernels already in the wild, I think disabling
> F_NOTIFICATION_DATA for virtio-blk would be a reasonable solution.

Some devices might fail feature negotiation then.
I am not sure they are broken, devices might simply be able to
handle out of order values.


> 
>     Try some kind of compare and swap scheme where we detect that index
>     was updated since? Will allow skipping a notification, too.
> 
> 
> Do you have an idea of how this might be done? Anything I've come up with
> involves a lock.
>
> Would it be doable to have a lock for the vq management stuff
> and a second one to post notifications?


and only for when F_NOTIFICATION_DATA is set. not terrible ok I think.

> 
>     AMD guys, can't device survive with reordered notifications?
>     Basically just drop a notification if you see index
>     going back?
> 
> 
> This is the driver lying to us about the state of the queue; it's not going to
> be possible for us to work around it in hardware. For starters, how would we
> detect queue wrap around?
> 
> Thank you,
> Andrew

The index is a running value for split, for wrap arounds, there is
a special bit for that. No?


> 
> 
>     --
>     MST
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* (repost) Re: [PATCH] virtio_blk: always post notifications under the lock
       [not found]     ` <FE77DD4F-AB39-4781-9D24-06F171F47FED@amd.com>
  2025-01-22 15:13       ` Michael S. Tsirkin
@ 2025-01-22 15:15       ` Michael S. Tsirkin
  2025-01-22 17:33       ` Christian Borntraeger
  2 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-22 15:15 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 02:44:50PM +0000, Boyer, Andrew wrote:
> 
> 
>     On Jan 9, 2025, at 8:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote:
> 
> 
>         Am 07.01.25 um 19:25 schrieb Andrew Boyer:
> 
>             Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
>             feature
>             support") added notification data support to the core virtio driver
>             code. When this feature is enabled, the notification includes the
>             updated producer index for the queue. Thus it is now critical that
>             notifications arrive in order.
> 
>             The virtio_blk driver has historically not worried about
>             notification
>             ordering. Modify it so that the prepare and kick steps are both
>             done
>             under the vq lock.
> 
>             Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
>             Reviewed-by: Brett Creeley <brett.creeley@amd.com>
>             Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
>             feature support")
>             Cc: Viktor Prutyanov <viktor@daynix.com>
>             Cc: virtualization@lists.linux.dev
>             Cc: linux-block@vger.kernel.org
>             ---
>              drivers/block/virtio_blk.c | 19 ++++---------------
>              1 file changed, 4 insertions(+), 15 deletions(-)
> 
>             diff --git a/drivers/block/virtio_blk.c b/drivers/block/
>             virtio_blk.c
>             index 3efe378f1386..14d9e66bb844 100644
>             --- a/drivers/block/virtio_blk.c
>             +++ b/drivers/block/virtio_blk.c
>             @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct
>             blk_mq_hw_ctx *hctx)
>              {
>                struct virtio_blk *vblk = hctx->queue->queuedata;
>                struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>             -   bool kick;
>                spin_lock_irq(&vq->lock);
>             -   kick = virtqueue_kick_prepare(vq->vq);
>             +   virtqueue_kick(vq->vq);
>                spin_unlock_irq(&vq->lock);
>             -
>             -   if (kick)
>             -           virtqueue_notify(vq->vq);
>              }
> 
> 
>         I would assume this will be a performance nightmare for normal IO.
> 
> 
> 
> 
> Hello Michael and Christian and Jason,
> Thank you for taking a look.
> 
> Is the performance concern that the vmexit might lead to the underlying virtual
> storage stack doing the work immediately? Any other job posting to the same
> queue would presumably be blocked on a vmexit when it goes to attempt its own
> notification. That would be almost the same as having the other job block on a
> lock during the operation, although I guess if you are skipping notifications
> somehow it would look different.
> 
> I don't have any sort of setup where I can try it but I would appreciate it if
> someone else could.
> 
> 
>     Hmm. Not good, notify can be very slow, holding a lock is a bad idea.
>     Basically, virtqueue_notify must work ouside of locks, this
>     means af8ececda185 is broken and we did not notice.
> 
>     Let's fix it please.
> 
> 
> With so many broken kernels already in the wild, I think disabling
> F_NOTIFICATION_DATA for virtio-blk would be a reasonable solution.

Some devices might fail feature negotiation then.
I am not sure they are broken, devices might simply be able to
handle out of order values.


> 
>     Try some kind of compare and swap scheme where we detect that index
>     was updated since? Will allow skipping a notification, too.
> 
> 
> Do you have an idea of how this might be done? Anything I've come up with
> involves a lock.
>
> Would it be doable to have a lock for the vq management stuff
> and a second one to post notifications?


and only for when F_NOTIFICATION_DATA is set. not terrible ok I think.

> 
>     AMD guys, can't device survive with reordered notifications?
>     Basically just drop a notification if you see index
>     going back?
> 
> 
> This is the driver lying to us about the state of the queue; it's not going to
> be possible for us to work around it in hardware. For starters, how would we
> detect queue wrap around?
> 
> Thank you,
> Andrew

The index is a running value for split, for wrap arounds, there is
a special bit for that. No?


> 
> 
>     --
>     MST
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
       [not found]     ` <FE77DD4F-AB39-4781-9D24-06F171F47FED@amd.com>
  2025-01-22 15:13       ` Michael S. Tsirkin
  2025-01-22 15:15       ` (repost) " Michael S. Tsirkin
@ 2025-01-22 17:33       ` Christian Borntraeger
  2025-01-22 17:49         ` Boyer, Andrew
  2025-01-22 22:07         ` Michael S. Tsirkin
  2 siblings, 2 replies; 24+ messages in thread
From: Christian Borntraeger @ 2025-01-22 17:33 UTC (permalink / raw)
  To: Boyer, Andrew, Michael S. Tsirkin
  Cc: Jason Wang, Paolo Bonzini, Stefan Hajnoczi, Eugenio Perez,
	Xuan Zhuo, Jens Axboe, virtualization@lists.linux.dev,
	linux-block@vger.kernel.org, Nelson, Shannon, Creeley, Brett,
	Hubbe, Allen

Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
[...]

>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>  {
>>>>    struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>> -   bool kick;
>>>>    spin_lock_irq(&vq->lock);
>>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>>> +   virtqueue_kick(vq->vq);
>>>>    spin_unlock_irq(&vq->lock);
>>>> -
>>>> -   if (kick)
>>>> -           virtqueue_notify(vq->vq);
>>>>  }
>>>
>>> I would assume this will be a performance nightmare for normal IO.
>>
> 
> Hello Michael and Christian and Jason,
> Thank you for taking a look.
> 
> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.

The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
Really, dont do it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 15:13       ` Michael S. Tsirkin
@ 2025-01-22 17:45         ` Boyer, Andrew
  2025-01-23  6:53           ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Boyer, Andrew @ 2025-01-22 17:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Boyer, Andrew, Christian Borntraeger, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen



> On Jan 22, 2025, at 10:13 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Jan 22, 2025 at 02:44:50PM +0000, Boyer, Andrew wrote:
>> 
>> 
>>    On Jan 9, 2025, at 8:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> 
>>    On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote:
>> 
>> 
>>        Am 07.01.25 um 19:25 schrieb Andrew Boyer:
>> 
>>            Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
>>            feature
>>            support") added notification data support to the core virtio driver
>>            code. When this feature is enabled, the notification includes the
>>            updated producer index for the queue. Thus it is now critical that
>>            notifications arrive in order.
>> 
>>            The virtio_blk driver has historically not worried about
>>            notification
>>            ordering. Modify it so that the prepare and kick steps are both
>>            done
>>            under the vq lock.
>> 
>>            Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
>>            Reviewed-by: Brett Creeley <brett.creeley@amd.com>
>>            Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
>>            feature support")
>>            Cc: Viktor Prutyanov <viktor@daynix.com>
>>            Cc: virtualization@lists.linux.dev
>>            Cc: linux-block@vger.kernel.org
>>            ---
>>             drivers/block/virtio_blk.c | 19 ++++---------------
>>             1 file changed, 4 insertions(+), 15 deletions(-)
>> 
>>            diff --git a/drivers/block/virtio_blk.c b/drivers/block/
>>            virtio_blk.c
>>            index 3efe378f1386..14d9e66bb844 100644
>>            --- a/drivers/block/virtio_blk.c
>>            +++ b/drivers/block/virtio_blk.c
>>            @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct
>>            blk_mq_hw_ctx *hctx)
>>             {
>>               struct virtio_blk *vblk = hctx->queue->queuedata;
>>               struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>            -   bool kick;
>>               spin_lock_irq(&vq->lock);
>>            -   kick = virtqueue_kick_prepare(vq->vq);
>>            +   virtqueue_kick(vq->vq);
>>               spin_unlock_irq(&vq->lock);
>>            -
>>            -   if (kick)
>>            -           virtqueue_notify(vq->vq);
>>             }
>> 
>> 
>>        I would assume this will be a performance nightmare for normal IO.
>> 
>> 
>> 
>> 
>> Hello Michael and Christian and Jason,
>> Thank you for taking a look.
>> 
>> Is the performance concern that the vmexit might lead to the underlying virtual
>> storage stack doing the work immediately? Any other job posting to the same
>> queue would presumably be blocked on a vmexit when it goes to attempt its own
>> notification. That would be almost the same as having the other job block on a
>> lock during the operation, although I guess if you are skipping notifications
>> somehow it would look different.
>> 
>> I don't have any sort of setup where I can try it but I would appreciate it if
>> someone else could.
>> 
>> 
>>    Hmm. Not good, notify can be very slow, holding a lock is a bad idea.
>>    Basically, virtqueue_notify must work ouside of locks, this
>>    means af8ececda185 is broken and we did not notice.
>> 
>>    Let's fix it please.
>> 
>> 
>> With so many broken kernels already in the wild, I think disabling
>> F_NOTIFICATION_DATA for virtio-blk would be a reasonable solution.
> 
> Some devices might fail feature negotiation then.
> I am not sure they are broken, devices might simply be able to
> handle out of order values.
> 

A driver which does not support F_NOTIFICATION_DATA should just clear that bit. Surely devices which support it would also support not enabling it? Otherwise pre-6.4 kernels wouldn't work at all.

> 
>> 
>>    Try some kind of compare and swap scheme where we detect that index
>>    was updated since? Will allow skipping a notification, too.
>> 
>> 
>> Do you have an idea of how this might be done? Anything I've come up with
>> involves a lock.
>> 
>> Would it be doable to have a lock for the vq management stuff
>> and a second one to post notifications?
> 
> 
> and only for when F_NOTIFICATION_DATA is set. not terrible ok I think.
> 
>> 
>>    AMD guys, can't device survive with reordered notifications?
>>    Basically just drop a notification if you see index
>>    going back?
>> 
>> 
>> This is the driver lying to us about the state of the queue; it's not going to
>> be possible for us to work around it in hardware. For starters, how would we
>> detect queue wrap around?
>> 
>> Thank you,
>> Andrew
> 
> The index is a running value for split, for wrap arounds, there is
> a special bit for that. No?
> 

This is a hardware block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.

-Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 17:33       ` Christian Borntraeger
@ 2025-01-22 17:49         ` Boyer, Andrew
  2025-01-23  7:03           ` Michael S. Tsirkin
  2025-01-22 22:07         ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: Boyer, Andrew @ 2025-01-22 17:49 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Boyer, Andrew, Michael S. Tsirkin, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen



> On Jan 22, 2025, at 12:33 PM, Christian Borntraeger <borntraeger@linux.ibm.com> wrote:
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> [...]
> 
>>>>> --- a/drivers/block/virtio_blk.c
>>>>> +++ b/drivers/block/virtio_blk.c
>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>> {
>>>>>   struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>   struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>>> -   bool kick;
>>>>>   spin_lock_irq(&vq->lock);
>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>>>> +   virtqueue_kick(vq->vq);
>>>>>   spin_unlock_irq(&vq->lock);
>>>>> -
>>>>> -   if (kick)
>>>>> -           virtqueue_notify(vq->vq);
>>>>> }
>>>> 
>>>> I would assume this will be a performance nightmare for normal IO.
>>> 
>> 
>> Hello Michael and Christian and Jason,
>> Thank you for taking a look.
>> 
>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> 
> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> Really, dont do it.

If holding a lock at all is unworkable, do you have a suggestion for how we might fix the correctness issue?
Because this is definitely a correctness issue.

I don't see anything in the spec about requiring support for out-of-order notifications.

-Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 17:33       ` Christian Borntraeger
  2025-01-22 17:49         ` Boyer, Andrew
@ 2025-01-22 22:07         ` Michael S. Tsirkin
  2025-01-22 22:14           ` Boyer, Andrew
  2025-01-23  8:39           ` Christian Borntraeger
  1 sibling, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-22 22:07 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Boyer, Andrew, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> [...]
> 
> > > > > --- a/drivers/block/virtio_blk.c
> > > > > +++ b/drivers/block/virtio_blk.c
> > > > > @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> > > > >  {
> > > > >    struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > >    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> > > > > -   bool kick;
> > > > >    spin_lock_irq(&vq->lock);
> > > > > -   kick = virtqueue_kick_prepare(vq->vq);
> > > > > +   virtqueue_kick(vq->vq);
> > > > >    spin_unlock_irq(&vq->lock);
> > > > > -
> > > > > -   if (kick)
> > > > > -           virtqueue_notify(vq->vq);
> > > > >  }
> > > > 
> > > > I would assume this will be a performance nightmare for normal IO.
> > > 
> > 
> > Hello Michael and Christian and Jason,
> > Thank you for taking a look.
> > 
> > Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> 
> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> Really, dont do it.


The issue is with hardware that wants a copy of an index sent to
it in a notification. Now, you have a race:


thread 1:

	index = 1
		-> 			-> send 1 to hardware
           

thread 2:

	index = 2
		-> send 2 to hardware




the spec unfortunately does not say whether that is legal.

As far as I could tell, the device can easily use the
wrap counter inside the notification to detect this
and simply discard the "1" notification.


If not, I'd like to understand why.

Thanks!


-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 22:07         ` Michael S. Tsirkin
@ 2025-01-22 22:14           ` Boyer, Andrew
  2025-01-22 22:25             ` Michael S. Tsirkin
  2025-01-23  8:39           ` Christian Borntraeger
  1 sibling, 1 reply; 24+ messages in thread
From: Boyer, Andrew @ 2025-01-22 22:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christian Borntraeger, Boyer, Andrew, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen


> On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
>> [...]
>> 
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>> {
>>>>>>   struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>   struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>>>> -   bool kick;
>>>>>>   spin_lock_irq(&vq->lock);
>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>>>>> +   virtqueue_kick(vq->vq);
>>>>>>   spin_unlock_irq(&vq->lock);
>>>>>> -
>>>>>> -   if (kick)
>>>>>> -           virtqueue_notify(vq->vq);
>>>>>> }
>>>>> 
>>>>> I would assume this will be a performance nightmare for normal IO.
>>>> 
>>> 
>>> Hello Michael and Christian and Jason,
>>> Thank you for taking a look.
>>> 
>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
>> 
>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
>> Really, dont do it.
> 
> 
> The issue is with hardware that wants a copy of an index sent to
> it in a notification. Now, you have a race:
> 
> thread 1:
> 
>        index = 1
>                ->                      -> send 1 to hardware
> 
> 
> thread 2:
> 
>        index = 2
>                -> send 2 to hardware
> 
> the spec unfortunately does not say whether that is legal.
> 
> As far as I could tell, the device can easily use the
> wrap counter inside the notification to detect this
> and simply discard the "1" notification.
> 
> 
> If not, I'd like to understand why.

"Easily"?

This is a hardware doorbell block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.

-Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 22:14           ` Boyer, Andrew
@ 2025-01-22 22:25             ` Michael S. Tsirkin
  2025-01-22 22:34               ` Boyer, Andrew
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-22 22:25 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 10:14:52PM +0000, Boyer, Andrew wrote:
> 
> > On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> >> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> >> [...]
> >> 
> >>>>>> --- a/drivers/block/virtio_blk.c
> >>>>>> +++ b/drivers/block/virtio_blk.c
> >>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>>> {
> >>>>>>   struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>>   struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> >>>>>> -   bool kick;
> >>>>>>   spin_lock_irq(&vq->lock);
> >>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
> >>>>>> +   virtqueue_kick(vq->vq);
> >>>>>>   spin_unlock_irq(&vq->lock);
> >>>>>> -
> >>>>>> -   if (kick)
> >>>>>> -           virtqueue_notify(vq->vq);
> >>>>>> }
> >>>>> 
> >>>>> I would assume this will be a performance nightmare for normal IO.
> >>>> 
> >>> 
> >>> Hello Michael and Christian and Jason,
> >>> Thank you for taking a look.
> >>> 
> >>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> >> 
> >> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> >> Really, dont do it.
> > 
> > 
> > The issue is with hardware that wants a copy of an index sent to
> > it in a notification. Now, you have a race:
> > 
> > thread 1:
> > 
> >        index = 1
> >                ->                      -> send 1 to hardware
> > 
> > 
> > thread 2:
> > 
> >        index = 2
> >                -> send 2 to hardware
> > 
> > the spec unfortunately does not say whether that is legal.
> > 
> > As far as I could tell, the device can easily use the
> > wrap counter inside the notification to detect this
> > and simply discard the "1" notification.
> > 
> > 
> > If not, I'd like to understand why.
> 
> "Easily"?
> 
> This is a hardware doorbell block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.
> 
> -Andrew
> 


Does not this work?

notification includes two values:


1. offset
2. wrap_counter


if ((offset2 < offset1 && wrap_counter2 == wrap_counter1) ||
     offset1 > offset1 && wrap_counter2 != wrap_counter1)) {
	printf("going backwards, discard offset2");
}
    




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 22:25             ` Michael S. Tsirkin
@ 2025-01-22 22:34               ` Boyer, Andrew
  2025-01-23  1:47                 ` Jason Wang
  2025-01-23  7:23                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 24+ messages in thread
From: Boyer, Andrew @ 2025-01-22 22:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Boyer, Andrew, Christian Borntraeger, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen


> On Jan 22, 2025, at 5:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, Jan 22, 2025 at 10:14:52PM +0000, Boyer, Andrew wrote:
>> 
>>> On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>> 
>>> 
>>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
>>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
>>>> [...]
>>>> 
>>>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>>>> {
>>>>>>>>  struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>>>  struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>>>>>> -   bool kick;
>>>>>>>>  spin_lock_irq(&vq->lock);
>>>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>>>>>>> +   virtqueue_kick(vq->vq);
>>>>>>>>  spin_unlock_irq(&vq->lock);
>>>>>>>> -
>>>>>>>> -   if (kick)
>>>>>>>> -           virtqueue_notify(vq->vq);
>>>>>>>> }
>>>>>>> 
>>>>>>> I would assume this will be a performance nightmare for normal IO.
>>>>>> 
>>>>> 
>>>>> Hello Michael and Christian and Jason,
>>>>> Thank you for taking a look.
>>>>> 
>>>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
>>>> 
>>>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
>>>> Really, dont do it.
>>> 
>>> 
>>> The issue is with hardware that wants a copy of an index sent to
>>> it in a notification. Now, you have a race:
>>> 
>>> thread 1:
>>> 
>>>       index = 1
>>>               ->                      -> send 1 to hardware
>>> 
>>> 
>>> thread 2:
>>> 
>>>       index = 2
>>>               -> send 2 to hardware
>>> 
>>> the spec unfortunately does not say whether that is legal.
>>> 
>>> As far as I could tell, the device can easily use the
>>> wrap counter inside the notification to detect this
>>> and simply discard the "1" notification.
>>> 
>>> 
>>> If not, I'd like to understand why.
>> 
>> "Easily"?
>> 
>> This is a hardware doorbell block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.
>> 
>> -Andrew
>> 
> 
> Does not this work?
> 
> notification includes two values:
> 
> 1. offset
> 2. wrap_counter
> 
> if ((offset2 < offset1 && wrap_counter2 == wrap_counter1) ||
>     offset1 > offset1 && wrap_counter2 != wrap_counter1)) {
>        printf("going backwards, discard offset2");
> }
> 

No, Michael, this does not work in our programmable hardware device, because it is not software.

-Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 22:34               ` Boyer, Andrew
@ 2025-01-23  1:47                 ` Jason Wang
  2025-01-23  6:51                   ` Michael S. Tsirkin
  2025-01-23  7:23                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Wang @ 2025-01-23  1:47 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Michael S. Tsirkin, Christian Borntraeger, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Thu, Jan 23, 2025 at 6:34 AM Boyer, Andrew <Andrew.Boyer@amd.com> wrote:
>
>
> > On Jan 22, 2025, at 5:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > On Wed, Jan 22, 2025 at 10:14:52PM +0000, Boyer, Andrew wrote:
> >>
> >>> On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>
> >>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >>>
> >>>
> >>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> >>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> >>>> [...]
> >>>>
> >>>>>>>> --- a/drivers/block/virtio_blk.c
> >>>>>>>> +++ b/drivers/block/virtio_blk.c
> >>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>>>>> {
> >>>>>>>>  struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>>>>  struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> >>>>>>>> -   bool kick;
> >>>>>>>>  spin_lock_irq(&vq->lock);
> >>>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
> >>>>>>>> +   virtqueue_kick(vq->vq);
> >>>>>>>>  spin_unlock_irq(&vq->lock);
> >>>>>>>> -
> >>>>>>>> -   if (kick)
> >>>>>>>> -           virtqueue_notify(vq->vq);
> >>>>>>>> }
> >>>>>>>
> >>>>>>> I would assume this will be a performance nightmare for normal IO.
> >>>>>>
> >>>>>
> >>>>> Hello Michael and Christian and Jason,
> >>>>> Thank you for taking a look.
> >>>>>
> >>>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> >>>>
> >>>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> >>>> Really, dont do it.
> >>>
> >>>
> >>> The issue is with hardware that wants a copy of an index sent to
> >>> it in a notification. Now, you have a race:
> >>>
> >>> thread 1:
> >>>
> >>>       index = 1
> >>>               ->                      -> send 1 to hardware
> >>>
> >>>
> >>> thread 2:
> >>>
> >>>       index = 2
> >>>               -> send 2 to hardware
> >>>
> >>> the spec unfortunately does not say whether that is legal.
> >>>
> >>> As far as I could tell, the device can easily use the
> >>> wrap counter inside the notification to detect this
> >>> and simply discard the "1" notification.
> >>>
> >>>
> >>> If not, I'd like to understand why.
> >>
> >> "Easily"?
> >>
> >> This is a hardware doorbell block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.
> >>
> >> -Andrew
> >>
> >
> > Does not this work?
> >
> > notification includes two values:
> >
> > 1. offset
> > 2. wrap_counter
> >
> > if ((offset2 < offset1 && wrap_counter2 == wrap_counter1) ||
> >     offset1 > offset1 && wrap_counter2 != wrap_counter1)) {
> >        printf("going backwards, discard offset2");
> > }
> >
>
> No, Michael, this does not work in our programmable hardware device, because it is not software.
>
> -Andrew
>

Should we send a patch to disable this feature and cc stable first
then consider how to fix this?

Thanks


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-23  1:47                 ` Jason Wang
@ 2025-01-23  6:51                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-23  6:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Boyer, Andrew, Christian Borntraeger, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Thu, Jan 23, 2025 at 09:47:24AM +0800, Jason Wang wrote:
> On Thu, Jan 23, 2025 at 6:34 AM Boyer, Andrew <Andrew.Boyer@amd.com> wrote:
> >
> >
> > > On Jan 22, 2025, at 5:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > >
> > >
> > > On Wed, Jan 22, 2025 at 10:14:52PM +0000, Boyer, Andrew wrote:
> > >>
> > >>> On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>>
> > >>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > >>>
> > >>>
> > >>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> > >>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> > >>>> [...]
> > >>>>
> > >>>>>>>> --- a/drivers/block/virtio_blk.c
> > >>>>>>>> +++ b/drivers/block/virtio_blk.c
> > >>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> > >>>>>>>> {
> > >>>>>>>>  struct virtio_blk *vblk = hctx->queue->queuedata;
> > >>>>>>>>  struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> > >>>>>>>> -   bool kick;
> > >>>>>>>>  spin_lock_irq(&vq->lock);
> > >>>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
> > >>>>>>>> +   virtqueue_kick(vq->vq);
> > >>>>>>>>  spin_unlock_irq(&vq->lock);
> > >>>>>>>> -
> > >>>>>>>> -   if (kick)
> > >>>>>>>> -           virtqueue_notify(vq->vq);
> > >>>>>>>> }
> > >>>>>>>
> > >>>>>>> I would assume this will be a performance nightmare for normal IO.
> > >>>>>>
> > >>>>>
> > >>>>> Hello Michael and Christian and Jason,
> > >>>>> Thank you for taking a look.
> > >>>>>
> > >>>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> > >>>>
> > >>>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> > >>>> Really, dont do it.
> > >>>
> > >>>
> > >>> The issue is with hardware that wants a copy of an index sent to
> > >>> it in a notification. Now, you have a race:
> > >>>
> > >>> thread 1:
> > >>>
> > >>>       index = 1
> > >>>               ->                      -> send 1 to hardware
> > >>>
> > >>>
> > >>> thread 2:
> > >>>
> > >>>       index = 2
> > >>>               -> send 2 to hardware
> > >>>
> > >>> the spec unfortunately does not say whether that is legal.
> > >>>
> > >>> As far as I could tell, the device can easily use the
> > >>> wrap counter inside the notification to detect this
> > >>> and simply discard the "1" notification.
> > >>>
> > >>>
> > >>> If not, I'd like to understand why.
> > >>
> > >> "Easily"?
> > >>
> > >> This is a hardware doorbell block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.
> > >>
> > >> -Andrew
> > >>
> > >
> > > Does not this work?
> > >
> > > notification includes two values:
> > >
> > > 1. offset
> > > 2. wrap_counter
> > >
> > > if ((offset2 < offset1 && wrap_counter2 == wrap_counter1) ||
> > >     offset1 > offset1 && wrap_counter2 != wrap_counter1)) {
> > >        printf("going backwards, discard offset2");
> > > }
> > >
> >
> > No, Michael, this does not work in our programmable hardware device, because it is not software.
> >
> > -Andrew
> >
> 
> Should we send a patch to disable this feature and cc stable first
> then consider how to fix this?
> 
> Thanks

Given this has been out there for a long time, I do not see
how this will contribute anything useful.

-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 17:45         ` Boyer, Andrew
@ 2025-01-23  6:53           ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-23  6:53 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 05:45:28PM +0000, Boyer, Andrew wrote:
> 
> 
> > On Jan 22, 2025, at 10:13 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Wed, Jan 22, 2025 at 02:44:50PM +0000, Boyer, Andrew wrote:
> >> 
> >> 
> >>    On Jan 9, 2025, at 8:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> 
> >>    On Thu, Jan 09, 2025 at 01:01:20PM +0100, Christian Borntraeger wrote:
> >> 
> >> 
> >>        Am 07.01.25 um 19:25 schrieb Andrew Boyer:
> >> 
> >>            Commit af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
> >>            feature
> >>            support") added notification data support to the core virtio driver
> >>            code. When this feature is enabled, the notification includes the
> >>            updated producer index for the queue. Thus it is now critical that
> >>            notifications arrive in order.
> >> 
> >>            The virtio_blk driver has historically not worried about
> >>            notification
> >>            ordering. Modify it so that the prepare and kick steps are both
> >>            done
> >>            under the vq lock.
> >> 
> >>            Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>
> >>            Reviewed-by: Brett Creeley <brett.creeley@amd.com>
> >>            Fixes: af8ececda185 ("virtio: add VIRTIO_F_NOTIFICATION_DATA
> >>            feature support")
> >>            Cc: Viktor Prutyanov <viktor@daynix.com>
> >>            Cc: virtualization@lists.linux.dev
> >>            Cc: linux-block@vger.kernel.org
> >>            ---
> >>             drivers/block/virtio_blk.c | 19 ++++---------------
> >>             1 file changed, 4 insertions(+), 15 deletions(-)
> >> 
> >>            diff --git a/drivers/block/virtio_blk.c b/drivers/block/
> >>            virtio_blk.c
> >>            index 3efe378f1386..14d9e66bb844 100644
> >>            --- a/drivers/block/virtio_blk.c
> >>            +++ b/drivers/block/virtio_blk.c
> >>            @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct
> >>            blk_mq_hw_ctx *hctx)
> >>             {
> >>               struct virtio_blk *vblk = hctx->queue->queuedata;
> >>               struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> >>            -   bool kick;
> >>               spin_lock_irq(&vq->lock);
> >>            -   kick = virtqueue_kick_prepare(vq->vq);
> >>            +   virtqueue_kick(vq->vq);
> >>               spin_unlock_irq(&vq->lock);
> >>            -
> >>            -   if (kick)
> >>            -           virtqueue_notify(vq->vq);
> >>             }
> >> 
> >> 
> >>        I would assume this will be a performance nightmare for normal IO.
> >> 
> >> 
> >> 
> >> 
> >> Hello Michael and Christian and Jason,
> >> Thank you for taking a look.
> >> 
> >> Is the performance concern that the vmexit might lead to the underlying virtual
> >> storage stack doing the work immediately? Any other job posting to the same
> >> queue would presumably be blocked on a vmexit when it goes to attempt its own
> >> notification. That would be almost the same as having the other job block on a
> >> lock during the operation, although I guess if you are skipping notifications
> >> somehow it would look different.
> >> 
> >> I don't have any sort of setup where I can try it but I would appreciate it if
> >> someone else could.
> >> 
> >> 
> >>    Hmm. Not good, notify can be very slow, holding a lock is a bad idea.
> >>    Basically, virtqueue_notify must work ouside of locks, this
> >>    means af8ececda185 is broken and we did not notice.
> >> 
> >>    Let's fix it please.
> >> 
> >> 
> >> With so many broken kernels already in the wild, I think disabling
> >> F_NOTIFICATION_DATA for virtio-blk would be a reasonable solution.
> > 
> > Some devices might fail feature negotiation then.
> > I am not sure they are broken, devices might simply be able to
> > handle out of order values.
> > 
> 
> A driver which does not support F_NOTIFICATION_DATA should just clear that bit. Surely devices which support it would also support not enabling it? Otherwise pre-6.4 kernels wouldn't work at all.
> 
> > 
> >> 
> >>    Try some kind of compare and swap scheme where we detect that index
> >>    was updated since? Will allow skipping a notification, too.
> >> 
> >> 
> >> Do you have an idea of how this might be done? Anything I've come up with
> >> involves a lock.
> >> 
> >> Would it be doable to have a lock for the vq management stuff
> >> and a second one to post notifications?
> > 
> > 
> > and only for when F_NOTIFICATION_DATA is set. not terrible ok I think.
> > 
> >> 
> >>    AMD guys, can't device survive with reordered notifications?
> >>    Basically just drop a notification if you see index
> >>    going back?
> >> 
> >> 
> >> This is the driver lying to us about the state of the queue; it's not going to
> >> be possible for us to work around it in hardware. For starters, how would we
> >> detect queue wrap around?
> >> 
> >> Thank you,
> >> Andrew
> > 
> > The index is a running value for split, for wrap arounds, there is
> > a special bit for that. No?
> > 
> 
> This is a hardware block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.
> 
> -Andrew

Do you not have programmable hardware that can compare current queue state
and the doorbell *before* overwriting it?




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 17:49         ` Boyer, Andrew
@ 2025-01-23  7:03           ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-23  7:03 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 05:49:18PM +0000, Boyer, Andrew wrote:
> 
> 
> > On Jan 22, 2025, at 12:33 PM, Christian Borntraeger <borntraeger@linux.ibm.com> wrote:
> > 
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> > [...]
> > 
> >>>>> --- a/drivers/block/virtio_blk.c
> >>>>> +++ b/drivers/block/virtio_blk.c
> >>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>> {
> >>>>>   struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>   struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> >>>>> -   bool kick;
> >>>>>   spin_lock_irq(&vq->lock);
> >>>>> -   kick = virtqueue_kick_prepare(vq->vq);
> >>>>> +   virtqueue_kick(vq->vq);
> >>>>>   spin_unlock_irq(&vq->lock);
> >>>>> -
> >>>>> -   if (kick)
> >>>>> -           virtqueue_notify(vq->vq);
> >>>>> }
> >>>> 
> >>>> I would assume this will be a performance nightmare for normal IO.
> >>> 
> >> 
> >> Hello Michael and Christian and Jason,
> >> Thank you for taking a look.
> >> 
> >> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> > 
> > The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> > Really, dont do it.
> 
> If holding a lock at all is unworkable, do you have a suggestion for how we might fix the correctness issue?
> Because this is definitely a correctness issue.
> 
> I don't see anything in the spec about requiring support for out-of-order notifications.
> 
> -Andrew

Yes we didn't think it through in the spec :(
So driver made one assumption, for performance,
and the device made another one, for simplicity.

The data needed for the device to solve it, is there in the
notification. We can make the safe choice with the lock, and add a
feature flag for devices who can handle out of order,
or make the current choice and add a feature flag for
devices that need a lock.

Let's continue the discussion in another thread whether
a hardware workaround is possible.


-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 22:34               ` Boyer, Andrew
  2025-01-23  1:47                 ` Jason Wang
@ 2025-01-23  7:23                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-01-23  7:23 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Wed, Jan 22, 2025 at 10:34:25PM +0000, Boyer, Andrew wrote:
> 
> > On Jan 22, 2025, at 5:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Wed, Jan 22, 2025 at 10:14:52PM +0000, Boyer, Andrew wrote:
> >> 
> >>> On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> 
> >>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >>> 
> >>> 
> >>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> >>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> >>>> [...]
> >>>> 
> >>>>>>>> --- a/drivers/block/virtio_blk.c
> >>>>>>>> +++ b/drivers/block/virtio_blk.c
> >>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>>>>> {
> >>>>>>>>  struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>>>>  struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> >>>>>>>> -   bool kick;
> >>>>>>>>  spin_lock_irq(&vq->lock);
> >>>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
> >>>>>>>> +   virtqueue_kick(vq->vq);
> >>>>>>>>  spin_unlock_irq(&vq->lock);
> >>>>>>>> -
> >>>>>>>> -   if (kick)
> >>>>>>>> -           virtqueue_notify(vq->vq);
> >>>>>>>> }
> >>>>>>> 
> >>>>>>> I would assume this will be a performance nightmare for normal IO.
> >>>>>> 
> >>>>> 
> >>>>> Hello Michael and Christian and Jason,
> >>>>> Thank you for taking a look.
> >>>>> 
> >>>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> >>>> 
> >>>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> >>>> Really, dont do it.
> >>> 
> >>> 
> >>> The issue is with hardware that wants a copy of an index sent to
> >>> it in a notification. Now, you have a race:
> >>> 
> >>> thread 1:
> >>> 
> >>>       index = 1
> >>>               ->                      -> send 1 to hardware
> >>> 
> >>> 
> >>> thread 2:
> >>> 
> >>>       index = 2
> >>>               -> send 2 to hardware
> >>> 
> >>> the spec unfortunately does not say whether that is legal.
> >>> 
> >>> As far as I could tell, the device can easily use the
> >>> wrap counter inside the notification to detect this
> >>> and simply discard the "1" notification.
> >>> 
> >>> 
> >>> If not, I'd like to understand why.
> >> 
> >> "Easily"?
> >> 
> >> This is a hardware doorbell block used for many different interfaces and devices. When the notification write comes through, the doorbell block updates the queue state and schedules the queue for work. If a second notification comes in and overwrites that update before the queue is able to run (going backwards but not wrapping), we'll have no way of detecting it.
> >> 
> >> -Andrew
> >> 
> > 
> > Does not this work?
> > 
> > notification includes two values:
> > 
> > 1. offset
> > 2. wrap_counter
> > 
> > if ((offset2 < offset1 && wrap_counter2 == wrap_counter1) ||
> >     offset1 > offset1 && wrap_counter2 != wrap_counter1)) {
> >        printf("going backwards, discard offset2");
> > }
> > 
> 
> No, Michael, this does not work in our programmable hardware device, because it is not software.
> 
> -Andrew
> 

Of course. The hardware equivalent.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-22 22:07         ` Michael S. Tsirkin
  2025-01-22 22:14           ` Boyer, Andrew
@ 2025-01-23  8:39           ` Christian Borntraeger
  2025-02-24 21:03             ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: Christian Borntraeger @ 2025-01-23  8:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Boyer, Andrew, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen


Am 22.01.25 um 23:07 schrieb Michael S. Tsirkin:
> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
>> [...]
>>
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>>   {
>>>>>>     struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>     struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>>>> -   bool kick;
>>>>>>     spin_lock_irq(&vq->lock);
>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>>>>> +   virtqueue_kick(vq->vq);
>>>>>>     spin_unlock_irq(&vq->lock);
>>>>>> -
>>>>>> -   if (kick)
>>>>>> -           virtqueue_notify(vq->vq);
>>>>>>   }
>>>>>
>>>>> I would assume this will be a performance nightmare for normal IO.
>>>>
>>>
>>> Hello Michael and Christian and Jason,
>>> Thank you for taking a look.
>>>
>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
>>
>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
>> Really, dont do it.
> 
> 
> The issue is with hardware that wants a copy of an index sent to
> it in a notification. Now, you have a race:
> 
> 
> thread 1:
> 
> 	index = 1
> 		-> 			-> send 1 to hardware
>             
> 
> thread 2:
> 
> 	index = 2
> 		-> send 2 to hardware
> 
> 
> 
> 
> the spec unfortunately does not say whether that is legal.
> 
> As far as I could tell, the device can easily use the
> wrap counter inside the notification to detect this
> and simply discard the "1" notification.
> 
> 
> If not, I'd like to understand why.

Yes I agree with you here.
I just want to emphasize that holding a lock during notify is not a workable solution.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-01-23  8:39           ` Christian Borntraeger
@ 2025-02-24 21:03             ` Michael S. Tsirkin
  2025-02-24 21:31               ` Boyer, Andrew
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-02-24 21:03 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Boyer, Andrew, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Thu, Jan 23, 2025 at 09:39:44AM +0100, Christian Borntraeger wrote:
> 
> Am 22.01.25 um 23:07 schrieb Michael S. Tsirkin:
> > On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> > > Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> > > [...]
> > > 
> > > > > > > --- a/drivers/block/virtio_blk.c
> > > > > > > +++ b/drivers/block/virtio_blk.c
> > > > > > > @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> > > > > > >   {
> > > > > > >     struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > > > >     struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> > > > > > > -   bool kick;
> > > > > > >     spin_lock_irq(&vq->lock);
> > > > > > > -   kick = virtqueue_kick_prepare(vq->vq);
> > > > > > > +   virtqueue_kick(vq->vq);
> > > > > > >     spin_unlock_irq(&vq->lock);
> > > > > > > -
> > > > > > > -   if (kick)
> > > > > > > -           virtqueue_notify(vq->vq);
> > > > > > >   }
> > > > > > 
> > > > > > I would assume this will be a performance nightmare for normal IO.
> > > > > 
> > > > 
> > > > Hello Michael and Christian and Jason,
> > > > Thank you for taking a look.
> > > > 
> > > > Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> > > 
> > > The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> > > Really, dont do it.
> > 
> > 
> > The issue is with hardware that wants a copy of an index sent to
> > it in a notification. Now, you have a race:
> > 
> > 
> > thread 1:
> > 
> > 	index = 1
> > 		-> 			-> send 1 to hardware
> > 
> > thread 2:
> > 
> > 	index = 2
> > 		-> send 2 to hardware
> > 
> > 
> > 
> > 
> > the spec unfortunately does not say whether that is legal.
> > 
> > As far as I could tell, the device can easily use the
> > wrap counter inside the notification to detect this
> > and simply discard the "1" notification.
> > 
> > 
> > If not, I'd like to understand why.
> 
> Yes I agree with you here.
> I just want to emphasize that holding a lock during notify is not a workable solution.


Andrew, what do you say?

-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-02-24 21:03             ` Michael S. Tsirkin
@ 2025-02-24 21:31               ` Boyer, Andrew
  2025-02-24 21:37                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Boyer, Andrew @ 2025-02-24 21:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christian Borntraeger, Boyer, Andrew, Jason Wang, Paolo Bonzini,
	Stefan Hajnoczi, Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen


> On Feb 24, 2025, at 4:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Thu, Jan 23, 2025 at 09:39:44AM +0100, Christian Borntraeger wrote:
>> 
>> Am 22.01.25 um 23:07 schrieb Michael S. Tsirkin:
>>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
>>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
>>>> [...]
>>>> 
>>>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>>>>>  {
>>>>>>>>    struct virtio_blk *vblk = hctx->queue->queuedata;
>>>>>>>>    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>>>>>> -   bool kick;
>>>>>>>>    spin_lock_irq(&vq->lock);
>>>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
>>>>>>>> +   virtqueue_kick(vq->vq);
>>>>>>>>    spin_unlock_irq(&vq->lock);
>>>>>>>> -
>>>>>>>> -   if (kick)
>>>>>>>> -           virtqueue_notify(vq->vq);
>>>>>>>>  }
>>>>>>> 
>>>>>>> I would assume this will be a performance nightmare for normal IO.
>>>>>> 
>>>>> 
>>>>> Hello Michael and Christian and Jason,
>>>>> Thank you for taking a look.
>>>>> 
>>>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
>>>> 
>>>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
>>>> Really, dont do it.
>>> 
>>> 
>>> The issue is with hardware that wants a copy of an index sent to
>>> it in a notification. Now, you have a race:
>>> 
>>> 
>>> thread 1:
>>> 
>>>    index = 1
>>>            ->                      -> send 1 to hardware
>>> 
>>> thread 2:
>>> 
>>>    index = 2
>>>            -> send 2 to hardware
>>> 
>>> 
>>> 
>>> 
>>> the spec unfortunately does not say whether that is legal.
>>> 
>>> As far as I could tell, the device can easily use the
>>> wrap counter inside the notification to detect this
>>> and simply discard the "1" notification.
>>> 
>>> 
>>> If not, I'd like to understand why.
>> 
>> Yes I agree with you here.
>> I just want to emphasize that holding a lock during notify is not a workable solution.
> 
> 
> Andrew, what do you say?
> 
> --
> MST
> 

"can *easily* use" -> No.

Our doorbell hardware is configurable, but not infinitely programmable. None of the suggested workarounds are feasible for hardware in the face of incorrect driver behavior. We have marked the feature as broken in the linux kernel driver and moved on.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] virtio_blk: always post notifications under the lock
  2025-02-24 21:31               ` Boyer, Andrew
@ 2025-02-24 21:37                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-02-24 21:37 UTC (permalink / raw)
  To: Boyer, Andrew
  Cc: Christian Borntraeger, Jason Wang, Paolo Bonzini, Stefan Hajnoczi,
	Eugenio Perez, Xuan Zhuo, Jens Axboe,
	virtualization@lists.linux.dev, linux-block@vger.kernel.org,
	Nelson, Shannon, Creeley, Brett, Hubbe, Allen

On Mon, Feb 24, 2025 at 09:31:24PM +0000, Boyer, Andrew wrote:
> 
> > On Feb 24, 2025, at 4:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Thu, Jan 23, 2025 at 09:39:44AM +0100, Christian Borntraeger wrote:
> >> 
> >> Am 22.01.25 um 23:07 schrieb Michael S. Tsirkin:
> >>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote:
> >>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew:
> >>>> [...]
> >>>> 
> >>>>>>>> --- a/drivers/block/virtio_blk.c
> >>>>>>>> +++ b/drivers/block/virtio_blk.c
> >>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >>>>>>>>  {
> >>>>>>>>    struct virtio_blk *vblk = hctx->queue->queuedata;
> >>>>>>>>    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> >>>>>>>> -   bool kick;
> >>>>>>>>    spin_lock_irq(&vq->lock);
> >>>>>>>> -   kick = virtqueue_kick_prepare(vq->vq);
> >>>>>>>> +   virtqueue_kick(vq->vq);
> >>>>>>>>    spin_unlock_irq(&vq->lock);
> >>>>>>>> -
> >>>>>>>> -   if (kick)
> >>>>>>>> -           virtqueue_notify(vq->vq);
> >>>>>>>>  }
> >>>>>>> 
> >>>>>>> I would assume this will be a performance nightmare for normal IO.
> >>>>>> 
> >>>>> 
> >>>>> Hello Michael and Christian and Jason,
> >>>>> Thank you for taking a look.
> >>>>> 
> >>>>> Is the performance concern that the vmexit might lead to the underlying virtual storage stack doing the work immediately? Any other job posting to the same queue would presumably be blocked on a vmexit when it goes to attempt its own notification. That would be almost the same as having the other job block on a lock during the operation, although I guess if you are skipping notifications somehow it would look different.
> >>>> 
> >>>> The performance concern is that you hold a lock and then exit. Exits are expensive and can schedule so you will increase the lock holding time significantly. This is begging for lock holder preemption.
> >>>> Really, dont do it.
> >>> 
> >>> 
> >>> The issue is with hardware that wants a copy of an index sent to
> >>> it in a notification. Now, you have a race:
> >>> 
> >>> 
> >>> thread 1:
> >>> 
> >>>    index = 1
> >>>            ->                      -> send 1 to hardware
> >>> 
> >>> thread 2:
> >>> 
> >>>    index = 2
> >>>            -> send 2 to hardware
> >>> 
> >>> 
> >>> 
> >>> 
> >>> the spec unfortunately does not say whether that is legal.
> >>> 
> >>> As far as I could tell, the device can easily use the
> >>> wrap counter inside the notification to detect this
> >>> and simply discard the "1" notification.
> >>> 
> >>> 
> >>> If not, I'd like to understand why.
> >> 
> >> Yes I agree with you here.
> >> I just want to emphasize that holding a lock during notify is not a workable solution.
> > 
> > 
> > Andrew, what do you say?
> > 
> > --
> > MST
> > 
> 
> "can *easily* use" -> No.
> 
> Our doorbell hardware is configurable, but not infinitely programmable. None of the suggested workarounds are feasible for hardware in the face of incorrect driver behavior. We have marked the feature as broken in the linux kernel driver and moved on.
> 
> Thanks,
> Andrew
> 

Thanks Andrew, I understand.
My question would be, is the feature worth it for you?

I see two way to move forward
1.- amend spec to say notifications can be reordered
2.- amend spec to say they can not be reordered and add a new feature bit
    saying they can be
    add a new lock for driver to hold in case the feature is off


1 is less work but if 2 gives you a big performance advantage, I can do it.

-- 
MST


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-02-24 21:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 18:25 [PATCH] virtio_blk: always post notifications under the lock Andrew Boyer
2025-01-08  2:02 ` Jason Wang
2025-01-09 10:19 ` Michael S. Tsirkin
2025-01-09 12:01 ` Christian Borntraeger
2025-01-09 13:42   ` Michael S. Tsirkin
2025-01-22 14:48     ` Boyer, Andrew
     [not found]     ` <FE77DD4F-AB39-4781-9D24-06F171F47FED@amd.com>
2025-01-22 15:13       ` Michael S. Tsirkin
2025-01-22 17:45         ` Boyer, Andrew
2025-01-23  6:53           ` Michael S. Tsirkin
2025-01-22 15:15       ` (repost) " Michael S. Tsirkin
2025-01-22 17:33       ` Christian Borntraeger
2025-01-22 17:49         ` Boyer, Andrew
2025-01-23  7:03           ` Michael S. Tsirkin
2025-01-22 22:07         ` Michael S. Tsirkin
2025-01-22 22:14           ` Boyer, Andrew
2025-01-22 22:25             ` Michael S. Tsirkin
2025-01-22 22:34               ` Boyer, Andrew
2025-01-23  1:47                 ` Jason Wang
2025-01-23  6:51                   ` Michael S. Tsirkin
2025-01-23  7:23                 ` Michael S. Tsirkin
2025-01-23  8:39           ` Christian Borntraeger
2025-02-24 21:03             ` Michael S. Tsirkin
2025-02-24 21:31               ` Boyer, Andrew
2025-02-24 21:37                 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).