[PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
@ 2026-04-13  7:22 Jinhui Guo
  2026-04-13  7:45 ` Michael S. Tsirkin
  0 siblings, 1 reply; 7+ messages in thread
From: Jinhui Guo @ 2026-04-13  7:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Jiri Pirko
  Cc: virtualization, linux-kernel, Jinhui Guo, stable

virtqueue_exec_admin_cmd() holds admin_vq->lock with spin_lock_irqsave(),
which disables interrupts.  Using GFP_KERNEL inside this critical section
is unsafe because kmalloc() may sleep, leading to potential deadlocks or
scheduling violations.

Switch to GFP_ATOMIC to ensure the allocation is non-blocking.

Fixes: 4c3b54af907e ("virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result")
Cc: stable@vger.kernel.org
Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
---
 drivers/virtio/virtio_pci_modern.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 6d8ae2a6a8ca..db8e4f88b749 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -101,7 +101,7 @@ static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
 		return -EIO;
 
 	spin_lock_irqsave(&admin_vq->lock, flags);
-	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_KERNEL);
+	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
 	if (ret < 0) {
 		if (ret == -ENOSPC) {
 			spin_unlock_irqrestore(&admin_vq->lock, flags);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
  2026-04-13  7:22 [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd() Jinhui Guo
@ 2026-04-13  7:45 ` Michael S. Tsirkin
  2026-04-13  9:17   ` David Laight
  2026-04-13 10:00   ` Jinhui Guo
  0 siblings, 2 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2026-04-13  7:45 UTC (permalink / raw)
  To: Jinhui Guo
  Cc: Jason Wang, Xuan Zhuo, Eugenio Pérez, Jiri Pirko,
	virtualization, linux-kernel, stable

On Mon, Apr 13, 2026 at 03:22:49PM +0800, Jinhui Guo wrote:
> virtqueue_exec_admin_cmd() holds admin_vq->lock with spin_lock_irqsave(),
> which disables interrupts.  Using GFP_KERNEL inside this critical section
> is unsafe because kmalloc() may sleep, leading to potential deadlocks or
> scheduling violations.
> 
> Switch to GFP_ATOMIC to ensure the allocation is non-blocking.
> 
> Fixes: 4c3b54af907e ("virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
> ---
>  drivers/virtio/virtio_pci_modern.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index 6d8ae2a6a8ca..db8e4f88b749 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -101,7 +101,7 @@ static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
>  		return -EIO;
>  
>  	spin_lock_irqsave(&admin_vq->lock, flags);
> -	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_KERNEL);
> +	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
>  	if (ret < 0) {
>  		if (ret == -ENOSPC) {
>  			spin_unlock_irqrestore(&admin_vq->lock, flags);


GFP_ATOMIC allocations can and will fail. If using them, one must
retry, not just propagate failures.
Or just switch admin_vq->lock to a mutex?


> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
  2026-04-13  7:45 ` Michael S. Tsirkin
@ 2026-04-13  9:17   ` David Laight
  2026-04-13 12:22     ` Jinhui Guo
  2026-04-13 10:00   ` Jinhui Guo
  1 sibling, 1 reply; 7+ messages in thread
From: David Laight @ 2026-04-13  9:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jinhui Guo, Jason Wang, Xuan Zhuo, Eugenio Pérez, Jiri Pirko,
	virtualization, linux-kernel, stable

On Mon, 13 Apr 2026 03:45:20 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Apr 13, 2026 at 03:22:49PM +0800, Jinhui Guo wrote:
> > virtqueue_exec_admin_cmd() holds admin_vq->lock with spin_lock_irqsave(),
> > which disables interrupts.  Using GFP_KERNEL inside this critical section
> > is unsafe because kmalloc() may sleep, leading to potential deadlocks or
> > scheduling violations.
> > 
> > Switch to GFP_ATOMIC to ensure the allocation is non-blocking.
> > 
> > Fixes: 4c3b54af907e ("virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jinhui Guo <guojinhui.liam@bytedance.com>
> > ---
> >  drivers/virtio/virtio_pci_modern.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > index 6d8ae2a6a8ca..db8e4f88b749 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -101,7 +101,7 @@ static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
> >  		return -EIO;
> >  
> >  	spin_lock_irqsave(&admin_vq->lock, flags);
> > -	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_KERNEL);
> > +	ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
> >  	if (ret < 0) {
> >  		if (ret == -ENOSPC) {
> >  			spin_unlock_irqrestore(&admin_vq->lock, flags);  
> 
> 
> GFP_ATOMIC allocations can and will fail. If using them, one must
> retry, not just propagate failures.
> Or just switch admin_vq->lock to a mutex?

Or do the allocate before acquiring the lock (and free it not used
in the error path).

	David

> 
> 
> > -- 
> > 2.20.1  
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
  2026-04-13  9:17   ` David Laight
@ 2026-04-13 12:22     ` Jinhui Guo
  2026-04-13 13:33       ` David Laight
  2026-04-13 14:14       ` Eugenio Perez Martin
  0 siblings, 2 replies; 7+ messages in thread
From: Jinhui Guo @ 2026-04-13 12:22 UTC (permalink / raw)
  To: David Laight
  Cc: Michael S. Tsirkin, Eugenio Pérez, Jinhui Guo, Jason Wang,
	Jiri Pirko, Xuan Zhuo, linux-kernel, stable, virtualization

On Mon, Apr 13, 2026 at 10:17:59 +0100, David Laight wrote:
> Or do the allocate before acquiring the lock (and free it not used
> in the error path).

Hi David,

Thanks for the suggestion.

Pre-allocating the memory outside the lock is indeed a good practice,
but unfortunately it doesn't work in this specific virtqueue context.

The kmalloc() in question is not happening at the virtqueue_exec_admin_cmd()
level. Instead, it is deeply embedded inside virtqueue_add_sgs()
(specifically, in functions like alloc_indirect_split() or
virtqueue_add_indirect_packed()) to allocate indirect descriptors when
multiple SG elements are provided.

As a caller, we have no mechanism to pre-allocate this indirect descriptor
memory and pass it down to virtqueue_add_sgs(). Furthermore, virtqueue_add_sgs()
needs to atomically check the queue's num_free status, allocate the indirect
table if necessary, and update the queue pointers. All these operations
must be protected by admin_vq->lock to prevent concurrent admin command
submissions from corrupting the virtqueue state.

Therefore, allocating before acquiring the lock isn't feasible here, and
replacing GFP_KERNEL with GFP_ATOMIC (with a proper sleepable retry upon
failure) seems to be the more viable fix.

Does this make sense?

Thanks,
Jinhui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
  2026-04-13 12:22     ` Jinhui Guo
@ 2026-04-13 13:33       ` David Laight
  2026-04-13 14:14       ` Eugenio Perez Martin
  1 sibling, 0 replies; 7+ messages in thread
From: David Laight @ 2026-04-13 13:33 UTC (permalink / raw)
  To: Jinhui Guo
  Cc: Michael S. Tsirkin, Eugenio Pérez, Jason Wang, Jiri Pirko,
	Xuan Zhuo, linux-kernel, stable, virtualization

On Mon, 13 Apr 2026 20:22:44 +0800
"Jinhui Guo" <guojinhui.liam@bytedance.com> wrote:

> On Mon, Apr 13, 2026 at 10:17:59 +0100, David Laight wrote:
> > Or do the allocate before acquiring the lock (and free it not used
> > in the error path).  
> 
> Hi David,
> 
> Thanks for the suggestion.
> 
> Pre-allocating the memory outside the lock is indeed a good practice,
> but unfortunately it doesn't work in this specific virtqueue context.
> 
> The kmalloc() in question is not happening at the virtqueue_exec_admin_cmd()
> level. Instead, it is deeply embedded inside virtqueue_add_sgs()
> (specifically, in functions like alloc_indirect_split() or
> virtqueue_add_indirect_packed()) to allocate indirect descriptors when
> multiple SG elements are provided.
> 
> As a caller, we have no mechanism to pre-allocate this indirect descriptor
> memory and pass it down to virtqueue_add_sgs(). Furthermore, virtqueue_add_sgs()
> needs to atomically check the queue's num_free status, allocate the indirect
> table if necessary, and update the queue pointers. All these operations
> must be protected by admin_vq->lock to prevent concurrent admin command
> submissions from corrupting the virtqueue state.

It just sounds non-trivial...

> 
> Therefore, allocating before acquiring the lock isn't feasible here, and
> replacing GFP_KERNEL with GFP_ATOMIC (with a proper sleepable retry upon
> failure) seems to be the more viable fix.

The sleep-retry isn't really ideal - and may not make progress.

An 'interesting' solution would be to return the size of the kmalloc()
that failed, kmalloc() and kfree() a buffer of that size and hope
it is still available for the retry.
For a quick read of the code it is always a constant multiplied by the
number of fragments.

Although I only found kmalloc() in the 'indirect' paths.
I didn't spot what happens if the ring itself is full.

	David

> 
> Does this make sense?
> 
> Thanks,
> Jinhui


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
  2026-04-13 12:22     ` Jinhui Guo
  2026-04-13 13:33       ` David Laight
@ 2026-04-13 14:14       ` Eugenio Perez Martin
  1 sibling, 0 replies; 7+ messages in thread
From: Eugenio Perez Martin @ 2026-04-13 14:14 UTC (permalink / raw)
  To: Jinhui Guo
  Cc: David Laight, Michael S. Tsirkin, Jason Wang, Jiri Pirko,
	Xuan Zhuo, linux-kernel, stable, virtualization

On Mon, Apr 13, 2026 at 2:23 PM Jinhui Guo <guojinhui.liam@bytedance.com> wrote:
>
> On Mon, Apr 13, 2026 at 10:17:59 +0100, David Laight wrote:
> > Or do the allocate before acquiring the lock (and free it not used
> > in the error path).
>
> Hi David,
>
> Thanks for the suggestion.
>
> Pre-allocating the memory outside the lock is indeed a good practice,
> but unfortunately it doesn't work in this specific virtqueue context.
>
> The kmalloc() in question is not happening at the virtqueue_exec_admin_cmd()
> level. Instead, it is deeply embedded inside virtqueue_add_sgs()
> (specifically, in functions like alloc_indirect_split() or
> virtqueue_add_indirect_packed()) to allocate indirect descriptors when
> multiple SG elements are provided.
>
> As a caller, we have no mechanism to pre-allocate this indirect descriptor
> memory and pass it down to virtqueue_add_sgs(). Furthermore, virtqueue_add_sgs()
> needs to atomically check the queue's num_free status, allocate the indirect
> table if necessary, and update the queue pointers. All these operations
> must be protected by admin_vq->lock to prevent concurrent admin command
> submissions from corrupting the virtqueue state.
>

Sounds like a big chunk of that is achieved with virtqueue_map_* and
virtqueue_add_{in,out}buf_premapped functions, isn't it? Or am I
missing something?

> Therefore, allocating before acquiring the lock isn't feasible here, and
> replacing GFP_KERNEL with GFP_ATOMIC (with a proper sleepable retry upon
> failure) seems to be the more viable fix.
>
> Does this make sense?
>
> Thanks,
> Jinhui
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd()
  2026-04-13  7:45 ` Michael S. Tsirkin
  2026-04-13  9:17   ` David Laight
@ 2026-04-13 10:00   ` Jinhui Guo
  1 sibling, 0 replies; 7+ messages in thread
From: Jinhui Guo @ 2026-04-13 10:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eugenio Pérez, Jinhui Guo, Jason Wang, Jiri Pirko, Xuan Zhuo,
	linux-kernel, stable, virtualization

On Mon, Apr 13, 2026 at 03:45:20 -0400, "Michael S. Tsirkin" wrote:
> GFP_ATOMIC allocations can and will fail. If using them, one must
> retry, not just propagate failures.
> Or just switch admin_vq->lock to a mutex?

Hi Michael,

Thank you for the review.

Regarding the suggestion to switch admin_vq->lock to a mutex:

The virtqueue callback vp_modern_avq_done() holds admin_vq->lock and
runs in an interrupt handler context, making it impractical to replace
the spinlock with a mutex directly.

I considered deferring the completion to a workqueue so we could safely
use a mutex, but since this is a bug fix destined for stable@vger.kernel.org,
doing so would introduce significant code churn (e.g., handling INIT_WORK,
cancel_work_sync during cleanup, etc.) and increase the risk for backports.

Therefore, using GFP_ATOMIC with the existing spinlock seems to be the most
minimal and safest approach for a fix.

However, just replacing GFP_KERNEL with GFP_ATOMIC isn't entirely safe
because of how virtqueue_add_sgs() handles allocation failures. If kmalloc()
fails under memory pressure with GFP_ATOMIC, the function falls back to using
direct descriptors. If there are not enough free direct descriptors, it
ultimately returns -ENOSPC.

In the current code, -ENOSPC is handled with a busy loop:

if (ret == -ENOSPC) {
	spin_unlock_irqrestore(&admin_vq->lock, flags);
	cpu_relax();
	goto again;
}

If the -ENOSPC is actually caused by a GFP_ATOMIC allocation failure under
memory pressure, this cpu_relax() loop will never yield the CPU to memory
reclaim mechanisms (like kswapd), potentially leading to a soft lockup.

To properly handle both actual queue-full conditions and GFP_ATOMIC failures,
I propose replacing cpu_relax() with a sleep (e.g., usleep_range(10, 100)).
This allows memory reclaim to run while we wait.

I plan to send out a v2 patch with this modification:

--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -101,11 +101,11 @@ static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq,
                return -EIO;

        spin_lock_irqsave(&admin_vq->lock, flags);
-       ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_KERNEL);
+       ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, cmd, GFP_ATOMIC);
        if (ret < 0) {
                if (ret == -ENOSPC) {
                        spin_unlock_irqrestore(&admin_vq->lock, flags);
-                       cpu_relax();
+                       usleep_range(10, 100);
                        goto again;
                }
                goto unlock_err;

Does this approach align with your expectations for the fix?

Thanks,
Jinhui

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-13 14:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13  7:22 [PATCH] virtio_pci_modern: Use GFP_ATOMIC with spin_lock_irqsave held in virtqueue_exec_admin_cmd() Jinhui Guo
2026-04-13  7:45 ` Michael S. Tsirkin
2026-04-13  9:17   ` David Laight
2026-04-13 12:22     ` Jinhui Guo
2026-04-13 13:33       ` David Laight
2026-04-13 14:14       ` Eugenio Perez Martin
2026-04-13 10:00   ` Jinhui Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox