Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* [PATCH] vsock/virtio: fix memory leak in virtio_transport_recv_listen
From: Divya Mankani @ 2026-06-05 19:19 UTC (permalink / raw)
  To: kuba, pabeni
  Cc: horms, virtualization, kvm, netdev, linux-kernel,
	syzbot+1b2c9c4a0f8708082678, Divya Mankani

Syzbot reported a memory leak inside virtio_transport_recv_listen
caused by a race condition when the parent listener socket shuts down
while an incoming packet is being enqueued.

Fix this by locking the parent socket and verifying its shutdown
state under the lock before executing vsock_enqueue_accept().

Fixes: a478546a782a ("vsock/virtio: add support for listen sockets")
Reported-by: syzbot+1b2c9c4a0f8708082678@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1b2c9c4a0f8708082678
Signed-off-by: Divya Mankani <divyakm@unc.edu>
---
 net/vmw_vsock/virtio_transport_common.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3b294164b..8006a13bb 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1571,15 +1571,20 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
 	vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid),
 			le32_to_cpu(hdr->src_port));
 
-	ret = vsock_assign_transport(vchild, vsk);
-	/* Transport assigned (looking at remote_addr) must be the same
-	 * where we received the request.
+	/* Lock the parent listener socket to synchronize with a potential
+	 * simultaneous shutdown thread running __vsock_release().
 	 */
-	if (ret || vchild->transport != &t->transport) {
+	lock_sock(sk);
+
+	/* Check if the listener socket was shut down while we were
+	 * creating and configuring the child socket.
+	 */
+	if (sk->sk_shutdown == SHUTDOWN_MASK) {
+		release_sock(sk);
 		release_sock(child);
 		virtio_transport_reset_no_sock(t, skb, sock_net(sk));
 		sock_put(child);
-		return ret;
+		return -ESHUTDOWN;
 	}
 
 	sk_acceptq_added(sk);
@@ -1590,6 +1595,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
 	vsock_enqueue_accept(sk, child);
 	virtio_transport_send_response(vchild, skb);
 
+	/* Safely release both locked objects */
+	release_sock(sk);
 	release_sock(child);
 
 	sk->sk_data_ready(sk);
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related

* Re: [PATCH] VIRTIO: Update the desc 'flag' fied last in packed ring.
From: Si-Wei Liu @ 2026-06-05 18:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eugenio Perez Martin, yangjiale, Jason Wang, Xuan Zhuo,
	virtualization, linux-kernel, Andrew.Boyer
In-Reply-To: <20260605134252-mutt-send-email-mst@kernel.org>



On 6/5/2026 10:43 AM, Michael S. Tsirkin wrote:
> On Fri, Jun 05, 2026 at 09:03:36AM -0700, Si-Wei Liu wrote:
>>
>> On 6/1/2026 11:04 PM, Eugenio Perez Martin wrote:
>>> On Tue, Jun 2, 2026 at 6:34 AM yangjiale <yangjiale133@163.com> wrote:
>>>> When a descriptor list spans across cache lines,
>>>> updating the flag first can lead to a scenario where the device side
>>>> perceives the flag as valid, yet the corresponding address and length
>>>> fields remain unupdated—resulting in invalid values.
>>>> Therefore, the flag field must be updated last.
>>>>
>>>> Signed-off-by: yangjiale <yangjiale133@163.com>
>>>> ---
>>>>    drivers/virtio/virtio_ring.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>> index fbca7ce1c6bf..036b4f90d30f 100644
>>>> --- a/drivers/virtio/virtio_ring.c
>>>> +++ b/drivers/virtio/virtio_ring.c
>>>> @@ -1688,6 +1688,10 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
>>>>                                                &addr, &len, premapped, attr))
>>>>                                   goto unmap_release;
>>>>
>>>> +                       desc[i].addr = cpu_to_le64(addr);
>>>> +                       desc[i].len = cpu_to_le32(len);
>>>> +                       desc[i].id = cpu_to_le16(id);
>>>> +
>>>>                           flags = cpu_to_le16(vq->packed.avail_used_flags |
>>>>                                       (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
>>>>                                       (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
>>>> @@ -1696,10 +1700,6 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
>>>>                           else
>>>>                                   desc[i].flags = flags;
>>>>
>>>> -                       desc[i].addr = cpu_to_le64(addr);
>>>> -                       desc[i].len = cpu_to_le32(len);
>>>> -                       desc[i].id = cpu_to_le16(id);
>>>> -
>>>>                           if (unlikely(vq->use_map_api)) {
>>>>                                   vq->packed.desc_extra[curr].addr = premapped ?
>>>>                                           DMA_MAPPING_ERROR : addr;
>>> These flags are updated before the flags of the head descriptor at the
>>> end of the function, at "vq->packed.vring.desc[head].flags =
>>> head_flags", so the device should not see these. Because of that, the
>>> relative order between the rest of the fields of the same descriptor
>>> or other descriptors' fields, except for the head descriptor's flags,
>>> should not matter. There is a write memory barrier just before
>>> updating the head's flags.
>> The above analysis is absolutely correct. Though one hardware vendor told me
>> that this driver implementation kinda stops them from reading ahead of
>> descriptors already posted beyond the available index., ending up with
>> suboptimal performance that is hard to make up by other means. Would it be a
>> bad idea to go with this change and add write barrier in a gentle way for a
>> small flit in the batch, e.g. commit to memory after every cache line size
>> worth of descriptors are posted? Would the memory barrier have negative
>> performance overhead to other backend implementation variants than real
>> hardware PCI device?
>>
>> -Siwei
> this would need a new feature bit, won't it?
Probably. This is to capture the device's expectation and behavior 
right? the driver change itself is not spec violating...

>
>>> Also, I don't get why the cache line matters here. Can you expand? Am
>>> I missing something?
> me too.
>
Just to avoid extra delay due to excessive coherency messages and 
frequent cache thrashing, device read over pci bus contends with host 
write/update on the descriptors in a same cache line..

-Siwei

^ permalink raw reply

* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Sean Christopherson @ 2026-06-05 18:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	Kiryl Shutsemau, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov, Jan Kiszka,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	John Stultz, H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <87fr315fq9.ffs@fw13>

On Fri, Jun 05, 2026, Thomas Gleixner wrote:
> On Fri, May 29 2026 at 07:43, Sean Christopherson wrote:
> > Don't re-calibrate the TSC frequency if the TSC is known to run at a fixed
> > frequency.
> 
> That's misleading because fixed frequency means that the frequency does
> not change, i.e. X86_FEATURE_CONSTANT_TSC is set. But
> X86_FEATURE_CONSTANT_TSC does not imply that the frequency can be read
> from CPUID/MSRs.

Sorry, "if the TSC runs at a known, fixed frequency" would be a better way to
phrase this.

> > In practice, this is likely one big nop, as re-calibration is
> > used only for SMP=n kernels, and only for hardware that is 20+ years old,
> > i.e. is extremely unlikely to collide with TSC_KNOWN_FREQ.
> 
> recalibrate_cpu_khz() is only invoked from Intel P4 and AMD K7 CPU
> frequency drivers, which means that's absolutely not interesting and
> neither X86_FEATURE_CONSTANT_TSC nor X86_FEATURE_TSC_KNOWN_FREQ can be
> set on those systems.

It _shouldn't_ be set on those systems, but in the world of virtualization it's
not completely impossible.

> IOW, this patch is pointless voodoo ware.

Would y'all be opposed to adding a WARN?  I don't actually care about P4 or K7
CPUs, but without any reference to X86_FEATURE_TSC_KNOWN_FREQ in
recalibrate_cpu_khz(), the code _looks_ wrong, and so is very confusing for
readers that don't already know that in practice, it's limited to ancient CPUs.

In other words, the point is to document expectations and mutual exclusion, not
to "fix" anything.

^ permalink raw reply

* Re: [PATCH] VIRTIO: Update the desc 'flag' fied last in packed ring.
From: Michael S. Tsirkin @ 2026-06-05 17:43 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Eugenio Perez Martin, yangjiale, Jason Wang, Xuan Zhuo,
	virtualization, linux-kernel, Andrew.Boyer
In-Reply-To: <6035a8f3-e225-45b0-9f48-55de953bff15@oracle.com>

On Fri, Jun 05, 2026 at 09:03:36AM -0700, Si-Wei Liu wrote:
> 
> 
> On 6/1/2026 11:04 PM, Eugenio Perez Martin wrote:
> > On Tue, Jun 2, 2026 at 6:34 AM yangjiale <yangjiale133@163.com> wrote:
> > > When a descriptor list spans across cache lines,
> > > updating the flag first can lead to a scenario where the device side
> > > perceives the flag as valid, yet the corresponding address and length
> > > fields remain unupdated—resulting in invalid values.
> > > Therefore, the flag field must be updated last.
> > > 
> > > Signed-off-by: yangjiale <yangjiale133@163.com>
> > > ---
> > >   drivers/virtio/virtio_ring.c | 8 ++++----
> > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index fbca7ce1c6bf..036b4f90d30f 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1688,6 +1688,10 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
> > >                                               &addr, &len, premapped, attr))
> > >                                  goto unmap_release;
> > > 
> > > +                       desc[i].addr = cpu_to_le64(addr);
> > > +                       desc[i].len = cpu_to_le32(len);
> > > +                       desc[i].id = cpu_to_le16(id);
> > > +
> > >                          flags = cpu_to_le16(vq->packed.avail_used_flags |
> > >                                      (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> > >                                      (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
> > > @@ -1696,10 +1700,6 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
> > >                          else
> > >                                  desc[i].flags = flags;
> > > 
> > > -                       desc[i].addr = cpu_to_le64(addr);
> > > -                       desc[i].len = cpu_to_le32(len);
> > > -                       desc[i].id = cpu_to_le16(id);
> > > -
> > >                          if (unlikely(vq->use_map_api)) {
> > >                                  vq->packed.desc_extra[curr].addr = premapped ?
> > >                                          DMA_MAPPING_ERROR : addr;
> > These flags are updated before the flags of the head descriptor at the
> > end of the function, at "vq->packed.vring.desc[head].flags =
> > head_flags", so the device should not see these. Because of that, the
> > relative order between the rest of the fields of the same descriptor
> > or other descriptors' fields, except for the head descriptor's flags,
> > should not matter. There is a write memory barrier just before
> > updating the head's flags.
> The above analysis is absolutely correct. Though one hardware vendor told me
> that this driver implementation kinda stops them from reading ahead of
> descriptors already posted beyond the available index., ending up with
> suboptimal performance that is hard to make up by other means. Would it be a
> bad idea to go with this change and add write barrier in a gentle way for a
> small flit in the batch, e.g. commit to memory after every cache line size
> worth of descriptors are posted? Would the memory barrier have negative
> performance overhead to other backend implementation variants than real
> hardware PCI device?
> 
> -Siwei

this would need a new feature bit, won't it?

> > 
> > Also, I don't get why the cache line matters here. Can you expand? Am
> > I missing something?
> 

me too.


^ permalink raw reply

* Re: [PATCH] VIRTIO: Update the desc 'flag' fied last in packed ring.
From: Si-Wei Liu @ 2026-06-05 16:03 UTC (permalink / raw)
  To: Eugenio Perez Martin, yangjiale
  Cc: Michael S . Tsirkin, Jason Wang, Xuan Zhuo, virtualization,
	linux-kernel, Andrew.Boyer
In-Reply-To: <CAJaqyWfugEttrQuR5_LrbQaivAqCaixsprUOjEVtdiz_sexFpA@mail.gmail.com>



On 6/1/2026 11:04 PM, Eugenio Perez Martin wrote:
> On Tue, Jun 2, 2026 at 6:34 AM yangjiale <yangjiale133@163.com> wrote:
>> When a descriptor list spans across cache lines,
>> updating the flag first can lead to a scenario where the device side
>> perceives the flag as valid, yet the corresponding address and length
>> fields remain unupdated—resulting in invalid values.
>> Therefore, the flag field must be updated last.
>>
>> Signed-off-by: yangjiale <yangjiale133@163.com>
>> ---
>>   drivers/virtio/virtio_ring.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index fbca7ce1c6bf..036b4f90d30f 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -1688,6 +1688,10 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
>>                                               &addr, &len, premapped, attr))
>>                                  goto unmap_release;
>>
>> +                       desc[i].addr = cpu_to_le64(addr);
>> +                       desc[i].len = cpu_to_le32(len);
>> +                       desc[i].id = cpu_to_le16(id);
>> +
>>                          flags = cpu_to_le16(vq->packed.avail_used_flags |
>>                                      (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
>>                                      (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
>> @@ -1696,10 +1700,6 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
>>                          else
>>                                  desc[i].flags = flags;
>>
>> -                       desc[i].addr = cpu_to_le64(addr);
>> -                       desc[i].len = cpu_to_le32(len);
>> -                       desc[i].id = cpu_to_le16(id);
>> -
>>                          if (unlikely(vq->use_map_api)) {
>>                                  vq->packed.desc_extra[curr].addr = premapped ?
>>                                          DMA_MAPPING_ERROR : addr;
> These flags are updated before the flags of the head descriptor at the
> end of the function, at "vq->packed.vring.desc[head].flags =
> head_flags", so the device should not see these. Because of that, the
> relative order between the rest of the fields of the same descriptor
> or other descriptors' fields, except for the head descriptor's flags,
> should not matter. There is a write memory barrier just before
> updating the head's flags.
The above analysis is absolutely correct. Though one hardware vendor 
told me that this driver implementation kinda stops them from reading 
ahead of descriptors already posted beyond the available index., ending 
up with suboptimal performance that is hard to make up by other means. 
Would it be a bad idea to go with this change and add write barrier in a 
gentle way for a small flit in the batch, e.g. commit to memory after 
every cache line size worth of descriptors are posted? Would the memory 
barrier have negative performance overhead to other backend 
implementation variants than real hardware PCI device?

-Siwei
>
> Also, I don't get why the cache line matters here. Can you expand? Am
> I missing something?



^ permalink raw reply

* Re: [PATCH v1] vsock/virtio: rework MSG_ZEROCOPY flag handling
From: David Laight @ 2026-06-05 15:08 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman, Xuan Zhuo, Eugenio Pérez,
	Simon Horman, kvm, virtualization, netdev, linux-kernel, oxffffaa,
	rulkc
In-Reply-To: <20260605115314.552321-1-avkrasnov@rulkc.org>

On Fri,  5 Jun 2026 14:53:14 +0300
Arseniy Krasnov <avkrasnov@rulkc.org> wrote:

> Logically it was based on TCP implementation, so to make further
> support easier, rewrite it in the TCP way.
> 
> Signed-off-by: Arseniy Krasnov <avkrasnov@rulkc.org>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 64 ++++++++++++-------------
>  1 file changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 2fd9eaaf5ca6..00caeeaa5590 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -73,10 +73,13 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>  static int virtio_transport_fill_skb(struct sk_buff *skb,
>  				     struct virtio_vsock_pkt_info *info,
>  				     size_t len,
> -				     bool zcopy)
> +				     bool zcopy, struct ubuf_info *uarg)
>  {
>  	struct msghdr *msg = info->msg;
>  
> +	/* We have completion - attach it to 'skb'. */
> +	skb_zcopy_set(skb, uarg, NULL);
> +
>  	if (zcopy)
>  		return __zerocopy_sg_from_iter(msg, NULL, skb,
>  					       &msg->msg_iter, len, NULL);
> @@ -208,7 +211,8 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
>  						  u32 src_cid,
>  						  u32 src_port,
>  						  u32 dst_cid,
> -						  u32 dst_port)
> +						  u32 dst_port,
> +						  struct ubuf_info *uarg)
>  {
>  	struct vsock_sock *vsk;
>  	struct sk_buff *skb;
> @@ -245,7 +249,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
>  	if (info->msg && payload_len > 0) {
>  		int err;
>  
> -		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
> +		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy, uarg);
>  		if (err)
>  			goto out;
>  
> @@ -321,38 +325,36 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>  		return pkt_len;
>  
> -	if (info->msg) {
> -		/* If zerocopy is not enabled by 'setsockopt()', we behave as
> -		 * there is no MSG_ZEROCOPY flag set.
> +	if (info->msg && (info->msg->msg_flags & MSG_ZEROCOPY)) {
> +		/* If 'info->msg' is not NULL, this is only VIRTIO_VSOCK_OP_RW.
> +		 * 'MSG_ZEROCOPY' flag handling here is based on the same flag
> +		 * handling from 'tcp_sendmsg_locked()'.
>  		 */
> -		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
> -			info->msg->msg_flags &= ~MSG_ZEROCOPY;
> +		if (info->msg->msg_ubuf) {
> +			uarg = info->msg->msg_ubuf;
> +			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
> +		} else if (sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY)) {
> +			uarg = msg_zerocopy_realloc(sk_vsock(vsk), pkt_len,
> +						    NULL, false);
> +			if (!uarg) {
> +				virtio_transport_put_credit(vvs, pkt_len);
> +				return -ENOMEM;
> +			}
>  
> -		if (info->msg->msg_flags & MSG_ZEROCOPY)
>  			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
>  
> +			if (!can_zcopy)
> +				uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> +			have_uref = true;
> +		}
> +
> +		/* 'can_zcopy' means that this transmission will be
> +		 * in zerocopy way (e.g. using 'frags' array).
> +		 */

I've not looked at the tcp code, but the above doesn't look right.
I don't see why msg->msg_ubuf might be non-NULL without SOCK_ZEROCOPY set.
That would give the outer code a callback when the last skb is freed but
still copy the data.

I also don't see the point of calling msg_zerocopy_realloc() to get a
callback when the last skb is freed and then setting
	uarg_to_msgzc(uarg)->zerocopy = 0;
so that the callback doesn't actually do anything.
It isn't as though you 'find out' later on that you can't actually do
zerocopy.

>  		if (can_zcopy)
>  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>  					    (MAX_SKB_FRAGS * PAGE_SIZE));
> -
> -		if (info->msg->msg_flags & MSG_ZEROCOPY &&
> -		    info->op == VIRTIO_VSOCK_OP_RW) {
> -			uarg = info->msg->msg_ubuf;
> -
> -			if (!uarg) {
> -				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> -							    pkt_len, NULL, false);
> -				if (!uarg) {
> -					virtio_transport_put_credit(vvs, pkt_len);
> -					return -ENOMEM;
> -				}
> -
> -				if (!can_zcopy)
> -					uarg_to_msgzc(uarg)->zerocopy = 0;
> -
> -				have_uref = true;
> -			}
> -		}
>  	}
>  
>  	rest_len = pkt_len;
> @@ -365,14 +367,12 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  
>  		skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
>  						 src_cid, src_port,
> -						 dst_cid, dst_port);
> +						 dst_cid, dst_port, uarg);
>  		if (!skb) {
>  			ret = -ENOMEM;
>  			break;
>  		}
>  
> -		skb_zcopy_set(skb, uarg, NULL);

Aren't you passing uarg through two function calls instead of doing it here.
Doesn't even make it clearer what is going on.

-- David

> -
>  		virtio_transport_inc_tx_pkt(vvs, skb);
>  
>  		ret = t_ops->send_pkt(skb, info->net);
> @@ -1178,7 +1178,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
>  					   le64_to_cpu(hdr->dst_cid),
>  					   le32_to_cpu(hdr->dst_port),
>  					   le64_to_cpu(hdr->src_cid),
> -					   le32_to_cpu(hdr->src_port));
> +					   le32_to_cpu(hdr->src_port), NULL);
>  	if (!reply)
>  		return -ENOMEM;
>  


^ permalink raw reply

* [PATCH] virtio-mmio: add support for transport version 3
From: Peter Hilber @ 2026-06-05 14:29 UTC (permalink / raw)
  To: Michael S . Tsirkin, Jason Wang
  Cc: Trilok Soni, virtio-dev, Xuan Zhuo, Eugenio Pérez,
	virtualization, linux-kernel, Peter Hilber

Virtio MMIO transport version 3 allows device reset to complete
asynchronously. Unlike version 2, where writing zero to Status must
complete the reset before the write returns, version 3 requires the
driver to poll Status until it reads back zero before considering reset
complete.

Update virtio-mmio accordingly: accept transport version 3 and, during
reset, wait for Status to become zero. Keep the polling loop unbounded,
consistent with virtio-pci, since the reset callback does not return an
error code.

Signed-off-by: Peter Hilber <peter.hilber@oss.qualcomm.com>
Link: https://github.com/oasis-tcs/virtio-spec/commit/bb1dd2e1fe89b862f38f15873d835a698b196f89
---
 drivers/virtio/virtio_mmio.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 510b7c4efdff..316f03b97356 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -55,6 +55,7 @@
 #define pr_fmt(fmt) "virtio-mmio: " fmt
 
 #include <linux/acpi.h>
+#include <linux/delay.h>
 #include <linux/dma-mapping.h>
 #include <linux/highmem.h>
 #include <linux/interrupt.h>
@@ -114,9 +115,9 @@ static int vm_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	/* Make sure there are no mixed devices */
-	if (vm_dev->version == 2 &&
+	if (vm_dev->version >= 2 &&
 			!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
-		dev_err(&vdev->dev, "New virtio-mmio devices (version 2) must provide VIRTIO_F_VERSION_1 feature!\n");
+		dev_err(&vdev->dev, "New virtio-mmio devices (version >= 2) must provide VIRTIO_F_VERSION_1 feature!\n");
 		return -EINVAL;
 	}
 
@@ -254,6 +255,12 @@ static void vm_reset(struct virtio_device *vdev)
 
 	/* 0 status means a reset. */
 	writel(0, vm_dev->base + VIRTIO_MMIO_STATUS);
+
+	if (vm_dev->version >= 3) {
+		/* Wait for reset to complete. */
+		while (vm_get_status(vdev))
+			fsleep(1000);
+	}
 }
 
 
@@ -600,7 +607,7 @@ static int virtio_mmio_probe(struct platform_device *pdev)
 
 	/* Check device version */
 	vm_dev->version = readl(vm_dev->base + VIRTIO_MMIO_VERSION);
-	if (vm_dev->version < 1 || vm_dev->version > 2) {
+	if (vm_dev->version < 1 || vm_dev->version > 3) {
 		dev_err(&pdev->dev, "Version %ld not supported!\n",
 				vm_dev->version);
 		rc = -ENXIO;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v4 02/47] x86/tsc: Add a standalone helpers for getting TSC info from CPUID.0x15
From: Thomas Gleixner @ 2026-06-05 12:37 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
	Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
  Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <20260529144435.704127-3-seanjc@google.com>

On Fri, May 29 2026 at 07:43, Sean Christopherson wrote:
>  		cpuid(CPUID_LEAF_FREQ, &eax_base_mhz, &ebx, &ecx, &edx);
> -		crystal_khz = eax_base_mhz * 1000 *
> -			eax_denominator / ebx_numerator;
> +		info.crystal_khz = eax_base_mhz * 1000 *
> +			info.denominator / info.numerator;

Please get rid of this ugly line break. You have 100 characters.


^ permalink raw reply

* Re: [PATCH v4 01/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: Thomas Gleixner @ 2026-06-05 12:33 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Kiryl Shutsemau, Sean Christopherson,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Andy Lutomirski,
	Peter Zijlstra, Juergen Gross, Daniel Lezcano, John Stultz
  Cc: H. Peter Anvin, Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, kvm, linux-kernel, linux-coco, linux-hyperv,
	virtualization, xen-devel, David Woodhouse, Tom Lendacky,
	Nikunj A Dadhania, David Woodhouse, Michael Kelley
In-Reply-To: <20260529144435.704127-2-seanjc@google.com>

On Fri, May 29 2026 at 07:43, Sean Christopherson wrote:
> Don't re-calibrate the TSC frequency if the TSC is known to run at a fixed
> frequency.

That's misleading because fixed frequency means that the frequency does
not change, i.e. X86_FEATURE_CONSTANT_TSC is set. But
X86_FEATURE_CONSTANT_TSC does not imply that the frequency can be read
from CPUID/MSRs.

> In practice, this is likely one big nop, as re-calibration is
> used only for SMP=n kernels, and only for hardware that is 20+ years old,
> i.e. is extremely unlikely to collide with TSC_KNOWN_FREQ.

recalibrate_cpu_khz() is only invoked from Intel P4 and AMD K7 CPU
frequency drivers, which means that's absolutely not interesting and
neither X86_FEATURE_CONSTANT_TSC nor X86_FEATURE_TSC_KNOWN_FREQ can be
set on those systems.

IOW, this patch is pointless voodoo ware.

Thanks,

        tglx

^ permalink raw reply

* [PATCH v1] vsock/virtio: rework MSG_ZEROCOPY flag handling
From: Arseniy Krasnov @ 2026-06-05 11:53 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman, Xuan Zhuo, Eugenio Pérez,
	Simon Horman
  Cc: kvm, virtualization, netdev, linux-kernel, oxffffaa, rulkc,
	Arseniy Krasnov

Logically it was based on TCP implementation, so to make further
support easier, rewrite it in the TCP way.

Signed-off-by: Arseniy Krasnov <avkrasnov@rulkc.org>
---
 net/vmw_vsock/virtio_transport_common.c | 64 ++++++++++++-------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 2fd9eaaf5ca6..00caeeaa5590 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -73,10 +73,13 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
 static int virtio_transport_fill_skb(struct sk_buff *skb,
 				     struct virtio_vsock_pkt_info *info,
 				     size_t len,
-				     bool zcopy)
+				     bool zcopy, struct ubuf_info *uarg)
 {
 	struct msghdr *msg = info->msg;
 
+	/* We have completion - attach it to 'skb'. */
+	skb_zcopy_set(skb, uarg, NULL);
+
 	if (zcopy)
 		return __zerocopy_sg_from_iter(msg, NULL, skb,
 					       &msg->msg_iter, len, NULL);
@@ -208,7 +211,8 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
 						  u32 src_cid,
 						  u32 src_port,
 						  u32 dst_cid,
-						  u32 dst_port)
+						  u32 dst_port,
+						  struct ubuf_info *uarg)
 {
 	struct vsock_sock *vsk;
 	struct sk_buff *skb;
@@ -245,7 +249,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
 	if (info->msg && payload_len > 0) {
 		int err;
 
-		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
+		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy, uarg);
 		if (err)
 			goto out;
 
@@ -321,38 +325,36 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
 		return pkt_len;
 
-	if (info->msg) {
-		/* If zerocopy is not enabled by 'setsockopt()', we behave as
-		 * there is no MSG_ZEROCOPY flag set.
+	if (info->msg && (info->msg->msg_flags & MSG_ZEROCOPY)) {
+		/* If 'info->msg' is not NULL, this is only VIRTIO_VSOCK_OP_RW.
+		 * 'MSG_ZEROCOPY' flag handling here is based on the same flag
+		 * handling from 'tcp_sendmsg_locked()'.
 		 */
-		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
-			info->msg->msg_flags &= ~MSG_ZEROCOPY;
+		if (info->msg->msg_ubuf) {
+			uarg = info->msg->msg_ubuf;
+			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
+		} else if (sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY)) {
+			uarg = msg_zerocopy_realloc(sk_vsock(vsk), pkt_len,
+						    NULL, false);
+			if (!uarg) {
+				virtio_transport_put_credit(vvs, pkt_len);
+				return -ENOMEM;
+			}
 
-		if (info->msg->msg_flags & MSG_ZEROCOPY)
 			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
 
+			if (!can_zcopy)
+				uarg_to_msgzc(uarg)->zerocopy = 0;
+
+			have_uref = true;
+		}
+
+		/* 'can_zcopy' means that this transmission will be
+		 * in zerocopy way (e.g. using 'frags' array).
+		 */
 		if (can_zcopy)
 			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
 					    (MAX_SKB_FRAGS * PAGE_SIZE));
-
-		if (info->msg->msg_flags & MSG_ZEROCOPY &&
-		    info->op == VIRTIO_VSOCK_OP_RW) {
-			uarg = info->msg->msg_ubuf;
-
-			if (!uarg) {
-				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
-							    pkt_len, NULL, false);
-				if (!uarg) {
-					virtio_transport_put_credit(vvs, pkt_len);
-					return -ENOMEM;
-				}
-
-				if (!can_zcopy)
-					uarg_to_msgzc(uarg)->zerocopy = 0;
-
-				have_uref = true;
-			}
-		}
 	}
 
 	rest_len = pkt_len;
@@ -365,14 +367,12 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 
 		skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
 						 src_cid, src_port,
-						 dst_cid, dst_port);
+						 dst_cid, dst_port, uarg);
 		if (!skb) {
 			ret = -ENOMEM;
 			break;
 		}
 
-		skb_zcopy_set(skb, uarg, NULL);
-
 		virtio_transport_inc_tx_pkt(vvs, skb);
 
 		ret = t_ops->send_pkt(skb, info->net);
@@ -1178,7 +1178,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 					   le64_to_cpu(hdr->dst_cid),
 					   le32_to_cpu(hdr->dst_port),
 					   le64_to_cpu(hdr->src_cid),
-					   le32_to_cpu(hdr->src_port));
+					   le32_to_cpu(hdr->src_port), NULL);
 	if (!reply)
 		return -ENOMEM;
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] tools/virtio: check mmap return value in vringh_test
From: Michael S. Tsirkin @ 2026-06-05 10:06 UTC (permalink / raw)
  To: longlong yan; +Cc: jasowang, virtualization, xuanzhuo, eperezma, linux-kernel
In-Reply-To: <20260605021446.1611-1-yanlonglong@kylinos.cn>

On Fri, Jun 05, 2026 at 10:14:45AM +0800, longlong yan wrote:
> In parallel_test(), the return values of mmap() for both host_map and
> guest_map are not checked against MAP_FAILED. If mmap() fails, the
> subsequent code will dereference the invalid pointer, leading to a
> segmentation fault.
> 
> Add MAP_FAILED checks after both mmap() calls, using err() to report
> the error and exit, consistent with the existing error handling style
> in this file (e.g., the open() call on line 149).
> 
> Fixes: 1515c5ce26ae("tools/virtio: add vring_test.")
> Signed-off-by: longlong yan <yanlonglong@kylinos.cn>
> ---
>  tools/virtio/vringh_test.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
> index b9591223437a..8cc5ca6c7cca 100644
> --- a/tools/virtio/vringh_test.c
> +++ b/tools/virtio/vringh_test.c
> @@ -159,7 +159,12 @@ static int parallel_test(u64 features,
>  
>  	/* Parent and child use separate addresses, to check our mapping logic! */
>  	host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> +	if (host_map == MAP_FAILED)
> +		err(1, "mmap host_map");
> +	


trailing whitespace here.

>  	guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> +	if (guest_map == MAP_FAILED)
> +		err(1, "mmap guest_map");
>  
>  	pipe_ret = pipe(to_guest);
>  	assert(!pipe_ret);
> -- 
> 2.43.0


^ permalink raw reply

* Re: [PATCH 2/2] iommu/virtio: Avoid using the list iterator past the loop in viommu_add_resv_mem()
From: Robin Murphy @ 2026-06-05 10:05 UTC (permalink / raw)
  To: Maoyi Xie, Jean-Philippe Brucker
  Cc: joro, will, iommu, linux-kernel, virtualization
In-Reply-To: <CAHPEe=HV+wpgg3YMQcOxN5M7c+ew6MQRgY0YgW-3KgB91gn_sw@mail.gmail.com>

On 2026-06-05 9:59 am, Maoyi Xie wrote:
> Hi Jean-Philippe,
> 
> On Fri, Jun 5, 2026 at 12:04 AM Jean-Philippe Brucker <jpb@kernel.org> wrote:
>> Thank you for removing this hack, though I don't find a contract in the
>> list_for_each_entry() doc, and the fix still accesses a cursor outside the
>> loop. Since you mentioned C11 UB in another email, do you have more info
>> on the precise operation which is undefined in the kernel (container_of
>> into an invalid object or the &next->list addition)?  Just so I can avoid
>> it in the future.
> 
>> Anyway, thanks for the patch. If this is just general cleanup that's fine
>> too.
> 
> Thanks for the review. You're right.
> 
> There is no such contract in the docs. I overstated it. And no undefined
> pointer arithmetic happens here either.
> 
> iommu_resv_region has list as its first member, so offsetof is 0. That
> makes container_of(&vdev->resv_regions, struct iommu_resv_region, list)
> just (char *)&vdev->resv_regions - 0, i.e. the head with a different
> type. &next->list is the head again. Nothing reads next as an
> iommu_resv_region, so no pointer leaves its object.
> 
> The container_of would be undefined only if list was not the first
> member. Then the subtraction lands before the real object (C11 6.5.6).
> That is not the case here.

If that were a real issue, then I think list_entry_is_head() in the 
list_for_each_entry() iteration itself would suffer from it too. It's 
probably safe to say that while the C language in general leaves this 
open, Linux relies on GCC using pointer representations where the 
arithmetic will always work out, same as we depend on two's complement 
representation of integers all over the place.

I believe the concern is more just that if the pattern of using a 
list_entry iterator variable outside the scope of the loop isn't 
discouraged, then it's all too easy for subsequent code to inadvertently 
dereference fields *other* than the list_head without checking that it 
is a valid entry, which definitely would then be the worst kind of UB.

Thanks,
Robin.

> So this is cleanup, not a UB fix. The typed cursor points at the head but
> reads like an entry. That becomes a real bug the day someone reorders the
> struct or touches another field. The new pos stays a struct list_head *,
> which is just the head, so it avoids all of that. I should have said that
> in the message.
> 
> If you or Joerg prefer, I can resend the series with the wording fixed.
> 
> Thanks,
> Maoyi


^ permalink raw reply

* Re: [PATCH 2/2] iommu/virtio: Avoid using the list iterator past the loop in viommu_add_resv_mem()
From: Jean-Philippe Brucker @ 2026-06-05 10:01 UTC (permalink / raw)
  To: Maoyi Xie; +Cc: joro, will, robin.murphy, iommu, linux-kernel, virtualization
In-Reply-To: <CAHPEe=HV+wpgg3YMQcOxN5M7c+ew6MQRgY0YgW-3KgB91gn_sw@mail.gmail.com>

On Fri, Jun 05, 2026 at 04:59:47PM +0800, Maoyi Xie wrote:
> Hi Jean-Philippe,
> 
> On Fri, Jun 5, 2026 at 12:04 AM Jean-Philippe Brucker <jpb@kernel.org> wrote:
> > Thank you for removing this hack, though I don't find a contract in the
> > list_for_each_entry() doc, and the fix still accesses a cursor outside the
> > loop. Since you mentioned C11 UB in another email, do you have more info
> > on the precise operation which is undefined in the kernel (container_of
> > into an invalid object or the &next->list addition)?  Just so I can avoid
> > it in the future.
> 
> > Anyway, thanks for the patch. If this is just general cleanup that's fine
> > too.
> 
> Thanks for the review. You're right.
> 
> There is no such contract in the docs. I overstated it. And no undefined
> pointer arithmetic happens here either.
> 
> iommu_resv_region has list as its first member, so offsetof is 0. That
> makes container_of(&vdev->resv_regions, struct iommu_resv_region, list)
> just (char *)&vdev->resv_regions - 0, i.e. the head with a different
> type. &next->list is the head again. Nothing reads next as an
> iommu_resv_region, so no pointer leaves its object.
> 
> The container_of would be undefined only if list was not the first
> member. Then the subtraction lands before the real object (C11 6.5.6).
> That is not the case here.
> 
> So this is cleanup, not a UB fix. The typed cursor points at the head but
> reads like an entry. That becomes a real bug the day someone reorders the
> struct or touches another field. The new pos stays a struct list_head *,
> which is just the head, so it avoids all of that. I should have said that
> in the message.
> 
> If you or Joerg prefer, I can resend the series with the wording fixed.

For me the patch is fine as is, I was just curious about what the kernel
expects for pointer arithmetic. Thanks for clarifying

Thanks,
Jean

^ permalink raw reply

* [mst-vhost:balloon 8/39] Warning: mm/page_alloc.c:7270 expecting prototype for alloc_contig_frozen_pages_user_noprof(). Prototype was for alloc_contig_frozen_pages_user() instead
From: kernel test robot @ 2026-06-05  9:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: oe-kbuild-all, kvm, virtualization, netdev

Hi Michael,

FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
head:   b145636d693114d343a0a47a4400f5b31e211fec
commit: 2edc4a8a3ed54fe5fd742e60ff3cba3b7dbd8316 [8/39] mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
config: nios2-allnoconfig (https://download.01.org/0day-ci/archive/20260605/202606051752.lZNueWoo-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260605/202606051752.lZNueWoo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606051752.lZNueWoo-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: mm/page_alloc.c:7270 expecting prototype for alloc_contig_frozen_pages_user_noprof(). Prototype was for alloc_contig_frozen_pages_user() instead

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 2/2] iommu/virtio: Avoid using the list iterator past the loop in viommu_add_resv_mem()
From: Maoyi Xie @ 2026-06-05  8:59 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: joro, will, robin.murphy, iommu, linux-kernel, virtualization
In-Reply-To: <20260604160449.GA1619938@myrica>

Hi Jean-Philippe,

On Fri, Jun 5, 2026 at 12:04 AM Jean-Philippe Brucker <jpb@kernel.org> wrote:
> Thank you for removing this hack, though I don't find a contract in the
> list_for_each_entry() doc, and the fix still accesses a cursor outside the
> loop. Since you mentioned C11 UB in another email, do you have more info
> on the precise operation which is undefined in the kernel (container_of
> into an invalid object or the &next->list addition)?  Just so I can avoid
> it in the future.

> Anyway, thanks for the patch. If this is just general cleanup that's fine
> too.

Thanks for the review. You're right.

There is no such contract in the docs. I overstated it. And no undefined
pointer arithmetic happens here either.

iommu_resv_region has list as its first member, so offsetof is 0. That
makes container_of(&vdev->resv_regions, struct iommu_resv_region, list)
just (char *)&vdev->resv_regions - 0, i.e. the head with a different
type. &next->list is the head again. Nothing reads next as an
iommu_resv_region, so no pointer leaves its object.

The container_of would be undefined only if list was not the first
member. Then the subtraction lands before the real object (C11 6.5.6).
That is not the case here.

So this is cleanup, not a UB fix. The typed cursor points at the head but
reads like an entry. That becomes a real bug the day someone reorders the
struct or touches another field. The new pos stays a struct list_head *,
which is just the head, so it avoids all of that. I should have said that
in the message.

If you or Joerg prefer, I can resend the series with the wording fixed.

Thanks,
Maoyi

^ permalink raw reply

* [mst-vhost:balloon 8/39] Warning: mm/page_alloc.c:7270 expecting prototype for alloc_contig_frozen_pages_user_noprof(). Prototype was for alloc_contig_frozen_pages_user() instead
From: kernel test robot @ 2026-06-05  7:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: llvm, oe-kbuild-all, kvm, virtualization, netdev

Hi Michael,

FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
head:   b145636d693114d343a0a47a4400f5b31e211fec
commit: 2edc4a8a3ed54fe5fd742e60ff3cba3b7dbd8316 [8/39] mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260605/202606050959.Rsoiib9n-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project f43d6834093b19baf79beda8c0337ab020ac5f17)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260605/202606050959.Rsoiib9n-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606050959.Rsoiib9n-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: mm/page_alloc.c:7270 expecting prototype for alloc_contig_frozen_pages_user_noprof(). Prototype was for alloc_contig_frozen_pages_user() instead

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [mst-vhost:balloon 12/38] page_alloc.c:undefined reference to `folio_zero_user'
From: kernel test robot @ 2026-06-05  6:00 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: oe-kbuild-all, kvm, virtualization, netdev

Hi Michael,

FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
head:   b145636d693114d343a0a47a4400f5b31e211fec
commit: c4af912c65bc897b31264a3403f18b0cf52fa10c [12/38] mm: use folio_zero_user for user pages in post_alloc_hook
config: m68k-allnoconfig (https://download.01.org/0day-ci/archive/20260605/202606051337.Pac8xsTQ-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260605/202606051337.Pac8xsTQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606051337.Pac8xsTQ-lkp@intel.com/

All errors (new ones prefixed by >>):

   m68k-linux-ld: mm/page_alloc.o: in function `post_alloc_hook':
>> page_alloc.c:(.text+0x3b42): undefined reference to `folio_zero_user'
   m68k-linux-ld: mm/page_alloc.o: in function `get_page_from_freelist':
   page_alloc.c:(.text+0x546a): undefined reference to `folio_zero_user'
>> m68k-linux-ld: page_alloc.c:(.text+0x550c): undefined reference to `folio_zero_user'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH] tools/virtio: check mmap return value in vringh_test
From: longlong yan @ 2026-06-05  2:14 UTC (permalink / raw)
  To: mst, jasowang
  Cc: virtualization, xuanzhuo, eperezma, linux-kernel, longlong yan

In parallel_test(), the return values of mmap() for both host_map and
guest_map are not checked against MAP_FAILED. If mmap() fails, the
subsequent code will dereference the invalid pointer, leading to a
segmentation fault.

Add MAP_FAILED checks after both mmap() calls, using err() to report
the error and exit, consistent with the existing error handling style
in this file (e.g., the open() call on line 149).

Fixes: 1515c5ce26ae("tools/virtio: add vring_test.")
Signed-off-by: longlong yan <yanlonglong@kylinos.cn>
---
 tools/virtio/vringh_test.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
index b9591223437a..8cc5ca6c7cca 100644
--- a/tools/virtio/vringh_test.c
+++ b/tools/virtio/vringh_test.c
@@ -159,7 +159,12 @@ static int parallel_test(u64 features,
 
 	/* Parent and child use separate addresses, to check our mapping logic! */
 	host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+	if (host_map == MAP_FAILED)
+		err(1, "mmap host_map");
+	
 	guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+	if (guest_map == MAP_FAILED)
+		err(1, "mmap guest_map");
 
 	pipe_ret = pipe(to_guest);
 	assert(!pipe_ret);
-- 
2.43.0


^ permalink raw reply related

* [mst-vhost:balloon 12/38] ld.lld: error: undefined symbol: folio_zero_user
From: kernel test robot @ 2026-06-05  1:23 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: llvm, oe-kbuild-all, kvm, virtualization, netdev

Hi Michael,

FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
head:   b145636d693114d343a0a47a4400f5b31e211fec
commit: c4af912c65bc897b31264a3403f18b0cf52fa10c [12/38] mm: use folio_zero_user for user pages in post_alloc_hook
config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20260605/202606050942.HHhnHpHF-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 7917772d7d61384696c61102c08c2ea158e610fa)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260605/202606050942.HHhnHpHF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606050942.HHhnHpHF-lkp@intel.com/

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: folio_zero_user
   >>> referenced by page_alloc.c
   >>>               mm/page_alloc.o:(post_alloc_hook) in archive vmlinux.a
   >>> referenced by page_alloc.c
   >>>               mm/page_alloc.o:(get_page_from_freelist) in archive vmlinux.a

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH net] vhost/net: complete zerocopy ubufs only once
From: Michael S. Tsirkin @ 2026-06-04 23:14 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Qing Ming, Jason Wang, Eugenio Pérez, Shirley,
	David S. Miller, kvm, virtualization, netdev, linux-kernel
In-Reply-To: <b6182472-503c-43e9-a850-87cd40f86aa8@redhat.com>

On Thu, Jun 04, 2026 at 12:55:49PM +0200, Paolo Abeni wrote:
> On 6/1/26 12:43 PM, Qing Ming wrote:
> > vhost-net initializes one ubuf_info per outstanding zerocopy TX
> > descriptor and hands it to the backend socket.  The networking stack may
> > then clone a zerocopy skb before all skb references are released.  For
> > example, batman-adv fragmentation reaches skb_split(), which calls
> > skb_zerocopy_clone() and increments the same ubuf_info refcount.
> > 
> > vhost_zerocopy_complete() currently treats every ubuf callback as a
> > completed vhost descriptor.  It dereferences ubuf->ctx, writes the
> > descriptor completion state, and drops the vhost_net_ubuf_ref even when
> > the callback only releases a cloned skb reference.  A backend reset can
> > therefore wait for and free the vhost_net_ubuf_ref while another cloned
> > skb still carries the same ubuf_info.  A later completion then
> > dereferences the freed ubufs pointer.
> > 
> > KASAN reports the stale completion as:
> > 
> >   BUG: KASAN: slab-use-after-free in vhost_zerocopy_complete+0x1d7/0x1f0
> >   BUG: KASAN: slab-use-after-free in vhost_zerocopy_complete+0x101/0x1f0
> >   vhost_zerocopy_complete
> >   skb_copy_ubufs
> >   __dev_forward_skb2
> >   veth_xmit
> > 
> > The freed object was allocated from vhost_net_ioctl() while setting the
> > backend and freed through kfree_rcu()/kvfree_rcu_bulk after backend
> > removal, while delayed skb completion still reached
> > vhost_zerocopy_complete().
> > 
> > Honor the generic ubuf_info refcount before touching vhost state, and run
> > the vhost descriptor completion only for the final ubuf reference.  This
> > matches the msg_zerocopy_complete() ownership rule for cloned zerocopy
> > skbs.
> > 
> > Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
> > Signed-off-by: Qing Ming <a0yami@mailbox.org>
> 
> The patch LGTM.
> 
> @Michael: to you want to take it via your tree?
> 
> /P


I wan't copied) Alright then.



^ permalink raw reply

* [PATCH] can: virtio: Fix comment in UAPI header
From: Nathan Chancellor @ 2026-06-04 22:57 UTC (permalink / raw)
  To: Harald Mommer, Matias Ezequiel Vara Larsen, Michael S. Tsirkin,
	Jason Wang
  Cc: Xuan Zhuo, Eugenio Pérez, Mikhail Golubev-Ciuchea,
	Marc Kleine-Budde, Francesco Valla, virtualization, linux-can,
	linux-kernel, llvm, Nathan Chancellor

When compile testing the UAPI headers with clang, there is an warning turned
error for using a C++ style ('//') comment, which is explicitly forbidden for
UAPI headers.

  In file included from <built-in>:1:
  ./usr/include/linux/virtio_can.h:29:35: error: // comments are not allowed in this language [-Werror,-Wcomment]
     29 | #define VIRTIO_CAN_MAX_DLEN    64 // this is like CANFD_MAX_DLEN
        |                                   ^
  1 error generated.

Switch to a standard C style comment.

Fixes: 2b6b4bb7d96f ("can: virtio: Add virtio CAN driver")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 include/uapi/linux/virtio_can.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_can.h b/include/uapi/linux/virtio_can.h
index 08d7e3e78776..e054d5099241 100644
--- a/include/uapi/linux/virtio_can.h
+++ b/include/uapi/linux/virtio_can.h
@@ -26,7 +26,7 @@
 #define VIRTIO_CAN_FLAGS_FD             0x4000
 #define VIRTIO_CAN_FLAGS_RTR            0x2000
 
-#define VIRTIO_CAN_MAX_DLEN    64 // this is like CANFD_MAX_DLEN
+#define VIRTIO_CAN_MAX_DLEN    64 /* this is like CANFD_MAX_DLEN */
 
 struct virtio_can_config {
 #define VIRTIO_CAN_S_CTRL_BUSOFF (1u << 0) /* Controller BusOff */

---
base-commit: 7a85231f762aa97b945878abb9a26683486836c6
change-id: 20260604-virtio_can-fix-uapi-comment-bcafa08b4b8a

Best regards,
--  
Cheers,
Nathan


^ permalink raw reply related

* [CfP] Confidential Computing Microconference (LPC 2026)
From: Jörg Rödel @ 2026-06-04 18:29 UTC (permalink / raw)
  To: linux-coco, linux-kernel, kvm, virtualization, coconut-svsm,
	linux-sgx
  Cc: Dhaval Giani

Hi everyone,

We are pleased to officially open the Call for Presentations for the 2026
Confidential Computing Microconference at LPC this year. LPC will take place
from October 5th to 7th in Prague.

Our goal is to bring open-source developers and industry experts together for
productive discussions that lead to concrete solutions. We are looking for
interactive discussions, ongoing development topics, and problem-solving
sessions rather than static status updates.

We are looking for proposals covering, but not limited to, the following
developments and challenges:

	* Enhancements to CVM memory backing via `guest_memfd`
	* KVM Support for ARM CCA
	* Privilege separation features in KVM
	* CVM live migration
	* Secure VM Service Module (SVSM) architecture and Linux support
	* Trusted I/O software architecture
	* Solutions for the full CVM (remote) attestation problem
	* Linux as a CVM operating system across hypervisors
	* CVM Performance optimization and benchmarking

If you are working on any of these areas or have another critical Confidential
Computing topic that requires community alignment, please submit your proposal!

LPC microconferences are built around discussion and collaboration. Proposals
should focus on open problems, architectural roadblocks, or design choices that
would benefit from in-person feedback from the community.

* Submit here:		https://lpc.events/event/20/abstracts/
* Submission Deadline:	August 7, 2026

Make sure to select "Confidential Computing MC" as the track! This year the LPC
organization committee will grant pre-registration vouchers to anyone who has
submitted a topic. These are at the usual price ($600) which must be used
before registration opens. If your topic is not accepted you should be eligible
for a refund if your employer doesn’t approve your travel. For more details see

	https://lpc.events/blog/current/index.php/2026/04/06/changes-to-registration-availability-for-2026/

Looking forward to seeing you there,

- Dhaval and Joerg

^ permalink raw reply

* [mst-vhost:balloon 8/38] Warning: mm/page_alloc.c:7002 function parameter 'user_addr' not described in '__alloc_contig_frozen_range'
From: kernel test robot @ 2026-06-04 16:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: llvm, oe-kbuild-all, kvm, virtualization, netdev

Hi Michael,

FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git balloon
head:   ffd91da19b20a15bbcff7438ef1ad0031c110395
commit: c2b251ca61f1a3911a1c9ea9d1a5511a4da1aa1c [8/38] mm: add alloc_contig_frozen_pages_user for cache-friendly zeroing
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260604/202606041824.6UhiUn3N-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project f43d6834093b19baf79beda8c0337ab020ac5f17)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260604/202606041824.6UhiUn3N-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606041824.6UhiUn3N-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: mm/page_alloc.c:7002 function parameter 'user_addr' not described in '__alloc_contig_frozen_range'
>> Warning: mm/page_alloc.c:7002 expecting prototype for alloc_contig_frozen_range(). Prototype was for __alloc_contig_frozen_range() instead
>> Warning: mm/page_alloc.c:7266 function parameter 'user_addr' not described in 'alloc_contig_frozen_pages_user'
>> Warning: mm/page_alloc.c:7266 expecting prototype for alloc_contig_frozen_pages(). Prototype was for alloc_contig_frozen_pages_user() instead

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [mst-vhost:vhost 5/37] ./usr/include/linux/virtio_can.h:29:35: error: // comments are not allowed in this language
From: kernel test robot @ 2026-06-04 16:31 UTC (permalink / raw)
  To: Matias Ezequiel Vara Larsen
  Cc: llvm, oe-kbuild-all, kvm, virtualization, netdev,
	Michael S. Tsirkin, Harald Mommer, Mikhail Golubev-Ciuchea,
	Marc Kleine-Budde, Francesco Valla

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   7a85231f762aa97b945878abb9a26683486836c6
commit: 2b6b4bb7d96f407e42bfe28c4cd9ce8f7a59ab69 [5/37] can: virtio: Add virtio CAN driver
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20260604/202606041812.lddXUy2x-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project f43d6834093b19baf79beda8c0337ab020ac5f17)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260604/202606041812.lddXUy2x-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606041812.lddXUy2x-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <built-in>:1:
>> ./usr/include/linux/virtio_can.h:29:35: error: // comments are not allowed in this language [-Werror,-Wcomment]
      29 | #define VIRTIO_CAN_MAX_DLEN    64 // this is like CANFD_MAX_DLEN
         |                                   ^
   1 error generated.

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH 2/2] iommu/virtio: Avoid using the list iterator past the loop in viommu_add_resv_mem()
From: Jean-Philippe Brucker @ 2026-06-04 16:04 UTC (permalink / raw)
  To: Maoyi Xie; +Cc: joro, will, robin.murphy, iommu, linux-kernel, virtualization
In-Reply-To: <20260604051816.2976221-3-maoyixie.tju@gmail.com>

On Thu, Jun 04, 2026 at 01:18:16PM +0800, Maoyi Xie wrote:
> viommu_add_resv_mem() walks vdev->resv_regions to find the insertion
> point. When every element has a smaller start address, the
> list_for_each_entry() iterator ends up one past the last entry, and
> &next->list then aliases the list head, so the following list_add_tail()
> still appends at the tail. The result is correct, but using the iterator
> after the loop is undefined per the list_for_each_entry() contract.

Thank you for removing this hack, though I don't find a contract in the
list_for_each_entry() doc, and the fix still accesses a cursor outside the
loop. Since you mentioned C11 UB in another email, do you have more info
on the precise operation which is undefined in the kernel (container_of
into an invalid object or the &next->list addition)?  Just so I can avoid
it in the future.

Anyway, thanks for the patch. If this is just general cleanup that's fine
too.

Reviewed-by: Jean-Philippe Brucker <jpb@kernel.org>

> 
> The loop only needs a list_head as the insertion point, so iterate with
> list_for_each() and keep the typed list_entry() dereference inside the loop
> body. No functional change.
> 
> Suggested-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>

> ---
>  drivers/iommu/virtio-iommu.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 587fc13197f1..1d58d6b626a5 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -486,7 +486,8 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  	size_t size;
>  	u64 start64, end64;
>  	phys_addr_t start, end;
> -	struct iommu_resv_region *region = NULL, *next;
> +	struct iommu_resv_region *region = NULL;
> +	struct list_head *pos;
>  	unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>  
>  	start = start64 = le64_to_cpu(mem->start);
> @@ -520,11 +521,14 @@ static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  		return -ENOMEM;
>  
>  	/* Keep the list sorted */
> -	list_for_each_entry(next, &vdev->resv_regions, list) {
> +	list_for_each(pos, &vdev->resv_regions) {
> +		struct iommu_resv_region *next =
> +			list_entry(pos, struct iommu_resv_region, list);
> +
>  		if (next->start > region->start)
>  			break;
>  	}
> -	list_add_tail(&region->list, &next->list);
> +	list_add_tail(&region->list, pos);
>  	return 0;
>  }
>  
> -- 
> 2.34.1
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox