Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* [BUG] crypto: virtio - KASAN slab-use-after-free in virtio_crypto_skcipher_encrypt
From: Shuangpeng Bai @ 2026-06-15  2:10 UTC (permalink / raw)
  To: arei.gonglei, mst, jasowang, xuanzhuo, eperezma, herbert, davem,
	virtualization, linux-crypto, linux-kernel

Hi,

I hit the following KASAN report while testing current upstream kernel.

The issue was reproduced by queuing an AF_ALG skcipher request backed by
virtio-crypto, unbinding virtio0 from the virtio_crypto driver, and then
receiving from the old AF_ALG op fd.

KASAN: slab-use-after-free in virtio_crypto_skcipher_encrypt

I reproduced this on commit: e8c2f9fdadee7cbc75134dc463c1e0d856d6e5c7 (May 25 2026)

The reproducer and .config files are here.
https://gist.github.com/shuangpengbai/f6117a0883dd574f02288ca812bb7d65

I'm happy to test debug patches or provide additional information.

Reported-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>

[   54.367992][ T8332] BUG: KASAN: slab-use-after-free in virtio_crypto_skcipher_encrypt (drivers/crypto/virtio/virtio_crypto_skcipher_algs.c:473)
[   54.369596][ T8332] Read of size 8 at addr ffff888124a47010 by task virtio_crypto_a/8332
[   54.370922][ T8332]
[   54.371171][ T8332] Tainted: [W]=WARN
[   54.371172][ T8332] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   54.371175][ T8332] Call Trace:
[   54.371179][ T8332]  <TASK>
[   54.371181][ T8332]  dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
[   54.371188][ T8332]  print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
[   54.371202][ T8332]  kasan_report (mm/kasan/report.c:595)
[   54.371213][ T8332]  virtio_crypto_skcipher_encrypt (drivers/crypto/virtio/virtio_crypto_skcipher_algs.c:473)
[   54.371216][ T8332]  skcipher_recvmsg (crypto/algif_skcipher.c:203 crypto/algif_skcipher.c:226)
[   54.371249][ T8332]  sock_recvmsg (net/socket.c:1137 net/socket.c:1159)
[   54.371253][ T8332]  __sys_recvfrom (net/socket.c:2315)
[   54.371273][ T8332]  __x64_sys_recvfrom (net/socket.c:2330 net/socket.c:2326 net/socket.c:2326)
[   54.371277][ T8332]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[   54.371281][ T8332]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
[   54.371285][ T8332] RIP: 0033:0x7f3c6caaac2c
[   54.371289][ T8332] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 19 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 64 c3 0f 1f 00 55 48 83 ec 20 48 89 54 24 10
[   54.371292][ T8332] RSP: 002b:00007ffed3785308 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
[   54.371297][ T8332] RAX: ffffffffffffffda RBX: 0000000000000064 RCX: 00007f3c6caaac2c
[   54.371299][ T8332] RDX: 0000000000000040 RSI: 00007ffed37853a0 RDI: 0000000000000004
[   54.371301][ T8332] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[   54.371303][ T8332] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[   54.371305][ T8332] R13: 00007ffed37853a0 R14: 0000558cc9904118 R15: 0000000000000000
[   54.371309][ T8332]  </TASK>
[   54.371311][ T8332]
[   54.394932][ T8332] Freed by task 8332 on cpu 0 at 54.364772s:
[   54.395528][ T8332]  kasan_save_track (mm/kasan/common.c:57 mm/kasan/common.c:78)
[   54.395997][ T8332]  kasan_save_free_info (mm/kasan/generic.c:584)
[   54.396501][ T8332]  __kasan_slab_free (mm/kasan/common.c:253 mm/kasan/common.c:285)
[   54.396983][ T8332]  kfree (include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6251 mm/slub.c:6566)
[   54.397378][ T8332]  virtio_dev_remove (drivers/virtio/virtio.c:375)
[   54.397869][ T8332]  device_release_driver_internal (drivers/base/dd.c:619 drivers/base/dd.c:1352 drivers/base/dd.c:1375)
[   54.398475][ T8332]  unbind_store (drivers/base/bus.c:244)
[   54.398944][ T8332]  kernfs_fop_write_iter (fs/kernfs/file.c:352)
[   54.399476][ T8332]  vfs_write (fs/read_write.c:595 fs/read_write.c:688)
[   54.399915][ T8332]  ksys_write (fs/read_write.c:740)
[   54.400349][ T8332]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[   54.400818][ T8332]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
[   54.401406][ T8332]
[   54.401650][ T8332] The buggy address belongs to the object at ffff888124a47000
[   54.401650][ T8332]  which belongs to the cache kmalloc-192 of size 192
[   54.403038][ T8332] The buggy address is located 16 bytes inside of
[   54.403038][ T8332]  freed 192-byte region [ffff888124a47000, ffff888124a470c0)
[   54.404385][ T8332]


Best,
Shuangpeng

^ permalink raw reply

* [BUG] KASAN: slab-use-after-free in port_fops_splice_write
From: Shuangpeng Bai @ 2026-06-14 22:29 UTC (permalink / raw)
  To: amit, arnd, gregkh, virtualization, linux-kernel

Hi Kernel Maintainers,

I hit the following report while testing current upstream kernel:

KASAN: slab-use-after-free in port_fops_splice_write

I reproduced this on commit: e8c2f9fdadee7cbc75134dc463c1e0d856d6e5c7 (May 25 2026)

The reproducer and .config files are here.
https://gist.github.com/shuangpengbai/d21a9a20a05cd840b99ea90101888879

I'm happy to test debug patches or provide additional information.

Reported-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>

[   78.120298][ T8329] BUG: KASAN: slab-use-after-free in port_fops_splice_write (drivers/char/virtio_console.c:922)
[   78.121417][ T8329] Read of size 8 at addr ffff8881158d7020 by task virtconsole_spl/8329
[   78.122247][ T8329]
[   78.122502][ T8329] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   78.122505][ T8329] Call Trace:
[   78.122508][ T8329]  <TASK>
[   78.122510][ T8329]  dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
[   78.123695][ T8329]  print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
[   78.123714][ T8329]  kasan_report (mm/kasan/report.c:595)
[   78.123722][ T8329]  port_fops_splice_write (drivers/char/virtio_console.c:922)
[   78.123754][ T8329]  do_splice (fs/splice.c:936 fs/splice.c:1349)
[   78.123773][ T8329]  __se_sys_splice (fs/splice.c:1431 fs/splice.c:1634 fs/splice.c:1616)
[   78.123782][ T8329]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[   78.123795][ T8329]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
[   78.123799][ T8329] RIP: 0033:0x7f875ebb2f29
[   78.123805][ T8329] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 37 8f 0d 00 f7 d8 64 89 01 48
[   78.123808][ T8329] RSP: 002b:00007ffdd2abad88 EFLAGS: 00000206 ORIG_RAX: 0000000000000113
[   78.123815][ T8329] RAX: ffffffffffffffda RBX: 00007f875eabb6c0 RCX: 00007f875ebb2f29
[   78.123818][ T8329] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
[   78.123819][ T8329] RBP: 00007ffdd2abbdb0 R08: 0000000000001000 R09: 0000000000000000
[   78.123822][ T8329] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[   78.123823][ T8329] R13: 00007ffdd2abbdd7 R14: 0000000000000000 R15: 0000000000000000
[   78.123828][ T8329]  </TASK>
[   78.123830][ T8329]
[   78.144834][ T8329] Freed by task 8329 on cpu 1 at 78.116533s:
[   78.145445][ T8329]  kasan_save_track (mm/kasan/common.c:57 mm/kasan/common.c:78)
[   78.145926][ T8329]  kasan_save_free_info (mm/kasan/generic.c:584)
[   78.146445][ T8329]  __kasan_slab_free (mm/kasan/common.c:253 mm/kasan/common.c:285)
[   78.146936][ T8329]  kfree (include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6251 mm/slub.c:6566)
[   78.147342][ T8329]  vp_del_vqs (drivers/virtio/virtio_pci_common.c:259 drivers/virtio/virtio_pci_common.c:285)
[   78.147779][ T8329]  remove_vqs (drivers/char/virtio_console.c:1895)
[   78.148226][ T8329]  virtcons_remove (drivers/char/virtio_console.c:1939)
[   78.148717][ T8329]  virtio_dev_remove (drivers/virtio/virtio.c:375)
[   78.149220][ T8329]  device_release_driver_internal (drivers/base/dd.c:619 drivers/base/dd.c:1352 drivers/base/dd.c:1375)
[   78.149821][ T8329]  unbind_store (drivers/base/bus.c:244)
[   78.150261][ T8329]  kernfs_fop_write_iter (fs/kernfs/file.c:352)
[   78.150762][ T8329]  vfs_write (fs/read_write.c:595 fs/read_write.c:688)
[   78.151167][ T8329]  ksys_write (fs/read_write.c:740)
[   78.151579][ T8329]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[   78.152019][ T8329]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
[   78.152581][ T8329]
[   78.152811][ T8329] The buggy address belongs to the object at ffff8881158d7000
[   78.152811][ T8329]  which belongs to the cache kmalloc-256 of size 256
[   78.154115][ T8329] The buggy address is located 32 bytes inside of
[   78.154115][ T8329]  freed 256-byte region [ffff8881158d7000, ffff8881158d7100)
[   78.155397][ T8329]


Best,
Shuangpeng

^ permalink raw reply

* Re: [PATCH v1] s390/virtio_ccw: Also suppress -EINVAL on device detach
From: Halil Pasic @ 2026-06-14 22:23 UTC (permalink / raw)
  To: William Bezenah
  Cc: linux-s390, cohuck, farman, hca, gor, agordeev, borntraeger,
	svens, mjrosato, vneethv, oberpar, virtualization, kvm,
	linux-kernel, Halil Pasic
In-Reply-To: <20260612155407.199218-1-wbezenah@linux.ibm.com>

On Fri, 12 Jun 2026 17:54:07 +0200
William Bezenah <wbezenah@linux.ibm.com> wrote:

> Since commit 8c58a229688c ("s390/cio: Do not unregister the
> subchannel based on DNV"), subchannel behavior following a device
> detach has been updated and results in -EINVAL being propagated
> rather than -ENODEV, originating from ccw_device_start_timeout_key()
> in cio/device_ops. In the end, the virtio driver has no ability to
> react to the difference between device and subchannel states here,
> and during detach, both -ENODEV and -EINVAL indicate the device
> cannot be used and should not be treated as errors requiring
> attention. Update error handling in virtio_ccw_del_vq() and
> virtio_ccw_drop_indicator() to suppress -EINVAL in addition to
> -ENODEV.

Hi William!

Are you saying that ccw_device_start() started returning -EINVAL
since 8c58a229688c ("s390/cio: Do not unregister the subchannel based on
DNV")? Or did I somehow read the paragraph wrong?

The funcition ccw_device_start is documented to return:
 * Returns:                                                                     
 *  %0, if the operation was successful;                                        
 *  -%EBUSY, if the device is busy, or status pending;                          
 *  -%EACCES, if no path specified in @lpm is operational;                      
 *  -%ENODEV, if the device is not operational. 
and the commit message does not say a thing about introducing -EINVAL to
the mix.

Regards,
Halil 

^ permalink raw reply

* [BUG] KASAN: slab-use-after-free in mutex_lock from vduse
From: Shuangpeng Bai @ 2026-06-14 22:11 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo, eperezma, xieyongji, virtualization,
	linux-kernel

Hi Kernel Maintainers,

I hit the following report while testing current upstream kernel:

KASAN: slab-use-after-free in mutex_lock from vduse

I reproduced this on commit: e8c2f9fdadee7cbc75134dc463c1e0d856d6e5c7 (May 25 2026)

To help trigger the bug more reliably, we applied a minimal diagnostic patch
that only adds delays and print statements.

The reproducer and .config files are here.
https://gist.github.com/shuangpengbai/947d604f1b5d86d8b7a3c7a4000455ad

I'm happy to test debug patches or provide additional information.

Reported-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>

[  111.459938][ T8370] BUG: KASAN: slab-use-after-free in mutex_lock (include/linux/instrumented.h:112 include/linux/atomic/atomic-instrumented.h:4456 kernel/locking/mutex.c:161 kernel/locking/mutex.c:318)
[  111.460790][ T8370] Write of size 8 at addr ffff88811bb7a028 by task vduse_open_dest/8370
[  111.461901][ T8370]
[  111.462229][ T8370] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  111.462233][ T8370] Call Trace:
[  111.462238][ T8370]  <TASK>
[  111.462240][ T8370]  dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
[  111.462249][ T8370]  print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
[  111.462265][ T8370]  kasan_report (mm/kasan/report.c:595)
[  111.462285][ T8370]  kasan_check_range (mm/kasan/generic.c:? mm/kasan/generic.c:200)
[  111.462290][ T8370]  mutex_lock (include/linux/instrumented.h:112 include/linux/atomic/atomic-instrumented.h:4456 kernel/locking/mutex.c:161 kernel/locking/mutex.c:318)
[  111.462311][ T8370]  vduse_dev_open (drivers/vdpa/vdpa_user/vduse_dev.c:1663)
[  111.462327][ T8370]  chrdev_open (fs/char_dev.c:411)
[  111.462347][ T8370]  do_dentry_open (fs/open.c:947)
[  111.462356][ T8370]  vfs_open (fs/open.c:1079)
[  111.462361][ T8370]  path_openat (fs/namei.c:4699 fs/namei.c:4858)
[  111.462384][ T8370]  do_file_open (fs/namei.c:4887)
[  111.462436][ T8370]  do_sys_openat2 (fs/open.c:1364)
[  111.462462][ T8370]  __x64_sys_openat (fs/open.c:1370 fs/open.c:1386 fs/open.c:1381 fs/open.c:1381)
[  111.462479][ T8370]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[  111.462486][ T8370]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
[  111.462492][ T8370] RIP: 0033:0x7facc56097e4
[  111.462497][ T8370] Code: 84 00 00 00 00 00 44 89 54 24 0c e8 36 f5 ff ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 89 44 24 0c e8 68 f5 ff ff 8b 44
[  111.462502][ T8370] RSP: 002b:00007facc2418e70 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[  111.462509][ T8370] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007facc56097e4
[  111.462512][ T8370] RDX: 0000000000080002 RSI: 00007ffe00ff5eb0 RDI: 00000000ffffff9c
[  111.462515][ T8370] RBP: 00007ffe00ff5eb0 R08: 0000000000000000 R09: 00007facc2419700
[  111.462518][ T8370] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080002
[  111.462520][ T8370] R13: 00007ffe00ff5ccf R14: 00007facc2418fc0 R15: 0000000000802000
[  111.462526][ T8370]  </TASK>
[  111.462528][ T8370]
[  111.465710][ T8370] Freed by task 8361 on cpu 1 at 111.442076s:
[  111.465718][ T8370]  kasan_save_track (mm/kasan/common.c:57 mm/kasan/common.c:78)
[  111.467048][ T8370]  kasan_save_free_info (mm/kasan/generic.c:584)
[  111.467993][ T8370]  __kasan_slab_free (mm/kasan/common.c:253 mm/kasan/common.c:285)
[  111.469000][ T8370]  kfree (include/linux/kasan.h:235 mm/slub.c:2689 mm/slub.c:6251 mm/slub.c:6566)
[  111.469887][ T8370]  vduse_ioctl (drivers/vdpa/vdpa_user/vduse_dev.c:1866 drivers/vdpa/vdpa_user/vduse_dev.c:1908 drivers/vdpa/vdpa_user/vduse_dev.c:2208)
[  111.470848][ T8370]  __se_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:597 fs/ioctl.c:583)
[  111.471775][ T8370]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
[  111.472940][ T8370]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
[  111.473875][ T8370]
[  111.475022][ T8370] The buggy address belongs to the object at ffff88811bb7a000
[  111.475022][ T8370]  which belongs to the cache kmalloc-512 of size 512
[  111.475948][ T8370] The buggy address is located 40 bytes inside of
[  111.475948][ T8370]  freed 512-byte region [ffff88811bb7a000, ffff88811bb7a200)
[  111.476936][ T8370]


Best,
Shuangpeng

^ permalink raw reply

* Re: [PATCH net v3] virtio-net: fix len check in receive_big()
From: Michael S. Tsirkin @ 2026-06-14 19:29 UTC (permalink / raw)
  To: Xiang Mei
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <CAPpSM+Q=NM0WeBmZvyOEkyx73VU10AOHQjPbWDSet675B7AnCA@mail.gmail.com>

On Sat, Jun 13, 2026 at 01:15:02PM -0700, Xiang Mei wrote:
> On Wed, Jun 10, 2026 at 10:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jun 10, 2026 at 07:46:16PM -0700, Xiang Mei wrote:
> > > receive_big() bounds the device-announced length by
> > > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > > add_recvbuf_big() sets sg[1] to start at offset
> > > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > > check allows for the common hdr_len == 12 case.
> > >
> > > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > > then walks one frag past the page chain, storing a NULL page->private
> > > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > > write past the static frag array and a NULL frag handed up the rx path.
> > >
> > > Bound len by the size add_recvbuf_big() actually advertised.
> > >
> > > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > Thanks for the patch! Something small to improve:
> >
> > > ---
> > > v3: revoke 2/2 and add Xuan Zhuo's Reviewed-by tag
> > >
> > >  drivers/net/virtio_net.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index f4adcfee7a80..afe73eda1491 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
> > >                                  struct virtnet_rq_stats *stats)
> > >  {
> > >       struct page *page = buf;
> > > +     unsigned long max_len;
> >
> > Assignment can happen here?
> >
> > >       struct sk_buff *skb;
> > >
> > >       /* Make sure that len does not exceed the size allocated in
> > >        * add_recvbuf_big.
> > >        */
> > > -     if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > > +     max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> > > +               vi->big_packets_num_skbfrags * PAGE_SIZE;
> >
> > Took me a while to figure out what is going on, but I finally
> > understand:
> >
> >
> > Reducing
> > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE
> >
> > (what we allocated)
> >
> > by sizeof(struct padded_vnet_hdr) - vi->hdr_len
> >
> >
> > right?
> >
> > So clearer as:
> >
> >
> >         unsigned long max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE -
> >         sizeof(struct padded_vnet_hdr) + vi->hdr_len;
> >
> Right, that's the same value. Yours reads better!
> 
> I'll fold this into the next respin. One thing I'd like to settle
> first: David suggested storing this in a vi field computed once at the
> probe (it's a per-device constant) and just comparing len against it
> on the datapath, instead of re-deriving it in receive_big() each time.
> I'll wait for his take on that and send a single v4 that covers both.
> 
> Xiang

I don't mind.

> >
> >
> >
> > > +     if (unlikely(len > max_len)) {
> > >               pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > > -                      dev->name, len,
> > > -                      (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > > +                      dev->name, len, max_len);
> > >               goto err;
> > >       }
> > >
> > > --
> > > 2.43.0
> >


^ permalink raw reply

* [PATCH v2] vsock/virtio: rework MSG_ZEROCOPY flag handling
From: Arseniy Krasnov @ 2026-06-14 17:47 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman, Xuan Zhuo, Eugenio Pérez,
	Simon Horman
  Cc: kvm, virtualization, netdev, linux-kernel, oxffffaa, rulkc,
	Arseniy Krasnov

Logically it was based on TCP implementation, so make further support
easier, rewrite it in the TCP way.

Signed-off-by: Arseniy Krasnov <avkrasnov@rulkc.org>
---
 Changelog v1->v2:
 * Rebase on last 'net-next'. Don't need 'skb_zcopy_set()' now - it was
   already added.

 net/vmw_vsock/virtio_transport_common.c | 48 ++++++++++++-------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 09475007165b..787524b8cb44 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -328,38 +328,36 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
 		return pkt_len;
 
-	if (info->msg) {
-		/* If zerocopy is not enabled by 'setsockopt()', we behave as
-		 * there is no MSG_ZEROCOPY flag set.
+	if (info->msg && (info->msg->msg_flags & MSG_ZEROCOPY)) {
+		/* If 'info->msg' is not NULL, this is only VIRTIO_VSOCK_OP_RW.
+		 * 'MSG_ZEROCOPY' flag handling here is based on the same flag
+		 * handling from 'tcp_sendmsg_locked()'.
 		 */
-		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
-			info->msg->msg_flags &= ~MSG_ZEROCOPY;
+		if (info->msg->msg_ubuf) {
+			uarg = info->msg->msg_ubuf;
+			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
+		} else if (sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY)) {
+			uarg = msg_zerocopy_realloc(sk_vsock(vsk), pkt_len,
+						    NULL, false);
+			if (!uarg) {
+				virtio_transport_put_credit(vvs, pkt_len);
+				return -ENOMEM;
+			}
 
-		if (info->msg->msg_flags & MSG_ZEROCOPY)
 			can_zcopy = virtio_transport_can_zcopy(t_ops, info, pkt_len);
 
+			if (!can_zcopy)
+				uarg_to_msgzc(uarg)->zerocopy = 0;
+
+			have_uref = true;
+		}
+
+		/* 'can_zcopy' means that this transmission will be
+		 * in zerocopy way (e.g. using 'frags' array).
+		 */
 		if (can_zcopy)
 			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
 					    (MAX_SKB_FRAGS * PAGE_SIZE));
-
-		if (info->msg->msg_flags & MSG_ZEROCOPY &&
-		    info->op == VIRTIO_VSOCK_OP_RW) {
-			uarg = info->msg->msg_ubuf;
-
-			if (!uarg) {
-				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
-							    pkt_len, NULL, false);
-				if (!uarg) {
-					virtio_transport_put_credit(vvs, pkt_len);
-					return -ENOMEM;
-				}
-
-				if (!can_zcopy)
-					uarg_to_msgzc(uarg)->zerocopy = 0;
-
-				have_uref = true;
-			}
-		}
 	}
 
 	rest_len = pkt_len;
-- 
2.25.1


^ permalink raw reply related

* [PATCH v4] hwrng: virtio: clamp device-reported used.len at copy_data()
From: Michael Bommarito @ 2026-06-14 16:40 UTC (permalink / raw)
  To: Olivia Mackall, Herbert Xu, linux-crypto
  Cc: Michael S . Tsirkin, Jason Wang, Kees Cook, Christian Borntraeger,
	virtualization, linux-kernel

copy_data() trusts the device-reported used.len stored in vi->data_avail
and memcpy()s that many bytes out of the inline vi->data buffer without
bounding it against sizeof(vi->data) (SMP_CACHE_BYTES, typically 32 or
64).  A malicious or buggy virtio-rng backend can report a used.len past
the buffer and steer the memcpy() into adjacent slab memory;
hwrng_fillfn() then mixes those bytes into the guest RNG and guest root
can read them back via /dev/hwrng.  No guest userspace action is required
to first trigger the read.

Clamp data_avail to sizeof(vi->data) at point of use and bail if the
running index has already reached the clamped bound.  Same class as
commit c04db81cd028 ("net/9p: Fix buffer overflow in USB transport
layer").

Fixes: f7f510ec1957 ("virtio: An entropy device, as suggested by hpa.")
Cc: stable@vger.kernel.org
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-8
---
KASAN on a v7.1-rc4 guest whose backend reports used.len = 0x10000:

  BUG: KASAN: slab-out-of-bounds in virtio_read+0x394/0x5d0
  Read of size 64 at addr ffff88800ae0ba20 by task hwrng/52
   __asan_memcpy+0x23/0x60
   virtio_read+0x394/0x5d0
   hwrng_fillfn+0xb2/0x470
  located 0 bytes to the right of the 544-byte kmalloc-1k region.

With the clamp the same harness boots clean: copy_data() returns 0 for
the bogus report and the driver reissues the request.

Confidential-compute angle: a malicious hypervisor plus compromised guest
root could use /dev/hwrng as a guest-kernel heap leak channel, though
SEV-SNP/TDX guests usually disable virtio-rng.  The memory-safety fix is
worth carrying regardless.

Changes in v4:
- Drop array_index_nospec() on vi->data_idx (and linux/nospec.h) per
  Herbert Xu and Michael S. Tsirkin: data_idx is driver-maintained and
  already bounded by the check above, with no demonstrated speculation
  gadget.  Clamp unchanged; KASAN repro re-run (stock splats, patched
  clean).

Changes in v3: repost of v2 after the thread went quiet, rebased onto
v7.1-rc4.

Changes in v2 (Michael S. Tsirkin): move the check into copy_data() next
to the memcpy(); clamp to sizeof(vi->data) instead of forcing len = 0 so
an occasionally-over-reporting device does not start returning
zero-length reads.

v1: https://lore.kernel.org/all/20260418000020.1847122-1-michael.bommarito@gmail.com/
v2: https://lore.kernel.org/all/20260418150613.3522589-1-michael.bommarito@gmail.com/
v3: https://lore.kernel.org/all/20260531142251.2792061-1-michael.bommarito@gmail.com/
---
 drivers/char/hw_random/virtio-rng.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c
index 0ce02d7e5048e..7413d24a67a9d 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -69,7 +69,22 @@ static void request_entropy(struct virtrng_info *vi)
 static unsigned int copy_data(struct virtrng_info *vi, void *buf,
 			      unsigned int size)
 {
-	size = min_t(unsigned int, size, vi->data_avail);
+	unsigned int avail;
+
+	/*
+	 * vi->data_avail was set from the device-reported used.len and
+	 * vi->data_idx was advanced by previous copy_data() calls.  A
+	 * malicious or buggy virtio-rng backend can drive data_avail past
+	 * sizeof(vi->data), so clamp it at point of use before the memcpy()
+	 * below can be steered into adjacent slab memory.
+	 */
+	avail = min_t(unsigned int, vi->data_avail, sizeof(vi->data));
+	if (vi->data_idx >= avail) {
+		vi->data_avail = 0;
+		request_entropy(vi);
+		return 0;
+	}
+	size = min_t(unsigned int, size, avail - vi->data_idx);
 	memcpy(buf, vi->data + vi->data_idx, size);
 	vi->data_idx += size;
 	vi->data_avail -= size;

base-commit: a1f173eb51db0dc78536334729ef832c62d6c65a
-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH v3] hwrng: virtio: clamp device-reported used.len at copy_data()
From: Michael Bommarito @ 2026-06-14 16:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Herbert Xu, Olivia Mackall, linux-crypto, Jason Wang, Kees Cook,
	Christian Borntraeger, virtualization, linux-kernel, Dan Williams,
	Ingo Molnar, H. Peter Anvin, torvalds, alan, tglx
In-Reply-To: <20260611064040-mutt-send-email-mst@kernel.org>

On Thu, Jun 11, 2026 at 6:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> AKA defence is depth programming)
> Alright we can drop this. No biggie.

Sorry for the delay.  I'll ship a v4 without the nospec

Thanks,
Mike

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: busy-poll: introduce sk_tx_busy_loop()
From: Menglong Dong @ 2026-06-14 10:12 UTC (permalink / raw)
  To: menglong8.dong, Jakub Kicinski
  Cc: jasowang, mst, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	pabeni, magnus.karlsson, maciej.fijalkowski, sdf, horms, ast,
	daniel, hawk, john.fastabend, bjorn, kerneljasonxing, netdev,
	virtualization, linux-kernel, bpf
In-Reply-To: <20260613112113.55d9313f@kernel.org>

On 2026/6/14 02:21, Jakub Kicinski wrote:
> On Thu, 11 Jun 2026 15:12:40 +0800 menglong8.dong@gmail.com wrote:
> > For now, we use sk_busy_loop() for both rx and tx path. The sk_busy_loop()
> > will call napi_busy_loop() for the specified napi_id. However, some
> > nic drivers have tx napi, such as virtio-net. In this case, sk_busy_loop()
> > doesn't work, as it can only schedule the NAPI for the rx queue.
> > 
> > Therefore, introduce sk_tx_busy_loop() for the nic drivers that support tx
> > napi, which will schedule the tx napi if available.
> 
> First, I thought the only difference with Tx NAPI is that it can't be
> busy polled. So if you want to poll an instance don't register it as 
> a Tx one instead of adding all this "tx polling" stuff in the core?

I see. Register the tx NAPI with netif_napi_add_config() allow us
busy poll it. But we still have two NAPI instance: rx NAPI and tx NAPI.
sk_busy_loop() can only busy poll on one of them.

Before AF_XDP, we don't have the need to send packet via tx NAPI, which
means that we don't need to busy poll it.

I analyst some nic drivers on the implement of AF_XDP. Some of them
will check xsk tx ring of current queue and send the data in it in the
rx NAPI, such as mlx5. Some of them will allocate a extra "rxtx" NAPI
for the AF_XDP zero-copy queue, which will poll both the data receiving
and sending.

In the case about, they will do the data sending and receiving for the
AF_XDP in a single NAPI instance.

However, some driver receiving the data in rx NAPI and send data in
tx NAPI for AF_XDP. In this case, we can't use sk_busy_loop() for both
rx path and tx path, as we need to wake different NAPI instance.

> 
> Second, can this problem happen for any other NIC or is it purely 
> an artifact of virtio's delayed Tx completion handling?

According to my analysis, only virtio-net and ICSSG driver have
split NAPI for AF_XDP. I don't have a ICSSG nic, but the codex tell
me that it does have the same problem.

I'm not sure if it is a good idea to introduce the sk_tx_busy_loop().
Maybe we can modify the driver instead by using the same NAPI
for both data sending and receiving, just like others do. The
advantage of introduce sk_tx_busy_loop() is that we can split the
data sending and receiving, which maybe more efficient.

> 
> Third, this series does not apply.

Ah, I'll rebase this series if a V2 is acceptable.

Thanks!
Menglong Dong

> 
> 

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: support xsk wake up
From: Menglong Dong @ 2026-06-14  2:21 UTC (permalink / raw)
  To: Menglong Dong, Jakub Kicinski
  Cc: xuanzhuo, mst, jasowang, eperezma, andrew+netdev, davem, edumazet,
	pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <20260613144438.767f8069@kernel.org>

On 2026/6/14 05:44, Jakub Kicinski wrote:
> On Wed, 10 Jun 2026 16:16:48 +0800 Menglong Dong wrote:
> > +	/* If both rq->vq and fill ring are empty, and then the user submit
> > +	 * all the chunks to the fill ring and check the wake up flag
> > +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> > +	 * we will lose the chance to wake up the rx napi, so we have to
> > +	 * set the need_wakeup flag here.
> > +	 */
> 
> TBH all the comments you're adding are harder to understand than the
> code itself ;( Please try to phrase them better or just remove them.

Ah, sorry about that. The race condition here is a little hard to describe
for me. After the discussion in the V2:
https://lore.kernel.org/netdev/rHZz5_ylT4WggoZ-Ic2Q4w@linux.dev/
the race condition seems not likely to happen. So I'll remove this
part in V3.

Thanks!
Menglong Dong

> 
> 





^ permalink raw reply

* [PATCH] drm/vblank: Don't arm vblank timer with invalid frame duration
From: Roman Ilin @ 2026-06-13 22:44 UTC (permalink / raw)
  To: Thomas Zimmermann, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Simona Vetter
  Cc: Louis Chauvet, Javier Martinez Canillas, Dmitry Osipenko,
	dri-devel, virtualization, linux-kernel, Roman Ilin

When a CRTC's display mode carries a too small pixel clock,
drm_calc_timestamping_constants() computes a frame duration that
exceeds INT_MAX. drm_vblank_crtc.framedur_ns becomes negative.
drm_crtc_vblank_start_timer() then arms the vblank hrtimer with this
interval, after which vblank events are no longer delivered. Pending
page flips never complete and the display appears frozen.

This could be triggered on virtio-gpu guests that have dynamic resolution
enabled: when the SPICE agent or the X server resizes the output, it
submits a mode whose pixel clock is off by a factor of 1000, e.g.:

    clock = 406 kHz, htotal = 3152, vtotal = 2148

    framedur_ns = 3152 * 2148 * 1000000 / 406 = 16675852216 ns (~16.7 s)

16675852216 does not fit into an int and wraps to roughly -504000000.
ns_to_ktime() then yields a negative interval and the timer stops working.

Found by bisection, which pointed at commit a036f5fceedb ("drm/virtgpu:
Use vblank timer"). That commit merely made virtio-gpu use the vblank
timer and thereby exposed the pre-existing problem in the timer setup
added by commit 74afeb812850 ("drm/vblank: Add vblank timer").

Reject a non-positive frame duration in drm_crtc_vblank_start_timer() and
return an error. enable_vblank then fails and the driver falls back to
sending the vblank event immediately, as it did before the vblank timer
was introduced. Valid modes are unaffected, and the timer self-heals on
the next mode that has a sane clock.

Fixes: 74afeb812850 ("drm/vblank: Add vblank timer")
Cc: stable@vger.kernel.org
Signed-off-by: Roman Ilin <me@romanilin.is>
---
Notes:

Based on v7.1-rc7. Tested on 6.19 and 7.1-rc7.

Open questions:

This relies on the int overflow producing a negative value. The deeper
issue is that drm_calc_timestamping_constants() truncates framedur_ns to
int. Would you prefer to widen framedur_ns to s64, or to bound the
interval here (e.g. reject framedur_ns above one second) so that any
bogus interval is rejected regardless of sign?

Should virtio-gpu additionally sanitize the user-supplied clock in its
atomic_check (similar to vmwgfx for the clock==0 case) so the
vblank-timer throttling is preserved for these resizes, instead of
falling back to immediate events?

 drivers/gpu/drm/drm_vblank.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index f78bf37f1..557cd0bc8 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -2235,7 +2235,13 @@ int drm_crtc_vblank_start_timer(struct drm_crtc *crtc)

 	drm_calc_timestamping_constants(crtc, &crtc->mode);

+	/*
+	 * Return an error so the driver falls back to sending vblank events
+	 * when a small mode clock yields a frame duration exceeding INT_MAX.
+	 */
+	if (vblank->framedur_ns <= 0)
+		return -EINVAL;
+
 	spin_lock_irqsave(&vtimer->interval_lock, flags);
 	vtimer->interval = ns_to_ktime(vblank->framedur_ns);
 	spin_unlock_irqrestore(&vtimer->interval_lock, flags);
-- 
2.54.0

^ permalink raw reply related

* Re: [PATCH net-next] virtio-net: support xsk wake up
From: Jakub Kicinski @ 2026-06-13 21:46 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Menglong Dong, xuanzhuo, mst, jasowang, andrew+netdev, davem,
	edumazet, pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <CAJaqyWeYiruNosJsMTh2jQ=XCEcPg7956aqeRRpDSyynfpjNZA@mail.gmail.com>

On Wed, 10 Jun 2026 10:27:28 +0200 Eugenio Perez Martin wrote:
> And the From and Signed-off-by emails don't match, which I'm not sure is valid.

It's clearly the same person. Please focus on the code, not trivial
process issues.

Quoting documentation:

  Reviewer guidance
  -----------------

  [...]

  Reviewers are highly encouraged to do more in-depth review of submissions
  and not focus exclusively on process issues, trivial or subjective
  matters like code formatting, tags etc.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#reviewer-guidance

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: support xsk wake up
From: Jakub Kicinski @ 2026-06-13 21:44 UTC (permalink / raw)
  To: Menglong Dong
  Cc: xuanzhuo, mst, jasowang, eperezma, andrew+netdev, davem, edumazet,
	pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <20260610081648.2205711-1-dongml2@chinatelecom.cn>

On Wed, 10 Jun 2026 16:16:48 +0800 Menglong Dong wrote:
> +	/* If both rq->vq and fill ring are empty, and then the user submit
> +	 * all the chunks to the fill ring and check the wake up flag
> +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> +	 * we will lose the chance to wake up the rx napi, so we have to
> +	 * set the need_wakeup flag here.
> +	 */

TBH all the comments you're adding are harder to understand than the
code itself ;( Please try to phrase them better or just remove them.

^ permalink raw reply

* Re: [PATCH net v3] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-13 20:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260611014519-mutt-send-email-mst@kernel.org>

On Wed, Jun 10, 2026 at 10:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 10, 2026 at 07:46:16PM -0700, Xiang Mei wrote:
> > receive_big() bounds the device-announced length by
> > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > add_recvbuf_big() sets sg[1] to start at offset
> > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > check allows for the common hdr_len == 12 case.
> >
> > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > then walks one frag past the page chain, storing a NULL page->private
> > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > write past the static frag array and a NULL frag handed up the rx path.
> >
> > Bound len by the size add_recvbuf_big() actually advertised.
> >
> > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> Thanks for the patch! Something small to improve:
>
> > ---
> > v3: revoke 2/2 and add Xuan Zhuo's Reviewed-by tag
> >
> >  drivers/net/virtio_net.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..afe73eda1491 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >                                  struct virtnet_rq_stats *stats)
> >  {
> >       struct page *page = buf;
> > +     unsigned long max_len;
>
> Assignment can happen here?
>
> >       struct sk_buff *skb;
> >
> >       /* Make sure that len does not exceed the size allocated in
> >        * add_recvbuf_big.
> >        */
> > -     if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > +     max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> > +               vi->big_packets_num_skbfrags * PAGE_SIZE;
>
> Took me a while to figure out what is going on, but I finally
> understand:
>
>
> Reducing
> (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE
>
> (what we allocated)
>
> by sizeof(struct padded_vnet_hdr) - vi->hdr_len
>
>
> right?
>
> So clearer as:
>
>
>         unsigned long max_len = (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE -
>         sizeof(struct padded_vnet_hdr) + vi->hdr_len;
>
Right, that's the same value. Yours reads better!

I'll fold this into the next respin. One thing I'd like to settle
first: David suggested storing this in a vi field computed once at the
probe (it's a per-device constant) and just comparing len against it
on the datapath, instead of re-deriving it in receive_big() each time.
I'll wait for his take on that and send a single v4 that covers both.

Xiang

>
>
>
> > +     if (unlikely(len > max_len)) {
> >               pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > -                      dev->name, len,
> > -                      (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > +                      dev->name, len, max_len);
> >               goto err;
> >       }
> >
> > --
> > 2.43.0
>

^ permalink raw reply

* Re: [PATCH net] virtio-net: fix len check in receive_big()
From: Xiang Mei @ 2026-06-13 19:58 UTC (permalink / raw)
  To: David Laight
  Cc: mst, jasowang, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, virtualization, linux-kernel,
	minhquangbui99, bestswngs
In-Reply-To: <20260611104836.469610b0@pumpkin>

On Thu, Jun 11, 2026 at 2:48 AM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Wed, 10 Jun 2026 15:16:06 -0700
> Xiang Mei <xmei5@asu.edu> wrote:
>
> > receive_big() bounds the device-announced length by
> > (big_packets_num_skbfrags + 1) * PAGE_SIZE.  That is still too loose:
> > add_recvbuf_big() sets sg[1] to start at offset
> > sizeof(struct padded_vnet_hdr) into the first page, so the chain
> > actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
> > big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
> > check allows for the common hdr_len == 12 case.
> >
> > A malicious virtio backend can announce a len in that gap.  page_to_skb()
> > then walks one frag past the page chain, storing a NULL page->private
> > into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
> > write past the static frag array and a NULL frag handed up the rx path.
> >
> > Bound len by the size add_recvbuf_big() actually advertised.
> >
> > Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
> > Reported-by: Weiming Shi <bestswngs@gmail.com>
> > Signed-off-by: Xiang Mei <xmei5@asu.edu>
> > ---
> >  drivers/net/virtio_net.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..afe73eda1491 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1999,15 +1999,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >                                  struct virtnet_rq_stats *stats)
> >  {
> >       struct page *page = buf;
> > +     unsigned long max_len;
> >       struct sk_buff *skb;
> >
> >       /* Make sure that len does not exceed the size allocated in
> >        * add_recvbuf_big.
> >        */
> > -     if (unlikely(len > (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE)) {
> > +     max_len = vi->hdr_len + (PAGE_SIZE - sizeof(struct padded_vnet_hdr)) +
> > +               vi->big_packets_num_skbfrags * PAGE_SIZE;
>
> That looks like a constant (for the vi).
Thanks, agreed it's a per-device constant, so the expression never
changes after setup.

> Probably worth saving rather than recalculating all the time.
>

I thought about caching it in a vi->big_packets_max_len field computed
once after virtnet_set_big_packets(). My only concern is that the
saving is marginal here: receive_big() is only reached when
mergeable_rx_bufs is off (the "else if (vi->big_packets)" branch). Any
backend that negotiates VIRTIO_NET_F_MRG_RXBUF takes the
receive_mergeable() path for all RX, so receive_big() is never hit
there at all. And the bound has exactly one consumer.

So it's trading a 4-byte field (probably free in existing padding)
against one cache-hit load plus a shift, and two adds, on a path
that's effectively cold on modern setups. I mainly work on security
issues, so I would like to listen to your idea.


Xiang
> -- David
>
> > +     if (unlikely(len > max_len)) {
> >               pr_debug("%s: rx error: len %u exceeds allocated size %lu\n",
> > -                      dev->name, len,
> > -                      (vi->big_packets_num_skbfrags + 1) * PAGE_SIZE);
> > +                      dev->name, len, max_len);
> >               goto err;
> >       }
> >
>

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: busy-poll: introduce sk_tx_busy_loop()
From: Jakub Kicinski @ 2026-06-13 18:21 UTC (permalink / raw)
  To: menglong8.dong
  Cc: jasowang, mst, xuanzhuo, eperezma, andrew+netdev, davem, edumazet,
	pabeni, magnus.karlsson, maciej.fijalkowski, sdf, horms, ast,
	daniel, hawk, john.fastabend, bjorn, kerneljasonxing, netdev,
	virtualization, linux-kernel, bpf
In-Reply-To: <20260611071242.2485058-2-dongml2@chinatelecom.cn>

On Thu, 11 Jun 2026 15:12:40 +0800 menglong8.dong@gmail.com wrote:
> For now, we use sk_busy_loop() for both rx and tx path. The sk_busy_loop()
> will call napi_busy_loop() for the specified napi_id. However, some
> nic drivers have tx napi, such as virtio-net. In this case, sk_busy_loop()
> doesn't work, as it can only schedule the NAPI for the rx queue.
> 
> Therefore, introduce sk_tx_busy_loop() for the nic drivers that support tx
> napi, which will schedule the tx napi if available.

First, I thought the only difference with Tx NAPI is that it can't be
busy polled. So if you want to poll an instance don't register it as 
a Tx one instead of adding all this "tx polling" stuff in the core?

Second, can this problem happen for any other NIC or is it purely 
an artifact of virtio's delayed Tx completion handling?

Third, this series does not apply.

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] vsock: consolidate acceptq accounting into core helpers
From: patchwork-bot+netdevbpf @ 2026-06-13 17:50 UTC (permalink / raw)
  To: Raf Dickson
  Cc: netdev, virtualization, pabeni, sgarzare, stefanha, bryan-bt.tan,
	vishnu.dasa, bcm-kernel-feedback-list, bobbyeshleman, leonardi,
	horms, edumazet, kuba
In-Reply-To: <20260612045216.105796-1-rafdog35@gmail.com>

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 12 Jun 2026 04:52:12 +0000 you wrote:
> These patches follow up on commit c05fa14db43e
> ("vsock/vmci: fix sk_ack_backlog leak on failed handshake")
> by consolidating sk_acceptq_added() and sk_acceptq_removed() into
> the core vsock helpers so transports cannot forget them.
> 
> Changes since v2:
>   - Add vsock_pending_to_accept() helper for the vmci pending->accept
>     transition, avoiding a double sk_acceptq_added() (Stefano Garzarella)
>   - Split into 4 patches for bisectability (Stefano Garzarella)
>   - Fold sk_acceptq_added() into vsock_add_pending() as a separate patch
> 
> [...]

Here is the summary with links:
  - [net-next,v3,1/4] vsock: introduce vsock_pending_to_accept() helper
    https://git.kernel.org/netdev/net-next/c/77eee189397d
  - [net-next,v3,2/4] vsock: fold sk_acceptq_added() into vsock_add_pending()
    https://git.kernel.org/netdev/net-next/c/a6fd2cfdcdf5
  - [net-next,v3,3/4] vsock: fold sk_acceptq_added() into vsock_enqueue_accept()
    https://git.kernel.org/netdev/net-next/c/6f6f9b65a991
  - [net-next,v3,4/4] vsock: fold sk_acceptq_removed() into vsock_remove_pending()
    https://git.kernel.org/netdev/net-next/c/27fc25bb82e6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v3 4/4] vsock: fold sk_acceptq_removed() into vsock_remove_pending()
From: Jakub Kicinski @ 2026-06-13 17:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Raf Dickson, netdev, virtualization, pabeni, stefanha,
	bryan-bt.tan, vishnu.dasa, bcm-kernel-feedback-list,
	bobbyeshleman, leonardi, horms, edumazet
In-Reply-To: <aivjf4TZU4Q_s20y@sgarzare-redhat>

On Fri, 12 Jun 2026 12:48:14 +0200 Stefano Garzarella wrote:
> >@@ -773,7 +774,6 @@ static void vsock_pending_work(struct work_struct *work)
> > 	if (vsock_is_pending(sk)) {
> > 		vsock_remove_pending(listener, sk);
> >  
>      ^^
> There is an extra blank line that we can now remove here.
> 
> BTW, the code LGTM:

Since the merge window is upon us - also updated when applying.

^ permalink raw reply

* Re: [PATCH net-next v2] vsock/vmci: use sk_acceptq_is_full() helper
From: patchwork-bot+netdevbpf @ 2026-06-13 17:40 UTC (permalink / raw)
  To: Raf Dickson
  Cc: netdev, virtualization, pabeni, sgarzare, stefanha, bryan-bt.tan,
	vishnu.dasa, bcm-kernel-feedback-list, leonardi, horms, edumazet,
	kuba
In-Reply-To: <20260612045842.122207-1-rafdog35@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 12 Jun 2026 04:58:42 +0000 you wrote:
> Replace the open-coded backlog check with sk_acceptq_is_full().
> The helper uses > instead of >=, which is the correct comparison
> per commit 64a146513f8f ("[NET]: Revert incorrect accept queue
> backlog changes."), and adds READ_ONCE() for proper memory ordering.
> 
> Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Raf Dickson <rafdog35@gmail.com>
> 
> [...]

Here is the summary with links:
  - [net-next,v2] vsock/vmci: use sk_acceptq_is_full() helper
    https://git.kernel.org/netdev/net-next/c/4ff2e84ff1b3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next v2] vsock/vmci: use sk_acceptq_is_full() helper
From: Jakub Kicinski @ 2026-06-13 17:37 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Raf Dickson, netdev, virtualization, pabeni, stefanha,
	bryan-bt.tan, vishnu.dasa, bcm-kernel-feedback-list, leonardi,
	horms, edumazet
In-Reply-To: <aivKma8mRjTXV0BM@sgarzare-redhat>

On Fri, 12 Jun 2026 11:03:24 +0200 Stefano Garzarella wrote:
> nit: title should be updated since now this is not just vmci
> (e.g. vsock: use sk_acceptq_is_full() helper in all transports)
> 
> Not sure if it can be fixed while applying by netdev maintainers.

Updated and applied, thanks!

^ permalink raw reply

* Re: [PATCH] vduse: Fix error around jumping over a __cleanup() variable
From: Nathan Chancellor @ 2026-06-13 16:26 UTC (permalink / raw)
  To: David Laight
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	virtualization, linux-kernel, llvm
In-Reply-To: <20260611110346.2b9388a1@pumpkin>

On Thu, Jun 11, 2026 at 11:03:46AM +0100, David Laight wrote:
> On Wed, 10 Jun 2026 12:16:49 -0700
> Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > When building with clang, there is an error in vduse_vq_kick() from
> > attempting to jump over a variable declared with the cleanup attribute
> > using goto:
> .
> > Jumping over a variable declared with the cleanup attribute does not
> > prevent the cleanup function from running, it would just result in the
> > variable being passed uninitialized to the cleanup function .clang
> > errors instead of generating the invalid code, unlike GCC.
> 
> Does the same apply to variables allocated inside switch statements?
> I'm sure I've seen one that wasn't inside an extra block.

Yes:

  https://lore.kernel.org/20251002233627.GA3978676@ax162/

-- 
Cheers,
Nathan

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] virtio_net: xsk: fix race in rx wake up
From: Menglong Dong @ 2026-06-13 12:26 UTC (permalink / raw)
  To: menglong8.dong, xuanzhuo, eperezma, Bui Quang Minh
  Cc: mst, jasowang, andrew+netdev, davem, edumazet, kuba, pabeni,
	kerneljasonxing, netdev, virtualization, linux-kernel
In-Reply-To: <41eefa1d-99bf-450d-988e-7dec67c6b61e@gmail.com>

On 2026/6/12 00:24, Bui Quang Minh wrote:
> On 6/11/26 09:56, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <dongml2@chinatelecom.cn>
> >
> > During packet receiving in virtio-net, the rq can be empty, which means
> > "rq->vq->num_free == virtqueue_get_vring_size(rq->vq)", in
> > virtnet_add_recvbuf_xsk(), if we are using xsk. Meanwhile, the fill ring
> > can be empty too, which means we can't allocate anything from
> > xsk_buff_alloc_batch(). Then, we will set the XDP_RING_NEED_WAKEUP flag.
> >
> > However, if the user clean all the data in rx ring and fill the
> > "fill ring" and check the XDP_RING_NEED_WAKEUP flag after
> > xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(), then the rx
> > napi will never be scheduled: the rx ring is empty, which means we will
> > never receive a packet to trigger the further recv fill. The rx ring is
> > empty now, so the user will not check the flag too.
> >
> > Fix this by set the XDP_RING_NEED_WAKEUP flag before
> > xsk_buff_alloc_batch() if both rq->vq and fill ring are empty.
> >
> > Meanwhile, set the XDP_RING_NEED_WAKEUP flag if we have any free entry in
> > rq->vq.
> >
> > Fixes: e3f8800aa243 ("virtio-net: xsk: Support wakeup on RX side")
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > ---
> >   drivers/net/virtio_net.c | 25 ++++++++++++++++++++++---
> >   1 file changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index f4adcfee7a80..4b5b3fa62008 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1323,16 +1323,27 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue
> >   				   struct xsk_buff_pool *pool, gfp_t gfp)
> >   {
> >   	struct xdp_buff **xsk_buffs;
> > +	bool need_wakeup;
> >   	dma_addr_t addr;
> >   	int err = 0;
> >   	u32 len, i;
> >   	int num;
> >   
> > +	need_wakeup = xsk_uses_need_wakeup(pool);
> >   	xsk_buffs = rq->xsk_buffs;
> >   
> > +	/* If both rq->vq and fill ring are empty, and then the user submit
> > +	 * all the chunks to the fill ring and check the wake up flag
> > +	 * after xsk_buff_alloc_batch() and before xsk_set_rx_need_wakeup(),
> > +	 * we will lose the chance to wake up the rx napi, so we have to
> > +	 * set the need_wakeup flag here.
> > +	 */
> > +	if (need_wakeup && virtqueue_get_vring_size(rq->vq) == rq->vq->num_free)
> > +		xsk_set_rx_need_wakeup(pool);
> 

Hi, Bui Quang. Thanks for your reply. I spent some time learning
what you said.

> I think when polling the receive queue, the userspace program needs to 
> check the XDP_RING_NEED_WAKEUP flag if it does not see any packets. The 
> flag check is quite lightweight in my opinion. Here are some examples I find
> 
> - 
> https://github.com/xdp-project/xdp-tools/blob/e9469501622aa22a7e452a671000bec8685edcde/lib/util/xdpsock.c#L1206

You are right, I'm over concerned about this point. My origin
concern is that we can't wake up from the poll syscall in this case:

The chunk of the umem is 2000. In the beginning, the xsk->fill_ring
is filled with 2000 chunk, and then the user fall asleep and don't
do anything.

Kernel: the 2000th packet is received
Kernel: xsk_buff_alloc_batch return 0(xsk->fill_ring is empty and xsk->rx_ring is full)

        User: handle the xsk->rx_ring
        User: fill the xsk->fill_ring with 2000 chunks
        User: check the wake up flag
        User: no need_wakeup flag, fall asleep with poll() syscall

Kernel: call xsk_set_rx_need_wakeup()
Kernel: virio-net rx ringbuf is empty, we can't receive any packet further
Kernel: to call virtnet_add_recvbuf_xsk(), we are dead

But then, I found that we can still be wake up with the 2000th
packet from the poll syscall, which means that the case that
the NAPI and the user can't both be waked up doesn't exist.

> - 
> https://github.com/xdp-project/bpf-examples/blob/43e565901c4287efa863edca7f0e6cd6e35ed896/AF_XDP-forwarding/xsk_fwd.c#L540
> 
> Furthermore, the XDP_RING_NEED_WAKEUP flag related functions does not 
> provide any memory orderings. So even with your patch, I'm worried that 
> this case is possible
> 
> kernel userspace
> 
> xsk_buff_alloc_batch -> failed
>                                                              submit fill 
> ring
>                                                              flag != 
> XDP_RING_NEED_WAKEUP
> // reordering due to lack of memory orderings
> xsk_set_rx_need_wakeup
> 
> I'm not expert here, so correct me if I'm wrong. I think the wake up 
> flag is designed with no orderings so we cannot rely on it to reason and 
> skip further checks.
> 
> > +
> >   	num = xsk_buff_alloc_batch(pool, xsk_buffs, rq->vq->num_free);
[....]
> > +
> 
> Why do we need to set XDP_RING_NEED_WAKEUP even when 
> xsk_buff_alloc_batch succeeds?

Ah, don't mind here. I just thought that if xsk_buff_alloc_batch()
didn't allocate enough chunks as we need, we can wake up
the NAPI as soon as possible, in case that the virtio-net ringbuf
is full and cause packet dropping :)

Anyway, I'll remove the first patch, and send the second patch
only in the V3.

Thanks!
Menglong Dong

> 
> >   	return num;
> >   
> >   err:
> 
> Thanks,
> Quang Minh.
> 
> 
> 
> 





^ permalink raw reply

* [PATCH net v2 2/2] vsock/virtio: restore msg_iter on transmission failure
From: Octavian Purdila @ 2026-06-13  0:09 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo, Octavian Purdila,
	syzbot+28e5f3d207b14bae122a
In-Reply-To: <20260613000953.467473-1-tavip@google.com>

When transmission fails in virtio_transport_send_pkt_info, the msg_iter
might have been partially advanced. If we don't restore it, the next
attempt to send data will use an incorrect iterator state, leading to
desync and warnings like "send_pkt() returns 0, but X expected".

Specifically, this can happen in the following scenario, triggered by
the syzkaller repro:

1. A write-only VMA (PROT_WRITE only) is partially populated by a
   prior TUN write that failed with -EIO but still faulted in some
   pages).
2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
   buffer from this VMA.
3. The first packet (64KB) is sent successfully because the pages are
   populated.
4. The second packet allocation fails because GUP fast pins the first page
   but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
   permissions.
5. The iterator is advanced by the partially successful GUP (68KB total
   advanced: 64KB from first packet + 4KB from second), but the send loop
   breaks and only reports 64KB sent. This creates a 4KB desync.
6. The next retry starts with a non-zero iov_offset, disabling zerocopy
   and falling back to copy mode.
7. In copy mode, the transmission succeeds for the next packets but
   exhausts the iterator early because of the desync.
8. The final retry sees an empty iterator but zerocopy is re-enabled
   (offset resets). It attempts to send the remaining bytes with zerocopy
   but pins 0 pages, creating an empty packet.
9. The transport sends the empty packet, triggering the warning because
   the returned bytes (header only) do not match the expected payload size.
10. The loop continues to spin, allocating ubuf_info each time, eventually
    exhausting sysctl_optmem_max and returning -ENOMEM to userspace.

Restore msg_iter to its original state before the packet allocation
and transmission attempt if they fail.

Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
Assisted-by: gemini:gemini-3.1-pro
Signed-off-by: Octavian Purdila <tavip@google.com>
---
 net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b10666937c490..2baa5a6ebd750 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 	u32 src_cid, src_port, dst_cid, dst_port;
 	const struct virtio_transport *t_ops;
+	struct iov_iter_state msg_iter_state;
 	struct virtio_vsock_sock *vvs;
 	struct ubuf_info *uarg = NULL;
 	u32 pkt_len = info->pkt_len;
@@ -368,8 +369,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 		struct sk_buff *skb;
 		size_t skb_len;
 
+		/* Save iterator state in case allocation or transmission fails
+		 * so we can restore it and retry.
+		 */
+		if (info->msg)
+			iov_iter_save_state(&info->msg->msg_iter, &msg_iter_state);
+
 		skb_len = min(max_skb_len, rest_len);
 
+		/* Note: virtio_transport_alloc_skb() can advance info->msg->msg_iter
+		 * even if it fails (e.g. partial GUP success).
+		 */
 		skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
 						 uarg,
 						 src_cid, src_port,
@@ -399,6 +409,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 			break;
 	} while (rest_len);
 
+	if (info->msg && ret < 0)
+		iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
 	virtio_transport_put_credit(vvs, rest_len);
 
 	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related

* [PATCH net v2 1/2] iov_iter: export iov_iter_restore
From: Octavian Purdila @ 2026-06-13  0:09 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo, Octavian Purdila
In-Reply-To: <20260613000953.467473-1-tavip@google.com>

Export iov_iter_restore so that it can be used by modules.

This is needed by the virtio vsock transport (which can be built as a
module) to restore the msg_iter state when transmission fails.

Signed-off-by: Octavian Purdila <tavip@google.com>
---
 lib/iov_iter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 243662af1af73..067e745f9ef53 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1491,6 +1491,7 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 		i->__iov -= state->nr_segs - i->nr_segs;
 	i->nr_segs = state->nr_segs;
 }
+EXPORT_SYMBOL(iov_iter_restore);
 
 /*
  * Extract a list of contiguous pages from an ITER_FOLIOQ iterator.  This does
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related

* [PATCH net v2 0/2] vsock/virtio: fix msg_iter desync on transmission failure
From: Octavian Purdila @ 2026-06-13  0:09 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Viro, Andrew Morton, Arseniy Krasnov, David S. Miller,
	Eric Dumazet, Eugenio Pérez, Jakub Kicinski, Jason Wang, kvm,
	linux-block, linux-fsdevel, linux-kernel, Michael S. Tsirkin,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, Stefano Garzarella,
	virtualization, Xuan Zhuo, Octavian Purdila

This series fixes a msg_iter desync issue in the virtio vsock transport
that can lead to warnings and eventual -ENOMEM under specific failure
scenarios (e.g. partial GUP failure during MSG_ZEROCOPY transmission).

To fix this, we need to restore the msg_iter state on transmission failure.
However, since virtio vsock transport can be built as a module, we first
need to export iov_iter_restore.

Patch 1 exports iov_iter_restore.
Patch 2 implements the msg_iter restoration in virtio vsock.

Changes in v2:
- Use iov_iter_savestate()/iov_iter_restore() (Stefano)
- Use a single restore point (Stefano)
- Reverse xmas tree (Stefano)
- Added comments in the code (Stefano)

v1: https://lore.kernel.org/all/20260609004809.1285028-1-tavip@google.com/

Octavian Purdila (2):
  iov_iter: export iov_iter_restore
  vsock/virtio: restore msg_iter on transmission failure

 lib/iov_iter.c                          |  1 +
 net/vmw_vsock/virtio_transport_common.c | 13 +++++++++++++
 2 files changed, 14 insertions(+)

-- 
2.54.0.1136.gdb2ca164c4-goog

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox