* [syzbot] [wireless?] KASAN: slab-use-after-free Write in rsi_91x_deinit
From: syzbot @ 2026-04-20 7:05 UTC (permalink / raw)
To: linux-kernel, linux-usb, linux-wireless, netdev, syzkaller-bugs
Hello,
syzbot found the following issue on:
HEAD commit: 87117347a0e7 usb: dwc3: starfive: Add JHB100 USB 2.0 DRD c..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
console output: https://syzkaller.appspot.com/x/log.txt?x=171b04ce580000
kernel config: https://syzkaller.appspot.com/x/.config?x=2056c1e3f6d3b0bc
dashboard link: https://syzkaller.appspot.com/bug?extid=5de83f57cd8531f55596
compiler: gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/6168cf3e4727/disk-87117347.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/36e554f1750a/vmlinux-87117347.xz
kernel image: https://storage.googleapis.com/syzbot-assets/7aa92e741f66/bzImage-87117347.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+5de83f57cd8531f55596@syzkaller.appspotmail.com
rsi_91x: rsi_probe: Failed to init usb interface
==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
BUG: KASAN: slab-use-after-free in atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline]
BUG: KASAN: slab-use-after-free in __refcount_add include/linux/refcount.h:283 [inline]
BUG: KASAN: slab-use-after-free in __refcount_inc include/linux/refcount.h:366 [inline]
BUG: KASAN: slab-use-after-free in refcount_inc include/linux/refcount.h:383 [inline]
BUG: KASAN: slab-use-after-free in get_task_struct include/linux/sched/task.h:116 [inline]
BUG: KASAN: slab-use-after-free in kthread_stop+0x8f/0x680 kernel/kthread.c:754
Write of size 4 at addr ffff88813d339da8 by task kworker/0:3/10583
CPU: 0 UID: 0 PID: 10583 Comm: kworker/0:3 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Workqueue: usb_hub_wq hub_event
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x156/0x4c9 mm/kasan/report.c:482
kasan_report+0xdf/0x1e0 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:186 [inline]
kasan_check_range+0x10f/0x1e0 mm/kasan/generic.c:200
instrument_atomic_read_write include/linux/instrumented.h:112 [inline]
atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:252 [inline]
__refcount_add include/linux/refcount.h:283 [inline]
__refcount_inc include/linux/refcount.h:366 [inline]
refcount_inc include/linux/refcount.h:383 [inline]
get_task_struct include/linux/sched/task.h:116 [inline]
kthread_stop+0x8f/0x680 kernel/kthread.c:754
rsi_kill_thread drivers/net/wireless/rsi/rsi_common.h:78 [inline]
rsi_91x_deinit+0x102/0x1f0 drivers/net/wireless/rsi/rsi_91x_main.c:405
rsi_probe+0xd27/0x1aa0 drivers/net/wireless/rsi/rsi_91x_usb.c:861
usb_probe_interface+0x303/0x8f0 drivers/usb/core/driver.c:396
call_driver_probe drivers/base/dd.c:643 [inline]
really_probe+0x241/0xa60 drivers/base/dd.c:721
__driver_probe_device+0x1de/0x400 drivers/base/dd.c:863
driver_probe_device+0x4c/0x1b0 drivers/base/dd.c:893
__device_attach_driver+0x1df/0x340 drivers/base/dd.c:1021
bus_for_each_drv+0x159/0x1e0 drivers/base/bus.c:500
__device_attach+0x1e4/0x4d0 drivers/base/dd.c:1093
device_initial_probe+0xaf/0xd0 drivers/base/dd.c:1148
bus_probe_device+0x64/0x160 drivers/base/bus.c:613
device_add+0x11d9/0x1950 drivers/base/core.c:3691
usb_set_configuration+0xd97/0x1c60 drivers/usb/core/message.c:2268
usb_generic_driver_probe+0xa1/0xe0 drivers/usb/core/generic.c:250
usb_probe_device+0xef/0x400 drivers/usb/core/driver.c:291
call_driver_probe drivers/base/dd.c:643 [inline]
really_probe+0x241/0xa60 drivers/base/dd.c:721
__driver_probe_device+0x1de/0x400 drivers/base/dd.c:863
driver_probe_device+0x4c/0x1b0 drivers/base/dd.c:893
__device_attach_driver+0x1df/0x340 drivers/base/dd.c:1021
bus_for_each_drv+0x159/0x1e0 drivers/base/bus.c:500
__device_attach+0x1e4/0x4d0 drivers/base/dd.c:1093
device_initial_probe+0xaf/0xd0 drivers/base/dd.c:1148
bus_probe_device+0x64/0x160 drivers/base/bus.c:613
device_add+0x11d9/0x1950 drivers/base/core.c:3691
usb_new_device.cold+0x685/0x115c drivers/usb/core/hub.c:2695
hub_port_connect drivers/usb/core/hub.c:5567 [inline]
hub_port_connect_change drivers/usb/core/hub.c:5707 [inline]
port_event drivers/usb/core/hub.c:5871 [inline]
hub_event+0x314d/0x4af0 drivers/usb/core/hub.c:5953
process_one_work+0xa23/0x19a0 kernel/workqueue.c:3276
process_scheduled_works kernel/workqueue.c:3359 [inline]
worker_thread+0x5ef/0xe50 kernel/workqueue.c:3440
kthread+0x370/0x450 kernel/kthread.c:436
ret_from_fork+0x6c3/0xcb0 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
Allocated by task 2:
kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
kasan_save_track+0x14/0x30 mm/kasan/common.c:78
unpoison_slab_object mm/kasan/common.c:340 [inline]
__kasan_slab_alloc+0x6e/0x70 mm/kasan/common.c:366
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4538 [inline]
slab_alloc_node mm/slub.c:4866 [inline]
kmem_cache_alloc_node_noprof+0x26b/0x6b0 mm/slub.c:4918
alloc_task_struct_node kernel/fork.c:185 [inline]
dup_task_struct kernel/fork.c:916 [inline]
copy_process+0x48b/0x7820 kernel/fork.c:2050
kernel_clone+0xfc/0x9a0 kernel/fork.c:2653
kernel_thread+0xdb/0x120 kernel/fork.c:2714
create_kthread kernel/kthread.c:459 [inline]
kthreadd+0x498/0x7a0 kernel/kthread.c:817
ret_from_fork+0x6c3/0xcb0 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Freed by task 14:
kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
kasan_save_track+0x14/0x30 mm/kasan/common.c:78
kasan_save_free_info+0x3b/0x70 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:253 [inline]
__kasan_slab_free+0x43/0x70 mm/kasan/common.c:285
kasan_slab_free include/linux/kasan.h:235 [inline]
slab_free_hook mm/slub.c:2685 [inline]
slab_free mm/slub.c:6165 [inline]
kmem_cache_free+0x105/0x640 mm/slub.c:6295
rcu_do_batch kernel/rcu/tree.c:2617 [inline]
rcu_core+0x5a2/0x10d0 kernel/rcu/tree.c:2869
handle_softirqs+0x1de/0x9d0 kernel/softirq.c:622
run_ksoftirqd kernel/softirq.c:1063 [inline]
run_ksoftirqd+0x38/0x60 kernel/softirq.c:1055
smpboot_thread_fn+0x3d3/0xaa0 kernel/smpboot.c:160
kthread+0x370/0x450 kernel/kthread.c:436
ret_from_fork+0x6c3/0xcb0 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Last potentially related work creation:
kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
kasan_record_aux_stack+0x8c/0xa0 mm/kasan/generic.c:556
__call_rcu_common.constprop.0+0xa5/0x9b0 kernel/rcu/tree.c:3131
put_task_struct include/linux/sched/task.h:159 [inline]
put_task_struct include/linux/sched/task.h:128 [inline]
delayed_put_task_struct+0xe4/0x2e0 kernel/exit.c:231
rcu_do_batch kernel/rcu/tree.c:2617 [inline]
rcu_core+0x5a2/0x10d0 kernel/rcu/tree.c:2869
handle_softirqs+0x1de/0x9d0 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
invoke_softirq kernel/softirq.c:496 [inline]
__irq_exit_rcu+0xed/0x150 kernel/softirq.c:723
irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
sysvec_apic_timer_interrupt+0x8f/0xb0 arch/x86/kernel/apic/apic.c:1056
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
Second to last potentially related work creation:
kasan_save_stack+0x30/0x50 mm/kasan/common.c:57
kasan_record_aux_stack+0x8c/0xa0 mm/kasan/generic.c:556
__call_rcu_common.constprop.0+0xa5/0x9b0 kernel/rcu/tree.c:3131
put_task_struct_rcu_user kernel/exit.c:237 [inline]
put_task_struct_rcu_user+0x6c/0xc0 kernel/exit.c:234
context_switch kernel/sched/core.c:5301 [inline]
__schedule+0xeb9/0x4220 kernel/sched/core.c:6911
preempt_schedule_common+0x42/0xc0 kernel/sched/core.c:7095
preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
__raw_spin_unlock include/linux/spinlock_api_smp.h:169 [inline]
_raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:389 [inline]
filemap_map_pages+0x130f/0x1e50 mm/filemap.c:3936
do_fault_around mm/memory.c:5757 [inline]
do_read_fault mm/memory.c:5790 [inline]
do_fault mm/memory.c:5933 [inline]
do_pte_missing mm/memory.c:4477 [inline]
handle_pte_fault mm/memory.c:6317 [inline]
__handle_mm_fault+0x1e2e/0x2d60 mm/memory.c:6455
handle_mm_fault+0x36d/0xa20 mm/memory.c:6624
do_user_addr_fault+0x5ae/0x11d0 arch/x86/mm/fault.c:1334
handle_page_fault arch/x86/mm/fault.c:1474 [inline]
exc_page_fault+0x66/0xc0 arch/x86/mm/fault.c:1527
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618
The buggy address belongs to the object at ffff88813d339d80
which belongs to the cache task_struct of size 7296
The buggy address is located 40 bytes inside of
freed 7296-byte region [ffff88813d339d80, ffff88813d33ba00)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13d338
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff88813d33f601
flags: 0x200000000000040(head|node=0|zone=2)
page_type: f5(slab)
raw: 0200000000000040 ffff8881012d9500 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800040004 00000000f5000000 ffff88813d33f601
head: 0200000000000040 ffff8881012d9500 dead000000000100 dead000000000122
head: 0000000000000000 0000000800040004 00000000f5000000 ffff88813d33f601
head: 0200000000000003 ffffea0004f4ce01 00000000ffffffff 00000000ffffffff
head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5914, tgid 5914 (dhcpcd-run-hook), ts 193584350604, free_ts 193494512770
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x153/0x170 mm/page_alloc.c:1889
prep_new_page mm/page_alloc.c:1897 [inline]
get_page_from_freelist+0xf10/0x39f0 mm/page_alloc.c:3962
__alloc_frozen_pages_noprof+0x273/0x2860 mm/page_alloc.c:5250
alloc_slab_page mm/slub.c:3292 [inline]
allocate_slab mm/slub.c:3481 [inline]
new_slab+0xa6/0x6c0 mm/slub.c:3539
refill_objects+0x26b/0x400 mm/slub.c:7175
refill_sheaf mm/slub.c:2812 [inline]
__pcs_replace_empty_main+0x1ab/0x660 mm/slub.c:4615
alloc_from_pcs mm/slub.c:4717 [inline]
slab_alloc_node mm/slub.c:4851 [inline]
kmem_cache_alloc_node_noprof+0x4e9/0x6b0 mm/slub.c:4918
alloc_task_struct_node kernel/fork.c:185 [inline]
dup_task_struct kernel/fork.c:916 [inline]
copy_process+0x48b/0x7820 kernel/fork.c:2050
kernel_clone+0xfc/0x9a0 kernel/fork.c:2653
__do_sys_clone+0xd9/0x120 kernel/fork.c:2794
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x106/0x7b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 2861 tgid 2861 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
__free_pages_prepare mm/page_alloc.c:1433 [inline]
__free_frozen_pages+0x7b1/0xfb0 mm/page_alloc.c:2978
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x47/0xe0 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x1a0/0x1f0 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x4e/0x70 mm/kasan/common.c:350
kasan_slab_alloc include/linux/kasan.h:253 [inline]
slab_post_alloc_hook mm/slub.c:4538 [inline]
slab_alloc_node mm/slub.c:4866 [inline]
kmem_cache_alloc_noprof+0x2e7/0x6a0 mm/slub.c:4873
alloc_filename fs/namei.c:142 [inline]
do_getname+0x35/0x390 fs/namei.c:182
getname include/linux/fs.h:2512 [inline]
class_filename_constructor include/linux/fs.h:2539 [inline]
do_sys_openat2+0xc5/0x1e0 fs/open.c:1365
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x12d/0x210 fs/open.c:1383
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x106/0x7b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff88813d339c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88813d339d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88813d339d80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88813d339e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88813d339e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply
* Re: [PATCH v2 iwl-net] i40e: keep q_vectors array in sync with channel count changes
From: Simon Horman @ 2026-04-20 7:20 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: intel-wired-lan, netdev, magnus.karlsson, kuba, pabeni,
przemyslaw.kitszel, jacob.e.keller
In-Reply-To: <20260416114046.642171-1-maciej.fijalkowski@intel.com>
On Thu, Apr 16, 2026 at 01:40:46PM +0200, Maciej Fijalkowski wrote:
> For the main VSI, i40e_set_num_rings_in_vsi() always derives
> num_q_vectors from pf->num_lan_msix. At the same time, ethtool -L stores
> the user requested channel count in vsi->req_queue_pairs and the queue
> setup path uses that value for the effective number of queue pairs.
>
> This leaves queue and vector counts out of sync after shrinking channel
> count via ethtool -L. The active queue configuration is reduced, but the
> VSI still keeps the full PF-sized q_vector topology.
>
> That mismatch breaks reconfiguration flows which rely on vector/NAPI
> state matching the effective channel configuration. In particular,
> toggling /sys/class/net/<dev>/threaded after reducing the channel count
> can hang, and later channel-count changes can fail because VSI reinit
> does not rebuild q_vectors to match the new vector count.
>
> Fix this by making the main VSI num_q_vectors follow the effective
> requested channel count, capped by the available MSI-X vectors. Update
> i40e_vsi_reinit_setup() to rebuild q_vectors during VSI reinit so the
> vector topology is refreshed together with the ring arrays when channel
> count changes.
>
> Keep alloc_queue_pairs unchanged and based on pf->num_lan_qps so the VSI
> retains its full queue capacity.
>
> Selftest napi_threaded.py was originally used when Jakub reported hang
> on /sys/class/net/<dev>/threaded toggle. In order to make it pass on
> i40e, use persistent NAPI configuration for q_vector NAPIs so NAPI
> identity and threaded settings survive q_vector reallocation across
> channel-count changes. This is achieved by using netif_napi_add_config()
> when configuring q_vectors.
>
> $ export NETIF=ens259f1np1
> $ sudo -E env PATH="$PATH" ./tools/testing/selftests/drivers/net/napi_threaded.py
> TAP version 13
> 1..3
> ok 1 napi_threaded.napi_init
> ok 2 napi_threaded.change_num_queues
> ok 3 napi_threaded.enable_dev_threaded_disable_napi_threaded
> Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/intel-wired-lan/20260316133100.6054a11f@kernel.org/
> Fixes: d2a69fefd756 ("i40e: Fix changing previously set num_queue_pairs for PFs")
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> v2:
> - NULL vsi->tx_rings in i40e_vsi_alloc_arrays() (Sashiko)
Reviewed-by: Simon Horman <horms@kernel.org>
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 35 +++++++++++++++++----
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 926d001b2150..1d2a4181966f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -11403,10 +11403,14 @@ static void i40e_service_timer(struct timer_list *t)
> static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
> {
> struct i40e_pf *pf = vsi->back;
> + u16 qps;
>
> switch (vsi->type) {
> case I40E_VSI_MAIN:
> vsi->alloc_queue_pairs = pf->num_lan_qps;
> + qps = vsi->req_queue_pairs ?
> + min_t(u16, vsi->req_queue_pairs, pf->num_lan_qps) :
nit: It looks all the variables involved here u16.
So min() can be used instead of min_t().
> + pf->num_lan_qps;
> if (!vsi->num_tx_desc)
> vsi->num_tx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
> I40E_REQ_DESCRIPTOR_MULTIPLE);
> @@ -11414,7 +11418,8 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
> vsi->num_rx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
> I40E_REQ_DESCRIPTOR_MULTIPLE);
> if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags))
> - vsi->num_q_vectors = pf->num_lan_msix;
> + vsi->num_q_vectors = max_t(int, 1,
> + min_t(int, qps, pf->num_lan_msix));
nit: On the left side, all values seem to be either constants or u16.
So I think you can use clamp() here, and simply assign the resulting
value to num_q_vectors, which is an int.
> else
> vsi->num_q_vectors = 1;
>
...
> @@ -14265,12 +14272,27 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>
> pf = vsi->back;
>
> + if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags)) {
> + i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
> + vsi->base_vector = 0;
> + }
> +
> i40e_put_lump(pf->qp_pile, vsi->base_queue, vsi->idx);
> i40e_vsi_clear_rings(vsi);
>
> - i40e_vsi_free_arrays(vsi, false);
> + i40e_vsi_free_q_vectors(vsi);
> + i40e_vsi_free_arrays(vsi, true);
nit: with this patch applied the free_vectors argument (the 2nd parameter)
of i40e_vsi_free_arrays is always passed as true by callers.
So I think that, as a follow-up, it can be removed.
Similarly for i40e_vsi_alloc_arrays.
> i40e_set_num_rings_in_vsi(vsi);
> - ret = i40e_vsi_alloc_arrays(vsi, false);
> +
> + ret = i40e_vsi_alloc_arrays(vsi, true);
> + if (ret)
> + goto err_vsi;
> +
> + /* Rebuild q_vectors during VSI reinit because the effective channel
> + * count may change num_q_vectors. Keep vector topology aligned with the
> + * queue configuration after ethtool's .set_channels() callback.
> + */
> + ret = i40e_vsi_setup_vectors(vsi);
> if (ret)
> goto err_vsi;
>
...
^ permalink raw reply
* Re: [PATCH net 1/2] tcp: send a challenge ACK on SEG.ACK > SND.NXT
From: Eric Dumazet @ 2026-04-20 7:21 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
David Ahern, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-kernel, linux-kselftest
In-Reply-To: <20260420025428.101192-2-jiayuan.chen@linux.dev>
On Sun, Apr 19, 2026 at 7:55 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> RFC 5961 Section 5.2 validates an incoming segment's ACK value
> against the range [SND.UNA - MAX.SND.WND, SND.NXT] and states:
>
> "All incoming segments whose ACK value doesn't satisfy the above
> condition MUST be discarded and an ACK sent back."
>
> Commit 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack
> Mitigation") opted Linux into this mitigation and implements the
> challenge ACK on the lower side (SEG.ACK < SND.UNA - MAX.SND.WND),
> but the symmetric upper side (SEG.ACK > SND.NXT) still takes the
> pre-RFC-5961 path and silently returns
> SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, even though RFC 793 Section 3.9
> (now RFC 9293 Section 3.10.7.4) has always required:
>
> "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT)
> then send an ACK, drop the segment, and return."
>
> Complete the mitigation by sending a challenge ACK on that branch,
> reusing the existing tcp_send_challenge_ack() path which already
> enforces the per-socket RFC 5961 Section 7 rate limit via
> __tcp_oow_rate_limited(). FLAG_NO_CHALLENGE_ACK is honoured for
> symmetry with the lower-edge case.
>
> Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
>
> ---
> I'm not sure if 'blamed commit' is appropriate, because I think
> it's due to missing parts of the implementation, or it might be
> directly targeted to net-next.
The Fixes: tag seems appropriate, and net tree LGTM.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks!
^ permalink raw reply
* Re: [PATCH net 2/2] selftests/net: packetdrill: cover challenge ACK on SEG.ACK > SND.NXT
From: Eric Dumazet @ 2026-04-20 7:22 UTC (permalink / raw)
To: Jiayuan Chen
Cc: netdev, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
David Ahern, Jakub Kicinski, Paolo Abeni, Simon Horman,
Shuah Khan, linux-kernel, linux-kselftest
In-Reply-To: <20260420025428.101192-3-jiayuan.chen@linux.dev>
On Sun, Apr 19, 2026 at 7:55 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> Exercise the RFC 5961 Section 5.2 / RFC 793 Section 3.9 requirement
> on the upper edge of the acceptable ACK range, mirroring the existing
> coverage of the SEG.ACK < SND.UNA - MAX.SND.WND case.
>
> After the peer ACKs data the receiver has never sent, the receiver
> must respond with <SEQ = SND.NXT, ACK = RCV.NXT, CTL = ACK> and drop
> the offending segment. The script validates this exact response.
>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks!
^ permalink raw reply
* RE: [PATCH 7/9] wifi: rtw89: switch to using FIELD_GET_SIGNED()
From: Ping-Ke Shih @ 2026-04-20 7:49 UTC (permalink / raw)
To: Yury Norov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, H. Peter Anvin, Andy Lutomirski,
Peter Zijlstra, Jonathan Cameron, David Lechner, Nuno Sá,
Andy Shevchenko, Richard Cochran, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexandre Belloni,
Yury Norov, Rasmus Villemoes, Hans de Goede, Linus Walleij,
Sakari Ailus, Salah Triki, Achim Gratz, Ben Collins,
linux-kernel@vger.kernel.org, linux-iio@vger.kernel.org,
linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
linux-rtc@vger.kernel.org
In-Reply-To: <20260417173621.368914-8-ynorov@nvidia.com>
Yury Norov <ynorov@nvidia.com> wrote:
> --- a/drivers/net/wireless/realtek/rtw89/rtw8852b_common.c
> +++ b/drivers/net/wireless/realtek/rtw89/rtw8852b_common.c
> @@ -206,9 +206,9 @@ static void rtw8852bx_efuse_parsing_tssi(struct rtw89_dev *rtwdev,
> static bool _decode_efuse_gain(u8 data, s8 *high, s8 *low)
> {
> if (high)
> - *high = sign_extend32(FIELD_GET(GENMASK(7, 4), data), 3);
> + *high = FIELD_GET_SIGNED(GENMASK(7, 4), data);
> if (low)
> - *low = sign_extend32(FIELD_GET(GENMASK(3, 0), data), 3);
> + *low = FIELD_GET(GENMASK(3, 0), data);
FIELD_GET_SIGNED()?
>
> return data != 0xff;
> }
^ permalink raw reply
* [PATCH net v2 0/2] net: airoha: Fix NULL pointer derefrences in airoha_qdma_cleanup()
From: Lorenzo Bianconi @ 2026-04-20 8:07 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
Fix two possible NULL pointer derefrences in airoha_qdma_cleanup routine
if airoha_qdma_init() fails.
---
Changes in v2:
- Move page_pool allocation after desc list allocation in
airoha_qdma_init_rx_queue()
- Move netif_napi_add_tx() after irq desc queue allocation in
airoha_qdma_tx_irq_init()
- Link to v1: https://lore.kernel.org/r/20260417-airoha_qdma_init_rx_queue-fix-v1-0-db9fa5e468e5@kernel.org
---
Lorenzo Bianconi (2):
net: airoha: Move ndesc initialization at end of airoha_qdma_init_rx_queue()
net: airoha: Add size check for TX NAPIs in airoha_qdma_cleanup()
drivers/net/ethernet/airoha/airoha_eth.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
---
base-commit: 0cf004ffb61cd32d140531c3a84afe975f9fc7ea
change-id: 20260417-airoha_qdma_init_rx_queue-fix-b9bfada51671
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply
* [PATCH net v2 1/2] net: airoha: Move ndesc initialization at end of airoha_qdma_init_rx_queue()
From: Lorenzo Bianconi @ 2026-04-20 8:07 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260420-airoha_qdma_init_rx_queue-fix-v2-0-d99347e5c18d@kernel.org>
If queue entry or DMA descriptor list allocation fails in
airoha_qdma_init_rx_queue routine, airoha_qdma_cleanup() will trigger a
NULL pointer dereference running netif_napi_del() for RX queue NAPIs
since netif_napi_add() has never been executed to this particular RX NAPI.
The issue is due to the early ndesc initialization in
airoha_qdma_init_rx_queue() since airoha_qdma_cleanup() relies on ndesc
value to check if the queue is properly initialized. Fix the issue moving
ndesc initialization at end of airoha_qdma_init_tx routine.
Move page_pool allocation after descriptor list allocation in order to
avoid memory leaks if desc allocation fails.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_eth.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index e1ab15f1ee7d..fc79c456743c 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -745,14 +745,18 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
dma_addr_t dma_addr;
q->buf_size = PAGE_SIZE / 2;
- q->ndesc = ndesc;
q->qdma = qdma;
- q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry),
+ q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry),
GFP_KERNEL);
if (!q->entry)
return -ENOMEM;
+ q->desc = dmam_alloc_coherent(eth->dev, ndesc * sizeof(*q->desc),
+ &dma_addr, GFP_KERNEL);
+ if (!q->desc)
+ return -ENOMEM;
+
q->page_pool = page_pool_create(&pp_params);
if (IS_ERR(q->page_pool)) {
int err = PTR_ERR(q->page_pool);
@@ -761,11 +765,7 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
return err;
}
- q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc),
- &dma_addr, GFP_KERNEL);
- if (!q->desc)
- return -ENOMEM;
-
+ q->ndesc = ndesc;
netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll);
airoha_qdma_wr(qdma, REG_RX_RING_BASE(qid), dma_addr);
--
2.53.0
^ permalink raw reply related
* [PATCH net v2 2/2] net: airoha: Add size check for TX NAPIs in airoha_qdma_cleanup()
From: Lorenzo Bianconi @ 2026-04-20 8:07 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Lorenzo Bianconi
Cc: Simon Horman, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260420-airoha_qdma_init_rx_queue-fix-v2-0-d99347e5c18d@kernel.org>
If airoha_qdma_init routine fails before airoha_qdma_tx_irq_init() runs
successfully for all TX NAPIs, airoha_qdma_cleanup() will
unconditionally runs netif_napi_del() on TX NAPIs, triggering a NULL
pointer dereference. Fix the issue relying on q_tx_irq size value to
check if the TX NAPIs is properly initialized in airoha_qdma_cleanup().
Moreover, run netif_napi_add_tx() just if irq_q queue is properly
allocated.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_eth.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index fc79c456743c..fd8c4f817d85 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -996,8 +996,6 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q,
struct airoha_eth *eth = qdma->eth;
dma_addr_t dma_addr;
- netif_napi_add_tx(eth->napi_dev, &irq_q->napi,
- airoha_qdma_tx_napi_poll);
irq_q->q = dmam_alloc_coherent(eth->dev, size * sizeof(u32),
&dma_addr, GFP_KERNEL);
if (!irq_q->q)
@@ -1007,6 +1005,9 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q,
irq_q->size = size;
irq_q->qdma = qdma;
+ netif_napi_add_tx(eth->napi_dev, &irq_q->napi,
+ airoha_qdma_tx_napi_poll);
+
airoha_qdma_wr(qdma, REG_TX_IRQ_BASE(id), dma_addr);
airoha_qdma_rmw(qdma, REG_TX_IRQ_CFG(id), TX_IRQ_DEPTH_MASK,
FIELD_PREP(TX_IRQ_DEPTH_MASK, size));
@@ -1398,8 +1399,12 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma)
}
}
- for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++)
+ for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++) {
+ if (!qdma->q_tx_irq[i].size)
+ continue;
+
netif_napi_del(&qdma->q_tx_irq[i].napi);
+ }
for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) {
if (!qdma->q_tx[i].ndesc)
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v5] net: caif: fix stack out-of-bounds write in cfctrl_link_setup()
From: Kangzheng Gu @ 2026-04-20 8:09 UTC (permalink / raw)
To: Simon Horman
Cc: Paolo Abeni, davem, edumazet, kuba, kees, thorsten.blum, arnd,
sjur.brandeland, netdev, linux-kernel, stable
In-Reply-To: <20260414112951.GD469338@kernel.org>
Thanks for all of your advice, I am preparing a new version of patch now.
Simon Horman <horms@kernel.org> 于2026年4月14日周二 19:29写道:
>
> On Mon, Apr 13, 2026 at 11:30:53AM +0200, Paolo Abeni wrote:
> > On 4/12/26 3:57 PM, Simon Horman wrote:
> > > I am wondering if it would be best to follow the pattern for
> > > writing linkparam.u.utility.name elsewhere in this function.
> > > That:
> > > 1. Uses a somewhat more succinct loop control structure
> > > 2. Silently truncates input without updating cmdrsp if overrun would occur
> > >
> > > Something like this (compile tested only!):
> > >
> > > diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c
> > > index c6cc2bfed65d..ba184c11386e 100644
> > > --- a/net/caif/cfctrl.c
> > > +++ b/net/caif/cfctrl.c
> > > @@ -15,6 +15,7 @@
> > > #include <net/caif/cfctrl.h>
> > >
> > > #define container_obj(layr) container_of(layr, struct cfctrl, serv.layer)
> > > +#define RFM_VOLUME_LEN 20
> > > #define UTILITY_NAME_LENGTH 16
> > > #define CFPKT_CTRL_PKT_LEN 20
> > >
> > > @@ -414,10 +415,11 @@ static int cfctrl_link_setup(struct cfctrl *cfctrl, struct cfpkt *pkt, u8 cmdrsp
> > > */
> > > linkparam.u.rfm.connid = cfpkt_extr_head_u32(pkt);
> > > cp = (u8 *) linkparam.u.rfm.volume;
> > > - for (tmp = cfpkt_extr_head_u8(pkt);
> > > - cfpkt_more(pkt) && tmp != '\0';
> > > - tmp = cfpkt_extr_head_u8(pkt))
> > > + caif_assert(sizeof(linkparam.u.rfm.volume) >= RFM_VOLUME_LEN);
> > > + for(i = 0; i < RFM_VOLUME_LEN - 1 && cfpkt_more(pkt); i++) {
> > > + tmp = cfpkt_extr_head_u8(pkt);
> > > *cp++ = tmp;
> > > + }
> > > *cp = '\0';
> > >
> > > if (CFCTRL_ERR_BIT & cmdrsp)
> >
> > I agree that the code suggested by Simon is clearer. Note that AFAICS it
> > lacks an additional `tmp!= '\0'` check to break the loop, but even with
> > that added it should be preferable.
>
> Sorry, I left out the `tmp!= '\0' check.
> That was unintentional and I agree it should be there.
^ permalink raw reply
* [syzbot ci] Re: net: nsh: handle nested NSH headers during GSO
From: syzbot ci @ 2026-04-20 8:12 UTC (permalink / raw)
To: bird, caoruide123, davem, edumazet, horms, jbenc, kuba, lx24,
n05ec, netdev, pabeni, tomapufckgml, yifanwucs, yuantan098
Cc: syzbot, syzkaller-bugs
In-Reply-To: <6112cce99b4e3571444a616d0fb19e91e2fcca72.1776597598.git.caoruide123@gmail.com>
syzbot ci has tested the following series
[v1] net: nsh: handle nested NSH headers during GSO
https://lore.kernel.org/all/6112cce99b4e3571444a616d0fb19e91e2fcca72.1776597598.git.caoruide123@gmail.com
* [PATCH net 1/1] net: nsh: handle nested NSH headers during GSO
and found the following issue:
WARNING in nsh_gso_segment
Full report is available here:
https://ci.syzbot.org/series/13f77bac-014d-4059-9187-fbdbcb6a6540
***
WARNING in nsh_gso_segment
tree: net
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net.git
base: 0cf004ffb61cd32d140531c3a84afe975f9fc7ea
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/193058cd-cd88-4bb0-a6e6-b43b6179fcd9/config
syz repro: https://ci.syzbot.org/findings/446db72d-a630-471c-b369-2ef1f7b3b6ed/syz_repro
------------[ cut here ]------------
offset != (typeof(skb->mac_header))offset
WARNING: ./include/linux/skbuff.h:3173 at skb_reset_mac_header include/linux/skbuff.h:3173 [inline], CPU#1: syz.1.20/5978
WARNING: ./include/linux/skbuff.h:3173 at nsh_gso_segment+0x833/0x12d0 net/nsh/nsh.c:107, CPU#1: syz.1.20/5978
Modules linked in:
CPU: 1 UID: 0 PID: 5978 Comm: syz.1.20 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:skb_reset_mac_header include/linux/skbuff.h:3173 [inline]
RIP: 0010:nsh_gso_segment+0x833/0x12d0 net/nsh/nsh.c:107
Code: 2f 08 00 00 48 8b 7c 24 60 44 89 f6 e8 46 83 e4 fd 48 85 c0 0f 84 20 08 00 00 e8 48 af 31 f6 e9 ab fb ff ff e8 3e af 31 f6 90 <0f> 0b 90 e9 43 fd ff ff e8 30 af 31 f6 90 0f 0b 90 e9 39 fe ff ff
RSP: 0018:ffffc90004847220 EFLAGS: 00010293
RAX: ffffffff8b93aeb2 RBX: ffff88817362a6d8 RCX: ffff88810f088000
RDX: 0000000000000000 RSI: 0000000000010040 RDI: 0000000000010000
RBP: ffffc90004847390 R08: ffff88810f088000 R09: 0000000000000005
R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000010040 R14: dffffc0000000000 R15: 0000000000000074
FS: 00007f28e9ed36c0(0000) GS:ffff8882a9245000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000010000 CR3: 0000000170f06000 CR4: 00000000000006f0
Call Trace:
<TASK>
skb_mac_gso_segment+0x31c/0x690 net/core/gso.c:53
__skb_gso_segment+0x376/0x540 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0xa21/0x14a0 net/core/dev.c:4039
validate_xmit_skb_list+0x84/0x120 net/core/dev.c:4089
sch_direct_xmit+0xdf/0x4c0 net/sched/sch_generic.c:357
__dev_xmit_skb net/core/dev.c:4209 [inline]
__dev_queue_xmit+0x180f/0x3950 net/core/dev.c:4831
packet_snd net/packet/af_packet.c:3077 [inline]
packet_sendmsg+0x3ebc/0x50f0 net/packet/af_packet.c:3109
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
__sys_sendto+0x672/0x710 net/socket.c:2265
__do_sys_sendto net/socket.c:2272 [inline]
__se_sys_sendto net/socket.c:2268 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2268
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f28e8f9c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f28e9ed3028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f28e9216090 RCX: 00007f28e8f9c819
RDX: 00000000000100a6 RSI: 0000200000000180 RDI: 0000000000000005
RBP: 00007f28e9032c91 R08: 0000200000000140 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f28e9216128 R14: 00007f28e9216090 R15: 00007ffd50249d78
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply
* Re: [PATCH v5] net: caif: fix stack out-of-bounds write in cfctrl_link_setup()
From: Arnd Bergmann @ 2026-04-20 8:14 UTC (permalink / raw)
To: Kangzheng Gu, Simon Horman
Cc: Paolo Abeni, David S . Miller, Eric Dumazet, Jakub Kicinski,
Kees Cook, Thorsten Blum, sjur.brandeland, Netdev, linux-kernel,
stable
In-Reply-To: <CAKvcANPEa91paujTQjpW2hZhpXEhwfOjjy6CsN=OJ32iXYXdTA@mail.gmail.com>
On Mon, Apr 20, 2026, at 10:09, Kangzheng Gu wrote:
> Thanks for all of your advice, I am preparing a new version of patch now.
If you are actively using CAIF, please chime in at
https://lore.kernel.org/all/20260416182829.1440262-1-kuba@kernel.org/
If you are not actually using CAIF, maybe wait a little bit before
spending more time on it because the patches may no longer
apply if it gets removed due to lack of users.
Arnd
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
From: Kwapulinski, Piotr @ 2026-04-20 8:22 UTC (permalink / raw)
To: Li Li, Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
Brian Vazquez, Tantilov, Emil S
In-Reply-To: <20260419192555.3631327-1-boolli@google.com>
>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Li Li via Intel-wired-lan
>Sent: Sunday, April 19, 2026 9:26 PM
>To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
>Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David Decotigny <decot@google.com>; Singhai, Anjali <anjali.singhai@intel.com>; Samudrala, Sridhar <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
>Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
>
>Even though commit 2e281e1155fc ("idpf: detach and close netdevs while handling a reset") prevents ethtool -N/-n operations to operate on detached netdevs, we found that out-of-tree workflows like OpenOnload can bypass ethtool core locks and call idpf_set_rxnfc directly during an idpf HW reset. When this happens, we could get kernel crashes like the following:
>
>[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [ 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
>[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf] ...
>[ 4045.946323] Call Trace:
>[ 4045.948796] <TASK>
>[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142] ? __die_body.cold+0x8/0x12 [ 4045.968000] ? page_fault_oops+0x148/0x160 [ 4045.972117] ? exc_page_fault+0x6f/0x160 [ 4045.976060] ? asm_exc_page_fault+0x22/0x30 [ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277] process_one_work+0x174/0x330 [ 4046.037304] worker_thread+0x246/0x390 [ 4046.041074] ? __pfx_worker_thread+0x10/0x10 [ 4046.045364] kthread+0xf6/0x240 [ 4046.048530] ? __pfx_kthread+0x10/0x10 [ 4046.052297] ret_from_fork+0x2d/0x50 [ 4046.055896] ? __pfx_kthread+0x10/0x10 [ 4046.059664] ret_from_fork_asm+0x1a/0x30 [ 4046.063613] </TASK>
>
>To prevent this, we need to add checks in idpf_set_rxnfc and idpf_get_rxnfc to error out if the netdev is already detached.
>
>Tested: implemented the following patch to synthetically force idpf into a HW reset:
>
>diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
>index 4fc0bb14c5b1..27476d57bcf0 100644
>--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
>+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
>@@ -10,6 +10,9 @@
> #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> LIBETH_SQE_CHECK_PRIV(u32);
>
>+static bool SIMULATE_TX_TIMEOUT;
>+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
>+
> /**
> * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
> * @skb: send buffer
>@@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
>
> adapter->tx_timeout_count++;
>
>+ SIMULATE_TX_TIMEOUT = false;
>+
> netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> adapter->tx_timeout_count, txqueue);
> if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
> goto fetch_next_desc;
> }
> tx_q = complq->txq_grp->txqs[rel_tx_qid];
>+ if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
>+ goto fetch_next_desc;
>
> /* Determine completion type */
> ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
>diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>index be66f9b2e101..ba5da2a86c15 100644
>--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>@@ -8,6 +8,9 @@
> #include "idpf_virtchnl.h"
> #include "idpf_ptp.h"
>
>+static bool VIRTCHNL_FAILED;
>+module_param(VIRTCHNL_FAILED, bool, 0644);
>+
> /**
> * struct idpf_vc_xn_manager - Manager for tracking transactions
> * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
> switch (adapter->state) {
> case __IDPF_VER_CHECK:
> err = idpf_send_ver_msg(adapter);
>+
>+ if (unlikely(VIRTCHNL_FAILED)) {
>+ err = -EIO;
>+ }
Please remove redundant parenthesis
Piotr
>+
> switch (err) {
> case 0:
> /* success, move state machine forward */
>
>And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
>and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
>idpf_get_rxnfc() right after the HW reset.
>
>Without the patch: encountered NULL pointer and kernel crash.
>
>With the patch: no crashes.
>
>Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
>Signed-off-by: Li Li <boolli@google.com>
>---
> drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
>index bb99d9e7c65d..8368a7e6a754 100644
>--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
>+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
>@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> unsigned int cnt = 0;
> int err = 0;
>
>+ if (!netdev || !netif_device_present(netdev))
>+ return -ENODEV;
>+
> idpf_vport_ctrl_lock(netdev);
> vport = idpf_netdev_to_vport(netdev);
> vport_config = np->adapter->vport_config[np->vport_idx];
>@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd) {
> int ret = -EOPNOTSUPP;
>
>+ if (!netdev || !netif_device_present(netdev))
>+ return -ENODEV;
>+
> idpf_vport_ctrl_lock(netdev);
> switch (cmd->cmd) {
> case ETHTOOL_SRXCLSRLINS:
>--
>2.54.0.rc1.513.gad8abe7a5a-goog
^ permalink raw reply
* Re: [PATCH net 1/1] mptcp: hold subflow request owners when cloning reqsk
From: Matthieu Baerts @ 2026-04-20 8:26 UTC (permalink / raw)
To: Yuan Tan, Kuniyuki Iwashima
Cc: Ren Wei, netdev, mptcp, davem, edumazet, kuba, pabeni, horms,
ncardwell, dsahern, martineau, geliang, daniel, kafai, yifanwucs,
tomapufckgml, bird, caoruide123, enjou1224z
In-Reply-To: <05a19f8d-360e-41d1-bc8a-0b4caab3d354@gmail.com>
Hi Yuan Tan,
On 19/04/2026 11:51, Yuan Tan wrote:
>
> On 4/16/2026 11:48 AM, Kuniyuki Iwashima wrote:
>> On Thu, Apr 16, 2026 at 10:45 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>>> Hi Ren,
>>>
>>> On 15/04/2026 11:31, Ren Wei wrote:
>>>> From: Ruide Cao <caoruide123@gmail.com>
>>>>
>>>> TCP request migration clones pending request sockets with
>>>> inet_reqsk_clone(). For MPTCP MP_JOIN requests this raw-copies
>>>> subflow_req->msk, but the cloned request does not take a new reference.
>>>>
>>>> Both the original and the cloned request can later drop the same msk in
>>>> subflow_req_destructor(), and a migrated request may keep a dangling msk
>>>> pointer after the original owner has already been released.
>>>>
>>>> Add a request_sock clone callback and let MPTCP grab a reference for cloned
>>>> subflow requests that carry an msk. This keeps ownership balanced across
>>>> both successful migrations and failed clone/insert paths without changing
>>>> other protocols.
(...)
>>>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>>>> index e961936b6be7..140a9e96ad58 100644
>>>> --- a/net/ipv4/inet_connection_sock.c
>>>> +++ b/net/ipv4/inet_connection_sock.c
>>>> @@ -954,6 +954,9 @@ static struct request_sock *inet_reqsk_clone(struct request_sock *req,
>>>> if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(nreq)->tfo_listener)
>>>> rcu_assign_pointer(tcp_sk(nreq->sk)->fastopen_rsk, nreq);
>>> (Maybe TCP with fastopen could be this other user to call
>>> rcu_assign_pointer()? (net-next material))
>>>
>>>> + if (req->rsk_ops->init_clone)
>>>> + req->rsk_ops->init_clone(req, nreq);
>> I think a simple direct call is better.
>>
>> #ifdef CONFIG_MPTCP
>> if (tcp_rsk(req)->is_mptcp)
>> mptcp_reqsk_clone(nreq);
>> #endif
>>
> Thank you very much for your suggestion. We will use this approach in
> the next version of the patch. Would you like us to add your
> Suggested-by tag?
No need to add a Suggested-by tag: this tag is used when the whole patch
idea has been suggested by someone. That's not the case here: we only
proposed a small modification in the code, without changing the idea.
https://docs.kernel.org/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply
* [PATCH net v2 0/8] xsk: fix bugs around xsk skb allocation
From: Jason Xing @ 2026-04-20 8:27 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
There are rare issues around xsk_build_skb(). Some of them
were founded by Sashiko[1][2].
[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/
[2]: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/
---
v2
Link: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/#t
1. add four patches spotted by sashiko to fix buggy pre-existing
behavior
2. adjust the order of 8 patches.
Jason Xing (8):
xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
xsk: handle NULL dereference of the skb without frags issue
xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
xsk: avoid skb leak in XDP_TX_METADATA case
xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
xsk: fix xsk_addrs slab leak on multi-buffer error path
xsk: fix u64 descriptor address truncation on 32-bit architectures
net/xdp/xsk.c | 88 +++++++++++++++++++++++++++++++++--------
net/xdp/xsk_buff_pool.c | 3 ++
2 files changed, 75 insertions(+), 16 deletions(-)
--
2.41.3
^ permalink raw reply
* [PATCH net v2 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
From: Jason Xing @ 2026-04-20 8:27 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb->data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.
AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.
The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().
Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
(invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
the same descriptor and re-hits the same WARN until the socket
is closed.
Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk_buff_pool.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 37b7a68b89b3..c2521b6547e3 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -169,6 +169,9 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
if (force_zc && force_copy)
return -EINVAL;
+ if (pool->tx_sw_csum && (netdev->priv_flags & IFF_TX_SKB_NO_LINEAR))
+ return -EOPNOTSUPP;
+
if (xsk_get_pool_from_qid(netdev, queue_id))
return -EBUSY;
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 2/8] xsk: handle NULL dereference of the skb without frags issue
From: Jason Xing @ 2026-04-20 8:27 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy (e.g., MAX_SKB_FRAGS exceeded), the free_err
EOVERFLOW handler unconditionally dereferences xs->skb via
xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL
pointer dereference.
In this series, the skb is already freed by kfree_skb() inside
xsk_build_skb_zerocopy for the first-descriptor case, so we only need
to do the bookkeeping: cancel the one reserved CQ slot and account for
the single invalid descriptor.
Guard the existing xsk_inc_num_desc/xsk_drop_skb calls with an
xs->skb check (for the continuation case), and add an else branch
for the first-descriptor case that manually cancels the CQ slot and
increments invalid_descs by one.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6149f6a79897..6521604f8d42 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -893,9 +893,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
kfree_skb(skb);
if (err == -EOVERFLOW) {
- /* Drop the packet */
- xsk_inc_num_desc(xs->skb);
- xsk_drop_skb(xs->skb);
+ if (xs->skb) {
+ /* Drop the packet */
+ xsk_inc_num_desc(xs->skb);
+ xsk_drop_skb(xs->skb);
+ } else {
+ xsk_cq_cancel_locked(xs->pool, 1);
+ xs->tx->invalid_descs++;
+ }
xskq_cons_release(xs->tx);
} else {
/* Let application retry */
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
From: Jason Xing @ 2026-04-20 8:28 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.
If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:
if (skb && !skb_shinfo(skb)->nr_frags)
kfree_skb(skb);
evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.
Fix by adding a !xs->skb check to the condition, ensuring we only free
skbs that were freshly allocated in this call (xs->skb is NULL) and
never free an in-progress multi-buffer skb that the caller still
references.
Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6521604f8d42..4fdd1a45a9bd 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -889,7 +889,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
return skb;
free_err:
- if (skb && !skb_shinfo(skb)->nr_frags)
+ if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
kfree_skb(skb);
if (err == -EOVERFLOW) {
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
From: Jason Xing @ 2026-04-20 8:28 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.
Introduce a new common helper to fix the issue. That function will be
used by the subsequent patches soon.
Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 4fdd1a45a9bd..614e7bd1252b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -717,6 +717,12 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
return 0;
}
+static void xsk_drop_untrans_skb(struct sk_buff *skb)
+{
+ skb->destructor = sock_wfree;
+ kfree_skb(skb);
+}
+
static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
struct xdp_desc *desc)
{
@@ -890,7 +896,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
free_err:
if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
- kfree_skb(skb);
+ xsk_drop_untrans_skb(skb);
if (err == -EOVERFLOW) {
if (xs->skb) {
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 5/8] xsk: avoid skb leak in XDP_TX_METADATA case
From: Jason Xing @ 2026-04-20 8:28 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.
How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.
Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 614e7bd1252b..51f76e9d6ffd 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -749,8 +749,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
xsk_skb_init_misc(skb, xs, desc->addr);
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
- if (unlikely(err))
+ if (unlikely(err)) {
+ xsk_drop_untrans_skb(skb);
return ERR_PTR(err);
+ }
}
} else {
struct xsk_addrs *xsk_addr;
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
From: Jason Xing @ 2026-04-20 8:28 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.
How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
is why bug can be triggered.
4. there is no chance to free this skb anymore.
Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 51f76e9d6ffd..9236ec32b54a 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -784,8 +784,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
addr = buffer - pool->addrs;
for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
- if (unlikely(i >= MAX_SKB_FRAGS))
+ if (unlikely(i >= MAX_SKB_FRAGS)) {
+ if (!xs->skb)
+ xsk_drop_untrans_skb(skb);
return ERR_PTR(-EOVERFLOW);
+ }
page = pool->umem->pgs[addr >> PAGE_SHIFT];
get_page(page);
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 7/8] xsk: fix xsk_addrs slab leak on multi-buffer error path
From: Jason Xing @ 2026-04-20 8:28 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to >= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).
If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs->skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1
If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs > 1:
if (unlikely(num_descs > 1))
kmem_cache_free(xsk_tx_generic_cache, destructor_arg);
Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.
Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.
Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 9236ec32b54a..6b17974ca825 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -605,7 +605,7 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
spin_lock_irqsave(&pool->cq_prod_lock, flags);
idx = xskq_get_prod(pool->cq);
- if (unlikely(num_descs > 1)) {
+ if (unlikely(!xsk_skb_destructor_is_addr(skb))) {
xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
for (i = 0; i < num_descs; i++) {
@@ -660,7 +660,7 @@ static void xsk_consume_skb(struct sk_buff *skb)
u32 num_descs = xsk_get_num_desc(skb);
struct xsk_addrs *xsk_addr;
- if (unlikely(num_descs > 1)) {
+ if (unlikely(!xsk_skb_destructor_is_addr(skb))) {
xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
kmem_cache_free(xsk_tx_generic_cache, xsk_addr);
}
--
2.41.3
^ permalink raw reply related
* [PATCH net v2 8/8] xsk: fix u64 descriptor address truncation on 32-bit architectures
From: Jason Xing @ 2026-04-20 8:28 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
In-Reply-To: <20260420082805.14844-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
uintptr_t cast:
skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.
Fix this by handling the 32-bit case directly in
xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an xsk_addrs
struct (the same path already used for multi-descriptor SKBs) to store
the full u64 address.
Extend xsk_drop_untrans_skb() to free the xsk_addrs allocation on 32-bit
when the skb is dropped before transmission. Note that here we don't use
0x1UL method to judge in this case.
Also extend xsk_skb_destructor_is_addr() to cover 32-bit case like above.
The overhead is one extra kmem_cache_zalloc per first descriptor on
32-bit only; 64-bit builds are completely unchanged.
Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 54 ++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 47 insertions(+), 7 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6b17974ca825..bd49dbd9875b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -556,9 +556,23 @@ static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
return ret;
}
+/*
+ * On 64-bit, destructor_arg can store an inline address directly
+ * (tagged with bit 0 set). On 32-bit, all addresses go through an
+ * allocated xsk_addrs struct instead. In that case this function
+ * returns true only when destructor_arg is NULL (set_addr has not
+ * yet been called or has failed).
+ *
+ * For all callers:
+ * return true: no xsk_addrs struct to handle
+ * return false: destructor_arg points to an xsk_addrs struct
+ */
static bool xsk_skb_destructor_is_addr(struct sk_buff *skb)
{
- return (uintptr_t)skb_shinfo(skb)->destructor_arg & 0x1UL;
+ if (IS_ENABLED(CONFIG_64BIT))
+ return (uintptr_t)skb_shinfo(skb)->destructor_arg & 0x1UL;
+ else
+ return !skb_shinfo(skb)->destructor_arg;
}
static u64 xsk_skb_destructor_get_addr(struct sk_buff *skb)
@@ -566,9 +580,21 @@ static u64 xsk_skb_destructor_get_addr(struct sk_buff *skb)
return (u64)((uintptr_t)skb_shinfo(skb)->destructor_arg & ~0x1UL);
}
-static void xsk_skb_destructor_set_addr(struct sk_buff *skb, u64 addr)
+static int xsk_skb_destructor_set_addr(struct sk_buff *skb, u64 addr)
{
+ if (!IS_ENABLED(CONFIG_64BIT)) {
+ struct xsk_addrs *xsk_addr;
+
+ xsk_addr = kmem_cache_zalloc(xsk_tx_generic_cache, GFP_KERNEL);
+ if (!xsk_addr)
+ return -ENOMEM;
+ xsk_addr->addrs[0] = addr;
+ skb_shinfo(skb)->destructor_arg = (void *)xsk_addr;
+ return 0;
+ }
+
skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
+ return 0;
}
static void xsk_inc_num_desc(struct sk_buff *skb)
@@ -644,14 +670,14 @@ void xsk_destruct_skb(struct sk_buff *skb)
sock_wfree(skb);
}
-static void xsk_skb_init_misc(struct sk_buff *skb, struct xdp_sock *xs,
- u64 addr)
+static int xsk_skb_init_misc(struct sk_buff *skb, struct xdp_sock *xs,
+ u64 addr)
{
skb->dev = xs->dev;
skb->priority = READ_ONCE(xs->sk.sk_priority);
skb->mark = READ_ONCE(xs->sk.sk_mark);
skb->destructor = xsk_destruct_skb;
- xsk_skb_destructor_set_addr(skb, addr);
+ return xsk_skb_destructor_set_addr(skb, addr);
}
static void xsk_consume_skb(struct sk_buff *skb)
@@ -719,6 +745,12 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
static void xsk_drop_untrans_skb(struct sk_buff *skb)
{
+ if (!IS_ENABLED(CONFIG_64BIT) && !xsk_skb_destructor_is_addr(skb)) {
+ struct xsk_addrs *xsk_addr;
+
+ xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
+ kmem_cache_free(xsk_tx_generic_cache, xsk_addr);
+ }
skb->destructor = sock_wfree;
kfree_skb(skb);
}
@@ -746,7 +778,12 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
skb_reserve(skb, hr);
- xsk_skb_init_misc(skb, xs, desc->addr);
+ err = xsk_skb_init_misc(skb, xs, desc->addr);
+ if (unlikely(err)) {
+ xsk_drop_untrans_skb(skb);
+ return ERR_PTR(err);
+ }
+
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
if (unlikely(err)) {
@@ -845,7 +882,10 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
if (unlikely(err))
goto free_err;
- xsk_skb_init_misc(skb, xs, desc->addr);
+ err = xsk_skb_init_misc(skb, xs, desc->addr);
+ if (unlikely(err))
+ goto free_err;
+
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc,
xs->pool, hr);
--
2.41.3
^ permalink raw reply related
* Re: [PATCH net-deletions] caif: remove CAIF NETWORK LAYER
From: Linus Walleij @ 2026-04-20 8:30 UTC (permalink / raw)
To: Jakub Kicinski, phone-devel
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, corbet,
skhan, alexs, si.yanteng, dzm91, linux, mst, jasowang, xuanzhuo,
eperezma, xu.xin16, wang.yaxin, jiang.kun2, jihed.chaibi.dev,
arnd, tytso, jiayuan.chen, gregkh
In-Reply-To: <20260416182829.1440262-1-kuba@kernel.org>
On Thu, Apr 16, 2026 at 8:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Remove CAIF (Communication CPU to Application CPU Interface), the
> ST-Ericsson modem protocol. The subsystem has been orphaned since 2013.
> The last meaningful changes from the maintainers were in March 2013:
> a8c7687bf216 ("caif_virtio: Check that vringh_config is not null")
> b2273be8d2df ("caif_virtio: Use vringh_notify_enable correctly")
> 0d2e1a2926b1 ("caif_virtio: Introduce caif over virtio")
>
> Not-so-coincidentally, according to "the Internet" ST-Ericsson officially
> shut down its modem joint venture in Aug 2013.
Reviewed-by: Linus Walleij <linusw@kernel.org>
This specific code was used out-of-tree for modems M5730,
M5740 etc used for minor Android phone brands such as Sharp,
Motorola, Philips.
However it is also used in the Samsung Galaxy S 4G, so
let's page phone-devel@vger.kernel.org so the possible
audience knows. PostmarketOS is not actively supporting
it.
I remember that I advised Sjur to use virtio for this project and it lives
on through the generic contributions to virtio that resulted.
Yours,
Linus Walleij
^ permalink raw reply
* Re: [PATCH 1/9] bitfield: add FIELD_GET_SIGNED()
From: Johannes Berg @ 2026-04-20 8:43 UTC (permalink / raw)
To: Yury Norov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
Ping-Ke Shih, Richard Cochran, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexandre Belloni,
Yury Norov, Rasmus Villemoes, Hans de Goede, Linus Walleij,
Sakari Ailus, Salah Triki, Achim Gratz, Ben Collins, linux-kernel,
linux-iio, linux-wireless, netdev, linux-rtc
In-Reply-To: <20260417173621.368914-2-ynorov@nvidia.com>
On Fri, 2026-04-17 at 13:36 -0400, Yury Norov wrote:
> The bitfields are designed in assumption that fields contain unsigned
> integer values, thus extracting the values from the field implies
> zero-extending.
>
> Some drivers need to sign-extend their fields, and currently do it like:
>
> dc_re += sign_extend32(FIELD_GET(0xfff000, tmp), 11);
> dc_im += sign_extend32(FIELD_GET(0xfff, tmp), 11);
That's indeed pretty awful...
> +#define FIELD_GET_SIGNED(mask, reg) \
>
[...]
I (personally) tend to prefer the "__MAKE_OP" versions (*_get_bits()
etc.), in particular because WiFi and firmware interfaces deal a lot
with fixed endian fields.
Any chance it'd be simple to generate u32_get_bits_signed() etc.? Could
be especially useful for le32_get_bits_signed() for example, to have the
endian conversion built-in unlike FIELD_GET_SIGNED().
johannes
^ permalink raw reply
* Re: [RFC PATCH net] mptcp: pm: fix ADD_ADDR timer infinite retry on option space insufficient
From: Matthieu Baerts @ 2026-04-20 9:20 UTC (permalink / raw)
To: Li Xiasong
Cc: netdev, mptcp, linux-kernel, yuehaibing, zhangchangzhong,
weiyongjun1, Mat Martineau, Geliang Tang, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
In-Reply-To: <20260418100018.2219500-1-lixiasong1@huawei.com>
Hi Li,
On 18/04/2026 12:00, Li Xiasong wrote:
> When TCP option space is insufficient (e.g., IPv6 with tcp_timestamps
> enabled), the original code jumped to out_unlock without clearing the
> addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling
> indefinitely without sending ADD_ADDR,
Funny, I was looking at this issue on Friday evening :)
> preventing the endpoint list from being traversed.
It might help to add a bit of context: I guess here you meant that it
prevent advertising other ADD_ADDR, not using other subflows when
sending data, right?
> In a pure ACK scenario (indicated by drop_other_suboptions=true), if
> the option space is insufficient to carry the ADD_ADDR suboption, it
> is appropriate to drop this address signal to allow the timer handler
> to move on to other addresses.
>
> Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
> Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
> ---
>
> Seeking feedback on:
>
> When announcing addresses to the peer, MPTCP sends a pure ACK packet
> to carry MPTCP options (ADD_ADDR). In this scenario, if the option space
> is insufficient for ADD_ADDR, clearing addr_signal would:
>
> - Prevent the timer from retrying infinitely
> - Allow the timer to continue traversing and processing other addresses
> - Not block other subflow creation or address announcement operations
>
> Is there any scenario where we should retry later instead of clearing
> the address signal/echo flag? However, if a pure ACK doesn't have
> enough space for the flag, subsequent packets won't either.
That's correct: for the moment, if it is a pure ACK and there is not
enough space, no need to retry later because it is not possible to have
more space. It should only happen with an ADD_ADDR containing an IPv6
address and a port number. It might be good to specify this in the
commit message.
> ---
> net/mptcp/pm.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index 57a456690406..1d49779c6a1f 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -881,19 +881,18 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
> }
>
> *echo = mptcp_pm_should_add_signal_echo(msk);
> + add_addr = msk->pm.addr_signal &
> + ~(*echo ? BIT(MPTCP_ADD_ADDR_ECHO) : BIT(MPTCP_ADD_ADDR_SIGNAL));
> port = !!(*echo ? msk->pm.remote.port : msk->pm.local.port);
> -
> family = *echo ? msk->pm.remote.family : msk->pm.local.family;
nit: while at it, maybe clearer to have a dedicated 'if (*echo)' instead
of 3 lines with '*echo ? ... : ..., no?
if (*echo) {
add_addr = ...
port = ...
family = ...
} else {
add_addr = ...
port = ...
family = ...
}
> - if (remaining < mptcp_add_addr_len(family, *echo, port))
> - goto out_unlock;
>
> - if (*echo) {
> - *addr = msk->pm.remote;
> - add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_ECHO);
> - } else {
> - *addr = msk->pm.local;
> - add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_SIGNAL);
> + if (remaining < mptcp_add_addr_len(family, *echo, port)) {
> + if (*drop_other_suboptions)
> + WRITE_ONCE(msk->pm.addr_signal, add_addr);
If it is dropped, it would be helpful to increment the ADDADDRTXDROP MIB
counter, and ideally check that in the MPTCP selftests (e.g. adding a
new subtest in mptcp_join.sh, in add_addr_ports_tests()?).
Also, I wonder if it would not be clearer to jump to a new label here...
> + goto out_unlock;
> }
> +
> + *addr = *echo ? msk->pm.remote : msk->pm.local;
> WRITE_ONCE(msk->pm.addr_signal, add_addr);
> ret = true;
... inverting the two lines above, and adding "drop_signal_mark" label?
Apart from the comments above, I think your patch is doing the right thing.
Also, one last request: do you mind sending the v2 only to the mptcp ML,
please? I have a bunch of related fixes [1] plus this one is not urgent.
In fact, except for (urgent) fixes, it might be better to send MPTCP
patches only the to MPTCP ML: to a restricted number of people for the
first versions, there is enough traffic on Netdev.
[1]
https://lore.kernel.org/20260415-mptcp-inc-limits-v5-0-e54c3bf80e4e@kernel.org
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox