All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Lai, Yi" <yi1.lai@linux.intel.com>
To: Felix Moessbauer <felix.moessbauer@siemens.com>
Cc: axboe@kernel.dk, asml.silence@gmail.com,
	linux-kernel@vger.kernel.org, io-uring@vger.kernel.org,
	cgroups@vger.kernel.org, dqminh@cloudflare.com,
	longman@redhat.com, adriaan.schmidt@siemens.com,
	florian.bezdeka@siemens.com, stable@vger.kernel.org,
	syzkaller-bugs@googlegroups.com, pengfei.xu@intel.com,
	yi1.lai@intel.com
Subject: Re: [PATCH 1/1] io_uring/sqpoll: do not allow pinning outside of cpuset
Date: Wed, 18 Sep 2024 11:56:31 +0800	[thread overview]
Message-ID: <ZupPb3OH3tnM2ARj@ly-workstation> (raw)
In-Reply-To: <20240909150036.55921-1-felix.moessbauer@siemens.com>

Hi Felix Moessbauer,

Greetings!

I used Syzkaller and found that there is KASAN: use-after-free Read in io_sq_offload_create in Linux-next tree - next-20240916.

After bisection and the first bad commit is:
"
f011c9cf04c0 io_uring/sqpoll: do not allow pinning outside of cpuset
"

All detailed into can be found at:
https://github.com/laifryiee/syzkaller_logs/tree/main/240917_135250_io_sq_offload_create
Syzkaller repro code:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.c
Syzkaller repro syscall steps:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.prog
Syzkaller report:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/repro.report
Kconfig(make olddefconfig):
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/kconfig_origin
Bisect info:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/bisect_info.log
bzImage:
https://github.com/laifryiee/syzkaller_logs/raw/main/240917_135250_io_sq_offload_create/bzImage_7083504315d64199a329de322fce989e1e10f4f7
Issue dmesg:
https://github.com/laifryiee/syzkaller_logs/blob/main/240917_135250_io_sq_offload_create/7083504315d64199a329de322fce989e1e10f4f7_dmesg.log

"
[   23.564898] ==================================================================
[   23.565444] BUG: KASAN: use-after-free in io_sq_offload_create+0xcaa/0x11d0
[   23.565971] Read of size 8 at addr ffff888036377898 by task repro/729
[   23.566459] 
[   23.566593] CPU: 0 UID: 0 PID: 729 Comm: repro Not tainted 6.11.0-next-20240916-7083504315d6 #1
[   23.567271] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   23.568066] Call Trace:
[   23.568252]  <TASK>
[   23.568417]  dump_stack_lvl+0xea/0x150
[   23.568718]  print_report+0xce/0x610
[   23.569001]  ? io_sq_offload_create+0xcaa/0x11d0
[   23.569340]  ? kasan_addr_to_slab+0x11/0xb0
[   23.569651]  ? io_sq_offload_create+0xcaa/0x11d0
[   23.569992]  kasan_report+0xcc/0x110
[   23.570277]  ? io_sq_offload_create+0xcaa/0x11d0
[   23.570621]  kasan_check_range+0x3e/0x1c0
[   23.570917]  __kasan_check_read+0x15/0x20
[   23.571212]  io_sq_offload_create+0xcaa/0x11d0
[   23.571540]  ? __pfx_io_sq_offload_create+0x10/0x10
[   23.571893]  ? __pfx___lock_acquire+0x10/0x10
[   23.572228]  ? __this_cpu_preempt_check+0x21/0x30
[   23.572580]  ? lock_acquire.part.0+0x152/0x390
[   23.572910]  ? __this_cpu_preempt_check+0x21/0x30
[   23.573254]  ? lock_release+0x441/0x870
[   23.573541]  ? __pfx_lock_release+0x10/0x10
[   23.573846]  ? trace_lock_acquire+0x139/0x1b0
[   23.574180]  ? debug_smp_processor_id+0x20/0x30
[   23.574524]  ? rcu_is_watching+0x19/0xc0
[   23.574826]  ? __alloc_pages_noprof+0x517/0x710
[   23.575171]  ? __pfx___alloc_pages_noprof+0x10/0x10
[   23.575526]  ? mod_objcg_state+0x42c/0x9c0
[   23.575838]  ? lockdep_hardirqs_on+0x89/0x110
[   23.576159]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[   23.576534]  ? policy_nodemask+0xf9/0x450
[   23.576835]  ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
[   23.577220]  ? alloc_pages_mpol_noprof+0x35d/0x580
[   23.577575]  ? __pfx_alloc_pages_mpol_noprof+0x10/0x10
[   23.577950]  ? __kmalloc_node_noprof+0x3a3/0x4e0
[   23.578302]  ? __kvmalloc_node_noprof+0x7f/0x240
[   23.578645]  ? alloc_pages_noprof+0xa9/0x180
[   23.578963]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[   23.579347]  ? io_pages_map+0x244/0x5c0
[   23.579631]  io_uring_setup+0x18df/0x3950
[   23.579936]  ? __pfx_io_uring_setup+0x10/0x10
[   23.580263]  ? __audit_syscall_entry+0x39c/0x500
[   23.580602]  __x64_sys_io_uring_setup+0xa4/0x160
[   23.580939]  x64_sys_call+0x17f5/0x20d0
[   23.581224]  do_syscall_64+0x6d/0x140
[   23.581498]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   23.581872] RIP: 0033:0x7efd9fa3ee5d
[   23.582140] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[   23.583404] RSP: 002b:00007ffdd4400858 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9
[   23.583938] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efd9fa3ee5d
[   23.584430] RDX: 00007efd9fb3f247 RSI: 0000000020000080 RDI: 0000000000005230
[   23.584927] RBP: 00007ffdd4400860 R08: 00007ffdd44002d0 R09: 00007ffdd4400890
[   23.585419] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffdd44009b8
[   23.585914] R13: 0000000000401730 R14: 0000000000403e08 R15: 00007efd9fcb5000
[   23.586424]  </TASK>
[   23.586589] 
[   23.586709] The buggy address belongs to the physical page:
[   23.587094] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36377
[   23.587644] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
[   23.588103] raw: 000fffffc0000000 ffffea0000d8ddc8 ffffea0000d8ddc8 0000000000000000
[   23.588636] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   23.589167] page dumped because: kasan: bad access detected
[   23.589551] 
[   23.589670] Memory state around the buggy address:
[   23.590007]  ffff888036377780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.590514]  ffff888036377800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.591011] >ffff888036377880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.591508]                             ^
[   23.591794]  ffff888036377900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.592292]  ffff888036377980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   23.592789] ==================================================================
[   23.593344] Disabling lock debugging due to kernel taint
"

I hope you find it useful.

Regards,
Yi Lai

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.

Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install 

On Mon, Sep 09, 2024 at 05:00:36PM +0200, Felix Moessbauer wrote:
> The submit queue polling threads are userland threads that just never
> exit to the userland. When creating the thread with IORING_SETUP_SQ_AFF,
> the affinity of the poller thread is set to the cpu specified in
> sq_thread_cpu. However, this CPU can be outside of the cpuset defined
> by the cgroup cpuset controller. This violates the rules defined by the
> cpuset controller and is a potential issue for realtime applications.
> 
> In b7ed6d8ffd6 we fixed the default affinity of the poller thread, in
> case no explicit pinning is required by inheriting the one of the
> creating task. In case of explicit pinning, the check is more
> complicated, as also a cpu outside of the parent cpumask is allowed.
> We implemented this by using cpuset_cpus_allowed (that has support for
> cgroup cpusets) and testing if the requested cpu is in the set.
> 
> Fixes: 37d1e2e3642e ("io_uring: move SQPOLL thread io-wq forked worker")
> Cc: stable@vger.kernel.org # 6.1+
> Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
> ---
> Hi,
> 
> that's hopefully the last fix of cpu pinnings of the sq poller threads.
> However, there is more to come on the io-wq side. E.g the syscalls for
> IORING_REGISTER_IOWQ_AFF that can be used to change the affinites are
> not yet protected. I'm currently just lacking good reproducers for that.
> I also have to admit that I don't feel too comfortable making changes to
> the wq part, given that I don't have good tests.
> 
> While fixing this, I'm wondering if it makes sense to add tests for the
> combination of pinning and cpuset. If yes, where should these tests be
> added?
> 
> Best regards,
> Felix Moessbauer
> Siemens AG
> 
>  io_uring/sqpoll.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index 713be7c29388..b8ec8fec99b8 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -10,6 +10,7 @@
>  #include <linux/slab.h>
>  #include <linux/audit.h>
>  #include <linux/security.h>
> +#include <linux/cpuset.h>
>  #include <linux/io_uring.h>
>  
>  #include <uapi/linux/io_uring.h>
> @@ -459,10 +460,12 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
>  			return 0;
>  
>  		if (p->flags & IORING_SETUP_SQ_AFF) {
> +			struct cpumask allowed_mask;
>  			int cpu = p->sq_thread_cpu;
>  
>  			ret = -EINVAL;
> -			if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> +			cpuset_cpus_allowed(current, &allowed_mask);
> +			if (!cpumask_test_cpu(cpu, &allowed_mask))
>  				goto err_sqpoll;
>  			sqd->sq_cpu = cpu;
>  		} else {
> -- 
> 2.39.2
> 

  parent reply	other threads:[~2024-09-18  3:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-09 15:00 [PATCH 1/1] io_uring/sqpoll: do not allow pinning outside of cpuset Felix Moessbauer
2024-09-09 15:08 ` Jens Axboe
2024-09-09 15:09 ` Jens Axboe
2024-09-18  3:56 ` Lai, Yi [this message]
2024-09-18  6:16   ` Jens Axboe
2024-09-18  6:21     ` Lai, Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZupPb3OH3tnM2ARj@ly-workstation \
    --to=yi1.lai@linux.intel.com \
    --cc=adriaan.schmidt@siemens.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=dqminh@cloudflare.com \
    --cc=felix.moessbauer@siemens.com \
    --cc=florian.bezdeka@siemens.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=pengfei.xu@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=yi1.lai@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.