public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Li Nan <linan666@huaweicloud.com>
Cc: Changhui Zhong <czhong@redhat.com>,
	axboe@kernel.dk, ZiyangZhang@linux.alibaba.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	yukuai3@huawei.com, yi.zhang@huawei.com, houtao1@huawei.com,
	yangerkun@huawei.com, ming.lei@redhat.com
Subject: Re: [PATCH] ublk_drv: fix NULL pointer dereference in ublk_ctrl_start_recovery()
Date: Sat, 8 Jun 2024 22:16:10 +0800	[thread overview]
Message-ID: <ZmRnqmiNGW5Y+452@fedora> (raw)
In-Reply-To: <dd486a95-b7e8-5760-9ff1-7e9c1cfc9873@huaweicloud.com>

On Sat, Jun 08, 2024 at 02:34:47PM +0800, Li Nan wrote:
> 
> 
> 在 2024/6/6 17:52, Ming Lei 写道:
> > On Thu, Jun 06, 2024 at 04:05:33PM +0800, Li Nan wrote:
> > > 
> > > 
> > > 在 2024/6/6 12:48, Changhui Zhong 写道:
> > > 
> > > [...]
> > > 
> > > > > 
> > > > > Hi Changhui,
> > > > > 
> > > > > The hang is actually expected because recovery fails.
> > > > > 
> > > > > Please pull the latest ublksrv and check if the issue can still be
> > > > > reproduced:
> > > > > 
> > > > > https://github.com/ublk-org/ublksrv
> > > > > 
> > > > > BTW, one ublksrv segfault and two test cleanup issues are fixed.
> > > > > 
> > > > > Thanks,
> > > > > Ming
> > > > > 
> > > > 
> > > > Hi,Ming and Nan
> > > > 
> > > > after applying the new patch and pulling the latest ublksrv,
> > > > I ran the test for 4 hours and did not observe any task hang.
> > > > the test results looks good!
> > > > 
> > > > Thanks,
> > > > Changhui
> > > > 
> > > > 
> > > > .
> > > 
> > > Thanks for you test!
> > > 
> > > However, I got a NULL pointer dereference bug with ublksrv. It is not
> > 
> > BTW, your patch isn't related with generic/004 which won't touch
> > recovery code path.
> > 
> > > introduced by this patch. It seems io was issued after deleting disk. And
> > > it can be reproduced by:
> > > 
> > >    while true; do make test T=generic/004; done
> > 
> > We didn't see that when running such test with linus tree, and usually
> > Changhui run generic test for hours.
> > 
> > > 
> > > [ 1524.286485] running generic/004
> > > [ 1529.110875] blk_print_req_error: 109 callbacks suppressed
> > ...
> > > [ 1541.171010] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > [ 1541.171734] #PF: supervisor write access in kernel mode
> > > [ 1541.172271] #PF: error_code(0x0002) - not-present page
> > > [ 1541.172798] PGD 0 P4D 0
> > > [ 1541.173065] Oops: Oops: 0002 [#1] PREEMPT SMP
> > > [ 1541.173515] CPU: 0 PID: 43707 Comm: ublk Not tainted
> > > 6.9.0-next-20240523-00004-g9bc7e95c7323 #454
> > > [ 1541.174417] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > 1.16.1-2.fc37 04/01/2014
> > > [ 1541.175311] RIP: 0010:io_fallback_tw+0x252/0x300
> > 
> > This one looks one io_uring issue.
> > 
> > Care to provide which line of source code points to by 'io_fallback_tw+0x252'?
> > 
> > gdb> l *(io_fallback_tw+0x252)
> > 
> (gdb) list * io_fallback_tw+0x252
> 0xffffffff81d79dc2 is in io_fallback_tw
> (./arch/x86/include/asm/atomic64_64.h:25).
> 20              __WRITE_ONCE(v->counter, i);
> 21      }
> 22
> 23      static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v)
> 24      {
> 25              asm volatile(LOCK_PREFIX "addq %1,%0"
> 26                           : "=m" (v->counter)
> 27                           : "er" (i), "m" (v->counter) : "memory");
> 28      }
> 
> The corresponding code is:
> io_fallback_tw
>   percpu_ref_get(&last_ctx->refs);

[ 1541.171010] BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 1541.175311] RIP: 0010:io_fallback_tw+0x252/0x300

So looks the 'struct io_ring_ctx' instance is freed and ctx->refs.data
becomes NULL when calling percpu_ref_get(&last_ctx->refs), not clear how
it can happen since request grabs one ctx reference.

Maybe you can try kasan and see if more useful info can be dumped.

> 
> I have the vmcore of this issue. If you have any other needs, please let me
> know.

Not try remote vmcore debug yet, will think further if any useful hint
is needed.

> The space of the root path has been filled up by
> ublksrv(tests/tmpublk_loop_data_xxx), which may the issue be related to this?

It isn't supposed to be related, since the backing loop image is created by dd
and it isn't sparse image, and the test runs fio over ublk block
device only.

Thanks,
Ming


  reply	other threads:[~2024-06-08 14:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29  9:53 [PATCH] ublk_drv: fix NULL pointer dereference in ublk_ctrl_start_recovery() linan666
2024-06-03  0:39 ` Ming Lei
2024-06-03  2:19   ` Li Nan
2024-06-03  2:43     ` Ming Lei
2024-06-04  1:32     ` Changhui Zhong
2024-06-04 13:14       ` Li Nan
2024-06-05  1:40       ` Li Nan
2024-06-05  7:20         ` Changhui Zhong
2024-06-05  9:47           ` Ming Lei
2024-06-06  4:48             ` Changhui Zhong
2024-06-06  8:05               ` Li Nan
2024-06-06  9:52                 ` Ming Lei
2024-06-08  6:34                   ` Li Nan
2024-06-08 14:16                     ` Ming Lei [this message]
2024-06-06 13:43                 ` Changhui Zhong
2024-06-08  6:44                   ` Li Nan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmRnqmiNGW5Y+452@fedora \
    --to=ming.lei@redhat.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=czhong@redhat.com \
    --cc=houtao1@huawei.com \
    --cc=linan666@huaweicloud.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox