* [PATCH V1] block: Fix null pointer dereference issue on struct io_cq
@ 2023-05-17 8:44 Pradeep P V K
2023-05-17 8:55 ` Chaitanya Kulkarni
2023-05-17 8:58 ` Yu Kuai
0 siblings, 2 replies; 12+ messages in thread
From: Pradeep P V K @ 2023-05-17 8:44 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-kernel, Pradeep P V K
There is a potential race between ioc_clear_fn() and
exit_io_context() as shown below, due to which below
crash is observed. It can also result into use-after-free
issue.
context#1: context#2:
ioc_release_fn() do_exit();
->spin_lock(&ioc->lock); ->exit_io_context();
->ioc_destroy_icq(icq); ->ioc_exit_icqs();
->list_del_init(&icq->q_node); ->spin_lock_irq(&ioc->lock);
->call_rcu(&icq->__rcu_head,
icq_free_icq_rcu);
->spin_unlock(&ioc->lock);
->ioc_exit_icq(); gets the same icq
->bfq_exit_icq();
This results into below crash as bic
is NULL as it is derived from icq.
There is a chance that icq could be
free'd as well.
[33.245722][ T8666] Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000018.
...
Call trace:
[33.325782][ T8666] bfq_exit_icq+0x28/0xa8
[33.325785][ T8666] exit_io_context+0xcc/0x100
[33.325786][ T8666] do_exit+0x764/0xa58
[33.325791][ T8666] do_group_exit+0x0/0xa0
[33.325793][ T8666] invoke_syscall+0x48/0x114
[33.325802][ T8666] el0_svc_common+0xcc/0x118
[33.325805][ T8666] do_el0_svc+0x34/0xd0
[33.325807][ T8666] el0_svc+0x38/0xd0
[33.325812][ T8666] el0t_64_sync_handler+0x8c/0xfc
[33.325813][ T8666] el0t_64_sync+0x1a0/0x1a4
Fix this by checking with ICQ_DESTROYED flags in ioc_exit_icqs().
Also, ensure ioc_exit_icq() is accessing icq within rcu_read_lock/unlock
so that icq doesn't get free'd up while it is still using it.
Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com>
---
block/blk-ioc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 63fc02042408..1aa34fd46ac8 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -60,10 +60,14 @@ static void ioc_exit_icqs(struct io_context *ioc)
{
struct io_cq *icq;
+ rcu_read_lock();
spin_lock_irq(&ioc->lock);
- hlist_for_each_entry(icq, &ioc->icq_list, ioc_node)
- ioc_exit_icq(icq);
+ hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
+ if (!(icq->flags & ICQ_DESTROYED))
+ ioc_exit_icq(icq);
+ }
spin_unlock_irq(&ioc->lock);
+ rcu_read_unlock();
}
/*
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 8:44 [PATCH V1] block: Fix null pointer dereference issue on struct io_cq Pradeep P V K @ 2023-05-17 8:55 ` Chaitanya Kulkarni 2023-05-17 8:58 ` Yu Kuai 1 sibling, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2023-05-17 8:55 UTC (permalink / raw) To: Pradeep P V K Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, axboe@kernel.dk On 5/17/23 01:44, Pradeep P V K wrote: > There is a potential race between ioc_clear_fn() and > exit_io_context() as shown below, due to which below > crash is observed. It can also result into use-after-free > issue. > > context#1: context#2: > ioc_release_fn() do_exit(); > ->spin_lock(&ioc->lock); ->exit_io_context(); > ->ioc_destroy_icq(icq); ->ioc_exit_icqs(); > ->list_del_init(&icq->q_node); ->spin_lock_irq(&ioc->lock); > ->call_rcu(&icq->__rcu_head, > icq_free_icq_rcu); > ->spin_unlock(&ioc->lock); > ->ioc_exit_icq(); gets the same icq > ->bfq_exit_icq(); > This results into below crash as bic > is NULL as it is derived from icq. > There is a chance that icq could be > free'd as well. > > [33.245722][ T8666] Unable to handle kernel NULL pointer dereference > at virtual address 0000000000000018. > ... > Call trace: > [33.325782][ T8666] bfq_exit_icq+0x28/0xa8 > [33.325785][ T8666] exit_io_context+0xcc/0x100 > [33.325786][ T8666] do_exit+0x764/0xa58 > [33.325791][ T8666] do_group_exit+0x0/0xa0 > [33.325793][ T8666] invoke_syscall+0x48/0x114 > [33.325802][ T8666] el0_svc_common+0xcc/0x118 > [33.325805][ T8666] do_el0_svc+0x34/0xd0 > [33.325807][ T8666] el0_svc+0x38/0xd0 > [33.325812][ T8666] el0t_64_sync_handler+0x8c/0xfc > [33.325813][ T8666] el0t_64_sync+0x1a0/0x1a4 > > Fix this by checking with ICQ_DESTROYED flags in ioc_exit_icqs(). > Also, ensure ioc_exit_icq() is accessing icq within rcu_read_lock/unlock > so that icq doesn't get free'd up while it is still using it. > > Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com> > --- Is it possible to add a null_blk based blktests for this ? -ck ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 8:44 [PATCH V1] block: Fix null pointer dereference issue on struct io_cq Pradeep P V K 2023-05-17 8:55 ` Chaitanya Kulkarni @ 2023-05-17 8:58 ` Yu Kuai 2023-05-17 9:20 ` Damien Le Moal 2023-05-18 12:16 ` Yu Kuai 1 sibling, 2 replies; 12+ messages in thread From: Yu Kuai @ 2023-05-17 8:58 UTC (permalink / raw) To: Pradeep P V K, axboe; +Cc: linux-block, linux-kernel, yukuai (C) Hi, 在 2023/05/17 16:44, Pradeep P V K 写道: > There is a potential race between ioc_clear_fn() and > exit_io_context() as shown below, due to which below > crash is observed. It can also result into use-after-free > issue. > > context#1: context#2: > ioc_release_fn() do_exit(); > ->spin_lock(&ioc->lock); ->exit_io_context(); > ->ioc_destroy_icq(icq); ->ioc_exit_icqs(); > ->list_del_init(&icq->q_node); ->spin_lock_irq(&ioc->lock); > ->call_rcu(&icq->__rcu_head, > icq_free_icq_rcu); > ->spin_unlock(&ioc->lock); > ->ioc_exit_icq(); gets the same icq I don't understand how is this possible, the list is protected by 'ioc->lock', both hlist_del_init and hlist_for_each_entry are called inside the lock. Thanks, Kuai > ->bfq_exit_icq(); > This results into below crash as bic > is NULL as it is derived from icq. > There is a chance that icq could be > free'd as well. > > [33.245722][ T8666] Unable to handle kernel NULL pointer dereference > at virtual address 0000000000000018. > ... > Call trace: > [33.325782][ T8666] bfq_exit_icq+0x28/0xa8 > [33.325785][ T8666] exit_io_context+0xcc/0x100 > [33.325786][ T8666] do_exit+0x764/0xa58 > [33.325791][ T8666] do_group_exit+0x0/0xa0 > [33.325793][ T8666] invoke_syscall+0x48/0x114 > [33.325802][ T8666] el0_svc_common+0xcc/0x118 > [33.325805][ T8666] do_el0_svc+0x34/0xd0 > [33.325807][ T8666] el0_svc+0x38/0xd0 > [33.325812][ T8666] el0t_64_sync_handler+0x8c/0xfc > [33.325813][ T8666] el0t_64_sync+0x1a0/0x1a4 > > Fix this by checking with ICQ_DESTROYED flags in ioc_exit_icqs(). > Also, ensure ioc_exit_icq() is accessing icq within rcu_read_lock/unlock > so that icq doesn't get free'd up while it is still using it. > > Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com> > --- > block/blk-ioc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/block/blk-ioc.c b/block/blk-ioc.c > index 63fc02042408..1aa34fd46ac8 100644 > --- a/block/blk-ioc.c > +++ b/block/blk-ioc.c > @@ -60,10 +60,14 @@ static void ioc_exit_icqs(struct io_context *ioc) > { > struct io_cq *icq; > > + rcu_read_lock(); > spin_lock_irq(&ioc->lock); > - hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) > - ioc_exit_icq(icq); > + hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) { > + if (!(icq->flags & ICQ_DESTROYED)) > + ioc_exit_icq(icq); > + } > spin_unlock_irq(&ioc->lock); > + rcu_read_unlock(); > } > > /* > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 8:58 ` Yu Kuai @ 2023-05-17 9:20 ` Damien Le Moal 2023-05-17 9:21 ` Christoph Hellwig 2023-05-18 12:16 ` Yu Kuai 1 sibling, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2023-05-17 9:20 UTC (permalink / raw) To: Yu Kuai, Pradeep P V K, axboe; +Cc: linux-block, linux-kernel, yukuai (C) On 5/17/23 17:58, Yu Kuai wrote: > Hi, > > 在 2023/05/17 16:44, Pradeep P V K 写道: >> There is a potential race between ioc_clear_fn() and >> exit_io_context() as shown below, due to which below >> crash is observed. It can also result into use-after-free >> issue. >> >> context#1: context#2: >> ioc_release_fn() do_exit(); >> ->spin_lock(&ioc->lock); ->exit_io_context(); >> ->ioc_destroy_icq(icq); ->ioc_exit_icqs(); >> ->list_del_init(&icq->q_node); ->spin_lock_irq(&ioc->lock); >> ->call_rcu(&icq->__rcu_head, >> icq_free_icq_rcu); >> ->spin_unlock(&ioc->lock); >> ->ioc_exit_icq(); gets the same icq > I don't understand how is this possible, the list is protected by > 'ioc->lock', both hlist_del_init and hlist_for_each_entry are called > inside the lock. Given that ioc_destroy_icq() calls ioc_exit_icq(), ioc_exit_icqs() should ignore all icqs that have been destroyed already, otherwise, ioc_exit_icq() gets called twice for the same icq. The missing rcu lock in ioc_exit_icqs() already was in itself a bug, and the missing flag check is another. > > Thanks, > Kuai >> ->bfq_exit_icq(); >> This results into below crash as bic >> is NULL as it is derived from icq. >> There is a chance that icq could be >> free'd as well. >> >> [33.245722][ T8666] Unable to handle kernel NULL pointer dereference >> at virtual address 0000000000000018. >> ... >> Call trace: >> [33.325782][ T8666] bfq_exit_icq+0x28/0xa8 >> [33.325785][ T8666] exit_io_context+0xcc/0x100 >> [33.325786][ T8666] do_exit+0x764/0xa58 >> [33.325791][ T8666] do_group_exit+0x0/0xa0 >> [33.325793][ T8666] invoke_syscall+0x48/0x114 >> [33.325802][ T8666] el0_svc_common+0xcc/0x118 >> [33.325805][ T8666] do_el0_svc+0x34/0xd0 >> [33.325807][ T8666] el0_svc+0x38/0xd0 >> [33.325812][ T8666] el0t_64_sync_handler+0x8c/0xfc >> [33.325813][ T8666] el0t_64_sync+0x1a0/0x1a4 >> >> Fix this by checking with ICQ_DESTROYED flags in ioc_exit_icqs(). >> Also, ensure ioc_exit_icq() is accessing icq within rcu_read_lock/unlock >> so that icq doesn't get free'd up while it is still using it. >> >> Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com> Pradeep, this needs a Fixes tag and cc-stable I think. >> --- >> block/blk-ioc.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-ioc.c b/block/blk-ioc.c >> index 63fc02042408..1aa34fd46ac8 100644 >> --- a/block/blk-ioc.c >> +++ b/block/blk-ioc.c >> @@ -60,10 +60,14 @@ static void ioc_exit_icqs(struct io_context *ioc) >> { >> struct io_cq *icq; >> >> + rcu_read_lock(); >> spin_lock_irq(&ioc->lock); >> - hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) >> - ioc_exit_icq(icq); >> + hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) { >> + if (!(icq->flags & ICQ_DESTROYED)) >> + ioc_exit_icq(icq); >> + } >> spin_unlock_irq(&ioc->lock); >> + rcu_read_unlock(); >> } >> >> /* >> > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 9:20 ` Damien Le Moal @ 2023-05-17 9:21 ` Christoph Hellwig 2023-05-17 9:31 ` Damien Le Moal 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2023-05-17 9:21 UTC (permalink / raw) To: Damien Le Moal Cc: Yu Kuai, Pradeep P V K, axboe, linux-block, linux-kernel, yukuai (C) On Wed, May 17, 2023 at 06:20:19PM +0900, Damien Le Moal wrote: > twice for the same icq. The missing rcu lock in ioc_exit_icqs() already was in > itself a bug, and the missing flag check is another. spinlocks imply a rcu critical section, no need to duplicate it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 9:21 ` Christoph Hellwig @ 2023-05-17 9:31 ` Damien Le Moal 2023-05-18 9:25 ` Pradeep Pragallapati (QUIC) 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2023-05-17 9:31 UTC (permalink / raw) To: Christoph Hellwig Cc: Yu Kuai, Pradeep P V K, axboe, linux-block, linux-kernel, yukuai (C) On 5/17/23 18:21, Christoph Hellwig wrote: > On Wed, May 17, 2023 at 06:20:19PM +0900, Damien Le Moal wrote: >> twice for the same icq. The missing rcu lock in ioc_exit_icqs() already was in >> itself a bug, and the missing flag check is another. > > spinlocks imply a rcu critical section, no need to duplicate it. Right. And I misread the code. As Yu said, given that ioc_exit_icqs() iterates the list of icqs under ioc->lock and the ioc is removed from that list under the same lock, ioc_exit_icqs() should never see an icq that went through ioc_destroy_icq()... Very weird. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 9:31 ` Damien Le Moal @ 2023-05-18 9:25 ` Pradeep Pragallapati (QUIC) 0 siblings, 0 replies; 12+ messages in thread From: Pradeep Pragallapati (QUIC) @ 2023-05-18 9:25 UTC (permalink / raw) To: Damien Le Moal Cc: Yu Kuai, axboe@kernel.dk, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yukuai (C), Christoph Hellwig Hi, -----Original Message----- From: Damien Le Moal <dlemoal@kernel.org> Sent: Wednesday, May 17, 2023 3:02 PM To: Christoph Hellwig <hch@infradead.org> Cc: Yu Kuai <yukuai1@huaweicloud.com>; Pradeep Pragallapati (QUIC) <quic_pragalla@quicinc.com>; axboe@kernel.dk; linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; yukuai (C) <yukuai3@huawei.com> Subject: Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq On 5/17/23 18:21, Christoph Hellwig wrote: > On Wed, May 17, 2023 at 06:20:19PM +0900, Damien Le Moal wrote: >> twice for the same icq. The missing rcu lock in ioc_exit_icqs() >> already was in itself a bug, and the missing flag check is another. > > spinlocks imply a rcu critical section, no need to duplicate it. Right. And I misread the code. As Yu said, given that ioc_exit_icqs() iterates the list of icqs under ioc->lock and the ioc is removed from that list under the same lock, ioc_exit_icqs() should never see an icq that went through ioc_destroy_icq()... Very weird. This weird can be possible 1. updating icq_hint which is annotated as __rcu type without RCU-protected context in ioc_destroy_icq(). Moreover, this was taken care in else part of ioc_release_fn() by rcu_read_lock/unlock() but missed in if statement which can lead to this weird. 2. extracting icq from hlist/list elements are done using rcu locks protected in ioc_clear_queue() but same was not at ioc_exit_icqs(). So, far we have seen 10+ instances of this crash on 6.1 kernel during stability testing (Involves IO, reboots, device suspend/resume, and few more). With the V1 patch, we didn't observe the issue for at least 48hrs+ of stability testing. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-17 8:58 ` Yu Kuai 2023-05-17 9:20 ` Damien Le Moal @ 2023-05-18 12:16 ` Yu Kuai 2023-05-18 12:44 ` Yu Kuai 1 sibling, 1 reply; 12+ messages in thread From: Yu Kuai @ 2023-05-18 12:16 UTC (permalink / raw) To: Yu Kuai, Pradeep P V K, axboe; +Cc: linux-block, linux-kernel, yukuai (C) Hi, 在 2023/05/17 16:58, Yu Kuai 写道: > Hi, > > 在 2023/05/17 16:44, Pradeep P V K 写道: >> There is a potential race between ioc_clear_fn() and >> exit_io_context() as shown below, due to which below >> crash is observed. It can also result into use-after-free >> issue. >> >> context#1: context#2: >> ioc_release_fn() do_exit(); >> ->spin_lock(&ioc->lock); ->exit_io_context(); >> ->ioc_destroy_icq(icq); ->ioc_exit_icqs(); >> ->list_del_init(&icq->q_node); ->spin_lock_irq(&ioc->lock); >> ->call_rcu(&icq->__rcu_head, >> icq_free_icq_rcu); >> ->spin_unlock(&ioc->lock); I think above concurrent scenario is not possible as well. exit_io_context() doesn't release ioc refcount before ioc_exit_icqs() is done, so that ioc_release_fn() can never concurrent with ioc_exit_icqs(). do_exit exit_io_context ioc_exit_icqs put_io_context -> ioc_release_fn won't be called before this >> ->ioc_exit_icq(); gets the same >> icq > I don't understand how is this possible, the list is protected by > 'ioc->lock', both hlist_del_init and hlist_for_each_entry are called > inside the lock. > > Thanks, > Kuai >> ->bfq_exit_icq(); >> This results into below crash as bic >> is NULL as it is derived from icq. >> There is a chance that icq could be >> free'd as well. >> >> [33.245722][ T8666] Unable to handle kernel NULL pointer dereference >> at virtual address 0000000000000018. >> ... >> Call trace: >> [33.325782][ T8666] bfq_exit_icq+0x28/0xa8 >> [33.325785][ T8666] exit_io_context+0xcc/0x100 >> [33.325786][ T8666] do_exit+0x764/0xa58 >> [33.325791][ T8666] do_group_exit+0x0/0xa0 >> [33.325793][ T8666] invoke_syscall+0x48/0x114 >> [33.325802][ T8666] el0_svc_common+0xcc/0x118 >> [33.325805][ T8666] do_el0_svc+0x34/0xd0 >> [33.325807][ T8666] el0_svc+0x38/0xd0 >> [33.325812][ T8666] el0t_64_sync_handler+0x8c/0xfc >> [33.325813][ T8666] el0t_64_sync+0x1a0/0x1a4 >> >> Fix this by checking with ICQ_DESTROYED flags in ioc_exit_icqs(). >> Also, ensure ioc_exit_icq() is accessing icq within rcu_read_lock/unlock >> so that icq doesn't get free'd up while it is still using it. >> >> Signed-off-by: Pradeep P V K <quic_pragalla@quicinc.com> >> --- >> block/blk-ioc.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-ioc.c b/block/blk-ioc.c >> index 63fc02042408..1aa34fd46ac8 100644 >> --- a/block/blk-ioc.c >> +++ b/block/blk-ioc.c >> @@ -60,10 +60,14 @@ static void ioc_exit_icqs(struct io_context *ioc) >> { >> struct io_cq *icq; >> + rcu_read_lock(); >> spin_lock_irq(&ioc->lock); >> - hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) >> - ioc_exit_icq(icq); >> + hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) { >> + if (!(icq->flags & ICQ_DESTROYED)) By the way, above change doesn't make sense to me as well. ioc_exit_icq() is called before setting ICQ_DESTROYED, hence if ICQ_DESTROYED is set, then ICQ_EXITED is set as well, in this case ioc_exit_icq() won't do anything. >> + ioc_exit_icq(icq); >> + } >> spin_unlock_irq(&ioc->lock); >> + rcu_read_unlock(); >> } >> /* I think I do found a problem, but I'm not sure it's the same in your case, can you try the following patch? diff --git a/block/blk-ioc.c b/block/blk-ioc.c index 63fc02042408..37a56f2bb040 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -78,6 +78,9 @@ static void ioc_destroy_icq(struct io_cq *icq) lockdep_assert_held(&ioc->lock); + if (icq->flags & ICQ_DESTROYED) + return; + radix_tree_delete(&ioc->icq_tree, icq->q->id); hlist_del_init(&icq->ioc_node); list_del_init(&icq->q_node); @@ -128,12 +131,7 @@ static void ioc_release_fn(struct work_struct *work) spin_lock(&q->queue_lock); spin_lock(&ioc->lock); - /* - * The icq may have been destroyed when the ioc lock - * was released. - */ - if (!(icq->flags & ICQ_DESTROYED)) - ioc_destroy_icq(icq); + ioc_destroy_icq(icq); spin_unlock(&q->queue_lock); rcu_read_unlock(); @@ -173,18 +171,17 @@ void ioc_clear_queue(struct request_queue *q) { LIST_HEAD(icq_list); + rcu_read_lock(); spin_lock_irq(&q->queue_lock); list_splice_init(&q->icq_list, &icq_list); spin_unlock_irq(&q->queue_lock); - rcu_read_lock(); while (!list_empty(&icq_list)) { struct io_cq *icq = list_entry(icq_list.next, struct io_cq, q_node); spin_lock_irq(&icq->ioc->lock); - if (!(icq->flags & ICQ_DESTROYED)) - ioc_destroy_icq(icq); + ioc_destroy_icq(icq); spin_unlock_irq(&icq->ioc->lock); } rcu_read_unlock(); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-18 12:16 ` Yu Kuai @ 2023-05-18 12:44 ` Yu Kuai 2023-05-22 6:19 ` Pradeep Pragallapati 0 siblings, 1 reply; 12+ messages in thread From: Yu Kuai @ 2023-05-18 12:44 UTC (permalink / raw) To: Yu Kuai, Pradeep P V K, axboe; +Cc: linux-block, linux-kernel, yukuai (C) Hi, 在 2023/05/18 20:16, Yu Kuai 写道: > @@ -173,18 +171,17 @@ void ioc_clear_queue(struct request_queue *q) > { > LIST_HEAD(icq_list); > > + rcu_read_lock(); Sorry that I realized this is still not enough, following list_empty() and list_entry() can still concurrent with list_del(). Please try the following patch: > spin_lock_irq(&q->queue_lock); > list_splice_init(&q->icq_list, &icq_list); > spin_unlock_irq(&q->queue_lock); > > - rcu_read_lock(); > while (!list_empty(&icq_list)) { > struct io_cq *icq = > list_entry(icq_list.next, struct io_cq, q_node); > > spin_lock_irq(&icq->ioc->lock); > - if (!(icq->flags & ICQ_DESTROYED)) > - ioc_destroy_icq(icq); > + ioc_destroy_icq(icq); > spin_unlock_irq(&icq->ioc->lock); > } > rcu_read_unlock(); > > . > diff --git a/block/blk-ioc.c b/block/blk-ioc.c index 63fc02042408..47684d1e9006 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -78,6 +78,9 @@ static void ioc_destroy_icq(struct io_cq *icq) lockdep_assert_held(&ioc->lock); + if (icq->flags & ICQ_DESTROYED) + return; + radix_tree_delete(&ioc->icq_tree, icq->q->id); hlist_del_init(&icq->ioc_node); list_del_init(&icq->q_node); @@ -128,12 +131,7 @@ static void ioc_release_fn(struct work_struct *work) spin_lock(&q->queue_lock); spin_lock(&ioc->lock); - /* - * The icq may have been destroyed when the ioc lock - * was released. - */ - if (!(icq->flags & ICQ_DESTROYED)) - ioc_destroy_icq(icq); + ioc_destroy_icq(icq); spin_unlock(&q->queue_lock); rcu_read_unlock(); @@ -175,19 +173,16 @@ void ioc_clear_queue(struct request_queue *q) spin_lock_irq(&q->queue_lock); list_splice_init(&q->icq_list, &icq_list); - spin_unlock_irq(&q->queue_lock); - rcu_read_lock(); while (!list_empty(&icq_list)) { struct io_cq *icq = list_entry(icq_list.next, struct io_cq, q_node); spin_lock_irq(&icq->ioc->lock); - if (!(icq->flags & ICQ_DESTROYED)) - ioc_destroy_icq(icq); + ioc_destroy_icq(icq); spin_unlock_irq(&icq->ioc->lock); } - rcu_read_unlock(); + spin_unlock_irq(&q->queue_lock); } #else /* CONFIG_BLK_ICQ */ static inline void ioc_exit_icqs(struct io_context *ioc) ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-18 12:44 ` Yu Kuai @ 2023-05-22 6:19 ` Pradeep Pragallapati 2023-05-30 13:15 ` Pradeep Pragallapati 0 siblings, 1 reply; 12+ messages in thread From: Pradeep Pragallapati @ 2023-05-22 6:19 UTC (permalink / raw) To: Yu Kuai, axboe; +Cc: linux-block, linux-kernel, yukuai (C) On 5/18/2023 6:14 PM, Yu Kuai wrote: > Hi, > > 在 2023/05/18 20:16, Yu Kuai 写道: > >> @@ -173,18 +171,17 @@ void ioc_clear_queue(struct request_queue *q) >> { >> LIST_HEAD(icq_list); >> >> + rcu_read_lock(); > > Sorry that I realized this is still not enough, following list_empty() > and list_entry() can still concurrent with list_del(). Please try the > following patch: sure will try and update the results. >> spin_lock_irq(&q->queue_lock); >> list_splice_init(&q->icq_list, &icq_list); >> spin_unlock_irq(&q->queue_lock); >> >> - rcu_read_lock(); >> while (!list_empty(&icq_list)) { >> struct io_cq *icq = >> list_entry(icq_list.next, struct io_cq, >> q_node); >> >> spin_lock_irq(&icq->ioc->lock); >> - if (!(icq->flags & ICQ_DESTROYED)) >> - ioc_destroy_icq(icq); >> + ioc_destroy_icq(icq); >> spin_unlock_irq(&icq->ioc->lock); >> } >> rcu_read_unlock(); >> >> . >> > > diff --git a/block/blk-ioc.c b/block/blk-ioc.c > index 63fc02042408..47684d1e9006 100644 > --- a/block/blk-ioc.c > +++ b/block/blk-ioc.c > @@ -78,6 +78,9 @@ static void ioc_destroy_icq(struct io_cq *icq) > > lockdep_assert_held(&ioc->lock); > > + if (icq->flags & ICQ_DESTROYED) > + return; > + > radix_tree_delete(&ioc->icq_tree, icq->q->id); > hlist_del_init(&icq->ioc_node); > list_del_init(&icq->q_node); > @@ -128,12 +131,7 @@ static void ioc_release_fn(struct work_struct *work) > spin_lock(&q->queue_lock); > spin_lock(&ioc->lock); > > - /* > - * The icq may have been destroyed when the > ioc lock > - * was released. > - */ > - if (!(icq->flags & ICQ_DESTROYED)) > - ioc_destroy_icq(icq); > + ioc_destroy_icq(icq); > > spin_unlock(&q->queue_lock); > rcu_read_unlock(); > @@ -175,19 +173,16 @@ void ioc_clear_queue(struct request_queue *q) > > spin_lock_irq(&q->queue_lock); > list_splice_init(&q->icq_list, &icq_list); > - spin_unlock_irq(&q->queue_lock); > > - rcu_read_lock(); > while (!list_empty(&icq_list)) { > struct io_cq *icq = > list_entry(icq_list.next, struct io_cq, q_node); > > spin_lock_irq(&icq->ioc->lock); > - if (!(icq->flags & ICQ_DESTROYED)) > - ioc_destroy_icq(icq); > + ioc_destroy_icq(icq); > spin_unlock_irq(&icq->ioc->lock); > } > - rcu_read_unlock(); > + spin_unlock_irq(&q->queue_lock); > } > #else /* CONFIG_BLK_ICQ */ > static inline void ioc_exit_icqs(struct io_context *ioc) > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-22 6:19 ` Pradeep Pragallapati @ 2023-05-30 13:15 ` Pradeep Pragallapati 2023-05-30 13:24 ` Yu Kuai 0 siblings, 1 reply; 12+ messages in thread From: Pradeep Pragallapati @ 2023-05-30 13:15 UTC (permalink / raw) To: Yu Kuai, axboe; +Cc: linux-block, linux-kernel, yukuai (C) Hi, On 5/22/2023 11:49 AM, Pradeep Pragallapati wrote: > > On 5/18/2023 6:14 PM, Yu Kuai wrote: >> Hi, >> >> 在 2023/05/18 20:16, Yu Kuai 写道: >> >>> @@ -173,18 +171,17 @@ void ioc_clear_queue(struct request_queue *q) >>> { >>> LIST_HEAD(icq_list); >>> >>> + rcu_read_lock(); >> >> Sorry that I realized this is still not enough, following list_empty() >> and list_entry() can still concurrent with list_del(). Please try the >> following patch: > sure will try and update the results. At least for 80+hrs of testing, i didn't see the issue reproduced. seems like it is helping my case. >>> spin_lock_irq(&q->queue_lock); >>> list_splice_init(&q->icq_list, &icq_list); >>> spin_unlock_irq(&q->queue_lock); >>> >>> - rcu_read_lock(); >>> while (!list_empty(&icq_list)) { >>> struct io_cq *icq = >>> list_entry(icq_list.next, struct io_cq, >>> q_node); >>> >>> spin_lock_irq(&icq->ioc->lock); >>> - if (!(icq->flags & ICQ_DESTROYED)) >>> - ioc_destroy_icq(icq); >>> + ioc_destroy_icq(icq); >>> spin_unlock_irq(&icq->ioc->lock); >>> } >>> rcu_read_unlock(); >>> >>> . >>> >> >> diff --git a/block/blk-ioc.c b/block/blk-ioc.c >> index 63fc02042408..47684d1e9006 100644 >> --- a/block/blk-ioc.c >> +++ b/block/blk-ioc.c >> @@ -78,6 +78,9 @@ static void ioc_destroy_icq(struct io_cq *icq) >> >> lockdep_assert_held(&ioc->lock); >> >> + if (icq->flags & ICQ_DESTROYED) >> + return; >> + >> radix_tree_delete(&ioc->icq_tree, icq->q->id); >> hlist_del_init(&icq->ioc_node); >> list_del_init(&icq->q_node); >> @@ -128,12 +131,7 @@ static void ioc_release_fn(struct work_struct >> *work) >> spin_lock(&q->queue_lock); >> spin_lock(&ioc->lock); >> >> - /* >> - * The icq may have been destroyed when the >> ioc lock >> - * was released. >> - */ >> - if (!(icq->flags & ICQ_DESTROYED)) >> - ioc_destroy_icq(icq); >> + ioc_destroy_icq(icq); >> >> spin_unlock(&q->queue_lock); >> rcu_read_unlock(); >> @@ -175,19 +173,16 @@ void ioc_clear_queue(struct request_queue *q) >> >> spin_lock_irq(&q->queue_lock); >> list_splice_init(&q->icq_list, &icq_list); >> - spin_unlock_irq(&q->queue_lock); >> >> - rcu_read_lock(); >> while (!list_empty(&icq_list)) { >> struct io_cq *icq = >> list_entry(icq_list.next, struct io_cq, q_node); >> >> spin_lock_irq(&icq->ioc->lock); >> - if (!(icq->flags & ICQ_DESTROYED)) >> - ioc_destroy_icq(icq); >> + ioc_destroy_icq(icq); >> spin_unlock_irq(&icq->ioc->lock); >> } >> - rcu_read_unlock(); >> + spin_unlock_irq(&q->queue_lock); >> } >> #else /* CONFIG_BLK_ICQ */ >> static inline void ioc_exit_icqs(struct io_context *ioc) >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1] block: Fix null pointer dereference issue on struct io_cq 2023-05-30 13:15 ` Pradeep Pragallapati @ 2023-05-30 13:24 ` Yu Kuai 0 siblings, 0 replies; 12+ messages in thread From: Yu Kuai @ 2023-05-30 13:24 UTC (permalink / raw) To: Pradeep Pragallapati, Yu Kuai, axboe Cc: linux-block, linux-kernel, yukuai (C) Hi, 在 2023/05/30 21:15, Pradeep Pragallapati 写道: > Hi, > > On 5/22/2023 11:49 AM, Pradeep Pragallapati wrote: >> >> On 5/18/2023 6:14 PM, Yu Kuai wrote: >>> Hi, >>> >>> 在 2023/05/18 20:16, Yu Kuai 写道: >>> >>>> @@ -173,18 +171,17 @@ void ioc_clear_queue(struct request_queue *q) >>>> { >>>> LIST_HEAD(icq_list); >>>> >>>> + rcu_read_lock(); >>> >>> Sorry that I realized this is still not enough, following list_empty() >>> and list_entry() can still concurrent with list_del(). Please try the >>> following patch: >> sure will try and update the results. > > > At least for 80+hrs of testing, i didn't see the issue reproduced. seems > like it is helping my case. Thanks for the test, I'll send a patch soon. Kuai ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-05-30 13:24 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-17 8:44 [PATCH V1] block: Fix null pointer dereference issue on struct io_cq Pradeep P V K 2023-05-17 8:55 ` Chaitanya Kulkarni 2023-05-17 8:58 ` Yu Kuai 2023-05-17 9:20 ` Damien Le Moal 2023-05-17 9:21 ` Christoph Hellwig 2023-05-17 9:31 ` Damien Le Moal 2023-05-18 9:25 ` Pradeep Pragallapati (QUIC) 2023-05-18 12:16 ` Yu Kuai 2023-05-18 12:44 ` Yu Kuai 2023-05-22 6:19 ` Pradeep Pragallapati 2023-05-30 13:15 ` Pradeep Pragallapati 2023-05-30 13:24 ` Yu Kuai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox