* [PATCH] Remove blkg node after destroying blkg @ 2023-04-25 7:59 Tao Su 2023-04-25 8:09 ` Yu Kuai 0 siblings, 1 reply; 6+ messages in thread From: Tao Su @ 2023-04-25 7:59 UTC (permalink / raw) To: linux-block, linux-kernel; +Cc: tj, josef, axboe, tao1.su Kernel hang when poweroff or reboot, due to infinite restart in function blkg_destroy_all. It will goto restart label when a batch of blkgs are destroyed, but not remove blkg node in blkg_list. So the blkg_list is same in every 'restart' and result in kernel hang. By adding list_del to remove blkg node after destroying, can solve this kernel hang issue and satisfy the previous will to 'restart'. Reported-by: Xiangfei Ma <xiangfeix.ma@intel.com> Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> --- block/blk-cgroup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bd50b55bdb61..960eb538a704 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -530,6 +530,7 @@ static void blkg_destroy_all(struct gendisk *disk) spin_lock(&blkcg->lock); blkg_destroy(blkg); + list_del(&blkg->q_node); spin_unlock(&blkcg->lock); /* -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Remove blkg node after destroying blkg 2023-04-25 7:59 [PATCH] Remove blkg node after destroying blkg Tao Su @ 2023-04-25 8:09 ` Yu Kuai 2023-04-25 9:41 ` Tao Su 0 siblings, 1 reply; 6+ messages in thread From: Yu Kuai @ 2023-04-25 8:09 UTC (permalink / raw) To: Tao Su, linux-block, linux-kernel; +Cc: tj, josef, axboe, yukuai (C) Hi, 在 2023/04/25 15:59, Tao Su 写道: > Kernel hang when poweroff or reboot, due to infinite restart in function > blkg_destroy_all. It will goto restart label when a batch of blkgs are > destroyed, but not remove blkg node in blkg_list. So the blkg_list is > same in every 'restart' and result in kernel hang. > > By adding list_del to remove blkg node after destroying, can solve this > kernel hang issue and satisfy the previous will to 'restart'. > > Reported-by: Xiangfei Ma <xiangfeix.ma@intel.com> > Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> > Tested-by: Farrah Chen <farrah.chen@intel.com> > Signed-off-by: Tao Su <tao1.su@linux.intel.com> > --- > block/blk-cgroup.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index bd50b55bdb61..960eb538a704 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -530,6 +530,7 @@ static void blkg_destroy_all(struct gendisk *disk) > > spin_lock(&blkcg->lock); > blkg_destroy(blkg); > + list_del(&blkg->q_node); blkg should stay on the queue list until blkg_free_workfn(), otherwise parent blkg can be freed before child, which will cause some known issue. I think this hung happens when total blkg is greater than BLKG_DESTROY_BATCH_SIZE, right? Can you try if following patch fix your problem? index 1c1ebeb51003..0ecb4cce8af2 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -527,6 +527,9 @@ static void blkg_destroy_all(struct gendisk *disk) list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) { struct blkcg *blkcg = blkg->blkcg; + if (hlist_unhashed(&blkg->blkcg_node)) + continue; + spin_lock(&blkcg->lock); blkg_destroy(blkg); spin_unlock(&blkcg->lock); > spin_unlock(&blkcg->lock); > > /* > ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Remove blkg node after destroying blkg 2023-04-25 8:09 ` Yu Kuai @ 2023-04-25 9:41 ` Tao Su 2023-04-25 11:09 ` Yu Kuai 0 siblings, 1 reply; 6+ messages in thread From: Tao Su @ 2023-04-25 9:41 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-block, linux-kernel, tj, josef, axboe On Tue, Apr 25, 2023 at 04:09:34PM +0800, Yu Kuai wrote: > Hi, > > 在 2023/04/25 15:59, Tao Su 写道: > > Kernel hang when poweroff or reboot, due to infinite restart in function > > blkg_destroy_all. It will goto restart label when a batch of blkgs are > > destroyed, but not remove blkg node in blkg_list. So the blkg_list is > > same in every 'restart' and result in kernel hang. > > > > By adding list_del to remove blkg node after destroying, can solve this > > kernel hang issue and satisfy the previous will to 'restart'. > > > > Reported-by: Xiangfei Ma <xiangfeix.ma@intel.com> > > Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> > > Tested-by: Farrah Chen <farrah.chen@intel.com> > > Signed-off-by: Tao Su <tao1.su@linux.intel.com> > > --- > > block/blk-cgroup.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > > index bd50b55bdb61..960eb538a704 100644 > > --- a/block/blk-cgroup.c > > +++ b/block/blk-cgroup.c > > @@ -530,6 +530,7 @@ static void blkg_destroy_all(struct gendisk *disk) > > spin_lock(&blkcg->lock); > > blkg_destroy(blkg); > > + list_del(&blkg->q_node); > > blkg should stay on the queue list until blkg_free_workfn(), otherwise > parent blkg can be freed before child, which will cause some known > issue. Yes, directly removing blkg node is not appropriate, which I noticed some comments in blkg_destroy(), thanks for pointing out this issue. > > I think this hung happens when total blkg is greater than > BLKG_DESTROY_BATCH_SIZE, right? Yes, you are right. > > Can you try if following patch fix your problem? This patch can also fix my problem, and indeed is a more secure way. Thanks, Tao > > index 1c1ebeb51003..0ecb4cce8af2 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -527,6 +527,9 @@ static void blkg_destroy_all(struct gendisk *disk) > list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) { > struct blkcg *blkcg = blkg->blkcg; > > + if (hlist_unhashed(&blkg->blkcg_node)) > + continue; > + > spin_lock(&blkcg->lock); > blkg_destroy(blkg); > spin_unlock(&blkcg->lock); > > > spin_unlock(&blkcg->lock); > > /* > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Remove blkg node after destroying blkg 2023-04-25 9:41 ` Tao Su @ 2023-04-25 11:09 ` Yu Kuai 2023-04-26 1:13 ` Yu Kuai 0 siblings, 1 reply; 6+ messages in thread From: Yu Kuai @ 2023-04-25 11:09 UTC (permalink / raw) To: Tao Su; +Cc: linux-block, linux-kernel, tj, josef, axboe, yukuai (C) Hi, 在 2023/04/25 17:41, Tao Su 写道: > On Tue, Apr 25, 2023 at 04:09:34PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2023/04/25 15:59, Tao Su 写道: >>> Kernel hang when poweroff or reboot, due to infinite restart in function >>> blkg_destroy_all. It will goto restart label when a batch of blkgs are >>> destroyed, but not remove blkg node in blkg_list. So the blkg_list is >>> same in every 'restart' and result in kernel hang. >>> >>> By adding list_del to remove blkg node after destroying, can solve this >>> kernel hang issue and satisfy the previous will to 'restart'. >>> >>> Reported-by: Xiangfei Ma <xiangfeix.ma@intel.com> >>> Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> >>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>> Signed-off-by: Tao Su <tao1.su@linux.intel.com> >>> --- >>> block/blk-cgroup.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c >>> index bd50b55bdb61..960eb538a704 100644 >>> --- a/block/blk-cgroup.c >>> +++ b/block/blk-cgroup.c >>> @@ -530,6 +530,7 @@ static void blkg_destroy_all(struct gendisk *disk) >>> spin_lock(&blkcg->lock); >>> blkg_destroy(blkg); >>> + list_del(&blkg->q_node); >> >> blkg should stay on the queue list until blkg_free_workfn(), otherwise >> parent blkg can be freed before child, which will cause some known >> issue. > > Yes, directly removing blkg node is not appropriate, which I noticed some > comments in blkg_destroy(), thanks for pointing out this issue. > >> >> I think this hung happens when total blkg is greater than >> BLKG_DESTROY_BATCH_SIZE, right? > > Yes, you are right. > >> >> Can you try if following patch fix your problem? > > This patch can also fix my problem, and indeed is a more secure way. Thanks for the test, for a better solution, I think 'blkcg_mutex' can be used to protect 'blkg->q_node' list instead of 'queue_lock', so that the 'restart' can be removed because softlockup can be avoided. Thanks, Kuai > > Thanks, > Tao > >> >> index 1c1ebeb51003..0ecb4cce8af2 100644 >> --- a/block/blk-cgroup.c >> +++ b/block/blk-cgroup.c >> @@ -527,6 +527,9 @@ static void blkg_destroy_all(struct gendisk *disk) >> list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) { >> struct blkcg *blkcg = blkg->blkcg; >> >> + if (hlist_unhashed(&blkg->blkcg_node)) >> + continue; >> + >> spin_lock(&blkcg->lock); >> blkg_destroy(blkg); >> spin_unlock(&blkcg->lock); >> >>> spin_unlock(&blkcg->lock); >>> /* >>> > > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Remove blkg node after destroying blkg 2023-04-25 11:09 ` Yu Kuai @ 2023-04-26 1:13 ` Yu Kuai 2023-04-26 2:20 ` Tao Su 0 siblings, 1 reply; 6+ messages in thread From: Yu Kuai @ 2023-04-26 1:13 UTC (permalink / raw) To: Yu Kuai, Tao Su; +Cc: linux-block, linux-kernel, tj, josef, axboe, yukuai (C) Hi, 在 2023/04/25 19:09, Yu Kuai 写道: > Hi, > > 在 2023/04/25 17:41, Tao Su 写道: >> On Tue, Apr 25, 2023 at 04:09:34PM +0800, Yu Kuai wrote: >>> Hi, >>> >>> 在 2023/04/25 15:59, Tao Su 写道: >>>> Kernel hang when poweroff or reboot, due to infinite restart in >>>> function >>>> blkg_destroy_all. It will goto restart label when a batch of blkgs are >>>> destroyed, but not remove blkg node in blkg_list. So the blkg_list is >>>> same in every 'restart' and result in kernel hang. >>>> >>>> By adding list_del to remove blkg node after destroying, can solve this >>>> kernel hang issue and satisfy the previous will to 'restart'. >>>> >>>> Reported-by: Xiangfei Ma <xiangfeix.ma@intel.com> >>>> Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> >>>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>>> Signed-off-by: Tao Su <tao1.su@linux.intel.com> >>>> --- >>>> block/blk-cgroup.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c >>>> index bd50b55bdb61..960eb538a704 100644 >>>> --- a/block/blk-cgroup.c >>>> +++ b/block/blk-cgroup.c >>>> @@ -530,6 +530,7 @@ static void blkg_destroy_all(struct gendisk *disk) >>>> spin_lock(&blkcg->lock); >>>> blkg_destroy(blkg); >>>> + list_del(&blkg->q_node); >>> >>> blkg should stay on the queue list until blkg_free_workfn(), otherwise >>> parent blkg can be freed before child, which will cause some known >>> issue. >> >> Yes, directly removing blkg node is not appropriate, which I noticed some >> comments in blkg_destroy(), thanks for pointing out this issue. >> >>> >>> I think this hung happens when total blkg is greater than >>> BLKG_DESTROY_BATCH_SIZE, right? >> >> Yes, you are right. >> >>> >>> Can you try if following patch fix your problem? >> >> This patch can also fix my problem, and indeed is a more secure way. > > Thanks for the test, for a better solution, I think 'blkcg_mutex' can > be used to protect 'blkg->q_node' list instead of 'queue_lock', so that > the 'restart' can be removed because softlockup can be avoided. > I looked into this, and I found that this is not a easy thing to do. Anyway, feel free to submit a new patch based on my orignial suggestion. Thanks, Kuai ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Remove blkg node after destroying blkg 2023-04-26 1:13 ` Yu Kuai @ 2023-04-26 2:20 ` Tao Su 0 siblings, 0 replies; 6+ messages in thread From: Tao Su @ 2023-04-26 2:20 UTC (permalink / raw) To: Yu Kuai; +Cc: linux-block, linux-kernel, tj, josef, axboe, yukuai (C) On Wed, Apr 26, 2023 at 09:13:08AM +0800, Yu Kuai wrote: > Hi, > > 在 2023/04/25 19:09, Yu Kuai 写道: > > Hi, > > > > 在 2023/04/25 17:41, Tao Su 写道: > > > On Tue, Apr 25, 2023 at 04:09:34PM +0800, Yu Kuai wrote: > > > > Hi, > > > > > > > > 在 2023/04/25 15:59, Tao Su 写道: > > > > > Kernel hang when poweroff or reboot, due to infinite restart > > > > > in function > > > > > blkg_destroy_all. It will goto restart label when a batch of blkgs are > > > > > destroyed, but not remove blkg node in blkg_list. So the blkg_list is > > > > > same in every 'restart' and result in kernel hang. > > > > > > > > > > By adding list_del to remove blkg node after destroying, can solve this > > > > > kernel hang issue and satisfy the previous will to 'restart'. > > > > > > > > > > Reported-by: Xiangfei Ma <xiangfeix.ma@intel.com> > > > > > Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> > > > > > Tested-by: Farrah Chen <farrah.chen@intel.com> > > > > > Signed-off-by: Tao Su <tao1.su@linux.intel.com> > > > > > --- > > > > > block/blk-cgroup.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > > > > > index bd50b55bdb61..960eb538a704 100644 > > > > > --- a/block/blk-cgroup.c > > > > > +++ b/block/blk-cgroup.c > > > > > @@ -530,6 +530,7 @@ static void blkg_destroy_all(struct gendisk *disk) > > > > > spin_lock(&blkcg->lock); > > > > > blkg_destroy(blkg); > > > > > + list_del(&blkg->q_node); > > > > > > > > blkg should stay on the queue list until blkg_free_workfn(), otherwise > > > > parent blkg can be freed before child, which will cause some known > > > > issue. > > > > > > Yes, directly removing blkg node is not appropriate, which I noticed some > > > comments in blkg_destroy(), thanks for pointing out this issue. > > > > > > > > > > > I think this hung happens when total blkg is greater than > > > > BLKG_DESTROY_BATCH_SIZE, right? > > > > > > Yes, you are right. > > > > > > > > > > > Can you try if following patch fix your problem? > > > > > > This patch can also fix my problem, and indeed is a more secure way. > > > > Thanks for the test, for a better solution, I think 'blkcg_mutex' can > > be used to protect 'blkg->q_node' list instead of 'queue_lock', so that > > the 'restart' can be removed because softlockup can be avoided. > > > > I looked into this, and I found that this is not a easy thing to do. > > Anyway, feel free to submit a new patch based on my orignial suggestion. Thanks for your contribution and careful review, I will submit the new patch if no other comments. Thanks, Tao > > Thanks, > Kuai > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-04-26 2:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-25 7:59 [PATCH] Remove blkg node after destroying blkg Tao Su 2023-04-25 8:09 ` Yu Kuai 2023-04-25 9:41 ` Tao Su 2023-04-25 11:09 ` Yu Kuai 2023-04-26 1:13 ` Yu Kuai 2023-04-26 2:20 ` Tao Su
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).