* Sleeping while atomic regression around hd_struct_free() in 5.9-rc
@ 2020-08-28 10:32 Ilya Dryomov
2020-08-28 13:05 ` Ming Lei
0 siblings, 1 reply; 3+ messages in thread
From: Ilya Dryomov @ 2020-08-28 10:32 UTC (permalink / raw)
To: Ming Lei; +Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block
Hi Ming,
There seems to be a sleeping while atomic bug in hd_struct_free():
288 static void hd_struct_free(struct percpu_ref *ref)
289 {
290 struct hd_struct *part = container_of(ref, struct hd_struct, ref);
291 struct gendisk *disk = part_to_disk(part);
292 struct disk_part_tbl *ptbl =
293 rcu_dereference_protected(disk->part_tbl, 1);
294
295 rcu_assign_pointer(ptbl->last_lookup, NULL);
296 put_device(disk_to_dev(disk));
297
298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
299 queue_rcu_work(system_wq, &part->rcu_work);
300 }
hd_struct_free() is a percpu_ref release callback and must not sleep.
But put_device() can end up in disk_release(), resulting in anything
from "sleeping function called from invalid context" splats to actual
lockups if the queue ends up being released:
BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
INFO: lockdep is turned off.
CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W
5.9.0-rc2-ceph-g2de49bea2ebc #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
Call Trace:
dump_stack+0x96/0xd0
__schedule_bug.cold+0x64/0x71
__schedule+0x8ea/0xac0
? wait_for_completion+0x86/0x110
schedule+0x5f/0xd0
schedule_timeout+0x212/0x2a0
? wait_for_completion+0x86/0x110
wait_for_completion+0xb0/0x110
__wait_rcu_gp+0x139/0x150
synchronize_rcu+0x79/0xf0
? invoke_rcu_core+0xb0/0xb0
? rcu_read_lock_any_held+0xb0/0xb0
blk_free_flush_queue+0x17/0x30
blk_mq_hw_sysfs_release+0x32/0x70
kobject_put+0x7d/0x1d0
blk_mq_release+0xbe/0xf0
blk_release_queue+0xb7/0x140
kobject_put+0x7d/0x1d0
disk_release+0xb0/0xc0
device_release+0x25/0x80
kobject_put+0x7d/0x1d0
hd_struct_free+0x4c/0xc0
percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
rcu_core+0x3fd/0x660
? rcu_core+0x3cc/0x660
__do_softirq+0xd5/0x45e
? smpboot_thread_fn+0x26/0x1d0
run_ksoftirqd+0x30/0x60
smpboot_thread_fn+0xfe/0x1d0
? sort_range+0x20/0x20
kthread+0x11a/0x150
? kthread_delayed_work_timer_fn+0xa0/0xa0
ret_from_fork+0x1f/0x30
"git blame" points at your commit tb7d6c3033323 ("block: fix
use-after-free on cached last_lookup partition"), but there is
likely more to it because it went into 5.8 and I haven't seen
these lockups until we rebased to 5.9-rc.
Could you please take a look?
Thanks,
Ilya
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc 2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov @ 2020-08-28 13:05 ` Ming Lei 2020-08-30 8:46 ` Ilya Dryomov 0 siblings, 1 reply; 3+ messages in thread From: Ming Lei @ 2020-08-28 13:05 UTC (permalink / raw) To: Ilya Dryomov Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block Hi Ilya, Thanks for your report! On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote: > Hi Ming, > > There seems to be a sleeping while atomic bug in hd_struct_free(): > > 288 static void hd_struct_free(struct percpu_ref *ref) > 289 { > 290 struct hd_struct *part = container_of(ref, struct hd_struct, ref); > 291 struct gendisk *disk = part_to_disk(part); > 292 struct disk_part_tbl *ptbl = > 293 rcu_dereference_protected(disk->part_tbl, 1); > 294 > 295 rcu_assign_pointer(ptbl->last_lookup, NULL); > 296 put_device(disk_to_dev(disk)); > 297 > 298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); > 299 queue_rcu_work(system_wq, &part->rcu_work); > 300 } > > hd_struct_free() is a percpu_ref release callback and must not sleep. > But put_device() can end up in disk_release(), resulting in anything > from "sleeping function called from invalid context" splats to actual > lockups if the queue ends up being released: > > BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102 > INFO: lockdep is turned off. > CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W > 5.9.0-rc2-ceph-g2de49bea2ebc #1 > Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 > Call Trace: > dump_stack+0x96/0xd0 > __schedule_bug.cold+0x64/0x71 > __schedule+0x8ea/0xac0 > ? wait_for_completion+0x86/0x110 > schedule+0x5f/0xd0 > schedule_timeout+0x212/0x2a0 > ? wait_for_completion+0x86/0x110 > wait_for_completion+0xb0/0x110 > __wait_rcu_gp+0x139/0x150 > synchronize_rcu+0x79/0xf0 > ? invoke_rcu_core+0xb0/0xb0 > ? rcu_read_lock_any_held+0xb0/0xb0 > blk_free_flush_queue+0x17/0x30 > blk_mq_hw_sysfs_release+0x32/0x70 > kobject_put+0x7d/0x1d0 > blk_mq_release+0xbe/0xf0 > blk_release_queue+0xb7/0x140 > kobject_put+0x7d/0x1d0 > disk_release+0xb0/0xc0 > device_release+0x25/0x80 > kobject_put+0x7d/0x1d0 > hd_struct_free+0x4c/0xc0 > percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0 > rcu_core+0x3fd/0x660 > ? rcu_core+0x3cc/0x660 > __do_softirq+0xd5/0x45e > ? smpboot_thread_fn+0x26/0x1d0 > run_ksoftirqd+0x30/0x60 > smpboot_thread_fn+0xfe/0x1d0 > ? sort_range+0x20/0x20 > kthread+0x11a/0x150 > ? kthread_delayed_work_timer_fn+0xa0/0xa0 > ret_from_fork+0x1f/0x30 > > "git blame" points at your commit tb7d6c3033323 ("block: fix > use-after-free on cached last_lookup partition"), but there is > likely more to it because it went into 5.8 and I haven't seen > these lockups until we rebased to 5.9-rc. The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to synchronous request_queue removal"). > > Could you please take a look? Can you try the following patch? diff --git a/block/partitions/core.c b/block/partitions/core.c index e62a98a8eeb7..b06fc3425802 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work) { struct hd_struct *part = container_of(to_rcu_work(work), struct hd_struct, rcu_work); + struct gendisk *disk = part_to_disk(part); + + /* + * Release the reference grabbed in delete_partition, and it should + * have been done in hd_struct_free(), however device's release + * handler can't be done in percpu_ref's ->release() callback + * because it is run via call_rcu(). + */ + put_device(disk_to_dev(disk)); part->start_sect = 0; part->nr_sects = 0; @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref) rcu_dereference_protected(disk->part_tbl, 1); rcu_assign_pointer(ptbl->last_lookup, NULL); - put_device(disk_to_dev(disk)); INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); queue_rcu_work(system_wq, &part->rcu_work); Thanks, Ming ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc 2020-08-28 13:05 ` Ming Lei @ 2020-08-30 8:46 ` Ilya Dryomov 0 siblings, 0 replies; 3+ messages in thread From: Ilya Dryomov @ 2020-08-30 8:46 UTC (permalink / raw) To: Ming Lei; +Cc: Yufen Yu, Hou Tao, Christoph Hellwig, Jens Axboe, linux-block On Fri, Aug 28, 2020 at 3:05 PM Ming Lei <ming.lei@redhat.com> wrote: > > Hi Ilya, > > Thanks for your report! > > On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote: > > Hi Ming, > > > > There seems to be a sleeping while atomic bug in hd_struct_free(): > > > > 288 static void hd_struct_free(struct percpu_ref *ref) > > 289 { > > 290 struct hd_struct *part = container_of(ref, struct hd_struct, ref); > > 291 struct gendisk *disk = part_to_disk(part); > > 292 struct disk_part_tbl *ptbl = > > 293 rcu_dereference_protected(disk->part_tbl, 1); > > 294 > > 295 rcu_assign_pointer(ptbl->last_lookup, NULL); > > 296 put_device(disk_to_dev(disk)); > > 297 > > 298 INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); > > 299 queue_rcu_work(system_wq, &part->rcu_work); > > 300 } > > > > hd_struct_free() is a percpu_ref release callback and must not sleep. > > But put_device() can end up in disk_release(), resulting in anything > > from "sleeping function called from invalid context" splats to actual > > lockups if the queue ends up being released: > > > > BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102 > > INFO: lockdep is turned off. > > CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G W > > 5.9.0-rc2-ceph-g2de49bea2ebc #1 > > Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015 > > Call Trace: > > dump_stack+0x96/0xd0 > > __schedule_bug.cold+0x64/0x71 > > __schedule+0x8ea/0xac0 > > ? wait_for_completion+0x86/0x110 > > schedule+0x5f/0xd0 > > schedule_timeout+0x212/0x2a0 > > ? wait_for_completion+0x86/0x110 > > wait_for_completion+0xb0/0x110 > > __wait_rcu_gp+0x139/0x150 > > synchronize_rcu+0x79/0xf0 > > ? invoke_rcu_core+0xb0/0xb0 > > ? rcu_read_lock_any_held+0xb0/0xb0 > > blk_free_flush_queue+0x17/0x30 > > blk_mq_hw_sysfs_release+0x32/0x70 > > kobject_put+0x7d/0x1d0 > > blk_mq_release+0xbe/0xf0 > > blk_release_queue+0xb7/0x140 > > kobject_put+0x7d/0x1d0 > > disk_release+0xb0/0xc0 > > device_release+0x25/0x80 > > kobject_put+0x7d/0x1d0 > > hd_struct_free+0x4c/0xc0 > > percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0 > > rcu_core+0x3fd/0x660 > > ? rcu_core+0x3cc/0x660 > > __do_softirq+0xd5/0x45e > > ? smpboot_thread_fn+0x26/0x1d0 > > run_ksoftirqd+0x30/0x60 > > smpboot_thread_fn+0xfe/0x1d0 > > ? sort_range+0x20/0x20 > > kthread+0x11a/0x150 > > ? kthread_delayed_work_timer_fn+0xa0/0xa0 > > ret_from_fork+0x1f/0x30 > > > > "git blame" points at your commit tb7d6c3033323 ("block: fix > > use-after-free on cached last_lookup partition"), but there is > > likely more to it because it went into 5.8 and I haven't seen > > these lockups until we rebased to 5.9-rc. > > The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to > synchronous request_queue removal"). > > > > > Could you please take a look? > > Can you try the following patch? > > diff --git a/block/partitions/core.c b/block/partitions/core.c > index e62a98a8eeb7..b06fc3425802 100644 > --- a/block/partitions/core.c > +++ b/block/partitions/core.c > @@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work) > { > struct hd_struct *part = > container_of(to_rcu_work(work), struct hd_struct, rcu_work); > + struct gendisk *disk = part_to_disk(part); > + > + /* > + * Release the reference grabbed in delete_partition, and it should > + * have been done in hd_struct_free(), however device's release > + * handler can't be done in percpu_ref's ->release() callback > + * because it is run via call_rcu(). > + */ > + put_device(disk_to_dev(disk)); > > part->start_sect = 0; > part->nr_sects = 0; > @@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref) > rcu_dereference_protected(disk->part_tbl, 1); > > rcu_assign_pointer(ptbl->last_lookup, NULL); > - put_device(disk_to_dev(disk)); > > INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); > queue_rcu_work(system_wq, &part->rcu_work); This patch fixes it for me. Thanks, Ilya ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-08-30 8:46 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov 2020-08-28 13:05 ` Ming Lei 2020-08-30 8:46 ` Ilya Dryomov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox