All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Yufen Yu <yuyufen@huawei.com>, Hou Tao <houtao1@huawei.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>
Subject: Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc
Date: Fri, 28 Aug 2020 21:05:14 +0800	[thread overview]
Message-ID: <20200828130514.GA236174@T590> (raw)
In-Reply-To: <CAOi1vP9t1VL-JXj9ETdU_B1kMLjKGcW6wZss6bmxoH5UCAcK7Q@mail.gmail.com>

Hi Ilya,

Thanks for your report!

On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> Hi Ming,
> 
> There seems to be a sleeping while atomic bug in hd_struct_free():
> 
> 288 static void hd_struct_free(struct percpu_ref *ref)
> 289 {
> 290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> 291         struct gendisk *disk = part_to_disk(part);
> 292         struct disk_part_tbl *ptbl =
> 293                 rcu_dereference_protected(disk->part_tbl, 1);
> 294
> 295         rcu_assign_pointer(ptbl->last_lookup, NULL);
> 296         put_device(disk_to_dev(disk));
> 297
> 298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> 299         queue_rcu_work(system_wq, &part->rcu_work);
> 300 }
> 
> hd_struct_free() is a percpu_ref release callback and must not sleep.
> But put_device() can end up in disk_release(), resulting in anything
> from "sleeping function called from invalid context" splats to actual
> lockups if the queue ends up being released:
> 
>   BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
>   INFO: lockdep is turned off.
>   CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
> 5.9.0-rc2-ceph-g2de49bea2ebc #1
>   Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
>   Call Trace:
>    dump_stack+0x96/0xd0
>    __schedule_bug.cold+0x64/0x71
>    __schedule+0x8ea/0xac0
>    ? wait_for_completion+0x86/0x110
>    schedule+0x5f/0xd0
>    schedule_timeout+0x212/0x2a0
>    ? wait_for_completion+0x86/0x110
>    wait_for_completion+0xb0/0x110
>    __wait_rcu_gp+0x139/0x150
>    synchronize_rcu+0x79/0xf0
>    ? invoke_rcu_core+0xb0/0xb0
>    ? rcu_read_lock_any_held+0xb0/0xb0
>    blk_free_flush_queue+0x17/0x30
>    blk_mq_hw_sysfs_release+0x32/0x70
>    kobject_put+0x7d/0x1d0
>    blk_mq_release+0xbe/0xf0
>    blk_release_queue+0xb7/0x140
>    kobject_put+0x7d/0x1d0
>    disk_release+0xb0/0xc0
>    device_release+0x25/0x80
>    kobject_put+0x7d/0x1d0
>    hd_struct_free+0x4c/0xc0
>    percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
>    rcu_core+0x3fd/0x660
>    ? rcu_core+0x3cc/0x660
>    __do_softirq+0xd5/0x45e
>    ? smpboot_thread_fn+0x26/0x1d0
>    run_ksoftirqd+0x30/0x60
>    smpboot_thread_fn+0xfe/0x1d0
>    ? sort_range+0x20/0x20
>    kthread+0x11a/0x150
>    ? kthread_delayed_work_timer_fn+0xa0/0xa0
>    ret_from_fork+0x1f/0x30
> 
> "git blame" points at your commit tb7d6c3033323 ("block: fix
> use-after-free on cached last_lookup partition"), but there is
> likely more to it because it went into 5.8 and I haven't seen
> these lockups until we rebased to 5.9-rc.

The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
synchronous request_queue removal").

> 
> Could you please take a look?

Can you try the following patch?

diff --git a/block/partitions/core.c b/block/partitions/core.c
index e62a98a8eeb7..b06fc3425802 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
 {
 	struct hd_struct *part =
 		container_of(to_rcu_work(work), struct hd_struct, rcu_work);
+	struct gendisk *disk = part_to_disk(part);
+
+	/*
+	 * Release the reference grabbed in delete_partition, and it should
+	 * have been done in hd_struct_free(), however device's release
+	 * handler can't be done in percpu_ref's ->release() callback
+	 * because it is run via call_rcu().
+	 */
+	put_device(disk_to_dev(disk));
 
 	part->start_sect = 0;
 	part->nr_sects = 0;
@@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
 		rcu_dereference_protected(disk->part_tbl, 1);
 
 	rcu_assign_pointer(ptbl->last_lookup, NULL);
-	put_device(disk_to_dev(disk));
 
 	INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
 	queue_rcu_work(system_wq, &part->rcu_work);



Thanks,
Ming


  reply	other threads:[~2020-08-28 13:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov
2020-08-28 13:05 ` Ming Lei [this message]
2020-08-30  8:46   ` Ilya Dryomov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200828130514.GA236174@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=houtao1@huawei.com \
    --cc=idryomov@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=yuyufen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.