linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Yufen Yu <yuyufen@huawei.com>, Hou Tao <houtao1@huawei.com>,
	Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>
Subject: Re: Sleeping while atomic regression around hd_struct_free() in 5.9-rc
Date: Fri, 28 Aug 2020 21:05:14 +0800	[thread overview]
Message-ID: <20200828130514.GA236174@T590> (raw)
In-Reply-To: <CAOi1vP9t1VL-JXj9ETdU_B1kMLjKGcW6wZss6bmxoH5UCAcK7Q@mail.gmail.com>

Hi Ilya,

Thanks for your report!

On Fri, Aug 28, 2020 at 12:32:48PM +0200, Ilya Dryomov wrote:
> Hi Ming,
> 
> There seems to be a sleeping while atomic bug in hd_struct_free():
> 
> 288 static void hd_struct_free(struct percpu_ref *ref)
> 289 {
> 290         struct hd_struct *part = container_of(ref, struct hd_struct, ref);
> 291         struct gendisk *disk = part_to_disk(part);
> 292         struct disk_part_tbl *ptbl =
> 293                 rcu_dereference_protected(disk->part_tbl, 1);
> 294
> 295         rcu_assign_pointer(ptbl->last_lookup, NULL);
> 296         put_device(disk_to_dev(disk));
> 297
> 298         INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
> 299         queue_rcu_work(system_wq, &part->rcu_work);
> 300 }
> 
> hd_struct_free() is a percpu_ref release callback and must not sleep.
> But put_device() can end up in disk_release(), resulting in anything
> from "sleeping function called from invalid context" splats to actual
> lockups if the queue ends up being released:
> 
>   BUG: scheduling while atomic: ksoftirqd/3/26/0x00000102
>   INFO: lockdep is turned off.
>   CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G        W
> 5.9.0-rc2-ceph-g2de49bea2ebc #1
>   Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
>   Call Trace:
>    dump_stack+0x96/0xd0
>    __schedule_bug.cold+0x64/0x71
>    __schedule+0x8ea/0xac0
>    ? wait_for_completion+0x86/0x110
>    schedule+0x5f/0xd0
>    schedule_timeout+0x212/0x2a0
>    ? wait_for_completion+0x86/0x110
>    wait_for_completion+0xb0/0x110
>    __wait_rcu_gp+0x139/0x150
>    synchronize_rcu+0x79/0xf0
>    ? invoke_rcu_core+0xb0/0xb0
>    ? rcu_read_lock_any_held+0xb0/0xb0
>    blk_free_flush_queue+0x17/0x30
>    blk_mq_hw_sysfs_release+0x32/0x70
>    kobject_put+0x7d/0x1d0
>    blk_mq_release+0xbe/0xf0
>    blk_release_queue+0xb7/0x140
>    kobject_put+0x7d/0x1d0
>    disk_release+0xb0/0xc0
>    device_release+0x25/0x80
>    kobject_put+0x7d/0x1d0
>    hd_struct_free+0x4c/0xc0
>    percpu_ref_switch_to_atomic_rcu+0x1df/0x1f0
>    rcu_core+0x3fd/0x660
>    ? rcu_core+0x3cc/0x660
>    __do_softirq+0xd5/0x45e
>    ? smpboot_thread_fn+0x26/0x1d0
>    run_ksoftirqd+0x30/0x60
>    smpboot_thread_fn+0xfe/0x1d0
>    ? sort_range+0x20/0x20
>    kthread+0x11a/0x150
>    ? kthread_delayed_work_timer_fn+0xa0/0xa0
>    ret_from_fork+0x1f/0x30
> 
> "git blame" points at your commit tb7d6c3033323 ("block: fix
> use-after-free on cached last_lookup partition"), but there is
> likely more to it because it went into 5.8 and I haven't seen
> these lockups until we rebased to 5.9-rc.

The pull-the-trigger commit is actually e8c7d14ac6c3 ("block: revert back to
synchronous request_queue removal").

> 
> Could you please take a look?

Can you try the following patch?

diff --git a/block/partitions/core.c b/block/partitions/core.c
index e62a98a8eeb7..b06fc3425802 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -278,6 +278,15 @@ static void hd_struct_free_work(struct work_struct *work)
 {
 	struct hd_struct *part =
 		container_of(to_rcu_work(work), struct hd_struct, rcu_work);
+	struct gendisk *disk = part_to_disk(part);
+
+	/*
+	 * Release the reference grabbed in delete_partition, and it should
+	 * have been done in hd_struct_free(), however device's release
+	 * handler can't be done in percpu_ref's ->release() callback
+	 * because it is run via call_rcu().
+	 */
+	put_device(disk_to_dev(disk));
 
 	part->start_sect = 0;
 	part->nr_sects = 0;
@@ -293,7 +302,6 @@ static void hd_struct_free(struct percpu_ref *ref)
 		rcu_dereference_protected(disk->part_tbl, 1);
 
 	rcu_assign_pointer(ptbl->last_lookup, NULL);
-	put_device(disk_to_dev(disk));
 
 	INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
 	queue_rcu_work(system_wq, &part->rcu_work);



Thanks,
Ming


  reply	other threads:[~2020-08-28 13:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 10:32 Sleeping while atomic regression around hd_struct_free() in 5.9-rc Ilya Dryomov
2020-08-28 13:05 ` Ming Lei [this message]
2020-08-30  8:46   ` Ilya Dryomov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200828130514.GA236174@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=houtao1@huawei.com \
    --cc=idryomov@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=yuyufen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).