* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 8:34 [PATCH] block: fix "Directory XXXXX with parent 'block' already present!" Ming Lei
@ 2022-04-21 12:02 ` Shinichiro Kawasaki
2022-04-21 16:09 ` Christoph Hellwig
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Shinichiro Kawasaki @ 2022-04-21 12:02 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block@vger.kernel.org, Dan Williams, yukuai
On Apr 21, 2022 / 16:34, Ming Lei wrote:
> q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
> created when adding disk, and removed when releasing request queue.
>
> There is small window between releasing disk and releasing request
> queue, and during the period, one disk with same name may be created
> and added, so debugfs_create_dir() may complain with "Directory XXXXX
> with parent 'block' already present!"
>
> Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
> and the dir name is named with q->id from beginning, and switched to
> disk name when adding disk, and finally changed to q->id in disk_release().
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Cc: yukuai (C) <yukuai3@huawei.com>
> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
Thank you Ming, I confirmed that blktests block group all passes with this
patch, including the block/002 test case.
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
--
Best Regards,
Shin'ichiro Kawasaki
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 8:34 [PATCH] block: fix "Directory XXXXX with parent 'block' already present!" Ming Lei
2022-04-21 12:02 ` Shinichiro Kawasaki
@ 2022-04-21 16:09 ` Christoph Hellwig
2022-04-22 3:01 ` Ming Lei
2022-04-21 16:37 ` Dan Williams
` (3 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-04-21 16:09 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Dan Williams, yukuai,
Shin'ichiro Kawasaki
On Thu, Apr 21, 2022 at 04:34:31PM +0800, Ming Lei wrote:
> q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
> created when adding disk, and removed when releasing request queue.
>
> There is small window between releasing disk and releasing request
> queue, and during the period, one disk with same name may be created
> and added, so debugfs_create_dir() may complain with "Directory XXXXX
> with parent 'block' already present!"
>
> Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
> and the dir name is named with q->id from beginning, and switched to
> disk name when adding disk, and finally changed to q->id in disk_release().
Is there any good reason to not just debugfs_remove_recursive in
blk_unregister_queue and do away with all the renaming?
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 16:09 ` Christoph Hellwig
@ 2022-04-22 3:01 ` Ming Lei
2022-04-22 6:05 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-04-22 3:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Dan Williams, yukuai,
Shin'ichiro Kawasaki
On Thu, Apr 21, 2022 at 09:09:01AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 21, 2022 at 04:34:31PM +0800, Ming Lei wrote:
> > q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
> > created when adding disk, and removed when releasing request queue.
> >
> > There is small window between releasing disk and releasing request
> > queue, and during the period, one disk with same name may be created
> > and added, so debugfs_create_dir() may complain with "Directory XXXXX
> > with parent 'block' already present!"
> >
> > Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
> > and the dir name is named with q->id from beginning, and switched to
> > disk name when adding disk, and finally changed to q->id in disk_release().
>
> Is there any good reason to not just debugfs_remove_recursive in
> blk_unregister_queue and do away with all the renaming?
Please see the following reasons:
1) disk_release_mq() calls elevator_exit()/rq_qos_exit(), and the two
may trigger UAF if q->debugfs_dir is removed in blk_unregister_queue().
2) after deleting disk, blktrace still should/can work for tracing
passthrough request.
3) "debugfs directory deleted with blktrace active" in block/002 could
be triggered
thanks,
Ming
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-22 3:01 ` Ming Lei
@ 2022-04-22 6:05 ` Christoph Hellwig
2022-04-22 6:43 ` Ming Lei
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-04-22 6:05 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Jens Axboe, linux-block, Dan Williams, yukuai,
Shin'ichiro Kawasaki
On Fri, Apr 22, 2022 at 11:01:43AM +0800, Ming Lei wrote:
> Please see the following reasons:
>
> 1) disk_release_mq() calls elevator_exit()/rq_qos_exit(), and the two
> may trigger UAF if q->debugfs_dir is removed in blk_unregister_queue().
Well. The debugfs_remove_recursive already removes all underlying
entries, so the extra debugfs_remove_recursive calls there can just
go away.
>
> 2) after deleting disk, blktrace still should/can work for tracing
> passthrough request.
>
> 3) "debugfs directory deleted with blktrace active" in block/002 could
> be triggered
Well, 3 just tests 2, so these are really one.
But how is blktrace supposed to work after the disk is torn down
anyway? Pretty much all actual block trace traces reference the
gendisk and/or block device which are getting freed at that point.
So doing any blktrace action after the gendisk is released will
lead to memory corruption. For everyting but SCSI the race windows
are probably small enough to not be seen by accident, but if you
unbind the SSI ULP this should be fairly reproducible.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-22 6:05 ` Christoph Hellwig
@ 2022-04-22 6:43 ` Ming Lei
0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-04-22 6:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, linux-block, Dan Williams, yukuai,
Shin'ichiro Kawasaki
On Thu, Apr 21, 2022 at 11:05:13PM -0700, Christoph Hellwig wrote:
> On Fri, Apr 22, 2022 at 11:01:43AM +0800, Ming Lei wrote:
> > Please see the following reasons:
> >
> > 1) disk_release_mq() calls elevator_exit()/rq_qos_exit(), and the two
> > may trigger UAF if q->debugfs_dir is removed in blk_unregister_queue().
>
> Well. The debugfs_remove_recursive already removes all underlying
> entries, so the extra debugfs_remove_recursive calls there can just
> go away.
> >
> > 2) after deleting disk, blktrace still should/can work for tracing
> > passthrough request.
> >
> > 3) "debugfs directory deleted with blktrace active" in block/002 could
> > be triggered
>
> Well, 3 just tests 2, so these are really one.
>
> But how is blktrace supposed to work after the disk is torn down
> anyway? Pretty much all actual block trace traces reference the
> gendisk and/or block device which are getting freed at that point.
> So doing any blktrace action after the gendisk is released will
> lead to memory corruption. For everyting but SCSI the race windows
> are probably small enough to not be seen by accident, but if you
> unbind the SSI ULP this should be fairly reproducible.
blktrace opens the bdev, so it is safe to trace until blktrace closes
the bdev. And del_gendisk() does happen before releasing disk, that
is why I don't think it is good to remove q->debugfs_dir inside
del_gendisk().
thanks,
Ming
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 8:34 [PATCH] block: fix "Directory XXXXX with parent 'block' already present!" Ming Lei
2022-04-21 12:02 ` Shinichiro Kawasaki
2022-04-21 16:09 ` Christoph Hellwig
@ 2022-04-21 16:37 ` Dan Williams
2022-04-21 17:28 ` Hannes Reinecke
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2022-04-21 16:37 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, yukuai, Shin'ichiro Kawasaki
On Thu, Apr 21, 2022 at 1:35 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
> created when adding disk, and removed when releasing request queue.
>
> There is small window between releasing disk and releasing request
> queue, and during the period, one disk with same name may be created
> and added, so debugfs_create_dir() may complain with "Directory XXXXX
> with parent 'block' already present!"
>
> Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
> and the dir name is named with q->id from beginning, and switched to
> disk name when adding disk, and finally changed to q->id in disk_release().
Oh, I thought this was already addressed here?:
https://lore.kernel.org/all/20220415035607.1836495-1-yukuai3@huawei.com/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 8:34 [PATCH] block: fix "Directory XXXXX with parent 'block' already present!" Ming Lei
` (2 preceding siblings ...)
2022-04-21 16:37 ` Dan Williams
@ 2022-04-21 17:28 ` Hannes Reinecke
2022-04-22 1:23 ` yukuai (C)
2022-04-21 20:54 ` kernel test robot
2022-04-21 23:06 ` kernel test robot
5 siblings, 1 reply; 12+ messages in thread
From: Hannes Reinecke @ 2022-04-21 17:28 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: linux-block, Dan Williams, yukuai, Shin'ichiro Kawasaki
On 4/21/22 10:34, Ming Lei wrote:
> q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
> created when adding disk, and removed when releasing request queue.
>
> There is small window between releasing disk and releasing request
> queue, and during the period, one disk with same name may be created
> and added, so debugfs_create_dir() may complain with "Directory XXXXX
> with parent 'block' already present!"
>
> Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
> and the dir name is named with q->id from beginning, and switched to
> disk name when adding disk, and finally changed to q->id in disk_release().
>
> Reported-by: Dan Williams <dan.j.williams@intel.com>
> Cc: yukuai (C) <yukuai3@huawei.com>
> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> block/blk-core.c | 4 ++++
> block/blk-sysfs.c | 4 ++--
> block/genhd.c | 8 ++++++++
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index f305cb66c72a..245ec664753d 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -438,6 +438,7 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
> {
> struct request_queue *q;
> int ret;
> + char q_name[16];
>
> q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
> GFP_KERNEL | __GFP_ZERO, node_id);
> @@ -495,6 +496,9 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
> blk_set_default_limits(&q->limits);
> q->nr_requests = BLKDEV_DEFAULT_RQ;
>
> + sprintf(q_name, "%d", q->id);
> + q->debugfs_dir = debugfs_create_dir(q_name, blk_debugfs_root);
> +
> return q;
>
> fail_stats:
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 88bd41d4cb59..1f986c20a07b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -837,8 +837,8 @@ int blk_register_queue(struct gendisk *disk)
> }
>
> mutex_lock(&q->debugfs_mutex);
> - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> - blk_debugfs_root);
> + q->debugfs_dir = debugfs_rename(blk_debugfs_root, q->debugfs_dir,
> + blk_debugfs_root, kobject_name(q->kobj.parent));
> mutex_unlock(&q->debugfs_mutex);
>
> if (queue_is_mq(q)) {
> diff --git a/block/genhd.c b/block/genhd.c
> index 36532b931841..08895f9f7087 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -25,6 +25,7 @@
> #include <linux/pm_runtime.h>
> #include <linux/badblocks.h>
> #include <linux/part_stat.h>
> +#include <linux/debugfs.h>
> #include "blk-throttle.h"
>
> #include "blk.h"
> @@ -1160,6 +1161,7 @@ static void disk_release_mq(struct request_queue *q)
> static void disk_release(struct device *dev)
> {
> struct gendisk *disk = dev_to_disk(dev);
> + char q_name[16];
>
> might_sleep();
> WARN_ON_ONCE(disk_live(disk));
> @@ -1173,6 +1175,12 @@ static void disk_release(struct device *dev)
> kfree(disk->random);
> xa_destroy(&disk->part_tbl);
>
> + mutex_lock(&disk->queue->debugfs_mutex);
> + sprintf(q_name, "%d", disk->queue->id);
> + disk->queue->debugfs_dir = debugfs_rename(blk_debugfs_root,
> + disk->queue->debugfs_dir, blk_debugfs_root, q_name);
> + mutex_unlock(&disk->queue->debugfs_mutex);
> +
> disk->queue->disk = NULL;
> blk_put_queue(disk->queue);
>
I don't think this is the right approach.
From my POV the underlying reason is an imbalance between
debugfs_create_dir() (which happens in blk_register_queue()) and
debugfs_remove_dir() (which happens in blk_release_queue())
So there is a small race window between blk_unregister_queue() and
blk_release_queue(), during which the queue might be re-registered and
then traipses over the (still-existant) queue.
So we should rather move the call to debugfs_remove_dir() into
blk_unregister_queue() to have them both symmetric.
Basically the patch '[PATCH RESEND] blk-mq: fix possible creation
failure for 'debugfs_dir'' from yukuai ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 17:28 ` Hannes Reinecke
@ 2022-04-22 1:23 ` yukuai (C)
2022-04-22 2:52 ` Ming Lei
0 siblings, 1 reply; 12+ messages in thread
From: yukuai (C) @ 2022-04-22 1:23 UTC (permalink / raw)
To: Hannes Reinecke, Ming Lei, Jens Axboe
Cc: linux-block, Dan Williams, Shin'ichiro Kawasaki
在 2022/04/22 1:28, Hannes Reinecke 写道:
> On 4/21/22 10:34, Ming Lei wrote:
>> q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
>> created when adding disk, and removed when releasing request queue.
>>
>> There is small window between releasing disk and releasing request
>> queue, and during the period, one disk with same name may be created
>> and added, so debugfs_create_dir() may complain with "Directory XXXXX
>> with parent 'block' already present!"
>>
>> Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
>> and the dir name is named with q->id from beginning, and switched to
>> disk name when adding disk, and finally changed to q->id in
>> disk_release().
>>
>> Reported-by: Dan Williams <dan.j.williams@intel.com>
>> Cc: yukuai (C) <yukuai3@huawei.com>
>> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>> ---
>> block/blk-core.c | 4 ++++
>> block/blk-sysfs.c | 4 ++--
>> block/genhd.c | 8 ++++++++
>> 3 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index f305cb66c72a..245ec664753d 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -438,6 +438,7 @@ struct request_queue *blk_alloc_queue(int node_id,
>> bool alloc_srcu)
>> {
>> struct request_queue *q;
>> int ret;
>> + char q_name[16];
>> q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
>> GFP_KERNEL | __GFP_ZERO, node_id);
>> @@ -495,6 +496,9 @@ struct request_queue *blk_alloc_queue(int node_id,
>> bool alloc_srcu)
>> blk_set_default_limits(&q->limits);
>> q->nr_requests = BLKDEV_DEFAULT_RQ;
>> + sprintf(q_name, "%d", q->id);
>> + q->debugfs_dir = debugfs_create_dir(q_name, blk_debugfs_root);
>> +
>> return q;
>> fail_stats:
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 88bd41d4cb59..1f986c20a07b 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -837,8 +837,8 @@ int blk_register_queue(struct gendisk *disk)
>> }
>> mutex_lock(&q->debugfs_mutex);
>> - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
>> - blk_debugfs_root);
>> + q->debugfs_dir = debugfs_rename(blk_debugfs_root, q->debugfs_dir,
>> + blk_debugfs_root, kobject_name(q->kobj.parent));
>> mutex_unlock(&q->debugfs_mutex);
>> if (queue_is_mq(q)) {
>> diff --git a/block/genhd.c b/block/genhd.c
>> index 36532b931841..08895f9f7087 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -25,6 +25,7 @@
>> #include <linux/pm_runtime.h>
>> #include <linux/badblocks.h>
>> #include <linux/part_stat.h>
>> +#include <linux/debugfs.h>
>> #include "blk-throttle.h"
>> #include "blk.h"
>> @@ -1160,6 +1161,7 @@ static void disk_release_mq(struct request_queue
>> *q)
>> static void disk_release(struct device *dev)
>> {
>> struct gendisk *disk = dev_to_disk(dev);
>> + char q_name[16];
>> might_sleep();
>> WARN_ON_ONCE(disk_live(disk));
>> @@ -1173,6 +1175,12 @@ static void disk_release(struct device *dev)
>> kfree(disk->random);
>> xa_destroy(&disk->part_tbl);
>> + mutex_lock(&disk->queue->debugfs_mutex);
>> + sprintf(q_name, "%d", disk->queue->id);
>> + disk->queue->debugfs_dir = debugfs_rename(blk_debugfs_root,
>> + disk->queue->debugfs_dir, blk_debugfs_root, q_name);
>> + mutex_unlock(&disk->queue->debugfs_mutex);
>> +
>> disk->queue->disk = NULL;
>> blk_put_queue(disk->queue);
>
> I don't think this is the right approach.
> From my POV the underlying reason is an imbalance between
> debugfs_create_dir() (which happens in blk_register_queue()) and
> debugfs_remove_dir() (which happens in blk_release_queue())
>
> So there is a small race window between blk_unregister_queue() and
> blk_release_queue(), during which the queue might be re-registered and
> then traipses over the (still-existant) queue.
>
> So we should rather move the call to debugfs_remove_dir() into
> blk_unregister_queue() to have them both symmetric.
>
> Basically the patch '[PATCH RESEND] blk-mq: fix possible creation
> failure for 'debugfs_dir'' from yukuai ...
Hi,
I forgot to move 'q->rqos_debugfs_dir' which causes a UAF in
block/002, and Ming was worried that:
blktrace still may work for passthrough req trace after disk is
deleted.
I can shutdown blktrace in blk_unregister_queue(), however I was
worried that concurrent blk_trace_setup() might reenable it.
I can add some delay and check if this is possible, if so, we might
need some other guarantee that blktrace will not running after
blk_unregister_queue().
Thanks,
Kuai
>
> Cheers,
>
> Hannes
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-22 1:23 ` yukuai (C)
@ 2022-04-22 2:52 ` Ming Lei
0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-04-22 2:52 UTC (permalink / raw)
To: yukuai (C)
Cc: Hannes Reinecke, Jens Axboe, linux-block, Dan Williams,
Shin'ichiro Kawasaki
On Fri, Apr 22, 2022 at 09:23:40AM +0800, yukuai (C) wrote:
> 在 2022/04/22 1:28, Hannes Reinecke 写道:
> > On 4/21/22 10:34, Ming Lei wrote:
> > > q->debugfs_dir is used by blk-mq debugfs and blktrace. The dentry is
> > > created when adding disk, and removed when releasing request queue.
> > >
> > > There is small window between releasing disk and releasing request
> > > queue, and during the period, one disk with same name may be created
> > > and added, so debugfs_create_dir() may complain with "Directory XXXXX
> > > with parent 'block' already present!"
> > >
> > > Fixes the issue by moving debugfs_create_dir() into blk_alloc_queue(),
> > > and the dir name is named with q->id from beginning, and switched to
> > > disk name when adding disk, and finally changed to q->id in
> > > disk_release().
> > >
> > > Reported-by: Dan Williams <dan.j.williams@intel.com>
> > > Cc: yukuai (C) <yukuai3@huawei.com>
> > > Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > > block/blk-core.c | 4 ++++
> > > block/blk-sysfs.c | 4 ++--
> > > block/genhd.c | 8 ++++++++
> > > 3 files changed, 14 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index f305cb66c72a..245ec664753d 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -438,6 +438,7 @@ struct request_queue *blk_alloc_queue(int
> > > node_id, bool alloc_srcu)
> > > {
> > > struct request_queue *q;
> > > int ret;
> > > + char q_name[16];
> > > q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
> > > GFP_KERNEL | __GFP_ZERO, node_id);
> > > @@ -495,6 +496,9 @@ struct request_queue *blk_alloc_queue(int
> > > node_id, bool alloc_srcu)
> > > blk_set_default_limits(&q->limits);
> > > q->nr_requests = BLKDEV_DEFAULT_RQ;
> > > + sprintf(q_name, "%d", q->id);
> > > + q->debugfs_dir = debugfs_create_dir(q_name, blk_debugfs_root);
> > > +
> > > return q;
> > > fail_stats:
> > > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > > index 88bd41d4cb59..1f986c20a07b 100644
> > > --- a/block/blk-sysfs.c
> > > +++ b/block/blk-sysfs.c
> > > @@ -837,8 +837,8 @@ int blk_register_queue(struct gendisk *disk)
> > > }
> > > mutex_lock(&q->debugfs_mutex);
> > > - q->debugfs_dir = debugfs_create_dir(kobject_name(q->kobj.parent),
> > > - blk_debugfs_root);
> > > + q->debugfs_dir = debugfs_rename(blk_debugfs_root, q->debugfs_dir,
> > > + blk_debugfs_root, kobject_name(q->kobj.parent));
> > > mutex_unlock(&q->debugfs_mutex);
> > > if (queue_is_mq(q)) {
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index 36532b931841..08895f9f7087 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -25,6 +25,7 @@
> > > #include <linux/pm_runtime.h>
> > > #include <linux/badblocks.h>
> > > #include <linux/part_stat.h>
> > > +#include <linux/debugfs.h>
> > > #include "blk-throttle.h"
> > > #include "blk.h"
> > > @@ -1160,6 +1161,7 @@ static void disk_release_mq(struct
> > > request_queue *q)
> > > static void disk_release(struct device *dev)
> > > {
> > > struct gendisk *disk = dev_to_disk(dev);
> > > + char q_name[16];
> > > might_sleep();
> > > WARN_ON_ONCE(disk_live(disk));
> > > @@ -1173,6 +1175,12 @@ static void disk_release(struct device *dev)
> > > kfree(disk->random);
> > > xa_destroy(&disk->part_tbl);
> > > + mutex_lock(&disk->queue->debugfs_mutex);
> > > + sprintf(q_name, "%d", disk->queue->id);
> > > + disk->queue->debugfs_dir = debugfs_rename(blk_debugfs_root,
> > > + disk->queue->debugfs_dir, blk_debugfs_root, q_name);
> > > + mutex_unlock(&disk->queue->debugfs_mutex);
> > > +
> > > disk->queue->disk = NULL;
> > > blk_put_queue(disk->queue);
> >
> > I don't think this is the right approach.
> > From my POV the underlying reason is an imbalance between
> > debugfs_create_dir() (which happens in blk_register_queue()) and
> > debugfs_remove_dir() (which happens in blk_release_queue())
> >
> > So there is a small race window between blk_unregister_queue() and
> > blk_release_queue(), during which the queue might be re-registered and
> > then traipses over the (still-existant) queue.
> >
> > So we should rather move the call to debugfs_remove_dir() into
> > blk_unregister_queue() to have them both symmetric.
> >
> > Basically the patch '[PATCH RESEND] blk-mq: fix possible creation
> > failure for 'debugfs_dir'' from yukuai ...
> Hi,
>
> I forgot to move 'q->rqos_debugfs_dir' which causes a UAF in
> block/002, and Ming was worried that:
>
> blktrace still may work for passthrough req trace after disk is
> deleted.
There are other issues in your patch:
- "debugfs directory deleted with blktrace active" in block/002 could
be triggered.
- disk_release_mq() calls elevator_exit()/rq_qos_exit(), and the two
may trigger UAF if q->debugfs_dir is removed in blk_unregister_queue().
>
> I can shutdown blktrace in blk_unregister_queue(), however I was
> worried that concurrent blk_trace_setup() might reenable it.
blktrace does work for tracing passthrough request after
disk is removed, and your patch makes it not possible.
blk_trace_shutdown() should have been done after releasing disk.
Thanks,
Ming
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 8:34 [PATCH] block: fix "Directory XXXXX with parent 'block' already present!" Ming Lei
` (3 preceding siblings ...)
2022-04-21 17:28 ` Hannes Reinecke
@ 2022-04-21 20:54 ` kernel test robot
2022-04-21 23:06 ` kernel test robot
5 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2022-04-21 20:54 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: kbuild-all, linux-block, Ming Lei, Dan Williams, yukuai,
Shin'ichiro Kawasaki
Hi Ming,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on v5.18-rc3 next-20220421]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/block-fix-Directory-XXXXX-with-parent-block-already-present/20220421-163556
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: csky-randconfig-r036-20220421 (https://download.01.org/0day-ci/archive/20220422/202204220422.9SM0r5ue-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/746684a4668ee70b284126159a10f98ff7ebb319
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Ming-Lei/block-fix-Directory-XXXXX-with-parent-block-already-present/20220421-163556
git checkout 746684a4668ee70b284126159a10f98ff7ebb319
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=csky SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
block/blk-sysfs.c: In function 'blk_register_queue':
>> block/blk-sysfs.c:841:43: warning: passing argument 4 of 'debugfs_rename' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
841 | blk_debugfs_root, kobject_name(q->kobj.parent));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from block/blk-sysfs.c:13:
include/linux/debugfs.h:252:47: note: expected 'char *' but argument is of type 'const char *'
252 | struct dentry *new_dir, char *new_name)
| ~~~~~~^~~~~~~~
vim +841 block/blk-sysfs.c
808
809 /**
810 * blk_register_queue - register a block layer queue with sysfs
811 * @disk: Disk of which the request queue should be registered with sysfs.
812 */
813 int blk_register_queue(struct gendisk *disk)
814 {
815 int ret;
816 struct device *dev = disk_to_dev(disk);
817 struct request_queue *q = disk->queue;
818
819 ret = blk_trace_init_sysfs(dev);
820 if (ret)
821 return ret;
822
823 mutex_lock(&q->sysfs_dir_lock);
824
825 ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue");
826 if (ret < 0) {
827 blk_trace_remove_sysfs(dev);
828 goto unlock;
829 }
830
831 ret = sysfs_create_group(&q->kobj, &queue_attr_group);
832 if (ret) {
833 blk_trace_remove_sysfs(dev);
834 kobject_del(&q->kobj);
835 kobject_put(&dev->kobj);
836 goto unlock;
837 }
838
839 mutex_lock(&q->debugfs_mutex);
840 q->debugfs_dir = debugfs_rename(blk_debugfs_root, q->debugfs_dir,
> 841 blk_debugfs_root, kobject_name(q->kobj.parent));
842 mutex_unlock(&q->debugfs_mutex);
843
844 if (queue_is_mq(q)) {
845 __blk_mq_register_dev(dev, q);
846 blk_mq_debugfs_register(q);
847 }
848
849 mutex_lock(&q->sysfs_lock);
850
851 ret = disk_register_independent_access_ranges(disk, NULL);
852 if (ret)
853 goto put_dev;
854
855 if (q->elevator) {
856 ret = elv_register_queue(q, false);
857 if (ret)
858 goto put_dev;
859 }
860
861 ret = blk_crypto_sysfs_register(q);
862 if (ret)
863 goto put_dev;
864
865 blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
866 wbt_enable_default(q);
867 blk_throtl_register_queue(q);
868
869 /* Now everything is ready and send out KOBJ_ADD uevent */
870 kobject_uevent(&q->kobj, KOBJ_ADD);
871 if (q->elevator)
872 kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
873 mutex_unlock(&q->sysfs_lock);
874
875 unlock:
876 mutex_unlock(&q->sysfs_dir_lock);
877
878 /*
879 * SCSI probing may synchronously create and destroy a lot of
880 * request_queues for non-existent devices. Shutting down a fully
881 * functional queue takes measureable wallclock time as RCU grace
882 * periods are involved. To avoid excessive latency in these
883 * cases, a request_queue starts out in a degraded mode which is
884 * faster to shut down and is made fully functional here as
885 * request_queues for non-existent devices never get registered.
886 */
887 if (!blk_queue_init_done(q)) {
888 blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q);
889 percpu_ref_switch_to_percpu(&q->q_usage_counter);
890 }
891
892 return ret;
893
894 put_dev:
895 elv_unregister_queue(q);
896 disk_unregister_independent_access_ranges(disk);
897 mutex_unlock(&q->sysfs_lock);
898 mutex_unlock(&q->sysfs_dir_lock);
899 kobject_del(&q->kobj);
900 blk_trace_remove_sysfs(dev);
901 kobject_put(&dev->kobj);
902
903 return ret;
904 }
905
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] block: fix "Directory XXXXX with parent 'block' already present!"
2022-04-21 8:34 [PATCH] block: fix "Directory XXXXX with parent 'block' already present!" Ming Lei
` (4 preceding siblings ...)
2022-04-21 20:54 ` kernel test robot
@ 2022-04-21 23:06 ` kernel test robot
5 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2022-04-21 23:06 UTC (permalink / raw)
To: Ming Lei, Jens Axboe
Cc: llvm, kbuild-all, linux-block, Ming Lei, Dan Williams, yukuai,
Shin'ichiro Kawasaki
Hi Ming,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on v5.18-rc3 next-20220421]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/block-fix-Directory-XXXXX-with-parent-block-already-present/20220421-163556
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: arm64-randconfig-r034-20220421 (https://download.01.org/0day-ci/archive/20220422/202204220614.mF9U5ks7-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/746684a4668ee70b284126159a10f98ff7ebb319
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Ming-Lei/block-fix-Directory-XXXXX-with-parent-block-already-present/20220421-163556
git checkout 746684a4668ee70b284126159a10f98ff7ebb319
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> block/blk-sysfs.c:841:22: error: passing 'const char *' to parameter of type 'char *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
blk_debugfs_root, kobject_name(q->kobj.parent));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/debugfs.h:252:47: note: passing argument to parameter 'new_name' here
struct dentry *new_dir, char *new_name)
^
1 error generated.
vim +841 block/blk-sysfs.c
808
809 /**
810 * blk_register_queue - register a block layer queue with sysfs
811 * @disk: Disk of which the request queue should be registered with sysfs.
812 */
813 int blk_register_queue(struct gendisk *disk)
814 {
815 int ret;
816 struct device *dev = disk_to_dev(disk);
817 struct request_queue *q = disk->queue;
818
819 ret = blk_trace_init_sysfs(dev);
820 if (ret)
821 return ret;
822
823 mutex_lock(&q->sysfs_dir_lock);
824
825 ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue");
826 if (ret < 0) {
827 blk_trace_remove_sysfs(dev);
828 goto unlock;
829 }
830
831 ret = sysfs_create_group(&q->kobj, &queue_attr_group);
832 if (ret) {
833 blk_trace_remove_sysfs(dev);
834 kobject_del(&q->kobj);
835 kobject_put(&dev->kobj);
836 goto unlock;
837 }
838
839 mutex_lock(&q->debugfs_mutex);
840 q->debugfs_dir = debugfs_rename(blk_debugfs_root, q->debugfs_dir,
> 841 blk_debugfs_root, kobject_name(q->kobj.parent));
842 mutex_unlock(&q->debugfs_mutex);
843
844 if (queue_is_mq(q)) {
845 __blk_mq_register_dev(dev, q);
846 blk_mq_debugfs_register(q);
847 }
848
849 mutex_lock(&q->sysfs_lock);
850
851 ret = disk_register_independent_access_ranges(disk, NULL);
852 if (ret)
853 goto put_dev;
854
855 if (q->elevator) {
856 ret = elv_register_queue(q, false);
857 if (ret)
858 goto put_dev;
859 }
860
861 ret = blk_crypto_sysfs_register(q);
862 if (ret)
863 goto put_dev;
864
865 blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
866 wbt_enable_default(q);
867 blk_throtl_register_queue(q);
868
869 /* Now everything is ready and send out KOBJ_ADD uevent */
870 kobject_uevent(&q->kobj, KOBJ_ADD);
871 if (q->elevator)
872 kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
873 mutex_unlock(&q->sysfs_lock);
874
875 unlock:
876 mutex_unlock(&q->sysfs_dir_lock);
877
878 /*
879 * SCSI probing may synchronously create and destroy a lot of
880 * request_queues for non-existent devices. Shutting down a fully
881 * functional queue takes measureable wallclock time as RCU grace
882 * periods are involved. To avoid excessive latency in these
883 * cases, a request_queue starts out in a degraded mode which is
884 * faster to shut down and is made fully functional here as
885 * request_queues for non-existent devices never get registered.
886 */
887 if (!blk_queue_init_done(q)) {
888 blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, q);
889 percpu_ref_switch_to_percpu(&q->q_usage_counter);
890 }
891
892 return ret;
893
894 put_dev:
895 elv_unregister_queue(q);
896 disk_unregister_independent_access_ranges(disk);
897 mutex_unlock(&q->sysfs_lock);
898 mutex_unlock(&q->sysfs_dir_lock);
899 kobject_del(&q->kobj);
900 blk_trace_remove_sysfs(dev);
901 kobject_put(&dev->kobj);
902
903 return ret;
904 }
905
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 12+ messages in thread