[PATCH v2 0/2] btrfs: zoned: skip reporting zone for new block group

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] btrfs: zoned: skip reporting zone for new block group
@ 2025-03-18  0:59 Naohiro Aota
  2025-03-18  0:59 ` [PATCH v2 1/2] block: introduce zone capacity helper Naohiro Aota
  2025-03-18  0:59 ` [PATCH v2 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
  0 siblings, 2 replies; 9+ messages in thread
From: Naohiro Aota @ 2025-03-18  0:59 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dlemoal, axboe, linux-block, hch, Naohiro Aota

Newly created block group should reside on empty zones, whose write pointer
should always be 0. Also, we can load the zone capacity from the block
layer. So, we don't need to REPORT_ZONE to load the info.

The reporting zone on a new block group is not only unnecessary, but also
can cause a deadlock. When one process do a report zones and another
process freezes the block device, the report zones side cannot allocate
a tag because the freeze is already started. This can thus result in new
block group creation to hang forever, blocking the write path.

v1: https://patch.msgid.link/cover.1741596325.git.naohiro.aota@wdc.com

Naohiro Aota (2):
  block: introduce zone capacity helper
  btrfs: zoned: skip reporting zone for new block group

 fs/btrfs/zoned.c       | 18 ++++++++++--
 include/linux/blkdev.h | 67 ++++++++++++++++++++++++++++--------------
 2 files changed, 61 insertions(+), 24 deletions(-)

-- 
2.49.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] block: introduce zone capacity helper
  2025-03-18  0:59 [PATCH v2 0/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
@ 2025-03-18  0:59 ` Naohiro Aota
  2025-03-18  5:14   ` Damien Le Moal
  2025-03-18  7:27   ` Johannes Thumshirn
  2025-03-18  0:59 ` [PATCH v2 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
  1 sibling, 2 replies; 9+ messages in thread
From: Naohiro Aota @ 2025-03-18  0:59 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dlemoal, axboe, linux-block, hch, Naohiro Aota

{bdev,disk}_zone_capacity() takes block_device or gendisk and sector position
and returns the zone capacity of the corresponding zone.

With that, move disk_nr_zones() and blk_zone_plug_bio() to consolidate them in
the same #ifdef block.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 include/linux/blkdev.h | 67 ++++++++++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d37751789bf5..6f1bdec27ac7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -691,23 +691,6 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
 		(q->limits.features & BLK_FEAT_ZONED);
 }
 
-#ifdef CONFIG_BLK_DEV_ZONED
-static inline unsigned int disk_nr_zones(struct gendisk *disk)
-{
-	return disk->nr_zones;
-}
-bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs);
-#else /* CONFIG_BLK_DEV_ZONED */
-static inline unsigned int disk_nr_zones(struct gendisk *disk)
-{
-	return 0;
-}
-static inline bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs)
-{
-	return false;
-}
-#endif /* CONFIG_BLK_DEV_ZONED */
-
 static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
 {
 	if (!blk_queue_is_zoned(disk->queue))
@@ -715,11 +698,6 @@ static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
 	return sector >> ilog2(disk->queue->limits.chunk_sectors);
 }
 
-static inline unsigned int bdev_nr_zones(struct block_device *bdev)
-{
-	return disk_nr_zones(bdev->bd_disk);
-}
-
 static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
 {
 	return bdev->bd_disk->queue->limits.max_open_zones;
@@ -826,6 +804,51 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb)
 		(sb->s_blocksize_bits - SECTOR_SHIFT);
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static inline unsigned int disk_nr_zones(struct gendisk *disk)
+{
+	return disk->nr_zones;
+}
+bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs);
+
+/**
+ * disk_zone_capacity - returns the zone capacity of zone at @pos
+ * @disk:	disk to work with
+ * @pos:	sector number within the querying zone
+ *
+ * Returns the zone capacity of a zone at @pos. @pos can be any sector contained
+ * in the zone.
+ */
+static inline unsigned int disk_zone_capacity(struct gendisk *disk,
+					      sector_t pos)
+{
+	sector_t zone_sectors = disk->queue->limits.chunk_sectors;
+
+	if (pos + zone_sectors >= get_capacity(disk))
+		return disk->last_zone_capacity;
+	return disk->zone_capacity;
+}
+static inline unsigned int bdev_zone_capacity(struct block_device *bdev,
+					      sector_t pos)
+{
+	return disk_zone_capacity(bdev->bd_disk, pos);
+}
+#else /* CONFIG_BLK_DEV_ZONED */
+static inline unsigned int disk_nr_zones(struct gendisk *disk)
+{
+	return 0;
+}
+static inline bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs)
+{
+	return false;
+}
+#endif /* CONFIG_BLK_DEV_ZONED */
+
+static inline unsigned int bdev_nr_zones(struct block_device *bdev)
+{
+	return disk_nr_zones(bdev->bd_disk);
+}
+
 int bdev_disk_changed(struct gendisk *disk, bool invalidate);
 
 void put_disk(struct gendisk *disk);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone capacity helper
  2025-03-18  0:59 ` [PATCH v2 1/2] block: introduce zone capacity helper Naohiro Aota
@ 2025-03-18  5:14   ` Damien Le Moal
  2025-03-18  5:54     ` Christoph Hellwig
  2025-03-18  7:27   ` Johannes Thumshirn
  1 sibling, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2025-03-18  5:14 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs; +Cc: axboe, linux-block, hch

On 3/18/25 9:59 AM, Naohiro Aota wrote:
> {bdev,disk}_zone_capacity() takes block_device or gendisk and sector position
> and returns the zone capacity of the corresponding zone.
> 
> With that, move disk_nr_zones() and blk_zone_plug_bio() to consolidate them in
> the same #ifdef block.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  include/linux/blkdev.h | 67 ++++++++++++++++++++++++++++--------------
>  1 file changed, 45 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index d37751789bf5..6f1bdec27ac7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -691,23 +691,6 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
>  		(q->limits.features & BLK_FEAT_ZONED);
>  }
>  
> -#ifdef CONFIG_BLK_DEV_ZONED
> -static inline unsigned int disk_nr_zones(struct gendisk *disk)
> -{
> -	return disk->nr_zones;
> -}
> -bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs);
> -#else /* CONFIG_BLK_DEV_ZONED */
> -static inline unsigned int disk_nr_zones(struct gendisk *disk)
> -{
> -	return 0;
> -}
> -static inline bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs)
> -{
> -	return false;
> -}
> -#endif /* CONFIG_BLK_DEV_ZONED */
> -
>  static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
>  {
>  	if (!blk_queue_is_zoned(disk->queue))
> @@ -715,11 +698,6 @@ static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
>  	return sector >> ilog2(disk->queue->limits.chunk_sectors);
>  }
>  
> -static inline unsigned int bdev_nr_zones(struct block_device *bdev)
> -{
> -	return disk_nr_zones(bdev->bd_disk);
> -}
> -
>  static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
>  {
>  	return bdev->bd_disk->queue->limits.max_open_zones;
> @@ -826,6 +804,51 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb)
>  		(sb->s_blocksize_bits - SECTOR_SHIFT);
>  }
>  
> +#ifdef CONFIG_BLK_DEV_ZONED
> +static inline unsigned int disk_nr_zones(struct gendisk *disk)
> +{
> +	return disk->nr_zones;
> +}
> +bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs);
> +
> +/**
> + * disk_zone_capacity - returns the zone capacity of zone at @pos

@pos is not the start of the zone, so this should be:

 * disk_zone_capacity - returns the capacity of the zone containing @sect

And rename pos to sect.

With that, this looks good to me, so feel free to add:

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone capacity helper
  2025-03-18  5:14   ` Damien Le Moal
@ 2025-03-18  5:54     ` Christoph Hellwig
  2025-03-18  7:57       ` Damien Le Moal
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2025-03-18  5:54 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Naohiro Aota, linux-btrfs, axboe, linux-block, hch

On Tue, Mar 18, 2025 at 02:14:29PM +0900, Damien Le Moal wrote:
> @pos is not the start of the zone, so this should be:
> 
>  * disk_zone_capacity - returns the capacity of the zone containing @sect
> 
> And rename pos to sect.

While we're nitpicking:  sect is a bit weird, the block layer usually
spells it out as sectors (sect sounds too close to the german word for
sparkling wine anyway :))


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone capacity helper
  2025-03-18  5:54     ` Christoph Hellwig
@ 2025-03-18  7:57       ` Damien Le Moal
  2025-03-18  8:04         ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Le Moal @ 2025-03-18  7:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Naohiro Aota, linux-btrfs, axboe, linux-block

On 3/18/25 2:54 PM, Christoph Hellwig wrote:
> On Tue, Mar 18, 2025 at 02:14:29PM +0900, Damien Le Moal wrote:
>> @pos is not the start of the zone, so this should be:
>>
>>  * disk_zone_capacity - returns the capacity of the zone containing @sect
>>
>> And rename pos to sect.
> 
> While we're nitpicking:  sect is a bit weird, the block layer usually
> spells it out as sectors (sect sounds too close to the german word for
> sparkling wine anyway :))

Given that this is not a number of sectors, "sector" would be the right name
then :)


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone capacity helper
  2025-03-18  7:57       ` Damien Le Moal
@ 2025-03-18  8:04         ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2025-03-18  8:04 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Naohiro Aota, linux-btrfs, axboe, linux-block

On Tue, Mar 18, 2025 at 04:57:45PM +0900, Damien Le Moal wrote:
> Given that this is not a number of sectors, "sector" would be the right name
> then :)

Yes.  That's what I meant, but I obviously can't type before the
first three coffees of the day.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] block: introduce zone capacity helper
  2025-03-18  0:59 ` [PATCH v2 1/2] block: introduce zone capacity helper Naohiro Aota
  2025-03-18  5:14   ` Damien Le Moal
@ 2025-03-18  7:27   ` Johannes Thumshirn
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2025-03-18  7:27 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org
  Cc: dlemoal@kernel.org, axboe@kernel.dk, linux-block@vger.kernel.org,
	hch@infradead.org

Minus the nitpicks from Damien and Christoph,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] btrfs: zoned: skip reporting zone for new block group
  2025-03-18  0:59 [PATCH v2 0/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
  2025-03-18  0:59 ` [PATCH v2 1/2] block: introduce zone capacity helper Naohiro Aota
@ 2025-03-18  0:59 ` Naohiro Aota
  2025-03-18  7:25   ` Johannes Thumshirn
  1 sibling, 1 reply; 9+ messages in thread
From: Naohiro Aota @ 2025-03-18  0:59 UTC (permalink / raw)
  To: linux-btrfs
  Cc: dlemoal, axboe, linux-block, hch, Naohiro Aota,
	Shin'ichiro Kawasaki

There is a potential deadlock if we do report zones in an IO context, detailed
in below lockdep report. When one process do a report zones and another process
freezes the block device, the report zones side cannot allocate a tag because
the freeze is already started. This can thus result in new block group creation
to hang forever, blocking the write path.

Thankfully, a new block group should be created on empty zones. So, reporting
the zones is not necessary and we can set the write pointer = 0 and load the
zone capacity from the block layer using bdev_zone_capacity() helper.

 ======================================================
 WARNING: possible circular locking dependency detected
 6.14.0-rc1 #252 Not tainted
 ------------------------------------------------------
 modprobe/1110 is trying to acquire lock:
 ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60

 but task is already holding lock:
 ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
        blk_queue_enter+0x3d9/0x500
        blk_mq_alloc_request+0x47d/0x8e0
        scsi_execute_cmd+0x14f/0xb80
        sd_zbc_do_report_zones+0x1c1/0x470
        sd_zbc_report_zones+0x362/0xd60
        blkdev_report_zones+0x1b1/0x2e0
        btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
        btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
        btrfs_make_block_group+0x36b/0x870 [btrfs]
        btrfs_create_chunk+0x147d/0x2320 [btrfs]
        btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
        start_transaction+0xce6/0x1620 [btrfs]
        btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
        kthread+0x39d/0x750
        ret_from_fork+0x30/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
        down_read+0x9b/0x470
        btrfs_map_block+0x2ce/0x2ce0 [btrfs]
        btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
        btrfs_submit_bbio+0x16/0x30 [btrfs]
        btree_write_cache_pages+0xb5a/0xf90 [btrfs]
        do_writepages+0x17f/0x7b0
        __writeback_single_inode+0x114/0xb00
        writeback_sb_inodes+0x52b/0xe00
        wb_writeback+0x1a7/0x800
        wb_workfn+0x12a/0xbd0
        process_one_work+0x85a/0x1460
        worker_thread+0x5e2/0xfc0
        kthread+0x39d/0x750
        ret_from_fork+0x30/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
        __mutex_lock+0x1aa/0x1360
        btree_write_cache_pages+0x252/0xf90 [btrfs]
        do_writepages+0x17f/0x7b0
        __writeback_single_inode+0x114/0xb00
        writeback_sb_inodes+0x52b/0xe00
        wb_writeback+0x1a7/0x800
        wb_workfn+0x12a/0xbd0
        process_one_work+0x85a/0x1460
        worker_thread+0x5e2/0xfc0
        kthread+0x39d/0x750
        ret_from_fork+0x30/0x70
        ret_from_fork_asm+0x1a/0x30

 -> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
        __lock_acquire+0x2f52/0x5ea0
        lock_acquire+0x1b1/0x540
        __flush_work+0x3ac/0xb60
        wb_shutdown+0x15b/0x1f0
        bdi_unregister+0x172/0x5b0
        del_gendisk+0x841/0xa20
        sd_remove+0x85/0x130
        device_release_driver_internal+0x368/0x520
        bus_remove_device+0x1f1/0x3f0
        device_del+0x3bd/0x9c0
        __scsi_remove_device+0x272/0x340
        scsi_forget_host+0xf7/0x170
        scsi_remove_host+0xd2/0x2a0
        sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
        device_release_driver_internal+0x368/0x520
        bus_remove_device+0x1f1/0x3f0
        device_del+0x3bd/0x9c0
        device_unregister+0x13/0xa0
        sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
        scsi_debug_exit+0x17/0x70 [scsi_debug]
        __do_sys_delete_module.isra.0+0x321/0x520
        do_syscall_64+0x93/0x180
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 other info that might help us debug this:

 Chain exists of:
   (work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)#16

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&q->q_usage_counter(queue)#16);
                                lock(&fs_info->dev_replace.rwsem);
                                lock(&q->q_usage_counter(queue)#16);
   lock((work_completion)(&(&wb->dwork)->work));

  *** DEADLOCK ***

 5 locks held by modprobe/1110:
  #0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
  #1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0
  #2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
  #3: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
  #4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60

 stack backtrace:
 CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 #252
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x6a/0x90
  print_circular_bug.cold+0x1e0/0x274
  check_noncircular+0x306/0x3f0
  ? __pfx_check_noncircular+0x10/0x10
  ? mark_lock+0xf5/0x1650
  ? __pfx_check_irq_usage+0x10/0x10
  ? lockdep_lock+0xca/0x1c0
  ? __pfx_lockdep_lock+0x10/0x10
  __lock_acquire+0x2f52/0x5ea0
  ? __pfx___lock_acquire+0x10/0x10
  ? __pfx_mark_lock+0x10/0x10
  lock_acquire+0x1b1/0x540
  ? __flush_work+0x38f/0xb60
  ? __pfx_lock_acquire+0x10/0x10
  ? __pfx_lock_release+0x10/0x10
  ? mark_held_locks+0x94/0xe0
  ? __flush_work+0x38f/0xb60
  __flush_work+0x3ac/0xb60
  ? __flush_work+0x38f/0xb60
  ? __pfx_mark_lock+0x10/0x10
  ? __pfx___flush_work+0x10/0x10
  ? __pfx_wq_barrier_func+0x10/0x10
  ? __pfx___might_resched+0x10/0x10
  ? mark_held_locks+0x94/0xe0
  wb_shutdown+0x15b/0x1f0
  bdi_unregister+0x172/0x5b0
  ? __pfx_bdi_unregister+0x10/0x10
  ? up_write+0x1ba/0x510
  del_gendisk+0x841/0xa20
  ? __pfx_del_gendisk+0x10/0x10
  ? _raw_spin_unlock_irqrestore+0x35/0x60
  ? __pm_runtime_resume+0x79/0x110
  sd_remove+0x85/0x130
  device_release_driver_internal+0x368/0x520
  ? kobject_put+0x5d/0x4a0
  bus_remove_device+0x1f1/0x3f0
  device_del+0x3bd/0x9c0
  ? __pfx_device_del+0x10/0x10
  __scsi_remove_device+0x272/0x340
  scsi_forget_host+0xf7/0x170
  scsi_remove_host+0xd2/0x2a0
  sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
  ? kernfs_remove_by_name_ns+0xc0/0xf0
  device_release_driver_internal+0x368/0x520
  ? kobject_put+0x5d/0x4a0
  bus_remove_device+0x1f1/0x3f0
  device_del+0x3bd/0x9c0
  ? __pfx_device_del+0x10/0x10
  ? __pfx___mutex_unlock_slowpath+0x10/0x10
  device_unregister+0x13/0xa0
  sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
  scsi_debug_exit+0x17/0x70 [scsi_debug]
  __do_sys_delete_module.isra.0+0x321/0x520
  ? __pfx___do_sys_delete_module.isra.0+0x10/0x10
  ? __pfx_slab_free_after_rcu_debug+0x10/0x10
  ? kasan_save_stack+0x2c/0x50
  ? kasan_record_aux_stack+0xa3/0xb0
  ? __call_rcu_common.constprop.0+0xc4/0xfb0
  ? kmem_cache_free+0x3a0/0x590
  ? __x64_sys_close+0x78/0xd0
  do_syscall_64+0x93/0x180
  ? lock_is_held_type+0xd5/0x130
  ? __call_rcu_common.constprop.0+0x3c0/0xfb0
  ? lockdep_hardirqs_on+0x78/0x100
  ? __call_rcu_common.constprop.0+0x3c0/0xfb0
  ? __pfx___call_rcu_common.constprop.0+0x10/0x10
  ? kmem_cache_free+0x3a0/0x590
  ? lockdep_hardirqs_on_prepare+0x16d/0x400
  ? do_syscall_64+0x9f/0x180
  ? lockdep_hardirqs_on+0x78/0x100
  ? do_syscall_64+0x9f/0x180
  ? __pfx___x64_sys_openat+0x10/0x10
  ? lockdep_hardirqs_on_prepare+0x16d/0x400
  ? do_syscall_64+0x9f/0x180
  ? lockdep_hardirqs_on+0x78/0x100
  ? do_syscall_64+0x9f/0x180
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f436712b68b
 Code: 73 01 c3 48 8b 0d 8d a7 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d a7 0c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b
 RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8
 RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000
 R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000
 R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000
  </TASK>

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
CC: stable@vger.kernel.org # 6.13+
---
 fs/btrfs/zoned.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 4956baf8220a..6c730f6bce10 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1277,7 +1277,7 @@ struct zone_info {
 
 static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 				struct zone_info *info, unsigned long *active,
-				struct btrfs_chunk_map *map)
+				struct btrfs_chunk_map *map, bool new)
 {
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 	struct btrfs_device *device;
@@ -1307,6 +1307,8 @@ static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 		return 0;
 	}
 
+	ASSERT(!new || btrfs_dev_is_empty_zone(device, info->physical));
+
 	/* This zone will be used for allocation, so mark this zone non-empty. */
 	btrfs_dev_clear_zone_empty(device, info->physical);
 
@@ -1319,6 +1321,18 @@ static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 	 * to determine the allocation offset within the zone.
 	 */
 	WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
+
+	if (new) {
+		sector_t capacity;
+
+		capacity = bdev_zone_capacity(device->bdev, info->physical >> SECTOR_SHIFT);
+		up_read(&dev_replace->rwsem);
+		info->alloc_offset = 0;
+		info->capacity = capacity << SECTOR_SHIFT;
+
+		return 0;
+	}
+
 	nofs_flag = memalloc_nofs_save();
 	ret = btrfs_get_dev_zone(device, info->physical, &zone);
 	memalloc_nofs_restore(nofs_flag);
@@ -1588,7 +1602,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	}
 
 	for (i = 0; i < map->num_stripes; i++) {
-		ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map);
+		ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map, new);
 		if (ret)
 			goto out;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] btrfs: zoned: skip reporting zone for new block group
  2025-03-18  0:59 ` [PATCH v2 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
@ 2025-03-18  7:25   ` Johannes Thumshirn
  0 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2025-03-18  7:25 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs@vger.kernel.org
  Cc: dlemoal@kernel.org, axboe@kernel.dk, linux-block@vger.kernel.org,
	hch@infradead.org, Shinichiro Kawasaki

On 18.03.25 02:00, Naohiro Aota wrote:
> There is a potential deadlock if we do report zones in an IO context, detailed
> in below lockdep report. When one process do a report zones and another process
                                         does ^
> freezes the block device, the report zones side cannot allocate a tag because
> the freeze is already started. This can thus result in new block group creation
> to hang forever, blocking the write path.
> 



>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>   RIP: 0033:0x7f436712b68b
>   Code: 73 01 c3 48 8b 0d 8d a7 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d a7 0c 00 f7 d8 64 89 01 48

The code line isn't really necessary for the bug report and overly long, 
so I think you can just delete it.
Other than that,

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-18  8:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18  0:59 [PATCH v2 0/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
2025-03-18  0:59 ` [PATCH v2 1/2] block: introduce zone capacity helper Naohiro Aota
2025-03-18  5:14   ` Damien Le Moal
2025-03-18  5:54     ` Christoph Hellwig
2025-03-18  7:57       ` Damien Le Moal
2025-03-18  8:04         ` Christoph Hellwig
2025-03-18  7:27   ` Johannes Thumshirn
2025-03-18  0:59 ` [PATCH v2 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota
2025-03-18  7:25   ` Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox