* [PATCH 0/2] btrfs: zoned: skip reporting zone for new block group @ 2025-03-12 1:31 Naohiro Aota 2025-03-12 1:31 ` [PATCH 1/2] block: introduce zone capacity helper Naohiro Aota 2025-03-12 1:31 ` [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota 0 siblings, 2 replies; 11+ messages in thread From: Naohiro Aota @ 2025-03-12 1:31 UTC (permalink / raw) To: linux-btrfs; +Cc: dlemoal, axboe, linux-block, Naohiro Aota Newly created block group should reside on empty zones, whose write pointer should always be 0. Also, we can load the zone capacity from the block layer. So, we don't need to REPORT_ZONE to load the info. The reporting zone on a new block group is not only unnecessary, but also can cause a deadlock. When one process do a report zones and another process freezes the block device, the report zones side cannot allocate a tag because the freeze is already started. This can thus result in new block group creation to hang forever, blocking the write path. Naohiro Aota (2): block: introduce zone capacity helper btrfs: zoned: skip reporting zone for new block group fs/btrfs/zoned.c | 18 ++++++++++++++++-- include/linux/blkdev.h | 21 +++++++++++++++++++++ 2 files changed, 37 insertions(+), 2 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 1:31 [PATCH 0/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota @ 2025-03-12 1:31 ` Naohiro Aota 2025-03-12 1:39 ` Damien Le Moal 2025-03-12 5:27 ` Christoph Hellwig 2025-03-12 1:31 ` [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota 1 sibling, 2 replies; 11+ messages in thread From: Naohiro Aota @ 2025-03-12 1:31 UTC (permalink / raw) To: linux-btrfs; +Cc: dlemoal, axboe, linux-block, Naohiro Aota {bdev,disk}_zone_capacity() takes block_device or gendisk and sector position and returns the zone capacity of the corresponding zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- include/linux/blkdev.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d37751789bf5..3c860a0cf339 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -826,6 +826,27 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb) (sb->s_blocksize_bits - SECTOR_SHIFT); } +#ifdef CONFIG_BLK_DEV_ZONED +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) +{ + sector_t zone_sectors = disk->queue->limits.chunk_sectors; + + if (pos + zone_sectors >= get_capacity(disk)) + return disk->last_zone_capacity; + return disk->zone_capacity; +} +#else /* CONFIG_BLK_DEV_ZONED */ +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) +{ + return 0; +} +#endif /* CONFIG_BLK_DEV_ZONED */ + +static inline unsigned int bdev_zone_capacity(struct block_device *bdev, sector_t pos) +{ + return disk_zone_capacity(bdev->bd_disk, pos); +} + int bdev_disk_changed(struct gendisk *disk, bool invalidate); void put_disk(struct gendisk *disk); -- 2.48.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 1:31 ` [PATCH 1/2] block: introduce zone capacity helper Naohiro Aota @ 2025-03-12 1:39 ` Damien Le Moal 2025-03-12 2:07 ` Naohiro Aota 2025-03-12 5:27 ` Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Damien Le Moal @ 2025-03-12 1:39 UTC (permalink / raw) To: Naohiro Aota, linux-btrfs; +Cc: axboe, linux-block On 3/12/25 10:31, Naohiro Aota wrote: > {bdev,disk}_zone_capacity() takes block_device or gendisk and sector position > and returns the zone capacity of the corresponding zone. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > include/linux/blkdev.h | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index d37751789bf5..3c860a0cf339 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -826,6 +826,27 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb) > (sb->s_blocksize_bits - SECTOR_SHIFT); > } > > +#ifdef CONFIG_BLK_DEV_ZONED There is already an "#ifdef CONFIG_BLK_DEV_ZONED" in blkdev.h, see disk_nr_zones(). Please add this new helper under that ifdef to avoid adding a new one. > +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) > +{ > + sector_t zone_sectors = disk->queue->limits.chunk_sectors; > + > + if (pos + zone_sectors >= get_capacity(disk)) > + return disk->last_zone_capacity; > + return disk->zone_capacity; > +} > +#else /* CONFIG_BLK_DEV_ZONED */ > +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) Is this ever called for a non zoned drive ? It should not be... So do we really need this stub ? > +{ > + return 0; > +} > +#endif /* CONFIG_BLK_DEV_ZONED */ > + > +static inline unsigned int bdev_zone_capacity(struct block_device *bdev, sector_t pos) > +{ > + return disk_zone_capacity(bdev->bd_disk, pos); > +} > + > int bdev_disk_changed(struct gendisk *disk, bool invalidate); > > void put_disk(struct gendisk *disk); -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 1:39 ` Damien Le Moal @ 2025-03-12 2:07 ` Naohiro Aota 0 siblings, 0 replies; 11+ messages in thread From: Naohiro Aota @ 2025-03-12 2:07 UTC (permalink / raw) To: Damien Le Moal, Naohiro Aota, linux-btrfs@vger.kernel.org Cc: axboe@kernel.dk, linux-block@vger.kernel.org On Wed Mar 12, 2025 at 10:39 AM JST, Damien Le Moal wrote: > On 3/12/25 10:31, Naohiro Aota wrote: >> {bdev,disk}_zone_capacity() takes block_device or gendisk and sector position >> and returns the zone capacity of the corresponding zone. >> >> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> >> --- >> include/linux/blkdev.h | 21 +++++++++++++++++++++ >> 1 file changed, 21 insertions(+) >> >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index d37751789bf5..3c860a0cf339 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -826,6 +826,27 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb) >> (sb->s_blocksize_bits - SECTOR_SHIFT); >> } >> >> +#ifdef CONFIG_BLK_DEV_ZONED > > There is already an "#ifdef CONFIG_BLK_DEV_ZONED" in blkdev.h, see > disk_nr_zones(). Please add this new helper under that ifdef to avoid adding a > new one. Sure. But, since this patch uses get_capacity(), we need to move the existing ones here instead. > >> +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) >> +{ >> + sector_t zone_sectors = disk->queue->limits.chunk_sectors; >> + >> + if (pos + zone_sectors >= get_capacity(disk)) >> + return disk->last_zone_capacity; >> + return disk->zone_capacity; >> +} >> +#else /* CONFIG_BLK_DEV_ZONED */ >> +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) > > Is this ever called for a non zoned drive ? It should not be... So do we really > need this stub ? I added that just let it compile without CONFIG_BLK_DEV_ZONED. Well, the code querying the capacity should be under CONFIG_BLK_DEV_ZONED, so maybe it is not necessary. I'll drop that. > >> +{ >> + return 0; >> +} >> +#endif /* CONFIG_BLK_DEV_ZONED */ >> + >> +static inline unsigned int bdev_zone_capacity(struct block_device *bdev, sector_t pos) >> +{ >> + return disk_zone_capacity(bdev->bd_disk, pos); >> +} >> + >> int bdev_disk_changed(struct gendisk *disk, bool invalidate); >> >> void put_disk(struct gendisk *disk); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 1:31 ` [PATCH 1/2] block: introduce zone capacity helper Naohiro Aota 2025-03-12 1:39 ` Damien Le Moal @ 2025-03-12 5:27 ` Christoph Hellwig 2025-03-12 6:55 ` Damien Le Moal 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2025-03-12 5:27 UTC (permalink / raw) To: Naohiro Aota; +Cc: linux-btrfs, dlemoal, axboe, linux-block On Wed, Mar 12, 2025 at 10:31:03AM +0900, Naohiro Aota wrote: > +#ifdef CONFIG_BLK_DEV_ZONED > +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) Overly long line. > +{ > + sector_t zone_sectors = disk->queue->limits.chunk_sectors; > + > + if (pos + zone_sectors >= get_capacity(disk)) > + return disk->last_zone_capacity; > + return disk->zone_capacity; But I also don't understand how pos plays in here. Maybe add a kerneldoc comment describing what the function is supposed to do? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 5:27 ` Christoph Hellwig @ 2025-03-12 6:55 ` Damien Le Moal 2025-03-12 7:00 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Damien Le Moal @ 2025-03-12 6:55 UTC (permalink / raw) To: Christoph Hellwig, Naohiro Aota; +Cc: linux-btrfs, axboe, linux-block On 3/12/25 14:27, Christoph Hellwig wrote: > On Wed, Mar 12, 2025 at 10:31:03AM +0900, Naohiro Aota wrote: >> +#ifdef CONFIG_BLK_DEV_ZONED >> +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) > > Overly long line. > >> +{ >> + sector_t zone_sectors = disk->queue->limits.chunk_sectors; >> + >> + if (pos + zone_sectors >= get_capacity(disk)) >> + return disk->last_zone_capacity; >> + return disk->zone_capacity; > > But I also don't understand how pos plays in here. Maybe add a > kerneldoc comment describing what the function is supposed to do? The last zone can be smaller than all other zones, hence we have disk->zone_capacity and disk->last_zone_capacity. Pos is a sector used to indicate the zone for which you want the capacity. But yes, agreed, a kernel doc would be nice to clarify that. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 6:55 ` Damien Le Moal @ 2025-03-12 7:00 ` Christoph Hellwig 2025-03-12 7:11 ` Damien Le Moal 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2025-03-12 7:00 UTC (permalink / raw) To: Damien Le Moal Cc: Christoph Hellwig, Naohiro Aota, linux-btrfs, axboe, linux-block On Wed, Mar 12, 2025 at 03:55:31PM +0900, Damien Le Moal wrote: > On 3/12/25 14:27, Christoph Hellwig wrote: > > On Wed, Mar 12, 2025 at 10:31:03AM +0900, Naohiro Aota wrote: > >> +#ifdef CONFIG_BLK_DEV_ZONED > >> +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) > > > > Overly long line. > > > >> +{ > >> + sector_t zone_sectors = disk->queue->limits.chunk_sectors; > >> + > >> + if (pos + zone_sectors >= get_capacity(disk)) > >> + return disk->last_zone_capacity; > >> + return disk->zone_capacity; > > > > But I also don't understand how pos plays in here. Maybe add a > > kerneldoc comment describing what the function is supposed to do? > > The last zone can be smaller than all other zones, hence we have > disk->zone_capacity and disk->last_zone_capacity. Pos is a sector used to > indicate the zone for which you want the capacity. > > But yes, agreed, a kernel doc would be nice to clarify that. Should it be zoned_start then to make that obvious? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] block: introduce zone capacity helper 2025-03-12 7:00 ` Christoph Hellwig @ 2025-03-12 7:11 ` Damien Le Moal 0 siblings, 0 replies; 11+ messages in thread From: Damien Le Moal @ 2025-03-12 7:11 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Naohiro Aota, linux-btrfs, axboe, linux-block On 3/12/25 16:00, Christoph Hellwig wrote: > On Wed, Mar 12, 2025 at 03:55:31PM +0900, Damien Le Moal wrote: >> On 3/12/25 14:27, Christoph Hellwig wrote: >>> On Wed, Mar 12, 2025 at 10:31:03AM +0900, Naohiro Aota wrote: >>>> +#ifdef CONFIG_BLK_DEV_ZONED >>>> +static inline unsigned int disk_zone_capacity(struct gendisk *disk, sector_t pos) >>> >>> Overly long line. >>> >>>> +{ >>>> + sector_t zone_sectors = disk->queue->limits.chunk_sectors; >>>> + >>>> + if (pos + zone_sectors >= get_capacity(disk)) >>>> + return disk->last_zone_capacity; >>>> + return disk->zone_capacity; >>> >>> But I also don't understand how pos plays in here. Maybe add a >>> kerneldoc comment describing what the function is supposed to do? >> >> The last zone can be smaller than all other zones, hence we have >> disk->zone_capacity and disk->last_zone_capacity. Pos is a sector used to >> indicate the zone for which you want the capacity. >> >> But yes, agreed, a kernel doc would be nice to clarify that. > > Should it be zoned_start then to make that obvious? Well, it does not have to be the zone start. It can be any sector contained in the zone you are interested in. With a comment, that should be clear, and also the same as helpers such as disk_zone_no() which do not require the zone start sector. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group 2025-03-12 1:31 [PATCH 0/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota 2025-03-12 1:31 ` [PATCH 1/2] block: introduce zone capacity helper Naohiro Aota @ 2025-03-12 1:31 ` Naohiro Aota 2025-03-12 1:42 ` Damien Le Moal 1 sibling, 1 reply; 11+ messages in thread From: Naohiro Aota @ 2025-03-12 1:31 UTC (permalink / raw) To: linux-btrfs; +Cc: dlemoal, axboe, linux-block, Naohiro Aota There is a potential deadlock if we do report zones in an IO context. When one process do a report zones and another process freezes the block device, the report zones side cannot allocate a tag because the freeze is already started. This can thus result in new block group creation to hang forever, blocking the write path. Thankfully, a new block group should be created on empty zones. So, reporting the zones is not necessary and we can set the write pointer = 0 and load the zone capacity from the block layer using bdev_zone_capacity() helper. CC: stable@vger.kernel.org # 6.13+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/zoned.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 4956baf8220a..6c730f6bce10 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1277,7 +1277,7 @@ struct zone_info { static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx, struct zone_info *info, unsigned long *active, - struct btrfs_chunk_map *map) + struct btrfs_chunk_map *map, bool new) { struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; struct btrfs_device *device; @@ -1307,6 +1307,8 @@ static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx, return 0; } + ASSERT(!new || btrfs_dev_is_empty_zone(device, info->physical)); + /* This zone will be used for allocation, so mark this zone non-empty. */ btrfs_dev_clear_zone_empty(device, info->physical); @@ -1319,6 +1321,18 @@ static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx, * to determine the allocation offset within the zone. */ WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size)); + + if (new) { + sector_t capacity; + + capacity = bdev_zone_capacity(device->bdev, info->physical >> SECTOR_SHIFT); + up_read(&dev_replace->rwsem); + info->alloc_offset = 0; + info->capacity = capacity << SECTOR_SHIFT; + + return 0; + } + nofs_flag = memalloc_nofs_save(); ret = btrfs_get_dev_zone(device, info->physical, &zone); memalloc_nofs_restore(nofs_flag); @@ -1588,7 +1602,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new) } for (i = 0; i < map->num_stripes; i++) { - ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map); + ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map, new); if (ret) goto out; -- 2.48.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group 2025-03-12 1:31 ` [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota @ 2025-03-12 1:42 ` Damien Le Moal 2025-03-13 2:24 ` Shinichiro Kawasaki 0 siblings, 1 reply; 11+ messages in thread From: Damien Le Moal @ 2025-03-12 1:42 UTC (permalink / raw) To: Naohiro Aota, linux-btrfs; +Cc: axboe, linux-block, Shin'ichiro Kawasaki On 3/12/25 10:31, Naohiro Aota wrote: > There is a potential deadlock if we do report zones in an IO context. When one > process do a report zones and another process freezes the block device, the > report zones side cannot allocate a tag because the freeze is already started. > This can thus result in new block group creation to hang forever, blocking the > write path. +Shin'ichiro blktest has a failing test case due to a lockdep splat triggered by this. Would be good to add that information (with the splat) here. > > Thankfully, a new block group should be created on empty zones. So, reporting > the zones is not necessary and we can set the write pointer = 0 and load the > zone capacity from the block layer using bdev_zone_capacity() helper. > > CC: stable@vger.kernel.org # 6.13+ > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> With that fixed, looks good to me. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group 2025-03-12 1:42 ` Damien Le Moal @ 2025-03-13 2:24 ` Shinichiro Kawasaki 0 siblings, 0 replies; 11+ messages in thread From: Shinichiro Kawasaki @ 2025-03-13 2:24 UTC (permalink / raw) To: Damien Le Moal Cc: Naohiro Aota, linux-btrfs@vger.kernel.org, axboe@kernel.dk, linux-block@vger.kernel.org On Mar 12, 2025 / 10:42, Damien Le Moal wrote: > On 3/12/25 10:31, Naohiro Aota wrote: > > There is a potential deadlock if we do report zones in an IO context. When one > > process do a report zones and another process freezes the block device, the > > report zones side cannot allocate a tag because the freeze is already started. > > This can thus result in new block group creation to hang forever, blocking the > > write path. > > +Shin'ichiro > > blktest has a failing test case due to a lockdep splat triggered by this. Would > be good to add that information (with the splat) here. I confirmed that this fix avoids the blktests zbd/009 failure I reported [1]. Thanks for the fix! Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> [1] https://lore.kernel.org/linux-block/uyijd3ufbrfbiyyaajvhyhdyytssubekvymzgyiqjqmkh33cmi@ksqjpewsqlvw/ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-03-13 2:24 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-12 1:31 [PATCH 0/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota 2025-03-12 1:31 ` [PATCH 1/2] block: introduce zone capacity helper Naohiro Aota 2025-03-12 1:39 ` Damien Le Moal 2025-03-12 2:07 ` Naohiro Aota 2025-03-12 5:27 ` Christoph Hellwig 2025-03-12 6:55 ` Damien Le Moal 2025-03-12 7:00 ` Christoph Hellwig 2025-03-12 7:11 ` Damien Le Moal 2025-03-12 1:31 ` [PATCH 2/2] btrfs: zoned: skip reporting zone for new block group Naohiro Aota 2025-03-12 1:42 ` Damien Le Moal 2025-03-13 2:24 ` Shinichiro Kawasaki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox