public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Damien Le Moal <dlemoal@kernel.org>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH v3 07/14] block: Do not remove zone write plugs still in use
Date: Thu, 2 May 2024 08:09:36 +0200	[thread overview]
Message-ID: <751efc92-2081-41c0-8007-c73f51fcf87e@suse.de> (raw)
In-Reply-To: <20240501110907.96950-8-dlemoal@kernel.org>

On 5/1/24 13:09, Damien Le Moal wrote:
> Large write BIOs that span a zone boundary are split in
> blk_mq_submit_bio() before being passed to blk_zone_plug_bio() for zone
> write plugging. Such split BIO will be chained with one fragment
> targeting one zone and the remainder of the BIO targeting the next
> zone. The two BIOs can be executed in parallel, without a predetermine
> order relative to eachother and their completion may be reversed: the

each other

> remainder first completing and the first fragment then completing. In
> such case, bio_endio() will not immediately execute
> blk_zone_write_plug_bio_endio() for the parent BIO (the remainder of the
> split BIO) as the BIOs are chained. blk_zone_write_plug_bio_endio() for
> the parent BIO will be executed only once the first fragment completes.
> 
> In the case of a device with small zones and very large BIOs, uch

such

> completion pattern can lead to disk_should_remove_zone_wplug() to return
> true for the zone of the parent BIO when the parent BIO request
> completes and blk_zone_write_plug_complete_request() is executed. This
> triggers the removal of the zone write plug from the hash table using
> disk_remove_zone_wplug(). With the zone write plug of the parent BIO
> missing, the call to disk_get_zone_wplug() in
> blk_zone_write_plug_bio_endio() returns NULL and triggers a warning.
> 
> This patterns can be recreated fairly easily using a scsi_debug device
> with small zone and btrfs. E.g.
> 
> modprobe scsi_debug delay=0 dev_size_mb=1024 sector_size=4096 \
> 	zbc=host-managed zone_cap_mb=3 zone_nr_conv=0 zone_size_mb=4
> mkfs.btrfs -f -O zoned /dev/sda
> mount -t btrfs /dev/sda /mnt
> fio --name=wrtest --rw=randwrite --direct=1 --ioengine=libaio \
> 	--bs=4k --iodepth=16 --size=1M --directory=/mnt --time_based \
> 	--runtime=10
> umount /dev/sda
> 
> Will result in the warning:
> 
> [   29.035538] WARNING: CPU: 3 PID: 37 at block/blk-zoned.c:1207 blk_zone_write_plug_bio_endio+0xee/0x1e0
> ...
> [   29.058682] Call Trace:
> [   29.059095]  <TASK>
> [   29.059473]  ? __warn+0x80/0x120
> [   29.059983]  ? blk_zone_write_plug_bio_endio+0xee/0x1e0
> [   29.060728]  ? report_bug+0x160/0x190
> [   29.061283]  ? handle_bug+0x36/0x70
> [   29.061830]  ? exc_invalid_op+0x17/0x60
> [   29.062399]  ? asm_exc_invalid_op+0x1a/0x20
> [   29.063025]  ? blk_zone_write_plug_bio_endio+0xee/0x1e0
> [   29.063760]  bio_endio+0xb7/0x150
> [   29.064280]  btrfs_clone_write_end_io+0x2b/0x60 [btrfs]
> [   29.065049]  blk_update_request+0x17c/0x500
> [   29.065666]  scsi_end_request+0x27/0x1a0 [scsi_mod]
> [   29.066356]  scsi_io_completion+0x5b/0x690 [scsi_mod]
> [   29.067077]  blk_complete_reqs+0x3a/0x50
> [   29.067692]  __do_softirq+0xcf/0x2b3
> [   29.068248]  ? sort_range+0x20/0x20
> [   29.068791]  run_ksoftirqd+0x1c/0x30
> [   29.069339]  smpboot_thread_fn+0xcc/0x1b0
> [   29.069936]  kthread+0xcf/0x100
> [   29.070438]  ? kthread_complete_and_exit+0x20/0x20
> [   29.071314]  ret_from_fork+0x31/0x50
> [   29.071873]  ? kthread_complete_and_exit+0x20/0x20
> [   29.072563]  ret_from_fork_asm+0x11/0x20
> [   29.073146]  </TASK>
> 
> either when fio executes or when unmount is executed.
> 
> Fix this by modifying disk_should_remove_zone_wplug() to check that the
> reference count to a zone write plug is not larger than 2, that is, that
> the only references left on the zone are the caller held reference
> (blk_zone_write_plug_complete_request()) and the initial extra reference
> for the zone write plug taken when it was initialized (and that is
> dropped when the zone write plug is removed from the hash table).
> 
> To be consistent with this change, make sure to drop the request or BIO
> held reference to the zone write plug before calling
> disk_zone_wplug_unplug_bio(). All references are also dropped using
> disk_put_zone_wplug() instead of atomic_dec() to ensure that the zone
> write plug is freed if it needs to be.
> 
> Comments are also improved to clarify zone write plugs reference
> handling.
> 
> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   block/blk-zoned.c | 39 +++++++++++++++++++++++++++++++--------
>   1 file changed, 31 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


  parent reply	other threads:[~2024-05-02  6:09 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01 11:08 [PATCH v3 00/14] Zone write plugging fixes and cleanup Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 01/14] dm: Check that a zoned table leads to a valid mapped device Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 02/14] block: Exclude conventional zones when faking max open limit Damien Le Moal
2024-05-02  5:52   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 03/14] block: Fix zone write plug initialization from blk_revalidate_zone_cb() Damien Le Moal
2024-05-02  5:53   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 04/14] block: Fix reference counting for zone write plugs in error state Damien Le Moal
2024-05-02  6:01   ` Hannes Reinecke
2024-05-02  9:37     ` Damien Le Moal
2024-05-01 11:08 ` [PATCH v3 05/14] block: Hold a reference on zone write plugs to schedule submission Damien Le Moal
2024-05-02  6:04   ` Hannes Reinecke
2024-05-01 11:08 ` [PATCH v3 06/14] block: Unhash a zone write plug only if needed Damien Le Moal
2024-05-02  6:05   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 07/14] block: Do not remove zone write plugs still in use Damien Le Moal
2024-05-02  5:38   ` Christoph Hellwig
2024-05-02  6:09   ` Hannes Reinecke [this message]
2024-05-01 11:09 ` [PATCH v3 08/14] block: Fix flush request sector restore Damien Le Moal
2024-05-02  6:10   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 09/14] block: Fix handling of non-empty flush write requests to zones Damien Le Moal
2024-05-02  6:11   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 10/14] block: Improve blk_zone_write_plug_bio_merged() Damien Le Moal
2024-05-02  6:13   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 11/14] block: Improve zone write request completion handling Damien Le Moal
2024-05-02  6:14   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 12/14] block: Simplify blk_zone_write_plug_bio_endio() Damien Le Moal
2024-05-02  6:15   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 13/14] block: Simplify zone write plug BIO abort Damien Le Moal
2024-05-02  6:15   ` Hannes Reinecke
2024-05-01 11:09 ` [PATCH v3 14/14] block: Cleanup blk_revalidate_zone_cb() Damien Le Moal
2024-05-02  6:17   ` Hannes Reinecke
2024-05-01 14:08 ` [PATCH v3 00/14] Zone write plugging fixes and cleanup Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=751efc92-2081-41c0-8007-c73f51fcf87e@suse.de \
    --to=hare@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox