From: Damien Le Moal <dlemoal@kernel.org>
To: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
fio@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Vincent Fu <vincentfu@gmail.com>
Subject: Re: [PATCH v2 5/8] zbd: add the recover_zbd_write_error option
Date: Wed, 7 May 2025 16:48:38 +0900 [thread overview]
Message-ID: <cf114fe6-a699-4b5d-ab09-48a45083fbfc@kernel.org> (raw)
In-Reply-To: <20250425052148.126788-6-shinichiro.kawasaki@wdc.com>
On 4/25/25 2:21 PM, Shin'ichiro Kawasaki wrote:
> When the continue_on_error options is specified, it is expected that the
> workload continues to run when non-critical errors happen. However,
> write workloads with zonemode=zbd option can not continue after errors,
> if the failed writes cause partial data write on the target device. This
> partial write creates write pointer gap between the device and fio, then
> the next write requests by fio will fail due to unaligned write command
> errors. This restriction results in undesirable test stops during long
> runs for SMR drives which can recover defect sectors.
>
> To allow the write workloads with zonemode=zbd to continue after write
> failures with partial data writes, introduce the new option
> recover_zbd_write_error. When this option is specified together with the
> continue_on_error option, fio checks the write pointer positions of the
> write target zones in the error handling step. Then fix the write
> pointer by moving it to the position that the failed writes would have
> moved. Bump up FIO_SERVER_VER to note that the new option is added.
>
> For that purpose, add a new function zbd_recover_write_error(). Call it
> from zbd_queue_io() for sync IO engines, and from io_completed() for
> async IO engines. Modify zbd_queue_io() to pass the pointer to the
> status so that zbd_recover_write_error() can modify the status to ignore
> the errors. Add three fields to struct fio_zone_info. The two new fields
> writes_in_flight and max_write_error_offset track status of in-flight
> writes at the write error, so that the write pointer positions can be
> fixed after the in-flight writes completed. The field fixing_zone_wp
> stores that the write pointer fix is ongoing, then prohibit the new
> writes get issued to the zone.
>
> When the failed write is synchronous, the write pointer fix is done by
> writing the left data for the failed write. This keeps the verify
> patterns written to the device, then verify works together with the
> continue_on_zbd_write_error option. When the failed write is
> asynchronous, other in-flight writes fail together. In this case, fio
> waits for all in-flight writes complete then fix the write pointer. Then
> verify data of the failed writes are lost and verify does not work.
> Check the continue_on_zbd_write_error option is not specified together
> with the verify workload and asynchronous IO engine.
>
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Looks OK to me.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2025-05-07 7:49 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-25 5:21 [PATCH v2 0/8] zbd: support continue_on_error for zonemode=zbd Shin'ichiro Kawasaki
2025-04-25 5:21 ` [PATCH v2 1/8] oslib: blkzoned: add blkzoned_move_zone_wp() helper function Shin'ichiro Kawasaki
2025-05-07 7:35 ` Damien Le Moal
2025-04-25 5:21 ` [PATCH v2 2/8] ioengine: add move_zone_wp() callback Shin'ichiro Kawasaki
2025-05-07 7:36 ` Damien Le Moal
2025-04-25 5:21 ` [PATCH v2 3/8] engines/libzbc: implement move_zone_wp callback Shin'ichiro Kawasaki
2025-05-07 7:41 ` Damien Le Moal
2025-04-25 5:21 ` [PATCH v2 4/8] zbd: introduce zbd_move_zone_wp() Shin'ichiro Kawasaki
2025-05-07 7:43 ` Damien Le Moal
2025-04-25 5:21 ` [PATCH v2 5/8] zbd: add the recover_zbd_write_error option Shin'ichiro Kawasaki
2025-05-07 7:48 ` Damien Le Moal [this message]
2025-04-25 5:21 ` [PATCH v2 6/8] t/zbd: set badblocks related parameters in run-tests-against-nullb Shin'ichiro Kawasaki
2025-04-25 5:21 ` [PATCH v2 7/8] t/zbd: add the test cases to confirm continue_on_error option Shin'ichiro Kawasaki
2025-04-25 5:21 ` [PATCH v2 8/8] t/zbd: add run-tests-against-scsi_debug Shin'ichiro Kawasaki
2025-05-07 11:29 ` [PATCH v2 0/8] zbd: support continue_on_error for zonemode=zbd Jens Axboe
2025-05-07 17:19 ` Vincent Fu
2025-05-07 17:22 ` Jens Axboe
2025-05-08 1:28 ` Shinichiro Kawasaki
2025-05-08 17:18 ` Vincent Fu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cf114fe6-a699-4b5d-ab09-48a45083fbfc@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=fio@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
--cc=vincentfu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox