Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
To: John Garry <john.g.garry@oracle.com>,
	song@kernel.org, yukuai@fygo.io, magiclinan@didiglobal.com,
	xiao@kernel.org, axboe@kernel.dk, vverma@digitalocean.com,
	martin.petersen@oracle.com, linux-kernel@vger.kernel.org
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH v2 2/7] md/raid1: advertise atomic write limits and handle runtime constraints
Date: Tue, 30 Jun 2026 10:39:34 +0200	[thread overview]
Message-ID: <m2se64zak9.fsf@gmail.com> (raw)
In-Reply-To: <7cbaaca3-4eb5-434d-a13f-f9574c9f977b@oracle.com>


Hi John,

On Mon, Jun 29, 2026 at 15:48 +0100, John Garry wrote:
> On 28/06/2026 15:24, Abd-Alrhman Masalkhi wrote:
>> Atomic writes in RAID1 must fit within a single barrier unit. Advertise
>> this restriction through the queue limits by setting
>> atomic_write_hw_unit_max to BARRIER_UNIT_SECTOR_SIZE so that bios which
>> would cross a barrier-unit boundary are rejected by the block layer
>> before reaching MD.
>> 
>> A bio that passes block-layer validation may still become unserviceable
>> within RAID1 due to bad blocks or write-behind constraints. In the former
>> case, complete the bio with EIO. In the latter case, disable
>> write-behind rather than failing the bio with EIO.
>> 
>> Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
>> Fixes: a4c55c902670 ("md/raid1: simplify raid1_write_request() error handling")
>> Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
>> ---
>> Changes in v2:
>>   - Drop the early atomic write split check from raid1_write_request().
>>   - Advertise the atomic write size limit via queue limits.
>>   - Disable write-behind instead of failing atomic writes when the
>>     BIO_MAX_VECS limit is encountered.
>>   - Link to v1: https://urldefense.com/v3/__https://lore.kernel.org/linux-raid/20260623072456.333437-3-abd.masalkhi@gmail.com/__;!!ACWV5N9M2RV99hQ!LbMSGSClRi0PNBqQti5ZNWGDVjDd34-7saYEAwNyBNjpNTjEA7veqM5RHG8KB1QiscarW4UaIefjm19ywSImtIgh$
>> ---
>>   drivers/md/raid1.c | 36 +++++++++++++++++++-----------------
>>   1 file changed, 19 insertions(+), 17 deletions(-)
>> 
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index afe2ca96ad8c..f322048ab3c2 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1522,6 +1522,7 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
>>   	int first_clone;
>>   	bool write_behind = false;
>>   	bool nowait = bio->bi_opf & REQ_NOWAIT;
>> +	bool atomic = bio->bi_opf & REQ_ATOMIC;
>>   	bool is_discard = op_is_discard(bio->bi_opf);
>>   	sector_t sector = bio->bi_iter.bi_sector;
>>   
>> @@ -1603,20 +1604,6 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
>>   			}
>>   			if (is_bad) {
>>   				int good_sectors;
>> -
>> -				/*
>> -				 * We cannot atomically write this, so just
>> -				 * error in that case. It could be possible to
>> -				 * atomically write other mirrors, but the
>> -				 * complexity of supporting that is not worth
>> -				 * the benefit.
>> -				 */
>> -				if (bio->bi_opf & REQ_ATOMIC) {
>> -					bio->bi_status = BLK_STS_NOTSUPP;
>> -					bio_endio(bio);
>> -					goto err_dec_pending;
>> -				}
>> -
>>   				good_sectors = first_bad - sector;
>>   				if (good_sectors < max_sectors)
>>   					max_sectors = good_sectors;
>> @@ -1633,10 +1620,24 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
>>   	 * at a time and thus needs a new bio that can fit the whole payload
>>   	 * this bio in page sized chunks.
>>   	 */
>> -	if (write_behind && mddev->bitmap)
>> -		max_sectors = min_t(int, max_sectors,
>> -				    BIO_MAX_VECS * (PAGE_SIZE >> 9));
>> +	if (write_behind && mddev->bitmap) {
>> +		if (atomic && max_sectors > BIO_MAX_VECS * (PAGE_SIZE >> 9))
>
> where does BIO_MAX_VECS * (PAGE_SIZE >> 9) even come from?
>

BIO_MAX_VECS * (PAGE_SIZE >> 9) defines the maximum size supported by
write-behind. The write-behind copy (alloc_behind_master_bio) uses a
single bio, which can hold at most BIO_MAX_VECS pages, making this the
largest payload it can carry. With a 4 KiB PAGE_SIZE, that corresponds
to 256 pages, or 1 MiB (2048 sectors).
This patch changes the behavior for atomic writes that exceed this
limit. Instead of failing the write with -EIO when the number of sectors
must be reduced, it disables write-behind and proceeds with the atomic
write.

>> +			/*
>> +			 * Atomic writes cannot be split, so disable
>> +			 * write-behind.
>> +			 */
>> +			write_behind = false;
>> +		else
>> +			max_sectors = min_t(int, max_sectors,
>> +					    BIO_MAX_VECS * (PAGE_SIZE >> 9));
>> +	}
>> +
>>   	if (max_sectors < bio_sectors(bio)) {
>> +		if (atomic) {
>> +			bio_io_error(bio);
>> +			goto err_dec_pending;
>> +		}
>> +
>>   		bio = bio_submit_split_bioset(bio, max_sectors,
>>   					      &conf->bio_split);
>>   		if (!bio)
>> @@ -3229,6 +3230,7 @@ static int raid1_set_limits(struct mddev *mddev)
>>   	lim.max_write_zeroes_sectors = 0;
>>   	lim.max_hw_wzeroes_unmap_sectors = 0;
>>   	lim.logical_block_size = mddev->logical_block_size;
>> +	lim.atomic_write_hw_unit_max = BARRIER_UNIT_SECTOR_SIZE;
>
> This BARRIER_UNIT_SECTOR_SIZE is a bit like chunk sectors, no? I am just 
> wondering if we just should set it to chunk sectors = 
> BARRIER_UNIT_SECTOR_SIZE
>
> I assume that it affects more than Reads and writes, e.g. discard also.
>

BARRIER_UNIT_SECTOR_SIZE is the resync barrier-bucket, not the layout
chunk size. Unless I'm missing something, using
atomic_write_hw_unit_max seems more appropriate than using the chunk
size. That way, the limit only applies to atomic writes instead of
affecting other operations such.

>>   	lim.features |= BLK_FEAT_ATOMIC_WRITES;
>>   	lim.features |= BLK_FEAT_PCI_P2PDMA;
>>   	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
>

-- 
Best Regards,
Abd-Alrhman

  reply	other threads:[~2026-06-30  8:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-28 14:24 [PATCH v2 0/7] md/raid10: fixes, atomic write handling, and error-path cleanup Abd-Alrhman Masalkhi
2026-06-28 14:24 ` [PATCH v2 1/7] md/raid10: fix r10bio leak in raid10_write_request() error paths Abd-Alrhman Masalkhi
2026-06-28 14:39   ` sashiko-bot
2026-06-28 14:24 ` [PATCH v2 2/7] md/raid1: advertise atomic write limits and handle runtime constraints Abd-Alrhman Masalkhi
2026-06-28 14:38   ` sashiko-bot
2026-06-29 14:48   ` John Garry
2026-06-30  8:39     ` Abd-Alrhman Masalkhi [this message]
2026-06-28 14:24 ` [PATCH v2 3/7] md/raid10: consistently fail atomic writes that require splitting Abd-Alrhman Masalkhi
2026-06-28 14:36   ` sashiko-bot
2026-06-28 21:35     ` Abd-Alrhman Masalkhi
2026-06-28 14:24 ` [PATCH v2 4/7] md/raid10: remove unnecessary barrier around bio_submit_split_bioset() Abd-Alrhman Masalkhi
2026-06-28 14:24 ` [PATCH v2 5/7] md/raid10: replace wait loop with wait_event_idle() Abd-Alrhman Masalkhi
2026-06-28 14:24 ` [PATCH v2 6/7] md/raid10: simplify write request error handling Abd-Alrhman Masalkhi
2026-06-28 14:24 ` [PATCH v2 7/7] md/raid10: simplify read " Abd-Alrhman Masalkhi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2se64zak9.fsf@gmail.com \
    --to=abd.masalkhi@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=john.g.garry@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=magiclinan@didiglobal.com \
    --cc=martin.petersen@oracle.com \
    --cc=song@kernel.org \
    --cc=vverma@digitalocean.com \
    --cc=xiao@kernel.org \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox