From: John Garry <john.g.garry@oracle.com>
To: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>,
song@kernel.org, yukuai@fygo.io, magiclinan@didiglobal.com,
xiao@kernel.org, axboe@kernel.dk, martin.petersen@oracle.com
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/7] md/raid1: handle atomic writes that require splitting
Date: Tue, 23 Jun 2026 12:38:23 +0100 [thread overview]
Message-ID: <007416b5-d099-406d-b1d1-ff0db2edc48a@oracle.com> (raw)
In-Reply-To: <m2mrwl1sha.fsf@gmail.com>
On 23/06/2026 11:06, Abd-Alrhman Masalkhi wrote:
> On Tue, Jun 23, 2026 at 10:20 +0100, John Garry wrote:
>> On 23/06/2026 09:58, Abd-Alrhman Masalkhi wrote:
>>> On Tue, Jun 23, 2026 at 09:11 +0100, John Garry wrote:
>>>> On 23/06/2026 08:24, Abd-Alrhman Masalkhi wrote:
>>>>> If a request already requires splitting when entering
>>>>> raid1_write_request(), the current code allows it to proceed until it
>>>>> eventually reaches the split path.
>>>>
>>>> The block layer should catch invalid atomic writes in
>>>> submit_bio_noacct() -> blk_validate_atomic_write_op_size() before we
>>>> even get as far as the md atomic write handling. Having the check in
>>>> bio_submit_split_bioset() is really just a fail-safe for the block layer
>>>> not catching invalid atomic writes or the atomic writes queue limits not
>>>> being properly calculated.
>>> The request size itself satisfies the currently advertised atomic write
>>> limits, so blk_validate_atomic_write_op_size() allows it. The problem
>>> is that RAID1 may further restrict atomic writes to a single barrier
>>> unit via align_to_barrier_unit_end(). Therefore a request that crosses
>>> a barrier-unit boundary can still reach raid1_write_request() with
>>> max_sectors < bio_sectors(bio).
>>>
>>> If the barrier-unit restriction should instead be advertised through the
>>> atomic write queue limits,
>>
>> It should. Any restrictions should be advertised up front. For the user
>> to issue an atomic write which is valid according to limits, then it
>> should succeed.
>>
>
> I'll take a look at how best to expose that through the queue limits and
> rework this part accordingly. If there is already an existing mechanism
> you had in mind, I'd appreciate any pointers.
Any write must fit within BARRIER_UNIT_SECTOR_SIZE, right?
Since an atomic write must be naturally aligned, then I would expect
that the atomic write max unit is limited by BARRIER_UNIT_SECTOR_SIZE.
>
>>> then I agree the block layer could reject
>>> such requests earlier and the RAID1 entry check would become
>>> unnecessary.
>>>
>>> However, there are also cases where max_sectors is reduced later within
>>> raid1_write_request(), for example when bad blocks are present on some
>>> mirrors (or due to other RAID1-specific constraints such as write-behind
>>> limits). Those reductions depend on RAID1 runtime state and mirror
>>> health, so they are not readily visible to the block layer during atomic
>>> write validation. In those cases RAID1 still needs to detect that the
>>> atomic write can no longer be serviced as requested and fail it
>>> appropriately.
>>
>> Sure, and we do this. As I remember, we should return -EIO in this case.
>>
>
> Right, and that's the main motivation for this patch. The original
> atomic write support already returned -EIO for one bad-block path, but
> there are other cases where max_sectors can be reduced (e.g. the
> first_bad <= sector path and write-behind limits)
>
> After a4c55c902670, those cases can end up completing with EINVAL or
> NOTSUPP instead. This patch is intended to restore consistent -EIO.
>
ok, but I could not check this as I did not recognize the baseline code.
>>>
>>>>
>>>>> Along the way, the bio may instead
>>>>> fail due to other conditions and return a different status, even though
>>>>> the request was invalid as an atomic write from the beginning.
>>>>>
>>
>
next prev parent reply other threads:[~2026-06-23 11:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 7:24 [PATCH 0/7] md/raid10: fixes, atomic write handling, and error-path cleanup Abd-Alrhman Masalkhi
2026-06-23 7:24 ` [PATCH 1/7] md/raid10: fix r10bio leak in raid10_write_request() error paths Abd-Alrhman Masalkhi
2026-06-23 7:24 ` [PATCH 2/7] md/raid1: handle atomic writes that require splitting Abd-Alrhman Masalkhi
2026-06-23 8:11 ` John Garry
2026-06-23 8:58 ` Abd-Alrhman Masalkhi
2026-06-23 9:20 ` John Garry
2026-06-23 10:06 ` Abd-Alrhman Masalkhi
2026-06-23 11:38 ` John Garry [this message]
2026-06-23 7:24 ` [PATCH 3/7] md/raid10: " Abd-Alrhman Masalkhi
2026-06-23 7:24 ` [PATCH 4/7] md/raid10: raid10_write_request() drops the barrier before calling Abd-Alrhman Masalkhi
2026-06-23 7:24 ` [PATCH 5/7] md/raid10: replace wait loop with wait_event_idle() Abd-Alrhman Masalkhi
2026-06-23 7:24 ` [PATCH 6/7] md/raid10: simplify write request error handling Abd-Alrhman Masalkhi
2026-06-23 7:24 ` [PATCH 7/7] md/raid10: simplify read " Abd-Alrhman Masalkhi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=007416b5-d099-406d-b1d1-ff0db2edc48a@oracle.com \
--to=john.g.garry@oracle.com \
--cc=abd.masalkhi@gmail.com \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=magiclinan@didiglobal.com \
--cc=martin.petersen@oracle.com \
--cc=song@kernel.org \
--cc=xiao@kernel.org \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox