All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: John Garry <john.g.garry@oracle.com>,
	agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com,
	song@kernel.org, yukuai3@huawei.com, nilay@linux.ibm.com,
	axboe@kernel.dk, cem@kernel.org, dm-devel@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	linux-block@vger.kernel.org, ojaswin@linux.ibm.com,
	martin.petersen@oracle.com, akpm@linux-foundation.org,
	linux-xfs@vger.kernel.org, djwong@kernel.org
Subject: Re: [PATCH v6 0/6] block/md/dm: set chunk_sectors from stacked dev stripe size
Date: Mon, 14 Jul 2025 15:00:57 +0900	[thread overview]
Message-ID: <c71ce330-d7b5-45ea-ba46-97598516e9fc@kernel.org> (raw)
In-Reply-To: <20250714055338.GA13470@lst.de>

On 2025/07/14 14:53, Christoph Hellwig wrote:
> On Fri, Jul 11, 2025 at 05:44:26PM +0900, Damien Le Moal wrote:
>> On 7/11/25 5:09 PM, John Garry wrote:
>>> This value in io_min is used to configure any atomic write limit for the
>>> stacked device. The idea is that the atomic write unit max is a
>>> power-of-2 factor of the stripe size, and the stripe size is available
>>> in io_min.
>>>
>>> Using io_min causes issues, as:
>>> a. it may be mutated
>>> b. the check for io_min being set for determining if we are dealing with
>>> a striped device is hard to get right, as reported in [0].
>>>
>>> This series now sets chunk_sectors limit to share stripe size.
>>
>> Hmm... chunk_sectors for a zoned device is the zone size. So is this all safe
>> if we are dealing with a zoned block device that also supports atomic writes ?
> 
> Btw, I wonder if it's time to decouple the zone size from the chunk
> size eventually.  It seems like a nice little hack, but with things
> like parity raid for zoned devices now showing up at least in academia,
> and nvme devices reporting chunk sizes the overload might not be that
> good any more.

Agreed, it would be nice to clean that up. BUT, the chunk_sectors sysfs
attribute file is reporting the zone size today. Changing that may break
applications. So I am not sure if we can actually do that, unless the sysfs
interface is considered as "unstable" ?

> 
>> Not that I know of any such device, but better be safe, so maybe for now
>> do not enable atomic write support on zoned devices ?
> 
> How would atomic writes make sense for zone devices?  Because all writes
> up to the reported write pointer must be valid, there usual checks for
> partial updates a lacking, so the only use would be to figure out if a
> write got truncated.  At least for file systems we detects this using the
> fs metadata that must be written on I/O completion anyway, so the only
> user would be an application with some sort of speculative writes that
> can't detect partial writes. Which sounds rather fringe and dangerous.

The only thing I can think of which would make sense is to avoid torn writes
with SAS drives. But in itself, that is extremely niche.

> 
> Now we should be able to implement the software atomic writes pretty
> easily for zoned XFS, and funnily they might actually be slightly faster
> than normal writes due to the transaction batching.  Now that we're
> getting reasonable test coverage we should be able to give it a spin, but
> I have a few too many things on my plate at the moment.


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2025-07-14  6:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-11  8:09 [PATCH v6 0/6] block/md/dm: set chunk_sectors from stacked dev stripe size John Garry
2025-07-11  8:09 ` [PATCH v6 1/6] ilog2: add max_pow_of_two_factor() John Garry
2025-07-11  8:09 ` [PATCH v6 2/6] block: sanitize chunk_sectors for atomic write limits John Garry
2025-07-11  8:42   ` Damien Le Moal
2025-07-11  9:22     ` John Garry
2025-07-11  8:09 ` [PATCH v6 3/6] md/raid0: set chunk_sectors limit John Garry
2025-07-11  8:09 ` [PATCH v6 4/6] md/raid10: " John Garry
2025-07-11  8:09 ` [PATCH v6 5/6] dm-stripe: limit chunk_sectors to the stripe size John Garry
2025-07-11  8:09 ` [PATCH v6 6/6] block: use chunk_sectors when evaluating stacked atomic write limits John Garry
2025-07-11  8:44 ` [PATCH v6 0/6] block/md/dm: set chunk_sectors from stacked dev stripe size Damien Le Moal
2025-07-11  9:16   ` John Garry
2025-07-14  5:53   ` Christoph Hellwig
2025-07-14  6:00     ` Damien Le Moal [this message]
2025-07-14  6:13       ` Christoph Hellwig
2025-07-15 15:45         ` Hannes Reinecke
2025-07-14  7:52     ` John Garry
2025-07-14 10:46       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c71ce330-d7b5-45ea-ba46-97598516e9fc@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cem@kernel.org \
    --cc=djwong@kernel.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mpatocka@redhat.com \
    --cc=nilay@linux.ibm.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=snitzer@kernel.org \
    --cc=song@kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.