All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	chandan.babu@oracle.com, dchinner@redhat.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com,
	linux-api@vger.kernel.org
Subject: Re: [PATCH 10/21] block: Add fops atomic write support
Date: Tue, 5 Dec 2023 09:45:53 +0800	[thread overview]
Message-ID: <ZW6A0R04Gk/04EHj@fedora> (raw)
In-Reply-To: <bd639010-2ad7-4379-ba0a-64b5f6ebec41@oracle.com>

On Mon, Dec 04, 2023 at 01:13:55PM +0000, John Garry wrote:
> 
> > > 
> > > I added this here (as opposed to the caller), as I was not really worried
> > > about speeding up the failure path. Are you saying to call even earlier in
> > > submission path?
> > atomic_write_unit_min is one hardware property, and it should be checked
> > in blk_queue_atomic_write_unit_min_sectors() from beginning, then you
> > can avoid this check every other where.
> 
> ok, but we still need to ensure in the submission path that the block device
> actually supports atomic writes - this was the initial check.

Then you may add one helper bdev_support_atomic_write().

> 
> > 
> > > > > +	if (pos % atomic_write_unit_min_bytes)
> > > > > +		return false;
> > > > > +	if (iov_iter_count(iter) % atomic_write_unit_min_bytes)
> > > > > +		return false;
> > > > > +	if (!is_power_of_2(iov_iter_count(iter)))
> > > > > +		return false;
> > > > > +	if (iov_iter_count(iter) > atomic_write_unit_max_bytes)
> > > > > +		return false;
> > > > > +	if (pos % iov_iter_count(iter))
> > > > > +		return false;
> > > > I am a bit confused about relation between atomic_write_unit_max_bytes and
> > > > atomic_write_max_bytes.
> > > I think that naming could be improved. Or even just drop merging (and
> > > atomic_write_max_bytes concept) until we show it to improve performance.
> > > 
> > > So generally atomic_write_unit_max_bytes will be same as
> > > atomic_write_max_bytes, however it could be different if:
> > > a. request queue nr hw segments or other request queue limits needs to
> > > restrict atomic_write_unit_max_bytes
> > > b. atomic_write_unit_max_bytes does not need to be a power-of-2 and
> > > atomic_write_max_bytes does. So essentially:
> > > atomic_write_unit_max_bytes = rounddown_pow_of_2(atomic_write_max_bytes)
> > > 
> > plug merge often improves sequential IO perf, so if the hardware supports
> > this way, I think 'atomic_write_max_bytes' should be supported from the
> > beginning, such as:
> > 
> > - user space submits sequential N * (4k, 8k, 16k, ...) atomic writes, all can
> > be merged to single IO request, which is issued to driver.
> > 
> > Or
> > 
> > - user space submits sequential 4k, 4k, 8k, 16K, 32k, 64k atomic writes, all can
> > be merged to single IO request, which is issued to driver.
> 
> Right, we do expect userspace to use a fixed block size, but we give scope
> in the API to use variable size.

Maybe it is enough to just take atomic_write_unit_min_bytes
only, and allow length to be N * atomic_write_unit_min_bytes.

But it may violate atomic write boundary?

> 
> > 
> > The hardware should recognize unit size by start LBA, and check if length is
> > valid, so probably the interface might be relaxed to:
> > 
> > 1) start lba is unit aligned, and this unit is in the supported unit
> > range(power_2 in [unit_min, unit_max])
> > 
> > 2) length needs to be:
> > 
> > - N * this_unit_size
> > - <= atomic_write_max_bytes
> 
> Please note that we also need to consider:
> - any atomic write boundary (from NVMe)

Can you provide actual NVMe boundary value?

Firstly natural aligned write won't cross boundary, so boundary should
be >= write_unit_max, see blow code from patch 10/21:

+static bool bio_straddles_atomic_write_boundary(loff_t bi_sector,
+				unsigned int bi_size,
+				unsigned int boundary)
+{
+	loff_t start = bi_sector << SECTOR_SHIFT;
+	loff_t end = start + bi_size;
+	loff_t start_mod = start % boundary;
+	loff_t end_mod = end % boundary;
+
+	if (end - start > boundary)
+		return true;
+	if ((start_mod > end_mod) && (start_mod && end_mod))
+		return true;
+
+	return false;
+}
+

Then if the WRITE size is <= boundary, the above function should return
false, right? Looks like it is power_of(2) & aligned atomic_write_max_bytes?

> - virt boundary (from NVMe)

virt boundary is applied on bv_offset and bv_len, and NVMe's virt
bounary is (4k - 1), it shouldn't be one issue in reality.

> 
> And, as I mentioned elsewhere, I am still not 100% comfortable that we don't
> pay attention to regular max_sectors_kb...

max_sectors_kb should be bigger than atomic_write_max_bytes actually,
then what is your concern?



Thanks,
Ming


  reply	other threads:[~2023-12-05  1:46 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-29 10:27 [PATCH 00/21] block atomic writes John Garry
2023-09-29 10:27 ` [PATCH 01/21] block: Add atomic write operations to request_queue limits John Garry
2023-10-03 16:40   ` Bart Van Assche
2023-10-04  3:00     ` Martin K. Petersen
2023-10-04 17:28       ` Bart Van Assche
2023-10-04 18:26         ` Martin K. Petersen
2023-10-04 21:00       ` Bart Van Assche
2023-10-05  8:22         ` John Garry
2023-11-09 15:10   ` Christoph Hellwig
2023-11-09 17:01     ` John Garry
2023-11-10  6:23       ` Christoph Hellwig
2023-11-10  9:04         ` John Garry
2023-09-29 10:27 ` [PATCH 02/21] block: Limit atomic writes according to bio and queue limits John Garry
2023-11-09 15:13   ` Christoph Hellwig
2023-11-09 17:41     ` John Garry
2023-12-04  3:19   ` Ming Lei
2023-12-04  3:55     ` Ming Lei
2023-12-04  9:35       ` John Garry
2023-09-29 10:27 ` [PATCH 03/21] fs/bdev: Add atomic write support info to statx John Garry
2023-09-29 22:49   ` Eric Biggers
2023-10-01 13:23     ` Bart Van Assche
2023-10-02  9:51       ` John Garry
2023-10-02 18:39         ` Bart Van Assche
2023-10-03  0:28           ` Martin K. Petersen
2023-11-09 15:15             ` Christoph Hellwig
2023-10-03  1:51         ` Dave Chinner
2023-10-03  2:57           ` Darrick J. Wong
2023-10-03  7:23             ` John Garry
2023-10-03 15:46               ` Darrick J. Wong
2023-10-04 14:19                 ` John Garry
2023-09-29 10:27 ` [PATCH 04/21] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-10-06 18:15   ` Jeremy Bongio
2023-10-09 22:02     ` Dave Chinner
2023-09-29 10:27 ` [PATCH 05/21] block: Add REQ_ATOMIC flag John Garry
2023-09-29 10:27 ` [PATCH 06/21] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-09-29 10:27 ` [PATCH 07/21] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-09-29 10:27 ` [PATCH 08/21] block: Error an attempt to split an atomic write bio John Garry
2023-09-29 10:27 ` [PATCH 09/21] block: Add checks to merging of atomic writes John Garry
2023-09-30 13:40   ` kernel test robot
2023-10-02 22:50     ` Nathan Chancellor
2023-10-04 11:40       ` John Garry
2023-09-29 10:27 ` [PATCH 10/21] block: Add fops atomic write support John Garry
2023-09-29 17:51   ` Bart Van Assche
2023-10-02 10:10     ` John Garry
2023-10-02 19:12       ` Bart Van Assche
2023-10-03  0:48         ` Martin K. Petersen
2023-10-03 16:55           ` Bart Van Assche
2023-10-04  2:53             ` Martin K. Petersen
2023-10-04 17:22               ` Bart Van Assche
2023-10-04 18:17                 ` Martin K. Petersen
2023-10-05 17:10                   ` Bart Van Assche
2023-10-05 22:36                     ` Dave Chinner
2023-10-05 22:58                       ` Bart Van Assche
2023-10-06  4:31                         ` Dave Chinner
2023-10-06 17:22                           ` Bart Van Assche
2023-10-07  1:21                             ` Martin K. Petersen
2023-10-03  8:37         ` John Garry
2023-10-03 16:45           ` Bart Van Assche
2023-10-04  9:14             ` John Garry
2023-10-04 17:34               ` Bart Van Assche
2023-10-04 21:59                 ` Dave Chinner
2023-12-04  2:30   ` Ming Lei
2023-12-04  9:27     ` John Garry
2023-12-04 12:18       ` Ming Lei
2023-12-04 13:13         ` John Garry
2023-12-05  1:45           ` Ming Lei [this message]
2023-12-05 10:49             ` John Garry
2023-09-29 10:27 ` [PATCH 11/21] fs: xfs: Don't use low-space allocator for alignment > 1 John Garry
2023-10-03  1:16   ` Dave Chinner
2023-10-03  3:00     ` Darrick J. Wong
2023-10-03  4:34       ` Dave Chinner
2023-10-03 10:22       ` John Garry
2023-09-29 10:27 ` [PATCH 12/21] fs: xfs: Introduce FORCEALIGN inode flag John Garry
2023-11-09 15:24   ` Christoph Hellwig
2023-09-29 10:27 ` [PATCH 13/21] fs: xfs: Make file data allocations observe the 'forcealign' flag John Garry
2023-10-03  1:42   ` Dave Chinner
2023-10-03 10:13     ` John Garry
2023-09-29 10:27 ` [PATCH 14/21] fs: xfs: Enable file data forcealign feature John Garry
2023-09-29 10:27 ` [PATCH 15/21] fs: xfs: Support atomic write for statx John Garry
2023-10-03  3:32   ` Dave Chinner
2023-10-03 10:56     ` John Garry
2023-10-03 16:10       ` Darrick J. Wong
2023-09-29 10:27 ` [PATCH 16/21] fs: iomap: Atomic write support John Garry
2023-10-03  4:24   ` Dave Chinner
2023-10-03 12:55     ` John Garry
2023-10-03 16:47     ` Darrick J. Wong
2023-10-04  1:16       ` Dave Chinner
2023-10-24 12:59     ` John Garry
2023-09-29 10:27 ` [PATCH 17/21] fs: xfs: iomap atomic " John Garry
2023-11-09 15:26   ` Christoph Hellwig
2023-11-10 10:42     ` John Garry
2023-11-28  8:56       ` John Garry
2023-11-28 13:56         ` Christoph Hellwig
2023-11-28 17:42           ` John Garry
2023-11-29  2:45             ` Martin K. Petersen
2023-12-04 13:45             ` Christoph Hellwig
2023-12-04 15:19               ` John Garry
2023-12-04 15:39                 ` Christoph Hellwig
2023-12-04 18:06                   ` John Garry
2023-12-05  4:55                 ` Theodore Ts'o
2023-12-05 11:09                   ` John Garry
2023-12-05 13:59                 ` Ming Lei
2023-09-29 10:27 ` [PATCH 18/21] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-09-29 17:54   ` Bart Van Assche
2023-10-02 11:27     ` John Garry
2023-10-06 17:52       ` Bart Van Assche
2023-10-06 23:48         ` Martin K. Petersen
2023-09-29 10:27 ` [PATCH 19/21] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-09-29 17:59   ` Bart Van Assche
2023-10-02 11:36     ` John Garry
2023-10-02 19:21       ` Bart Van Assche
2023-09-29 10:27 ` [PATCH 20/21] scsi: scsi_debug: Atomic write support John Garry
2023-09-29 10:27 ` [PATCH 21/21] nvme: Support atomic writes John Garry
2023-10-04 11:39   ` Pankaj Raghav
2023-10-05 10:24     ` John Garry
2023-10-05 13:32       ` Pankaj Raghav
2023-10-05 15:05         ` John Garry
2023-11-09 15:36   ` Christoph Hellwig
2023-11-09 15:42     ` Matthew Wilcox
2023-11-09 15:46       ` Christoph Hellwig
2023-11-09 19:08         ` John Garry
2023-11-10  6:29           ` Christoph Hellwig
2023-11-10  8:44             ` John Garry
2023-09-29 14:58 ` [PATCH 00/21] block " Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZW6A0R04Gk/04EHj@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.