public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	martin.petersen@oracle.com, djwong@kernel.org,
	viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com,
	jejb@linux.ibm.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, paul@paul-moore.com,
	jmorris@namei.org, serge@hallyn.com,
	Himanshu Madhani <himanshu.madhani@oracle.com>
Subject: Re: [PATCH RFC 01/16] block: Add atomic write operations to request_queue limits
Date: Thu, 4 May 2023 07:39:25 +1000	[thread overview]
Message-ID: <20230503213925.GD3223426@dread.disaster.area> (raw)
In-Reply-To: <20230503183821.1473305-2-john.g.garry@oracle.com>

On Wed, May 03, 2023 at 06:38:06PM +0000, John Garry wrote:
> From: Himanshu Madhani <himanshu.madhani@oracle.com>
> 
> Add the following limits:
> - atomic_write_boundary
> - atomic_write_max_bytes
> - atomic_write_unit_max
> - atomic_write_unit_min
> 
> Signed-off-by: Himanshu Madhani <himanshu.madhani@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  Documentation/ABI/stable/sysfs-block | 42 +++++++++++++++++++++
>  block/blk-settings.c                 | 56 ++++++++++++++++++++++++++++
>  block/blk-sysfs.c                    | 33 ++++++++++++++++
>  include/linux/blkdev.h               | 23 ++++++++++++
>  4 files changed, 154 insertions(+)
> 
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index 282de3680367..f3ed9890e03b 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -21,6 +21,48 @@ Description:
>  		device is offset from the internal allocation unit's
>  		natural alignment.
>  
> +What:		/sys/block/<disk>/atomic_write_max_bytes
> +Date:		May 2023
> +Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
> +Description:
> +		[RO] This parameter specifies the maximum atomic write
> +		size reported by the device. An atomic write operation
> +		must not exceed this number of bytes.
> +
> +
> +What:		/sys/block/<disk>/atomic_write_unit_min
> +Date:		May 2023
> +Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
> +Description:
> +		[RO] This parameter specifies the smallest block which can
> +		be written atomically with an atomic write operation. All
> +		atomic write operations must begin at a
> +		atomic_write_unit_min boundary and must be multiples of
> +		atomic_write_unit_min. This value must be a power-of-two.

What units is this defined to use? Bytes?

> +
> +
> +What:		/sys/block/<disk>/atomic_write_unit_max
> +Date:		January 2023
> +Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
> +Description:
> +		[RO] This parameter defines the largest block which can be
> +		written atomically with an atomic write operation. This
> +		value must be a multiple of atomic_write_unit_min and must
> +		be a power-of-two.

Same question. Also, how is this different to
atomic_write_max_bytes?

> +
> +
> +What:		/sys/block/<disk>/atomic_write_boundary
> +Date:		May 2023
> +Contact:	Himanshu Madhani <himanshu.madhani@oracle.com>
> +Description:
> +		[RO] A device may need to internally split I/Os which
> +		straddle a given logical block address boundary. In that
> +		case a single atomic write operation will be processed as
> +		one of more sub-operations which each complete atomically.
> +		This parameter specifies the size in bytes of the atomic
> +		boundary if one is reported by the device. This value must
> +		be a power-of-two.

How are users/filesystems supposed to use this?

> +
>  
>  What:		/sys/block/<disk>/diskseq
>  Date:		February 2021
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 896b4654ab00..e21731715a12 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -59,6 +59,9 @@ void blk_set_default_limits(struct queue_limits *lim)
>  	lim->zoned = BLK_ZONED_NONE;
>  	lim->zone_write_granularity = 0;
>  	lim->dma_alignment = 511;
> +	lim->atomic_write_unit_min = lim->atomic_write_unit_max = 1;

A value of "1" isn't obviously a power of 2, nor does it tell me
what units these values use.

> +	lim->atomic_write_max_bytes = 512;
> +	lim->atomic_write_boundary = 0;

The behaviour when the value is zero is not defined by the syfs
description above.

>  }
>  
>  /**
> @@ -183,6 +186,59 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
>  }
>  EXPORT_SYMBOL(blk_queue_max_discard_sectors);
>  
> +/**
> + * blk_queue_atomic_write_max_bytes - set max bytes supported by
> + * the device for atomic write operations.
> + * @q:  the request queue for the device
> + * @size: maximum bytes supported
> + */
> +void blk_queue_atomic_write_max_bytes(struct request_queue *q,
> +				      unsigned int size)
> +{
> +	q->limits.atomic_write_max_bytes = size;
> +}
> +EXPORT_SYMBOL(blk_queue_atomic_write_max_bytes);
> +
> +/**
> + * blk_queue_atomic_write_boundary - Device's logical block address space
> + * which an atomic write should not cross.

I have no idea what "logical block address space which an atomic
write should not cross" means, especially as the unit is in bytes
and not in sectors (which are the units LBAs are expressed in).

> + * @q:  the request queue for the device
> + * @size: size in bytes. Must be a power-of-two.
> + */
> +void blk_queue_atomic_write_boundary(struct request_queue *q,
> +				     unsigned int size)
> +{
> +	q->limits.atomic_write_boundary = size;
> +}
> +EXPORT_SYMBOL(blk_queue_atomic_write_boundary);
> +
> +/**
> + * blk_queue_atomic_write_unit_min - smallest unit that can be written
> + *				     atomically to the device.
> + * @q:  the request queue for the device
> + * @sectors: must be a power-of-two.
> + */
> +void blk_queue_atomic_write_unit_min(struct request_queue *q,
> +				     unsigned int sectors)
> +{
> +	q->limits.atomic_write_unit_min = sectors;
> +}
> +EXPORT_SYMBOL(blk_queue_atomic_write_unit_min);

Oh, these are sectors?

What size sector? Are we talking about fixed size 512 byte basic
block units, or are we talking about physical device sector sizes
(e.g. 4kB, maybe larger in future?)

These really should be in bytes, as they are directly exposed to
userspace applications via statx and applications will have no idea
what the sector size actually is without having to query the block
device directly...

> +
> +/*
> + * blk_queue_atomic_write_unit_max - largest unit that can be written
> + * atomically to the device.
> + * @q: the reqeust queue for the device
> + * @sectors: must be a power-of-two.
> + */
> +void blk_queue_atomic_write_unit_max(struct request_queue *q,
> +				     unsigned int sectors)
> +{
> +	struct queue_limits *limits = &q->limits;
> +	limits->atomic_write_unit_max = sectors;
> +}
> +EXPORT_SYMBOL(blk_queue_atomic_write_unit_max);
> +
>  /**
>   * blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
>   * @q:  the request queue for the device
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index f1fce1c7fa44..1025beff2281 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -132,6 +132,30 @@ static ssize_t queue_max_discard_segments_show(struct request_queue *q,
>  	return queue_var_show(queue_max_discard_segments(q), page);
>  }
>  
> +static ssize_t queue_atomic_write_max_bytes_show(struct request_queue *q,
> +						char *page)
> +{
> +	return queue_var_show(q->limits.atomic_write_max_bytes, page);
> +}
> +
> +static ssize_t queue_atomic_write_boundary_show(struct request_queue *q,
> +						char *page)
> +{
> +	return queue_var_show(q->limits.atomic_write_boundary, page);
> +}
> +
> +static ssize_t queue_atomic_write_unit_min_show(struct request_queue *q,
> +						char *page)
> +{
> +	return queue_var_show(queue_atomic_write_unit_min(q), page);
> +}
> +
> +static ssize_t queue_atomic_write_unit_max_show(struct request_queue *q,
> +						char *page)
> +{
> +	return queue_var_show(queue_atomic_write_unit_max(q), page);
> +}
> +
>  static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *page)
>  {
>  	return queue_var_show(q->limits.max_integrity_segments, page);
> @@ -604,6 +628,11 @@ QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
>  QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
>  QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
>  
> +QUEUE_RO_ENTRY(queue_atomic_write_max_bytes, "atomic_write_max_bytes");
> +QUEUE_RO_ENTRY(queue_atomic_write_boundary, "atomic_write_boundary");
> +QUEUE_RO_ENTRY(queue_atomic_write_unit_max, "atomic_write_unit_max");
> +QUEUE_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min");
> +
>  QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
>  QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
>  QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
> @@ -661,6 +690,10 @@ static struct attribute *queue_attrs[] = {
>  	&queue_discard_max_entry.attr,
>  	&queue_discard_max_hw_entry.attr,
>  	&queue_discard_zeroes_data_entry.attr,
> +	&queue_atomic_write_max_bytes_entry.attr,
> +	&queue_atomic_write_boundary_entry.attr,
> +	&queue_atomic_write_unit_min_entry.attr,
> +	&queue_atomic_write_unit_max_entry.attr,
>  	&queue_write_same_max_entry.attr,
>  	&queue_write_zeroes_max_entry.attr,
>  	&queue_zone_append_max_entry.attr,
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 941304f17492..6b6f2992338c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -304,6 +304,11 @@ struct queue_limits {
>  	unsigned int		discard_alignment;
>  	unsigned int		zone_write_granularity;
>  
> +	unsigned int		atomic_write_boundary;
> +	unsigned int		atomic_write_max_bytes;
> +	unsigned int		atomic_write_unit_min;
> +	unsigned int		atomic_write_unit_max;
> +
>  	unsigned short		max_segments;
>  	unsigned short		max_integrity_segments;
>  	unsigned short		max_discard_segments;
> @@ -929,6 +934,14 @@ void blk_queue_zone_write_granularity(struct request_queue *q,
>  				      unsigned int size);
>  extern void blk_queue_alignment_offset(struct request_queue *q,
>  				       unsigned int alignment);
> +extern void blk_queue_atomic_write_max_bytes(struct request_queue *q,
> +					     unsigned int size);
> +extern void blk_queue_atomic_write_unit_max(struct request_queue *q,
> +					    unsigned int sectors);
> +extern void blk_queue_atomic_write_unit_min(struct request_queue *q,
> +					    unsigned int sectors);
> +extern void blk_queue_atomic_write_boundary(struct request_queue *q,
> +					    unsigned int size);
>  void disk_update_readahead(struct gendisk *disk);
>  extern void blk_limits_io_min(struct queue_limits *limits, unsigned int min);
>  extern void blk_queue_io_min(struct request_queue *q, unsigned int min);
> @@ -1331,6 +1344,16 @@ static inline int queue_dma_alignment(const struct request_queue *q)
>  	return q ? q->limits.dma_alignment : 511;
>  }
>  
> +static inline unsigned int queue_atomic_write_unit_max(const struct request_queue *q)
> +{
> +	return q->limits.atomic_write_unit_max << SECTOR_SHIFT;
> +}
> +
> +static inline unsigned int queue_atomic_write_unit_min(const struct request_queue *q)
> +{
> +	return q->limits.atomic_write_unit_min << SECTOR_SHIFT;
> +}

Ah, what? This undocumented interface reports "unit limits" in
bytes, but it's not using the physical device sector size to convert
between sector units and bytes. This really needs some more
documentation and work to make it present all units consistently and
not result in confusion when devices have 4kB sector sizes and not
512 byte sectors...

Also, I think all the byte ranges should support full 64 bit values,
otherwise there will be silent overflows in converting 32 bit sector
counts to byte ranges. And, eventually, something will want to do
larger than 4GB atomic IOs

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-05-03 21:39 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-03 18:38 [PATCH RFC 00/16] block atomic writes John Garry
2023-05-03 18:38 ` [PATCH RFC 01/16] block: Add atomic write operations to request_queue limits John Garry
2023-05-03 21:39   ` Dave Chinner [this message]
2023-05-04 18:14     ` John Garry
2023-05-04 22:26       ` Dave Chinner
2023-05-05  7:54         ` John Garry
2023-05-05 22:00           ` Darrick J. Wong
2023-05-07  1:59             ` Martin K. Petersen
2023-05-05 23:18           ` Dave Chinner
2023-05-06  9:38             ` John Garry
2023-05-07  2:35             ` Martin K. Petersen
2023-05-05 22:47         ` Eric Biggers
2023-05-05 23:31           ` Dave Chinner
2023-05-06  0:08             ` Eric Biggers
2023-05-09  0:19   ` Mike Snitzer
2023-05-17 17:02     ` John Garry
2023-05-03 18:38 ` [PATCH RFC 02/16] fs/bdev: Add atomic write support info to statx John Garry
2023-05-03 21:58   ` Dave Chinner
2023-05-04  8:45     ` John Garry
2023-05-04 22:40       ` Dave Chinner
2023-05-05  8:01         ` John Garry
2023-05-05 22:04           ` Darrick J. Wong
2023-05-03 18:38 ` [PATCH RFC 03/16] xfs: Support atomic write for statx John Garry
2023-05-03 22:17   ` Dave Chinner
2023-05-05 22:10     ` Darrick J. Wong
2023-05-03 18:38 ` [PATCH RFC 04/16] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-05-03 18:38 ` [PATCH RFC 05/16] block: Add REQ_ATOMIC flag John Garry
2023-05-03 18:38 ` [PATCH RFC 06/16] block: Limit atomic writes according to bio and queue limits John Garry
2023-05-03 18:53   ` Keith Busch
2023-05-04  8:24     ` John Garry
2023-05-03 18:38 ` [PATCH RFC 07/16] block: Add bdev_find_max_atomic_write_alignment() John Garry
2023-05-03 18:38 ` [PATCH RFC 08/16] block: Add support for atomic_write_unit John Garry
2023-05-03 18:38 ` [PATCH RFC 09/16] block: Add blk_validate_atomic_write_op() John Garry
2023-05-03 18:38 ` [PATCH RFC 10/16] block: Add fops atomic write support John Garry
2023-05-03 18:38 ` [PATCH RFC 11/16] fs: iomap: Atomic " John Garry
2023-05-04  5:00   ` Dave Chinner
2023-05-05 21:19     ` Darrick J. Wong
2023-05-05 23:56       ` Dave Chinner
2023-05-03 18:38 ` [PATCH RFC 12/16] xfs: Add support for fallocate2 John Garry
2023-05-03 23:26   ` Dave Chinner
2023-05-05 22:23     ` Darrick J. Wong
2023-05-05 23:42       ` Dave Chinner
2023-05-03 18:38 ` [PATCH RFC 13/16] scsi: sd: Support reading atomic properties from block limits VPD John Garry
2023-05-03 18:38 ` [PATCH RFC 14/16] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-05-03 18:48   ` Bart Van Assche
2023-05-04  8:17     ` John Garry
2023-05-03 18:38 ` [PATCH RFC 15/16] scsi: scsi_debug: Atomic write support John Garry
2023-05-03 18:38 ` [PATCH RFC 16/16] nvme: Support atomic writes John Garry
2023-05-03 18:49   ` Bart Van Assche
2023-05-04  8:19     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230503213925.GD3223426@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=himanshu.madhani@oracle.com \
    --cc=jejb@linux.ibm.com \
    --cc=jmorris@namei.org \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=paul@paul-moore.com \
    --cc=sagi@grimberg.me \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox