linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me,
	jejb@linux.ibm.com, martin.petersen@oracle.com,
	djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org,
	dchinner@redhat.com, jack@suse.cz, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org,
	jaswin@linux.ibm.com, bvanassche@acm.org,
	Himanshu Madhani <himanshu.madhani@oracle.com>
Subject: Re: [PATCH v2 01/16] block: Add atomic write operations to request_queue limits
Date: Wed, 13 Dec 2023 20:28:38 +0800	[thread overview]
Message-ID: <ZXmjdnIqGHILTfQN@fedora> (raw)
In-Reply-To: <36ee54b4-b8d5-4b3c-81a0-cc824b6ef68e@oracle.com>

On Wed, Dec 13, 2023 at 09:13:48AM +0000, John Garry wrote:
> > > +
> > >   What:		/sys/block/<disk>/diskseq
> > >   Date:		February 2021
> > > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > > index 0046b447268f..d151be394c98 100644
> > > --- a/block/blk-settings.c
> > > +++ b/block/blk-settings.c
> > > @@ -59,6 +59,10 @@ void blk_set_default_limits(struct queue_limits *lim)
> > >   	lim->zoned = BLK_ZONED_NONE;
> > >   	lim->zone_write_granularity = 0;
> > >   	lim->dma_alignment = 511;
> > > +	lim->atomic_write_unit_min_sectors = 0;
> > > +	lim->atomic_write_unit_max_sectors = 0;
> > > +	lim->atomic_write_max_sectors = 0;
> > > +	lim->atomic_write_boundary_sectors = 0;
> > 
> > Can we move the four into single structure
> 
> There is no precedent for a similar structure in struct queue_limits. So
> would only passing a structure to the blk-settings.c API be ok?

Yes, this structure is part of the new API.

> 
> > and setup them in single
> > API? Then cross-validation can be done in this API.
> 
> I suppose so, if you think that it is better.
> 
> We rely on the driver to provide sound values. I suppose that we can
> sanitize them also (in a single API).

Please make the interface correct from beginning, and one good API is
helpful for both sides, such as isolating problems, easy to locate
bug, abstracting common logic, ...

And relying on API users is absolutely not good design.

> 
> > 
> > >   }
> > >   /**
> > > @@ -183,6 +187,62 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
> > >   }
> > >   EXPORT_SYMBOL(blk_queue_max_discard_sectors);
> > > +/**
> > > + * blk_queue_atomic_write_max_bytes - set max bytes supported by
> > > + * the device for atomic write operations.
> > > + * @q:  the request queue for the device
> > > + * @size: maximum bytes supported
> > > + */
> > > +void blk_queue_atomic_write_max_bytes(struct request_queue *q,
> > > +				      unsigned int bytes)
> > > +{
> > > +	q->limits.atomic_write_max_sectors = bytes >> SECTOR_SHIFT;
> > > +}
> > > +EXPORT_SYMBOL(blk_queue_atomic_write_max_bytes);
> > 
> > What if driver doesn't call it but driver supports atomic write?
> 
> We rely on the driver to do this. Any basic level of testing will show an
> issue if they don't.

Software quality depends on good requirement analysis, design and
implementation, instead of test.

Simply you can not cover all possibilities in test.

> 
> > 
> > I guess the default max sectors should be atomic_write_unit_max_sectors
> > if the feature is enabled.
> 
> Sure. If we have a single API to set all values, then we don't need to worry
> about this (assuming the values are filled in properly).
> 
> > 
> > > +
> > > +/**
> > > + * blk_queue_atomic_write_boundary_bytes - Device's logical block address space
> > > + * which an atomic write should not cross.
> > > + * @q:  the request queue for the device
> > > + * @bytes: must be a power-of-two.
> > > + */
> > > +void blk_queue_atomic_write_boundary_bytes(struct request_queue *q,
> > > +					   unsigned int bytes)
> > > +{
> > > +	q->limits.atomic_write_boundary_sectors = bytes >> SECTOR_SHIFT;
> > > +}
> > > +EXPORT_SYMBOL(blk_queue_atomic_write_boundary_bytes);
> > 
> > Default atomic_write_boundary_sectors should be
> > atomic_write_unit_max_sectors in case of atomic write?
> 
> Having atomic_write_boundary_sectors default to
> atomic_write_unit_max_sectors is effectively same as a default of 0.
> 
> > 
> > > +
> > > +/**
> > > + * blk_queue_atomic_write_unit_min_sectors - smallest unit that can be written
> > > + * atomically to the device.
> > > + * @q:  the request queue for the device
> > > + * @sectors: must be a power-of-two.
> > > + */
> > > +void blk_queue_atomic_write_unit_min_sectors(struct request_queue *q,
> > > +					     unsigned int sectors)
> > > +{
> > > +	struct queue_limits *limits = &q->limits;
> > > +
> > > +	limits->atomic_write_unit_min_sectors = sectors;
> > > +}
> > > +EXPORT_SYMBOL(blk_queue_atomic_write_unit_min_sectors);
> > 
> > atomic_write_unit_min_sectors should be >= (physical block size >> 9)
> > given the minimized atomic write unit is physical sector for all disk.
> 
> For SCSI, we have a granularity VPD value, and when set we pay attention to
> that. If not, we use the phys block size.
> 
> For NVMe, we use the logical block size. For physical block size, that can
> be greater than the logical block size for npwg set, and I don't think it's
> suitable use that as minimum atomic write unit.

I highly suspect it is wrong to use logical block size as minimum
support atomic write unit, given physical block size is supposed to
be the minimum atomic write unit.

> 
> Anyway, I am not too keen on sanitizing this value in this way.
> 
> > 
> > > +
> > > +/*
> > > + * blk_queue_atomic_write_unit_max_sectors - largest unit that can be written
> > > + * atomically to the device.
> > > + * @q: the request queue for the device
> > > + * @sectors: must be a power-of-two.
> > > + */
> > > +void blk_queue_atomic_write_unit_max_sectors(struct request_queue *q,
> > > +					     unsigned int sectors)
> > > +{
> > > +	struct queue_limits *limits = &q->limits;
> > > +
> > > +	limits->atomic_write_unit_max_sectors = sectors;
> > > +}
> > > +EXPORT_SYMBOL(blk_queue_atomic_write_unit_max_sectors);
> > 
> > atomic_write_unit_max_sectors should be >= atomic_write_unit_min_sectors.
> > 
> 
> Again, we rely on the driver to provide sound values. However, as mentioned,
> we can sanitize.

Relying on driver to provide sound value is absolutely bad design from API
viewpoint.

Thanks,
Ming


  reply	other threads:[~2023-12-13 12:28 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-12 11:08 [PATCH v2 00/16] block atomic writes John Garry
2023-12-12 11:08 ` [PATCH v2 01/16] block: Add atomic write operations to request_queue limits John Garry
2023-12-13  1:25   ` Ming Lei
2023-12-13  9:13     ` John Garry
2023-12-13 12:28       ` Ming Lei [this message]
2023-12-13 19:01         ` John Garry
2023-12-14  4:38         ` Martin K. Petersen
2023-12-14 13:46           ` Ming Lei
2023-12-14  4:34     ` Martin K. Petersen
2023-12-14 16:12     ` Christoph Hellwig
2023-12-12 11:08 ` [PATCH v2 02/16] block: Limit atomic writes according to bio and queue limits John Garry
2023-12-12 11:08 ` [PATCH v2 03/16] fs/bdev: Add atomic write support info to statx John Garry
2023-12-13 10:24   ` Jan Kara
2023-12-13 11:02     ` John Garry
2023-12-12 11:08 ` [PATCH v2 04/16] fs: Increase fmode_t size John Garry
2023-12-13 11:20   ` Jan Kara
2023-12-13 13:03     ` John Garry
2023-12-13 13:02   ` Christian Brauner
2023-12-13 13:15     ` John Garry
2023-12-13 16:03     ` Christoph Hellwig
2023-12-14  8:56       ` John Garry
2023-12-12 11:08 ` [PATCH v2 05/16] fs: Add RWF_ATOMIC and IOCB_ATOMIC flags for atomic write support John Garry
2023-12-13 13:31   ` Al Viro
2023-12-13 16:02     ` John Garry
2024-01-22  8:29   ` John Garry
2023-12-12 11:08 ` [PATCH v2 06/16] block: Add REQ_ATOMIC flag John Garry
2023-12-12 11:08 ` [PATCH v2 07/16] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2023-12-12 11:08 ` [PATCH v2 08/16] block: Limit atomic write IO size according to atomic_write_max_sectors John Garry
2023-12-15  2:27   ` Ming Lei
2023-12-15 13:55     ` John Garry
2023-12-12 11:08 ` [PATCH v2 09/16] block: Error an attempt to split an atomic write bio John Garry
2023-12-12 11:08 ` [PATCH v2 10/16] block: Add checks to merging of atomic writes John Garry
2023-12-12 11:08 ` [PATCH v2 11/16] block: Add fops atomic write support John Garry
2023-12-12 11:08 ` [PATCH v2 12/16] scsi: sd: Support reading atomic write properties from block limits VPD John Garry
2023-12-12 11:08 ` [PATCH v2 13/16] scsi: sd: Add WRITE_ATOMIC_16 support John Garry
2023-12-12 11:08 ` [PATCH v2 14/16] scsi: scsi_debug: Atomic write support John Garry
2023-12-12 11:08 ` [PATCH v2 15/16] nvme: Support atomic writes John Garry
2023-12-12 11:08 ` [PATCH v2 16/16] nvme: Ensure atomic writes will be executed atomically John Garry
2023-12-12 16:32 ` [PATCH v2 00/16] block atomic writes Christoph Hellwig
2023-12-13  9:32   ` John Garry
2023-12-13 15:44     ` Christoph Hellwig
2023-12-13 16:27       ` John Garry
2023-12-14 14:37         ` Christoph Hellwig
2023-12-14 15:46           ` John Garry
2023-12-18 22:50           ` Keith Busch
2023-12-19  5:14         ` Darrick J. Wong
2023-12-19  5:21           ` Christoph Hellwig
2023-12-19 12:41             ` John Garry
2023-12-19 15:17               ` Christoph Hellwig
2023-12-19 16:53                 ` John Garry
2023-12-21  6:50                   ` Christoph Hellwig
2023-12-21  9:49                     ` John Garry
2023-12-21 12:19                       ` Christoph Hellwig
2023-12-21 12:48                         ` John Garry
2023-12-21 12:57                           ` Christoph Hellwig
2023-12-21 13:18                             ` John Garry
2023-12-21 13:22                               ` Christoph Hellwig
2023-12-21 13:56                                 ` John Garry
2024-01-16 11:35                                 ` John Garry
2024-01-17 15:02                                   ` Christoph Hellwig
2024-01-17 16:16                                     ` John Garry
2024-01-09  9:55                     ` John Garry
2024-01-09 16:02                       ` Christoph Hellwig
2024-01-09 16:52                         ` John Garry
2024-01-09 23:04                       ` Dave Chinner
2024-01-10  8:55                         ` John Garry
2024-01-10  9:19                         ` Christoph Hellwig
2024-01-11  1:40                           ` Darrick J. Wong
2024-01-11  5:02                             ` Christoph Hellwig
2024-01-11  9:55                               ` John Garry
2024-01-11 14:45                                 ` Christoph Hellwig
2024-01-11 16:11                                   ` John Garry
2024-01-11 16:15                                     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXmjdnIqGHILTfQN@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=himanshu.madhani@oracle.com \
    --cc=jack@suse.cz \
    --cc=jaswin@linux.ibm.com \
    --cc=jbongio@google.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sagi@grimberg.me \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).