From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DB3F7C4332F for ; Wed, 13 Dec 2023 01:26:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7+Fi662eTBH61l7dziQI31ojIowi0xTylG+eKXF8CnY=; b=tVQj3pJt5nHfsVBz8Blk4AQA6X Zsvgvs17MBgiOkQVfNqSx5+dQBADaiV4JKO5VfBehCNm3qr6eltJbqacxxWPRmWzlqnY3tv/BSDL1 JRLDCP2mTbMNQNJ6DGDouAV5RGqLUsDARw0K9IpDlBtIv+Fru0IheLck4/b+OkN4X+Ih7pjhylc8E LeFYr08znN8X8Xa07BxWyGKaREmLydehpztS7ZRIqM/PGUr5BQbZzNRDYbBk+U636M+JEM8anPyk+ YmskAOpzychhaRQhgWCu1ldFHB65V1kvDLu9VDD5B2Xw51arvpEX0TMdfrvo8PKRzwF2O79G86Jd8 cUQaP1lw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rDE0j-00DJGY-1N; Wed, 13 Dec 2023 01:26:05 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1rDE0g-00DJFw-0u for linux-nvme@lists.infradead.org; Wed, 13 Dec 2023 01:26:04 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702430758; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7+Fi662eTBH61l7dziQI31ojIowi0xTylG+eKXF8CnY=; b=dChUCaQxmLUbiX6MxpsnVJl89hPiD7B91DgQlB3pSbJbKwiucufzodQoSPdgK1gEf2dU03 +7L9iENbwRtDZ4juX06iOoJwbznicxfG2b2vSpRNk8DBdujU44nlKgRT2e6Z5Pr7xROi5l eOKp19LuULGlwHLMustnio8gapDaFZY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-664-C7-uyY69MkKS7aFFTyZZzQ-1; Tue, 12 Dec 2023 20:25:52 -0500 X-MC-Unique: C7-uyY69MkKS7aFFTyZZzQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D1ED9833B41; Wed, 13 Dec 2023 01:25:51 +0000 (UTC) Received: from fedora (unknown [10.72.116.39]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5B04C3C25; Wed, 13 Dec 2023 01:25:42 +0000 (UTC) Date: Wed, 13 Dec 2023 09:25:38 +0800 From: Ming Lei To: John Garry Cc: axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org, jaswin@linux.ibm.com, bvanassche@acm.org, Himanshu Madhani Subject: Re: [PATCH v2 01/16] block: Add atomic write operations to request_queue limits Message-ID: References: <20231212110844.19698-1-john.g.garry@oracle.com> <20231212110844.19698-2-john.g.garry@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231212110844.19698-2-john.g.garry@oracle.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231212_172602_410956_62B9B4FA X-CRM114-Status: GOOD ( 32.26 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On Tue, Dec 12, 2023 at 11:08:29AM +0000, John Garry wrote: > From: Himanshu Madhani > > Add the following limits: > - atomic_write_boundary_bytes > - atomic_write_max_bytes > - atomic_write_unit_max_bytes > - atomic_write_unit_min_bytes > > All atomic writes limits are initialised to 0 to indicate no atomic write > support. Stacked devices are just not supported either for now. > > Signed-off-by: Himanshu Madhani > #jpg: Heavy rewrite > Signed-off-by: John Garry > --- > Documentation/ABI/stable/sysfs-block | 47 ++++++++++++++++++++++ > block/blk-settings.c | 60 ++++++++++++++++++++++++++++ > block/blk-sysfs.c | 33 +++++++++++++++ > include/linux/blkdev.h | 37 +++++++++++++++++ > 4 files changed, 177 insertions(+) > > diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block > index 1fe9a553c37b..ba81a081522f 100644 > --- a/Documentation/ABI/stable/sysfs-block > +++ b/Documentation/ABI/stable/sysfs-block > @@ -21,6 +21,53 @@ Description: > device is offset from the internal allocation unit's > natural alignment. > > +What: /sys/block//atomic_write_max_bytes > +Date: May 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] This parameter specifies the maximum atomic write > + size reported by the device. This parameter is relevant > + for merging of writes, where a merged atomic write > + operation must not exceed this number of bytes. > + The atomic_write_max_bytes may exceed the value in > + atomic_write_unit_max_bytes if atomic_write_max_bytes > + is not a power-of-two or atomic_write_unit_max_bytes is > + limited by some queue limits, such as max_segments. > + > + > +What: /sys/block//atomic_write_unit_min_bytes > +Date: May 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] This parameter specifies the smallest block which can > + be written atomically with an atomic write operation. All > + atomic write operations must begin at a > + atomic_write_unit_min boundary and must be multiples of > + atomic_write_unit_min. This value must be a power-of-two. > + > + > +What: /sys/block//atomic_write_unit_max_bytes > +Date: January 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] This parameter defines the largest block which can be > + written atomically with an atomic write operation. This > + value must be a multiple of atomic_write_unit_min and must > + be a power-of-two. > + > + > +What: /sys/block//atomic_write_boundary_bytes > +Date: May 2023 > +Contact: Himanshu Madhani > +Description: > + [RO] A device may need to internally split I/Os which > + straddle a given logical block address boundary. In that > + case a single atomic write operation will be processed as > + one of more sub-operations which each complete atomically. > + This parameter specifies the size in bytes of the atomic > + boundary if one is reported by the device. This value must > + be a power-of-two. > + > > What: /sys/block//diskseq > Date: February 2021 > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 0046b447268f..d151be394c98 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -59,6 +59,10 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->zoned = BLK_ZONED_NONE; > lim->zone_write_granularity = 0; > lim->dma_alignment = 511; > + lim->atomic_write_unit_min_sectors = 0; > + lim->atomic_write_unit_max_sectors = 0; > + lim->atomic_write_max_sectors = 0; > + lim->atomic_write_boundary_sectors = 0; Can we move the four into single structure and setup them in single API? Then cross-validation can be done in this API. > } > > /** > @@ -183,6 +187,62 @@ void blk_queue_max_discard_sectors(struct request_queue *q, > } > EXPORT_SYMBOL(blk_queue_max_discard_sectors); > > +/** > + * blk_queue_atomic_write_max_bytes - set max bytes supported by > + * the device for atomic write operations. > + * @q: the request queue for the device > + * @size: maximum bytes supported > + */ > +void blk_queue_atomic_write_max_bytes(struct request_queue *q, > + unsigned int bytes) > +{ > + q->limits.atomic_write_max_sectors = bytes >> SECTOR_SHIFT; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_max_bytes); What if driver doesn't call it but driver supports atomic write? I guess the default max sectors should be atomic_write_unit_max_sectors if the feature is enabled. > + > +/** > + * blk_queue_atomic_write_boundary_bytes - Device's logical block address space > + * which an atomic write should not cross. > + * @q: the request queue for the device > + * @bytes: must be a power-of-two. > + */ > +void blk_queue_atomic_write_boundary_bytes(struct request_queue *q, > + unsigned int bytes) > +{ > + q->limits.atomic_write_boundary_sectors = bytes >> SECTOR_SHIFT; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_boundary_bytes); Default atomic_write_boundary_sectors should be atomic_write_unit_max_sectors in case of atomic write? > + > +/** > + * blk_queue_atomic_write_unit_min_sectors - smallest unit that can be written > + * atomically to the device. > + * @q: the request queue for the device > + * @sectors: must be a power-of-two. > + */ > +void blk_queue_atomic_write_unit_min_sectors(struct request_queue *q, > + unsigned int sectors) > +{ > + struct queue_limits *limits = &q->limits; > + > + limits->atomic_write_unit_min_sectors = sectors; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_unit_min_sectors); atomic_write_unit_min_sectors should be >= (physical block size >> 9) given the minimized atomic write unit is physical sector for all disk. > + > +/* > + * blk_queue_atomic_write_unit_max_sectors - largest unit that can be written > + * atomically to the device. > + * @q: the request queue for the device > + * @sectors: must be a power-of-two. > + */ > +void blk_queue_atomic_write_unit_max_sectors(struct request_queue *q, > + unsigned int sectors) > +{ > + struct queue_limits *limits = &q->limits; > + > + limits->atomic_write_unit_max_sectors = sectors; > +} > +EXPORT_SYMBOL(blk_queue_atomic_write_unit_max_sectors); atomic_write_unit_max_sectors should be >= atomic_write_unit_min_sectors. Thanks, Ming