From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 861B6C83F17 for ; Thu, 10 Jul 2025 16:12:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1326B6B0098; Thu, 10 Jul 2025 12:12:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10AB56B0099; Thu, 10 Jul 2025 12:12:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 047AF6B009A; Thu, 10 Jul 2025 12:12:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E8DE96B0098 for ; Thu, 10 Jul 2025 12:12:33 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 94CA01A07BC for ; Thu, 10 Jul 2025 16:12:33 +0000 (UTC) X-FDA: 83648847786.23.18C973A Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf25.hostedemail.com (Postfix) with ESMTP id E7F7AA000C for ; Thu, 10 Jul 2025 16:12:31 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=RAaTgvue; spf=pass (imf25.hostedemail.com: domain of snitzer@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=snitzer@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752163952; a=rsa-sha256; cv=none; b=ceIEPtTnn7/OXqvBkQUbjPYQagKpB5xlz2GRwBHscCYAW4PAJkkZySbUrac/IJ+wcmQLrf QV1Wt8tyUhW36kZwkYcxws2HaNT7J0AgsGN2Euk5BtJ4dzUUynKRUfQ+VuYcgnBYSvmfaZ oeJ5OGiXGBiRfBbDLR/gi2UWVOSPF54= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752163952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ktPVq+7Rq1P+rw3zXHdKlQZ30neDZdMpW49fll344k8=; b=sSFLa96WIC/9ggicRU7yRkXjQatTsULij/wtIS1DI8j+e1w1yY5d/lXg6XHnqDSBy3dy/N qc/smvV/RTqxsAKi/N9S8GEIUyMsj/ZEYkt4W6LXwjd2u7aC4hBHuDMNcXve4KY/0WbCU/ rYIhKANwsFBaQ5KHiVsFBi/yreNMfXU= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=RAaTgvue; spf=pass (imf25.hostedemail.com: domain of snitzer@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=snitzer@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 2E08EA54C2B; Thu, 10 Jul 2025 16:12:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AC826C4CEE3; Thu, 10 Jul 2025 16:12:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752163950; bh=TQxkvxDCdmEvpgrrkG63eeGrJ977f+5aAoJ0l/tKO5I=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RAaTgvueMCvsP/he0Oa/uQbJeWvQW/RSm0EGwq14GyopuaORFtPEy+YcwmTpUjAVa N4HAyO/+FbUzsmE7oojeREJ3mJ36H7P1r9QAhBOp57d+ZIMS+V2VKs/A/ftwdNDx9g rc2/QQcPds0sEZzsisOzL5cdRp/1qB+PBKgNJUXwQz590Bt037fkv3GHiWtUdknSjj XhBNMSwSBm25kLuYbhM3U0Z7cBbwJaqfyTVt02LlDElyKFLOqiHIr9R2KizO+u1/mD 34I09zeBX29vOgk6wcp6XxPhiz5sYI+roBIBu10Az9WGpEFaNBzl4ww2lHfF5/gT4h BforhOeTkwVTA== Date: Thu, 10 Jul 2025 12:12:29 -0400 From: Mike Snitzer To: Keith Busch , Ming Lei , Jens Axboe Cc: Jeff Layton , Chuck Lever , NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey , Trond Myklebust , Anna Schumaker , linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, hch@infradead.org, linux-block@vger.kernel.org Subject: Re: [RFC PATCH v2 4/8] lib/iov_iter: remove piecewise bvec length checking in iov_iter_aligned_bvec Message-ID: References: <20250708160619.64800-1-snitzer@kernel.org> <20250708160619.64800-5-snitzer@kernel.org> <5819d6c5bb194613a14d2dcf05605e701683ba49.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E7F7AA000C X-Stat-Signature: irm3pds1owwo7b1oxg8nzex1xij1z86n X-HE-Tag: 1752163951-390629 X-HE-Meta: U2FsdGVkX18256kLpujsAdV5s8sdYSAwJJdWYrCC19ht/D1+paX6Iu2O1xtoc/Box2s2fbIjr9ddzqkc9brjc0ADwn45yvbfSEhe8TRX49c6JtwtHbyMXDM5B8h7XqrRLIS6CGpkmdPHBqVf2vHYxITEaZquSYk8jhFXMie40VmpQYZE9BfWG9pGVIL69jh1YAfoNcCVALkZDqHOBl1b7Awr4phVXkKrdJ0CszkgGrDGYb+xyKI9QP3BHHbSlsixRyB6EveTEclylkmK2K2TTsILxrdN30P6ik0LVKDYZEMpyaXmIYVvTvEXdfNGzXvF4lJFWgoH0CquVzDlw+BpEtA2V0bn4E+jwwuBrvBYH8Ojda0HAiGkDI31FAGAT7x3hW84ZzKsbTnQz12Qk+Gfy3ijtzl7jPSpPApq8jk1Q32JhI+zWdqHRDh2Ex4r1m8g8vFTRed/kno4nubbfHeO+vtQu+U1LG5WyC9pYXNcjExd5o6o5NhBlxgAW69aSga5KDzvShe9z3ZDa12yNRaBrbDSSrA12lYLXpujqN7lQW7YYSLn8Dbs36lHfyKJ90EnhV6tEckqZup74vHMqUFYBBLqiwWdGirkuFBJeUImAKWHO8/s4oJF2VRgft/+ZJWxl56wZ7WP4GwyuhzHXvwGeO+2/o9uG+NmNHQHw8tO5+MpHp6WxSKAzzGdUN/p1gQgJlwV5P8HYNJt1yh23msMGpYmVrugDzA0W7CnBANJziINcN95YlNZyf2kAEGSM4XoctsC2vBYkKs8II29mHNiR0cnx8aNZixXNgfTTP5LaTQBW7q69rpIhRMJri5E2XVpi0c7WRZ1zuk1z8N+9cGPDHtQ8wTvRuhFNWGh5jVYteUSVRmbf3kbB3tZQUr0EPG5vhPTaeGuty8dJOpOxz9hl4t3kvhNno8ZBGHA17ultUgnsg0P+77bBE5UPIk16aYsUL8xP525fHc8W98KC3H Z8OcLy4G TqGs4EbkSklbp1BmxhaF/AgGXkGvrZhIClUlcxLYXUtY/cnrtU7eTTxKrKgYQeRGL2w6vXkxayD+CLE0rzpw7xnkCqdLcCb9Tep6RBdlLlcEP+Vg5Fk9FluPOGlQhyBvPgCrx5H95A4fsPzn8Vr3vR61OcibjgPMKwzr6Pt/lUJ5XIozynu1Z7IuaQ8ZjIUtTLwLbtJ/ZkZGbbJJWPDm/npND3fN9mX0z6sGYDUVjCtopcoxXEZ/VhHb7obRGDecyHXUPk0W7QgnyNE7EpqIrKobLeBwQICbfRg1Zr7TppAfwlsn9eA1iS1zayVV20mAewvLM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 10, 2025 at 08:48:04AM -0600, Keith Busch wrote: > On Thu, Jul 10, 2025 at 09:52:53AM -0400, Jeff Layton wrote: > > On Tue, 2025-07-08 at 12:06 -0400, Mike Snitzer wrote: > > > iov_iter_aligned_bvec() is strictly checking alignment of each element > > > of the bvec to arrive at whether the bvec is aligned relative to > > > dma_alignment and on-disk alignment. Checking each element > > > individually results in disallowing a bvec that in aggregate is > > > perfectly aligned relative to the provided @len_mask. > > > > > > Relax the on-disk alignment checking such that it is done on the full > > > extent described by the bvec but still do piecewise checking of the > > > dma_alignment for each bvec's bv_offset. > > > > > > This allows for NFS's WRITE payload to be issued using O_DIRECT as > > > long as the bvec created with xdr_buf_to_bvec() is composed of pages > > > that respect the underlying device's dma_alignment (@addr_mask) and > > > the overall contiguous on-disk extent is aligned relative to the > > > logical_block_size (@len_mask). > > > > > > Signed-off-by: Mike Snitzer > > > --- > > > lib/iov_iter.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > > > index bdb37d572e97..b2ae482b8a1d 100644 > > > --- a/lib/iov_iter.c > > > +++ b/lib/iov_iter.c > > > @@ -819,13 +819,14 @@ static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask, > > > unsigned skip = i->iov_offset; > > > size_t size = i->count; > > > > > > + if (size & len_mask) > > > + return false; > > > + > > > do { > > > size_t len = bvec->bv_len; > > > > > > if (len > size) > > > len = size; > > > - if (len & len_mask) > > > - return false; > > > if ((unsigned long)(bvec->bv_offset + skip) & addr_mask) > > > return false; > > > > > > > cc'ing Keith too since he wrote this helper originally. > > Thanks. > > There's a comment in __bio_iov_iter_get_pages that says it expects each > vector to be a multiple of the block size. That makes it easier to > slit when needed, and this patch would allow vectors that break the > current assumption when calculating the "trim" value. Thanks for the pointer, that high-level bio code is being too restrictive. But not seeing any issues with the trim calculation itself, 'trim' is the number of bytes that are past the last logical_block_size aligned boundary. And then iov_iter_revert() will rollback the iov such that it doesn't include those. Then size is reduced by trim bytes. Just restating my challenge: Assuming that each vector is itself a multiple of logical_block_size disallows valid usecases (like the one I have with NFSD needing to use O_DIRECT for its WRITE payload, which can have the head and/or tail vectors at _not_ logical_block_size aligned boundaries). > But for nvme, you couldn't split such a bvec into a usable command > anyway. I think you'd have to introduce a different queue limit to check > against when validating iter alignment if you don't want to use the > logical block size. There isn't a new queue limit in play though, but I get your meaning that we did in fact assume the logical_block_size applicable for each vector. With my patch, we still validate logical_block_size (of overall bvec length, not of each vector) and implicitly validate that each vector has dma_alignment on-disk (while verifying that pages are aligned to dma_alignment). All said, in practice I haven't had any issues with this patch. But it could just be I don't have the stars aligned to test the case that might have problems. If you know of such a case I'd welcome suggestions. Thanks, Mike