From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46142944F for ; Wed, 6 Aug 2025 15:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754492683; cv=none; b=CyIHiEeFkCirBWLfjmR2YSZmMp7T0IVdt7rbeDUZemAP3bm9NxmIEiunLu8GAdIi2rD9riv3D9isDONbZqTwQqjRPoIeOuqH1QkohVrIH3qLXfBYUemli4a56ZLHas/hQtcIeISIUhgru9Vr7JNxxue1LGCnBCGmCTGrOrSxqvU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754492683; c=relaxed/simple; bh=UYqCNGyT3EmjFK3mZkAylSZ+X76TSycu9O7yKJ+mh1Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=eU0LhAkdMKVylX0n0SHZVbIN2hFJ9L9A15+OsXuvCtNfTotB6ztYlByNM5t6jQTXsxegcVH4u0R3/G3b6PCoJCuyneER9MSLZqB00afbI5QbH5cke+5UaTUOR9orngscgSybLH1ku2ZWPsaIZiIJ4gLe2jlX7IwVa5n0CLeNwCU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Iph8x/TO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Iph8x/TO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81F8FC4CEE7; Wed, 6 Aug 2025 15:04:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1754492682; bh=UYqCNGyT3EmjFK3mZkAylSZ+X76TSycu9O7yKJ+mh1Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Iph8x/TOWRyaus5Uiz2WYsGsEI/z2wj2GYEEkcKdS7FmhUa06ENCuHrYm8EKOAmGR UY5mcKrLIZ0gjz3+wex3oapU0/VAdHY1bnohB9Z5VUbehAP6MTj/IBLQws9veijsG2 /G+qZKzVx/NDN9cc6d7TYLjN9fovgDG0AFDYvfsF7gZecxUPwgUg/WhlqrNBHAS4Ud UXdJ+/3aExPbrFv++2LYJxW09iTxoizP0M5B6H7Is6QMiROW/BFlWpHPV84SWWstEb DcJAucdFZ5+Dg6aLoUqvzRY7yYWCmxyP0ewb2kYr1/NpeLGmegDyKriw3f8YmeOJTL 9oY9SLVgfuMOg== Date: Wed, 6 Aug 2025 09:04:40 -0600 From: Keith Busch To: Christoph Hellwig Cc: Keith Busch , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, axboe@kernel.dk Subject: Re: [PATCH 2/2] nvme: remove virtual boundary for sgl capable devices Message-ID: References: <20250805195608.2379107-1-kbusch@meta.com> <20250805195608.2379107-2-kbusch@meta.com> <20250806145514.GB20102@lst.de> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250806145514.GB20102@lst.de> On Wed, Aug 06, 2025 at 04:55:14PM +0200, Christoph Hellwig wrote: > On Tue, Aug 05, 2025 at 12:56:08PM -0700, Keith Busch wrote: > > From: Keith Busch > > > > The nvme virtual boundary is only for the PRP format. Devices that can > > use the SGL format don't need it for IO queues. Drop reporting it for > > such PCIe devices; fabrics target will continue to use the limit. > > That's not quite true any more as of 6.17. We now also rely it for > efficiently mapping multiple segments into a single IOMMU mapping. > So by not enforcing it for IOMMU mode. In many cases we're better > off splitting I/O rather forcing a non-optimized IOMMU mapping. Patch 1 removes the reliance on the virt boundary for the IOMMU. This makes it possible for NVMe to use this optimization on ARM64 SMMU, which we saw earlier can come in a larger granularity than NVMe's. Without patch 1, NVMe could never use that optimization on such an architecture, but now it can applications that choose to subscribe to that alignment. This patch, though, is more about being able to utilize user space buffers directly that can not be split into any valid IO's. This is possible now with patch one not relying on the virt boundary for IOMMU optimizations. In truth, for my use case, the IOMMU is either set to off or passthrough, so that optimzation isn't reachable. The use case I'm going for is taking zero-copy receive buffers from a network device and directly using them for storage IO. The user data doesn't arrive in nicely aligned segments from there.