From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 93D17D3ABCC for ; Mon, 11 Nov 2024 16:29:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=W9APuBn2tj0JZuN6500cml/JX8VT6aaSbRtIw7hOmtg=; b=Yyc71645uvq6C94fPFePy/VHoG XsHjEySLPGnjDakO3G7pRybJTb4S8+yT4T6gp0WfEjYC27OO1vJVdVtrcsX2mtVKV+mx/hSLVwrSS riRn5qfI2zAm+WxS7HW2SYFHaMxJ0J3hCo32U5HcjblBI96dtmaJZPQliplNlLfzngOCHHB+A9NhA rgNyh93yWZnDCdEiKlkwatNhyxj/UoezEW7HWdgG+GkwrQ0Z00AnhowD/Y1oQwFByY5beZNzBLS9k V7RMuRR35QEXetHGSfIcZAUutgWtc14q/OVVIQ6HJHtxPQjNeCmgedPtRbrDzRgZmXVJkkwaWTJhU WGKO9BXw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tAXIM-00000000a2Y-0FdG; Mon, 11 Nov 2024 16:29:42 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tAXGM-00000000ZkV-2vyQ for linux-nvme@lists.infradead.org; Mon, 11 Nov 2024 16:27:40 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 119965C41F6; Mon, 11 Nov 2024 16:26:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 555DAC4CECF; Mon, 11 Nov 2024 16:27:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731342457; bh=fJMQuYsECNO8SL8MoMztWuhXknHywKxO6bpFKaQ4/tk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rZQf0+4yfZt5LKsbrtjWZLo88wX1EU9zpd2YctRXKMVU2fN+CkFXHpbROUtJe9O15 /kvN6WXl0rbXesHJ9B4wWSh8aCTrBBS3SIAtmhOFzGpuEEqWhzh0RaYs1XKEIUI4hX jTCd+eGz8qgqTgz2lTuCmRmo0/lsl6Rv3brNNr5Kyh94rFdbCbHMAoGyIe/5WSlVSS KDyUpkbgXIp5Isx+39PNn0sYUtTAIv/aoK10rcaNSI2qUKBKih0Gu5/UpFCwEycIoe rebe0foqrYFiPhXRxGui7NFwBduK/0JiigPipoP54RuVXtPlZ/JmOoVxRlVChkz9X8 R2MQ0jI6pEHiQ== Date: Mon, 11 Nov 2024 09:27:33 -0700 From: Keith Busch To: Christoph Hellwig Cc: Keith Busch , linux-block@vger.kernel.org, linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org, axboe@kernel.dk, martin.petersen@oracle.com, asml.silence@gmail.com, javier.gonz@samsung.com, joshi.k@samsung.com Subject: Re: [PATCHv11 0/9] write hints with nvme fdp and scsi streams Message-ID: References: <20241108193629.3817619-1-kbusch@meta.com> <20241111102914.GA27870@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241111102914.GA27870@lst.de> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241111_082738_849912_ACBA649D X-CRM114-Status: GOOD ( 32.37 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On Mon, Nov 11, 2024 at 11:29:14AM +0100, Christoph Hellwig wrote: > On Fri, Nov 08, 2024 at 11:36:20AM -0800, Keith Busch wrote: > > Default partition split so partition one gets all the write hints > > exclusively > > I still don't think this actually works as expected, as the user > interface says the write streams are contigous, and with the bitmap > they aren't. > > As I seem to have a really hard time to get my point across, I instead > spent this morning doing a POC of what I mean, and pushed it here: > > http://git.infradead.org/?p=users/hch/misc.git;a=shortlog;h=refs/heads/block-write-streams Just purely for backward compatibility, I don't think you can have the nvme driver error out if a stream is too large. The fcntl lifetime hint never errored out before, which gets set unconditionally from the file_inode without considering the block device's max write stream. > The big differences are: > > - there is a separate write_stream value now instead of overloading > the write hint. For now it is an 8-bit field for the internal > data structures so that we don't have to grow the bio, but all the > user interfaces are kept at 16 bits (or in case of statx reduced to > it). If this becomes now enough because we need to support devices > with multiple reclaim groups we'll have to find some space by using > unions or growing structures As far as I know, 255 possible streams exceeds any use case I know about. > - block/fops.c is the place to map the existing write hints into > the write streams instead of the driver I might be something here, but that part sure looks the same as what's in this series. > - the stream granularity is added, because adding it to statx at a > later time would be nasty. Getting it in nvme is actually amazingly > cumbersome so I gave up on that and just fed a dummy value for > testing, though Just regarding the documentation on the write_stream_granularity, you don't need to discard the entire RU in a single command. You can invalidate the RU simply by overwriting the LBAs without ever issuing any discard commands. If you really want to treat it this way, you need to ensure the first LBA written to an RU is always aligned to NPDA/NPDAL. If this is really what you require to move this forward, though, that's fine with me. > - the partitions remapping is now done using an offset into the global > write stream space so that the there is a contiguous number space. > The interface for this is rather hacky, so only treat it as a start > for interface and use case discussions. > - the generic stack limits code stopped stacking the max write > streams. While it does the right thing for simple things like > multipath and mirroring/striping is is wrong for anything non-trivial > like parity raid. I've left this as a separate fold patch for the > discussion.