From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F1DCCD1283 for ; Sat, 30 Mar 2024 01:06:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C5E46B0088; Fri, 29 Mar 2024 21:06:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87E6B6B00AD; Fri, 29 Mar 2024 21:06:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EDF66B00B3; Fri, 29 Mar 2024 21:06:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4D2B86B0088 for ; Fri, 29 Mar 2024 21:06:25 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D2761120FA9 for ; Sat, 30 Mar 2024 01:06:24 +0000 (UTC) X-FDA: 81951914688.27.8396C02 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) by imf03.hostedemail.com (Postfix) with ESMTP id 6C2892000B for ; Sat, 30 Mar 2024 01:06:21 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=VugHsG2Y; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf03.hostedemail.com: domain of vadim.fedorenko@linux.dev designates 91.218.175.180 as permitted sender) smtp.mailfrom=vadim.fedorenko@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711760781; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BFgFwDTUylEnErMgQV0Ufr9/Ufnl/0omKSnqywixpVg=; b=apeFDRv/N5mDx107MsQcEoRVNHk2pDt9/Q3v/aEdSrh2TF0BxArRSi6REE25x+4qoxOmKk 46r2kfF5T+XX9Y4UcL8dhBHNKmSarrL4AILZrscyFFkF/oiHMe7xKKwhAhzyc0FHDGNf+C 72MizbgOEaXp88sOhJ8R4BODVWcBdXU= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=VugHsG2Y; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf03.hostedemail.com: domain of vadim.fedorenko@linux.dev designates 91.218.175.180 as permitted sender) smtp.mailfrom=vadim.fedorenko@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711760781; a=rsa-sha256; cv=none; b=G8T5kOw4jBuWDM573A5rTD1t9O82rj0Bj4gO+uwlGfo9ntp1fjRjmV3tT+m/pZEY85RNGu Z2C2Cbp6eGEX5INaIXlWZhDY/UU9wpXv+pWsaC36qqu6tbATLX69X4hOKnBPqsYXlVdfKa e28KDPXSoFdrpzn1J+12Zvww/UqEeo4= Message-ID: <08dd01e3-c45e-47d9-bcde-55f7d1edc480@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1711760777; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BFgFwDTUylEnErMgQV0Ufr9/Ufnl/0omKSnqywixpVg=; b=VugHsG2Y5vYBP20CvcOMmGaEI/5A/PvLnTsQTtTA0ZebTuac4nORyH8iRZe/CwFs5RLRhJ 4Ih/prbwbd8/OSA0Wv9Z9Z9JdeLOJUf8/vLW1xeGCG/2qNeI4CXYcIw3EixotT7o6oviEg ZM4gfY/Y4bUjm5TsY8pyZBWQLZ0Jv74= Date: Fri, 29 Mar 2024 18:06:09 -0700 MIME-Version: 1.0 Subject: Re: [PATCH 19/26] netfs: New writeback implementation Content-Language: en-US To: Naveen Mamindlapalli , David Howells , Christian Brauner , Jeff Layton , Gao Xiang , Dominique Martinet Cc: Matthew Wilcox , Steve French , Marc Dionne , Paulo Alcantara , Shyam Prasad N , Tom Talpey , Eric Van Hensbergen , Ilya Dryomov , "netfs@lists.linux.dev" , "linux-cachefs@redhat.com" , "linux-afs@lists.infradead.org" , "linux-cifs@vger.kernel.org" , "linux-nfs@vger.kernel.org" , "ceph-devel@vger.kernel.org" , "v9fs@lists.linux.dev" , "linux-erofs@lists.ozlabs.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Latchesar Ionkov , Christian Schoenebeck References: <20240328163424.2781320-1-dhowells@redhat.com> <20240328163424.2781320-20-dhowells@redhat.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Vadim Fedorenko In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 6C2892000B X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 4i3dnrn5uiy3yrj7w4gp6hwjra7d5i9t X-HE-Tag: 1711760781-149171 X-HE-Meta: U2FsdGVkX199oHUe2M4GsfLMO0lXdICG6ohiaVu9/EnJHSmkMnMN7Z3zZmll+rrdKLefSSO1L78X60AwMG6Iqxi1oxXuOs84HmtLmd7xeD0kejnTJOOr4m3tgJjvdCoUJp57rbZI561fuoSyUAeu6ztczx7X4ZPUH78OwcjkIBJ7jh3JARIRg+zplWlYHykgFh5JHV/5SO4H75wdu4YpxNxTFhO7hMoi82TrE/xeAW474LZnTsiZe63Zmc5sNbGxvhkLwt5OPAZr8sTQeSTsuY6VE1YRxRYdb61247oc0zVYurBsEP8obrEYuIP5gaQYtmVClLk1MGsBh3VL3Z6PQgpsi09jVFqvfBF8s1kjbT5NVSQISGfMv64UEGzPAcRVFuKCCNuC+0yhBxkdvpDLyEFPFfPaJzGhdfrhaAjsfN+hH9R4WDyrvcKJ7tbdRFAn+BW1lrdvzrab7yJNLKgsF7dt2nq7EwQcwFtseztZaD70/W7gof2GCZokAF71AbdcShjaHfXKsapjkyqWBaH98LB3n0ke34ndg8nZKW9mJezfZj06Nn3wJ4bBnQ72xx54AycA7k9/hqrtsDKHdcik5AJxwgmUF2nrdTavIZIqDviQRIjH+s/YK2pT4jRM4n6PqOwGoPZlpi7VKLdSGdypycjsUEnZbGuZNq5yA1tibgV5aB9e0db2nYoxh6Urd5FVCFZXOmu7G1mrQLuohy7b/IfbpGz4xHkiwfS6pm1iQzfBKOwrgpoazn+Ra2mCSjkPD/sAr8pbePIXlUUN3j5b5qvzPiQ58EJeI6taRsDqHPkK3VBBF6AJtfTIp9s8QxFpPMmkQoTqvh/4hmjqZ8IrQAsMNAgHK2TEURUXQPAp06SM/G7Wv16CXJJyGvLF/ixBFj18sd8anGvnMcJNtO6cO0ATYE7OjxiFQv7Y65CcQ00Y9KwclNEge6y7j4SJ20IJc2udsRLXnPPCMXg6n3i Q+27F0cO aurD6Skbr41JmXMTk7rmmK8sTODM54CVTgZbTQUrTNtQerxYgZCkbwtL0LnjQfH3ufW4bv57QAcp83g7Mpuw//Phy3IYO5IgWrJfXxA9u2MnP70ZqI+69rLbwE1tBNpYL4A0+kxp+24hznLQgVhoaYJl/R+oVrNcTN4qXV2KOPPboGqOAjgbiHiePtBMhMRsgaVbHG/puJyCbsj+xa/KP0MZgdbDeFEOhhbgt/mwA1NE/T6dcq77H5wWkv3x47+IA0PkJI9nNSdI1zat1+CgC+2JQc7QOvnz5VvIfr00Ewv19e5dh8x1B8OdQNoYmMs647gsoiPuLsZwiZj52uyMz+m6iZr+zaJONrZj9gFi+RfnAQhkkAfI3kAGDQDdBjbAzDbc+paNP30HYXU2lVCvcnkVI+bUKzSMeIgFsN+n5bvhtI/olHP3M/72dzlnYLozQQEBOolCN9T/C9vezFMZufyyJkxiMsYZGC7Z4ANOQAOuUW4Ow796xqm4TU5U5Eo5Z9EtobEZ5g6p9DlQXgkxPAZ1Ggq+Qs6HbgthGTWrxoCDCwinUylspKu3IH31gV02VTRgRocclFRYkJdPVE0zJv8XjoSbt5QFnPtFbQExtXyVvZ44fuZTiRRNfyxxJmlFI9LtbHh5cxxTvTFeBCtPC8OJZHUX7CT4Szd43JEFz2bNn/2N2aL4HviPGLPOpT1H/cFBpuFIhI57kMV7rbIW9Oh+S4X9Q5ANreTmgjGQx0pAaMMJIGaAFJ3KFWLvnpd8cQAcBNN/5+LF5EVqL9YJyDHKJRL8Gon443oaw345zi2DqavYsa1fnJWGSH5ZJ3LjIBqIJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 29/03/2024 10:34, Naveen Mamindlapalli wrote: >> -----Original Message----- >> From: David Howells >> Sent: Thursday, March 28, 2024 10:04 PM >> To: Christian Brauner ; Jeff Layton ; >> Gao Xiang ; Dominique Martinet >> >> Cc: David Howells ; Matthew Wilcox >> ; Steve French ; Marc Dionne >> ; Paulo Alcantara ; Shyam >> Prasad N ; Tom Talpey ; Eric Van >> Hensbergen ; Ilya Dryomov ; >> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org; >> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph- >> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux- >> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux- >> kernel@vger.kernel.org; Latchesar Ionkov ; Christian >> Schoenebeck >> Subject: [PATCH 19/26] netfs: New writeback implementation >> >> The current netfslib writeback implementation creates writeback requests of >> contiguous folio data and then separately tiles subrequests over the space >> twice, once for the server and once for the cache. This creates a few >> issues: >> >> (1) Every time there's a discontiguity or a change between writing to only >> one destination or writing to both, it must create a new request. >> This makes it harder to do vectored writes. >> >> (2) The folios don't have the writeback mark removed until the end of the >> request - and a request could be hundreds of megabytes. >> >> (3) In future, I want to support a larger cache granularity, which will >> require aggregation of some folios that contain unmodified data (which >> only need to go to the cache) and some which contain modifications >> (which need to be uploaded and stored to the cache) - but, currently, >> these are treated as discontiguous. >> >> There's also a move to get everyone to use writeback_iter() to extract >> writable folios from the pagecache. That said, currently writeback_iter() >> has some issues that make it less than ideal: >> >> (1) there's no way to cancel the iteration, even if you find a "temporary" >> error that means the current folio and all subsequent folios are going >> to fail; >> >> (2) there's no way to filter the folios being written back - something >> that will impact Ceph with it's ordered snap system; >> >> (3) and if you get a folio you can't immediately deal with (say you need >> to flush the preceding writes), you are left with a folio hanging in >> the locked state for the duration, when really we should unlock it and >> relock it later. >> >> In this new implementation, I use writeback_iter() to pump folios, >> progressively creating two parallel, but separate streams and cleaning up >> the finished folios as the subrequests complete. Either or both streams >> can contain gaps, and the subrequests in each stream can be of variable >> size, don't need to align with each other and don't need to align with the >> folios. >> >> Indeed, subrequests can cross folio boundaries, may cover several folios or >> a folio may be spanned by multiple folios, e.g.: >> >> +---+---+-----+-----+---+----------+ >> Folios: | | | | | | | >> +---+---+-----+-----+---+----------+ >> >> +------+------+ +----+----+ >> Upload: | | |.....| | | >> +------+------+ +----+----+ >> >> +------+------+------+------+------+ >> Cache: | | | | | | >> +------+------+------+------+------+ >> >> The progressive subrequest construction permits the algorithm to be >> preparing both the next upload to the server and the next write to the >> cache whilst the previous ones are already in progress. Throttling can be >> applied to control the rate of production of subrequests - and, in any >> case, we probably want to write them to the server in ascending order, >> particularly if the file will be extended. >> >> Content crypto can also be prepared at the same time as the subrequests and >> run asynchronously, with the prepped requests being stalled until the >> crypto catches up with them. This might also be useful for transport >> crypto, but that happens at a lower layer, so probably would be harder to >> pull off. >> >> The algorithm is split into three parts: >> >> (1) The issuer. This walks through the data, packaging it up, encrypting >> it and creating subrequests. The part of this that generates >> subrequests only deals with file positions and spans and so is usable >> for DIO/unbuffered writes as well as buffered writes. >> >> (2) The collector. This asynchronously collects completed subrequests, >> unlocks folios, frees crypto buffers and performs any retries. This >> runs in a work queue so that the issuer can return to the caller for >> writeback (so that the VM can have its kswapd thread back) or async >> writes. >> >> (3) The retryer. This pauses the issuer, waits for all outstanding >> subrequests to complete and then goes through the failed subrequests >> to reissue them. This may involve reprepping them (with cifs, the >> credits must be renegotiated, and a subrequest may need splitting), >> and doing RMW for content crypto if there's a conflicting change on >> the server. >> >> [!] Note that some of the functions are prefixed with "new_" to avoid >> clashes with existing functions. These will be renamed in a later patch >> that cuts over to the new algorithm. >> >> Signed-off-by: David Howells >> cc: Jeff Layton >> cc: Eric Van Hensbergen >> cc: Latchesar Ionkov >> cc: Dominique Martinet >> cc: Christian Schoenebeck >> cc: Marc Dionne >> cc: v9fs@lists.linux.dev >> cc: linux-afs@lists.infradead.org >> cc: netfs@lists.linux.dev >> cc: linux-fsdevel@vger.kernel.org [..snip..] >> +/* >> + * Begin a write operation for writing through the pagecache. >> + */ >> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t >> len) >> +{ >> + struct netfs_io_request *wreq = NULL; >> + struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp)); >> + >> + mutex_lock(&ictx->wb_lock); >> + >> + wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp, >> + iocb->ki_pos, NETFS_WRITETHROUGH); >> + if (IS_ERR(wreq)) >> + mutex_unlock(&ictx->wb_lock); >> + >> + wreq->io_streams[0].avail = true; >> + trace_netfs_write(wreq, netfs_write_trace_writethrough); > > Missing mutex_unlock() before return. > mutex_unlock() happens in new_netfs_end_writethrough() > Thanks, > Naveen >