From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Naveen Mamindlapalli <naveenm@marvell.com>,
David Howells <dhowells@redhat.com>,
Christian Brauner <christian@brauner.io>,
Jeff Layton <jlayton@kernel.org>,
Gao Xiang <hsiangkao@linux.alibaba.com>,
Dominique Martinet <asmadeus@codewreck.org>
Cc: Matthew Wilcox <willy@infradead.org>,
Steve French <smfrench@gmail.com>,
Marc Dionne <marc.dionne@auristor.com>,
Paulo Alcantara <pc@manguebit.com>,
Shyam Prasad N <sprasad@microsoft.com>,
Tom Talpey <tom@talpey.com>,
Eric Van Hensbergen <ericvh@kernel.org>,
Ilya Dryomov <idryomov@gmail.com>,
"netfs@lists.linux.dev" <netfs@lists.linux.dev>,
"linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
"linux-afs@lists.infradead.org" <linux-afs@lists.infradead.org>,
"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"v9fs@lists.linux.dev" <v9fs@lists.linux.dev>,
"linux-erofs@lists.ozlabs.org" <linux-erofs@lists.ozlabs.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Latchesar Ionkov <lucho@ionkov.net>,
Christian Schoenebeck <linux_oss@crudebyte.com>
Subject: Re: [PATCH 19/26] netfs: New writeback implementation
Date: Fri, 29 Mar 2024 18:06:09 -0700 [thread overview]
Message-ID: <08dd01e3-c45e-47d9-bcde-55f7d1edc480@linux.dev> (raw)
In-Reply-To: <SJ2PR18MB5635A86C024316BC5E57B79EA23A2@SJ2PR18MB5635.namprd18.prod.outlook.com>
On 29/03/2024 10:34, Naveen Mamindlapalli wrote:
>> -----Original Message-----
>> From: David Howells <dhowells@redhat.com>
>> Sent: Thursday, March 28, 2024 10:04 PM
>> To: Christian Brauner <christian@brauner.io>; Jeff Layton <jlayton@kernel.org>;
>> Gao Xiang <hsiangkao@linux.alibaba.com>; Dominique Martinet
>> <asmadeus@codewreck.org>
>> Cc: David Howells <dhowells@redhat.com>; Matthew Wilcox
>> <willy@infradead.org>; Steve French <smfrench@gmail.com>; Marc Dionne
>> <marc.dionne@auristor.com>; Paulo Alcantara <pc@manguebit.com>; Shyam
>> Prasad N <sprasad@microsoft.com>; Tom Talpey <tom@talpey.com>; Eric Van
>> Hensbergen <ericvh@kernel.org>; Ilya Dryomov <idryomov@gmail.com>;
>> netfs@lists.linux.dev; linux-cachefs@redhat.com; linux-afs@lists.infradead.org;
>> linux-cifs@vger.kernel.org; linux-nfs@vger.kernel.org; ceph-
>> devel@vger.kernel.org; v9fs@lists.linux.dev; linux-erofs@lists.ozlabs.org; linux-
>> fsdevel@vger.kernel.org; linux-mm@kvack.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Latchesar Ionkov <lucho@ionkov.net>; Christian
>> Schoenebeck <linux_oss@crudebyte.com>
>> Subject: [PATCH 19/26] netfs: New writeback implementation
>>
>> The current netfslib writeback implementation creates writeback requests of
>> contiguous folio data and then separately tiles subrequests over the space
>> twice, once for the server and once for the cache. This creates a few
>> issues:
>>
>> (1) Every time there's a discontiguity or a change between writing to only
>> one destination or writing to both, it must create a new request.
>> This makes it harder to do vectored writes.
>>
>> (2) The folios don't have the writeback mark removed until the end of the
>> request - and a request could be hundreds of megabytes.
>>
>> (3) In future, I want to support a larger cache granularity, which will
>> require aggregation of some folios that contain unmodified data (which
>> only need to go to the cache) and some which contain modifications
>> (which need to be uploaded and stored to the cache) - but, currently,
>> these are treated as discontiguous.
>>
>> There's also a move to get everyone to use writeback_iter() to extract
>> writable folios from the pagecache. That said, currently writeback_iter()
>> has some issues that make it less than ideal:
>>
>> (1) there's no way to cancel the iteration, even if you find a "temporary"
>> error that means the current folio and all subsequent folios are going
>> to fail;
>>
>> (2) there's no way to filter the folios being written back - something
>> that will impact Ceph with it's ordered snap system;
>>
>> (3) and if you get a folio you can't immediately deal with (say you need
>> to flush the preceding writes), you are left with a folio hanging in
>> the locked state for the duration, when really we should unlock it and
>> relock it later.
>>
>> In this new implementation, I use writeback_iter() to pump folios,
>> progressively creating two parallel, but separate streams and cleaning up
>> the finished folios as the subrequests complete. Either or both streams
>> can contain gaps, and the subrequests in each stream can be of variable
>> size, don't need to align with each other and don't need to align with the
>> folios.
>>
>> Indeed, subrequests can cross folio boundaries, may cover several folios or
>> a folio may be spanned by multiple folios, e.g.:
>>
>> +---+---+-----+-----+---+----------+
>> Folios: | | | | | | |
>> +---+---+-----+-----+---+----------+
>>
>> +------+------+ +----+----+
>> Upload: | | |.....| | |
>> +------+------+ +----+----+
>>
>> +------+------+------+------+------+
>> Cache: | | | | | |
>> +------+------+------+------+------+
>>
>> The progressive subrequest construction permits the algorithm to be
>> preparing both the next upload to the server and the next write to the
>> cache whilst the previous ones are already in progress. Throttling can be
>> applied to control the rate of production of subrequests - and, in any
>> case, we probably want to write them to the server in ascending order,
>> particularly if the file will be extended.
>>
>> Content crypto can also be prepared at the same time as the subrequests and
>> run asynchronously, with the prepped requests being stalled until the
>> crypto catches up with them. This might also be useful for transport
>> crypto, but that happens at a lower layer, so probably would be harder to
>> pull off.
>>
>> The algorithm is split into three parts:
>>
>> (1) The issuer. This walks through the data, packaging it up, encrypting
>> it and creating subrequests. The part of this that generates
>> subrequests only deals with file positions and spans and so is usable
>> for DIO/unbuffered writes as well as buffered writes.
>>
>> (2) The collector. This asynchronously collects completed subrequests,
>> unlocks folios, frees crypto buffers and performs any retries. This
>> runs in a work queue so that the issuer can return to the caller for
>> writeback (so that the VM can have its kswapd thread back) or async
>> writes.
>>
>> (3) The retryer. This pauses the issuer, waits for all outstanding
>> subrequests to complete and then goes through the failed subrequests
>> to reissue them. This may involve reprepping them (with cifs, the
>> credits must be renegotiated, and a subrequest may need splitting),
>> and doing RMW for content crypto if there's a conflicting change on
>> the server.
>>
>> [!] Note that some of the functions are prefixed with "new_" to avoid
>> clashes with existing functions. These will be renamed in a later patch
>> that cuts over to the new algorithm.
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: Eric Van Hensbergen <ericvh@kernel.org>
>> cc: Latchesar Ionkov <lucho@ionkov.net>
>> cc: Dominique Martinet <asmadeus@codewreck.org>
>> cc: Christian Schoenebeck <linux_oss@crudebyte.com>
>> cc: Marc Dionne <marc.dionne@auristor.com>
>> cc: v9fs@lists.linux.dev
>> cc: linux-afs@lists.infradead.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org
[..snip..]
>> +/*
>> + * Begin a write operation for writing through the pagecache.
>> + */
>> +struct netfs_io_request *new_netfs_begin_writethrough(struct kiocb *iocb, size_t
>> len)
>> +{
>> + struct netfs_io_request *wreq = NULL;
>> + struct netfs_inode *ictx = netfs_inode(file_inode(iocb->ki_filp));
>> +
>> + mutex_lock(&ictx->wb_lock);
>> +
>> + wreq = netfs_create_write_req(iocb->ki_filp->f_mapping, iocb->ki_filp,
>> + iocb->ki_pos, NETFS_WRITETHROUGH);
>> + if (IS_ERR(wreq))
>> + mutex_unlock(&ictx->wb_lock);
>> +
>> + wreq->io_streams[0].avail = true;
>> + trace_netfs_write(wreq, netfs_write_trace_writethrough);
>
> Missing mutex_unlock() before return.
>
mutex_unlock() happens in new_netfs_end_writethrough()
> Thanks,
> Naveen
>
next prev parent reply other threads:[~2024-03-30 1:06 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-28 16:33 [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache David Howells
2024-03-28 16:33 ` [PATCH 01/26] cifs: Fix duplicate fscache cookie warnings David Howells
2024-04-15 11:25 ` Jeff Layton
2024-04-15 13:03 ` David Howells
2024-04-15 22:51 ` Steve French
2024-04-16 22:40 ` David Howells
2024-03-28 16:33 ` [PATCH 02/26] 9p: Clean up some kdoc and unused var warnings David Howells
2024-03-28 16:33 ` [PATCH 03/26] netfs: Update i_blocks when write committed to pagecache David Howells
2024-04-15 11:28 ` Jeff Layton
2024-04-16 22:47 ` David Howells
2024-03-28 16:33 ` [PATCH 04/26] netfs: Replace PG_fscache by setting folio->private and marking dirty David Howells
2024-03-28 16:33 ` [PATCH 05/26] mm: Remove the PG_fscache alias for PG_private_2 David Howells
2024-03-28 16:33 ` [PATCH 06/26] netfs: Remove deprecated use of PG_private_2 as a second writeback flag David Howells
2024-03-28 16:33 ` [PATCH 07/26] netfs: Make netfs_io_request::subreq_counter an atomic_t David Howells
2024-03-28 16:34 ` [PATCH 08/26] netfs: Use subreq_counter to allocate subreq debug_index values David Howells
2024-03-28 16:34 ` [PATCH 09/26] mm: Provide a means of invalidation without using launder_folio David Howells
2024-04-15 11:41 ` Jeff Layton
2024-04-17 9:02 ` David Howells
2024-03-28 16:34 ` [PATCH 10/26] cifs: Use alternative invalidation to " David Howells
2024-03-28 16:34 ` [PATCH 11/26] 9p: " David Howells
2024-04-15 11:43 ` Jeff Layton
2024-04-16 23:03 ` David Howells
2024-03-28 16:34 ` [PATCH 12/26] afs: " David Howells
2024-03-28 16:34 ` [PATCH 13/26] netfs: Remove ->launder_folio() support David Howells
2024-03-28 16:34 ` [PATCH 14/26] netfs: Use mempools for allocating requests and subrequests David Howells
2024-03-28 16:34 ` [PATCH 15/26] mm: Export writeback_iter() David Howells
2024-04-03 8:59 ` Christoph Hellwig
2024-04-03 10:10 ` David Howells
2024-04-03 10:14 ` Christoph Hellwig
2024-04-03 10:55 ` David Howells
2024-04-03 12:41 ` Christoph Hellwig
2024-04-03 12:58 ` David Howells
2024-04-05 6:53 ` Christoph Hellwig
2024-04-05 10:15 ` Christian Brauner
2024-03-28 16:34 ` [PATCH 16/26] netfs: Switch to using unsigned long long rather than loff_t David Howells
2024-03-28 16:34 ` [PATCH 17/26] netfs: Fix writethrough-mode error handling David Howells
2024-04-15 12:40 ` Jeff Layton
2024-04-17 9:04 ` David Howells
2024-03-28 16:34 ` [PATCH 18/26] netfs: Add some write-side stats and clean up some stat names David Howells
2024-03-28 16:34 ` [PATCH 19/26] netfs: New writeback implementation David Howells
2024-03-29 10:34 ` Naveen Mamindlapalli
2024-03-30 1:06 ` Vadim Fedorenko [this message]
2024-03-30 1:03 ` Vadim Fedorenko
2024-03-28 16:34 ` [PATCH 20/26] netfs, afs: Implement helpers for new write code David Howells
2024-03-28 16:34 ` [PATCH 21/26] netfs, 9p: " David Howells
2024-03-28 16:34 ` [PATCH 22/26] netfs, cachefiles: " David Howells
2024-03-28 16:34 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
2024-03-28 16:34 ` [PATCH 24/26] netfs: Remove the old " David Howells
2024-04-15 12:20 ` Jeff Layton
2024-04-17 10:36 ` David Howells
2024-03-28 16:34 ` [PATCH 25/26] netfs: Miscellaneous tidy ups David Howells
2024-03-28 16:34 ` [PATCH 26/26] netfs, afs: Use writeback retry to deal with alternate keys David Howells
2024-04-01 13:53 ` Simon Horman
2024-04-02 8:32 ` David Howells
2024-04-10 17:38 ` Simon Horman
2024-04-11 7:09 ` David Howells
2024-04-02 8:46 ` [PATCH 19/26] netfs: New writeback implementation David Howells
2024-04-02 10:48 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Christian Brauner
2024-04-04 7:51 ` [PATCH 21/26] netfs, 9p: Implement helpers for new write code David Howells
2024-04-04 8:01 ` David Howells
2024-04-08 15:53 ` [PATCH 23/26] netfs: Cut over to using new writeback code David Howells
2024-04-15 12:49 ` [PATCH 00/26] netfs, afs, 9p, cifs: Rework netfs to use ->writepages() to copy to cache Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08dd01e3-c45e-47d9-bcde-55f7d1edc480@linux.dev \
--to=vadim.fedorenko@linux.dev \
--cc=asmadeus@codewreck.org \
--cc=ceph-devel@vger.kernel.org \
--cc=christian@brauner.io \
--cc=dhowells@redhat.com \
--cc=ericvh@kernel.org \
--cc=hsiangkao@linux.alibaba.com \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-afs@lists.infradead.org \
--cc=linux-cachefs@redhat.com \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux_oss@crudebyte.com \
--cc=lucho@ionkov.net \
--cc=marc.dionne@auristor.com \
--cc=naveenm@marvell.com \
--cc=netdev@vger.kernel.org \
--cc=netfs@lists.linux.dev \
--cc=pc@manguebit.com \
--cc=smfrench@gmail.com \
--cc=sprasad@microsoft.com \
--cc=tom@talpey.com \
--cc=v9fs@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).