From: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
To: Ira Weiny <ira.weiny@intel.com>, Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] memcpy_from_folio()
Date: Mon, 23 Jan 2023 18:50:04 +0100 [thread overview]
Message-ID: <3660839.MHq7AAxBmi@suse> (raw)
In-Reply-To: <Y80t49tQtWO5N22J@casper.infradead.org>
On domenica 22 gennaio 2023 13:36:51 CET Matthew Wilcox wrote:
> On Fri, Jan 20, 2023 at 11:15:17PM -0800, Ira Weiny wrote:
> > Matthew Wilcox wrote:
> > > I think I have a good folio replacement for memcpy_from_page(). One of
> > > the annoying things about dealing with multi-page folios is that you
> > > can't kmap the entire folio, yet on systems without highmem, you don't
> > > need to. It's also somewhat annoying in the caller to keep track
> > > of n/len/offset/pos/...
> > >
> > > I think this is probably the best option. We could have a loop that
> > > kmaps each page in the folio, but that seems like excessive complexity.
> >
> > Why? IMO better to contain the complexity of highmem systems into any
> > memcpy_[to,from]_folio() calls then spread them around the kernel.
>
> Sure, but look at the conversion that I posted. It's actually simpler
> than using the memcpy_from_page() API.
>
> > > I'm happy to have highmem systems be less efficient, since they are
> > > anyway. Another potential area of concern is that folios can be quite
> > > large and maybe having preemption disabled while we copy 2MB of data
> > > might be a bad thing. If so, the API is fine with limiting the amount
> > > of data we copy, we just need to find out that it is a problem and
> > > decide what the correct limit is, if it's not folio_size().
> >
> > Why not map the pages only when needed? I agree that keeping preemption
> > disabled for a long time is a bad thing. But kmap_local_page does not
> > disable preemption, only migration.
>
> Some of the scheduler people aren't terribly happy about even disabling
> migration for a long time. Is "copying 2MB of data" a long time? If I've
> done my sums correctly, my current laptop has 2x 16 bit LP-DDR4-4267
> DIMMs installed. That works out to 17GB/s and so copying 2MB of data
> will take 118us. Probably OK for even the most demanding workload.
>
> > Regardless any looping on the maps is going to only be on highmem systems
> > and we can map the pages only if/when needed. Synchronization of the
folio
> > should be handled by the caller. So it is fine to all allow migration
> > during memcpy_from_folio().
> >
> > So why not loop through the pages only when needed?
>
> So you're proposing re-enabling migration after calling
> kmap_local_folio()? I don't really understand.
>
> > > fs/ext4/verity.c | 16 +++++++---------
> > > include/linux/highmem.h | 29 +++++++++++++++++++++++++++++
> > > include/linux/page-flags.h | 1 +
> > > 3 files changed, 37 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> > > index e4da1704438e..afe847c967a4 100644
> > > --- a/fs/ext4/verity.c
> > > +++ b/fs/ext4/verity.c
> > > @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void
> > > *buf, size_t count,> >
> > > loff_t pos)
> > >
> > > {
> > >
> > > while (count) {
> > >
> > > - size_t n = min_t(size_t, count,
> > > - PAGE_SIZE - offset_in_page(pos));
> > > - struct page *page;
> > > + struct folio *folio;
> > > + size_t n;
> > >
> > > - page = read_mapping_page(inode->i_mapping, pos >>
PAGE_SHIFT,
> > > + folio = read_mapping_folio(inode->i_mapping, pos >>
PAGE_SHIFT,
> > >
> > > NULL);
> >
> > Is this an issue with how many pages get read into the page
> > cache? I went off on a tangent thinking this read the entire folio into
> > the cache. But I see now I was wrong. If this is operating page by page
> > why change this function at all?
>
> The folio may (indeed _should_) be already present in the cache, otherwise
> the cache isn't doing a very good job. If read_mapping_folio() ends up
> having to allocate the folio, today it only allocates a single page folio.
> But if somebody else allocated it through the readahead code, and the
> filesystem supports multi-page folios, then it will be larger than a
> single page. All callers must be prepared to handle a multi-page folio.
>
> > > - if (IS_ERR(page))
> > > - return PTR_ERR(page);
> > > -
> > > - memcpy_from_page(buf, page, offset_in_page(pos), n);
> > > + if (IS_ERR(folio))
> > > + return PTR_ERR(folio);
> > >
> > > - put_page(page);
> > > + n = memcpy_from_file_folio(buf, folio, pos, count);
> > > + folio_put(folio);
> > >
> > > buf += n;
> > > pos += n;
> > >
> > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > > index 9fa462561e05..9917357b9e8f 100644
> > > --- a/include/linux/highmem.h
> > > +++ b/include/linux/highmem.h
> > > @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page,
> > > size_t offset, size_t len)> >
> > > kunmap_local(addr);
> > >
> > > }
> > >
> > > +/**
> > > + * memcpy_from_file_folio - Copy some bytes from a file folio.
> > > + * @to: The destination buffer.
> > > + * @folio: The folio to copy from.
> > > + * @pos: The position in the file.
> > > + * @len: The maximum number of bytes to copy.
> > > + *
> > > + * Copy up to @len bytes from this folio. This may be limited by
> > > PAGE_SIZE
> >
> > I have a problem with 'may be limited'. How is the caller to know this?
>
> ... from the return value?
>
> > Won't this propagate a lot of checks in the caller? Effectively replacing
> > one complexity in the callers for another?
>
> Look at the caller I converted! It _reduces_ the amount of checks in
> the caller.
>
> > > + * if the folio comes from HIGHMEM, and by the size of the folio.
> > > + *
> > > + * Return: The number of bytes copied from the folio.
> > > + */
> > > +static inline size_t memcpy_from_file_folio(char *to, struct folio
> > > *folio,
> > > + loff_t pos, size_t len)
> > > +{
> > > + size_t offset = offset_in_folio(folio, pos);
> > > + char *from = kmap_local_folio(folio, offset);
> > > +
> > > + if (folio_test_highmem(folio))
> > > + len = min(len, PAGE_SIZE - offset);
> > > + else
> > > + len = min(len, folio_size(folio) - offset);
> > > +
> > > + memcpy(to, from, len);
> >
> > Do we need flush_dcache_page() for the pages?
>
> Why? memcpy_from_page() doesn't have one.
>
> > I gave this an attempt today before I realized read_mapping_folio() only
> > reads a single page. :-(
> >
> > How does memcpy_from_file_folio() work beyond a single page? And in that
> > case what is the point? The more I think about this the more confused I
> > get.
>
> In the highmem case, we map a single page and so cannot go beyond a
> single page. If the folio wasn't allocated from highmem, we're just
> using its address directly, so we can access the whole folio.
>
> Hope I've cleared up your confusions.
As you know I'm not a memory management expert, instead I'm just a user of
these APIs. However, since I have been Cc'ed, I have read Matthew's RFC and
the dialogue that follows.
FWIW, I think I understand and I like this proposal. Therefore, I also hope to
see it become a "real" patch.
Fabio
prev parent reply other threads:[~2023-01-23 17:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 14:57 [RFC] memcpy_from_folio() Matthew Wilcox
2023-01-21 7:15 ` Ira Weiny
2023-01-22 12:36 ` Matthew Wilcox
2023-01-23 17:50 ` Fabio M. De Francesco [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3660839.MHq7AAxBmi@suse \
--to=fmdefrancesco@gmail.com \
--cc=ira.weiny@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).