All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
To: Ira Weiny <ira.weiny@intel.com>, Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] memcpy_from_folio()
Date: Mon, 23 Jan 2023 18:50:04 +0100	[thread overview]
Message-ID: <3660839.MHq7AAxBmi@suse> (raw)
In-Reply-To: <Y80t49tQtWO5N22J@casper.infradead.org>

On domenica 22 gennaio 2023 13:36:51 CET Matthew Wilcox wrote:
> On Fri, Jan 20, 2023 at 11:15:17PM -0800, Ira Weiny wrote:
> > Matthew Wilcox wrote:
> > > I think I have a good folio replacement for memcpy_from_page().  One of
> > > the annoying things about dealing with multi-page folios is that you
> > > can't kmap the entire folio, yet on systems without highmem, you don't
> > > need to.  It's also somewhat annoying in the caller to keep track
> > > of n/len/offset/pos/...
> > > 
> > > I think this is probably the best option.  We could have a loop that
> > > kmaps each page in the folio, but that seems like excessive complexity.
> > 
> > Why?  IMO better to contain the complexity of highmem systems into any
> > memcpy_[to,from]_folio() calls then spread them around the kernel.
> 
> Sure, but look at the conversion that I posted.  It's actually simpler
> than using the memcpy_from_page() API.
> 
> > > I'm happy to have highmem systems be less efficient, since they are
> > > anyway.  Another potential area of concern is that folios can be quite
> > > large and maybe having preemption disabled while we copy 2MB of data
> > > might be a bad thing.  If so, the API is fine with limiting the amount
> > > of data we copy, we just need to find out that it is a problem and
> > > decide what the correct limit is, if it's not folio_size().
> > 
> > Why not map the pages only when needed?  I agree that keeping preemption
> > disabled for a long time is a bad thing.  But kmap_local_page does not
> > disable preemption, only migration.
> 
> Some of the scheduler people aren't terribly happy about even disabling
> migration for a long time.  Is "copying 2MB of data" a long time?  If I've
> done my sums correctly, my current laptop has 2x 16 bit LP-DDR4-4267
> DIMMs installed.  That works out to 17GB/s and so copying 2MB of data
> will take 118us.  Probably OK for even the most demanding workload.
> 
> > Regardless any looping on the maps is going to only be on highmem systems
> > and we can map the pages only if/when needed.  Synchronization of the 
folio
> > should be handled by the caller.  So it is fine to all allow migration
> > during memcpy_from_folio().
> > 
> > So why not loop through the pages only when needed?
> 
> So you're proposing re-enabling migration after calling
> kmap_local_folio()?  I don't really understand.
> 
> > >  fs/ext4/verity.c           |   16 +++++++---------
> > >  include/linux/highmem.h    |   29 +++++++++++++++++++++++++++++
> > >  include/linux/page-flags.h |    1 +
> > >  3 files changed, 37 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
> > > index e4da1704438e..afe847c967a4 100644
> > > --- a/fs/ext4/verity.c
> > > +++ b/fs/ext4/verity.c
> > > @@ -42,18 +42,16 @@ static int pagecache_read(struct inode *inode, void
> > > *buf, size_t count,> > 
> > >  			  loff_t pos)
> > >  
> > >  {
> > >  
> > >  	while (count) {
> > > 
> > > -		size_t n = min_t(size_t, count,
> > > -				 PAGE_SIZE - offset_in_page(pos));
> > > -		struct page *page;
> > > +		struct folio *folio;
> > > +		size_t n;
> > > 
> > > -		page = read_mapping_page(inode->i_mapping, pos >> 
PAGE_SHIFT,
> > > +		folio = read_mapping_folio(inode->i_mapping, pos >> 
PAGE_SHIFT,
> > > 
> > >  					 NULL);
> > 
> > Is this an issue with how many pages get read into the page
> > cache?  I went off on a tangent thinking this read the entire folio into
> > the cache.  But I see now I was wrong.  If this is operating page by page
> > why change this function at all?
> 
> The folio may (indeed _should_) be already present in the cache, otherwise
> the cache isn't doing a very good job.  If read_mapping_folio() ends up
> having to allocate the folio, today it only allocates a single page folio.
> But if somebody else allocated it through the readahead code, and the
> filesystem supports multi-page folios, then it will be larger than a
> single page.  All callers must be prepared to handle a multi-page folio.
> 
> > > -		if (IS_ERR(page))
> > > -			return PTR_ERR(page);
> > > -
> > > -		memcpy_from_page(buf, page, offset_in_page(pos), n);
> > > +		if (IS_ERR(folio))
> > > +			return PTR_ERR(folio);
> > > 
> > > -		put_page(page);
> > > +		n = memcpy_from_file_folio(buf, folio, pos, count);
> > > +		folio_put(folio);
> > > 
> > >  		buf += n;
> > >  		pos += n;
> > > 
> > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> > > index 9fa462561e05..9917357b9e8f 100644
> > > --- a/include/linux/highmem.h
> > > +++ b/include/linux/highmem.h
> > > @@ -414,6 +414,35 @@ static inline void memzero_page(struct page *page,
> > > size_t offset, size_t len)> > 
> > >  	kunmap_local(addr);
> > >  
> > >  }
> > > 
> > > +/**
> > > + * memcpy_from_file_folio - Copy some bytes from a file folio.
> > > + * @to: The destination buffer.
> > > + * @folio: The folio to copy from.
> > > + * @pos: The position in the file.
> > > + * @len: The maximum number of bytes to copy.
> > > + *
> > > + * Copy up to @len bytes from this folio.  This may be limited by
> > > PAGE_SIZE
> > 
> > I have a problem with 'may be limited'.  How is the caller to know this?
> 
> ... from the return value?
> 
> > Won't this propagate a lot of checks in the caller?  Effectively replacing
> > one complexity in the callers for another?
> 
> Look at the caller I converted!  It _reduces_ the amount of checks in
> the caller.
> 
> > > + * if the folio comes from HIGHMEM, and by the size of the folio.
> > > + *
> > > + * Return: The number of bytes copied from the folio.
> > > + */
> > > +static inline size_t memcpy_from_file_folio(char *to, struct folio
> > > *folio,
> > > +		loff_t pos, size_t len)
> > > +{
> > > +	size_t offset = offset_in_folio(folio, pos);
> > > +	char *from = kmap_local_folio(folio, offset);
> > > +
> > > +	if (folio_test_highmem(folio))
> > > +		len = min(len, PAGE_SIZE - offset);
> > > +	else
> > > +		len = min(len, folio_size(folio) - offset);
> > > +
> > > +	memcpy(to, from, len);
> > 
> > Do we need flush_dcache_page() for the pages?
> 
> Why?  memcpy_from_page() doesn't have one.
> 
> > I gave this an attempt today before I realized read_mapping_folio() only
> > reads a single page.  :-(
> > 
> > How does memcpy_from_file_folio() work beyond a single page?  And in that
> > case what is the point?  The more I think about this the more confused I
> > get.
> 
> In the highmem case, we map a single page and so cannot go beyond a
> single page.  If the folio wasn't allocated from highmem, we're just
> using its address directly, so we can access the whole folio.
> 
> Hope I've cleared up your confusions.

As you know I'm not a memory management expert, instead I'm just a user of 
these APIs. However, since I have been Cc'ed, I have read Matthew's RFC and 
the dialogue that follows. 

FWIW, I think I understand and I like this proposal. Therefore, I also hope to 
see it become a "real" patch.

Fabio 




      reply	other threads:[~2023-01-23 17:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20 14:57 [RFC] memcpy_from_folio() Matthew Wilcox
2023-01-21  7:15 ` Ira Weiny
2023-01-22 12:36   ` Matthew Wilcox
2023-01-23 17:50     ` Fabio M. De Francesco [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3660839.MHq7AAxBmi@suse \
    --to=fmdefrancesco@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.