Re: [Lsf-pc] [LSF/MM ATTEND] Memory management -- THP, hugetlb, scalability

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [Lsf-pc] [LSF/MM ATTEND] Memory management -- THP, hugetlb, scalability
Date: Sat, 11 Jan 2014 00:59:34 +0200	[thread overview]
Message-ID: <20140110225934.GA8951@node.dhcp.inet.fi> (raw)
In-Reply-To: <20140110225116.GA5722@linux.intel.com>

On Fri, Jan 10, 2014 at 05:51:16PM -0500, Matthew Wilcox wrote:
> On Fri, Jan 10, 2014 at 07:42:04PM +0200, Kirill A. Shutemov wrote:
> > On Wed, Jan 08, 2014 at 03:13:21PM +0000, Mel Gorman wrote:
> > > I think transparent huge pagecache is likely to crop up for more than one
> > > reason. There is the TLB issue and the motivation that i-TLB pressure is
> > > a problem in some specialised cases. Whatever the merits of that case,
> > > transparent hugepage cache has been raised as a potential solution for
> > > some VM scalability problems. I recognise that dealing with large numbers
> > > of struct pages is now a problem on larger machines (although I have not
> > > seen quantified data on the problem nor do I have access to a machine large
> > > enough to measure it myself) but I'm wary of transparent hugepage cache
> > > being treated as a primary solution for VM scalability problems. Lacking
> > > performance data I have no suggestions on what these alternative solutions
> > > might look like.
> 
> Something I'd like to see discussed (but don't have the MM chops to
> lead a discussion on myself) is the PAGE_CACHE_SIZE vs PAGE_SIZE split.
> This needs to be either fixed or removed, IMO.  It's been in the tree
> since before git history began (ie before 2005), it imposes a reasonably
> large cognitive burden on programmers ("what kind of page size do I want
> here?"), it's not intuitively obvious (to a non-mm person) which page
> size is which, and it's never actually bought us anything because it's
> always been the same!
> 
> Also, it bitrots.  Look at this:
> 
>         pgoff_t pgoff = (((address & PAGE_MASK)
>                         - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>         vmf.pgoff = pgoff;
>         pgoff_t offset = vmf->pgoff;
>         size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
>         if (offset >= size)
>                 return VM_FAULT_SIGBUS;
> 
> That's spread over three functions, but that goes to illustrate my point;
> getting this stuff right is Hard; core mm developers get it wrong, we
> don't have the right types to document whether a variable is in PAGE_SIZE
> or PAGE_CACHE_SIZE units, and we're not getting any benefit from it today.

I also want to drop PAGE_CACHE_*. It's on my todo list almost a year now ;)

> > Sibling topic is THP for XIP (see Matthew's patchset). Guys want to manage
> > persistent memory in 2M chunks where it's possible. And THP (but without
> > struct page in this case) is the obvious solution.
> 
> Not just 2MB, we also want 1GB pages for some special cases.  It looks
> doable (XFS can allocate aligned 1GB blocks).  I've written some
> supporting code that will at least get us to the point where we can
> insert a 1GB page.  I haven't been able to test anything yet.

It's probably doable from fs point of view, but adding PUD-level THP page
is not trivial at all. I think it's more productive better to concentrate
on 2M for now.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-01-10 22:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-03 12:25 [LSF/MM ATTEND] Memory management -- THP, hugetlb, scalability Kirill A. Shutemov
2014-01-03 12:25 ` Kirill A. Shutemov
2014-01-08 15:13 ` [Lsf-pc] " Mel Gorman
2014-01-10 17:42   ` Kirill A. Shutemov
2014-01-10 22:51     ` Matthew Wilcox
2014-01-10 22:51       ` Matthew Wilcox
2014-01-10 22:59       ` Kirill A. Shutemov [this message]
2014-01-11  1:49         ` Matthew Wilcox
2014-01-11  2:55           ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140110225934.GA8951@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mgorman@suse.de \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.