All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-doc@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Yin Fengwei <fengwei.yin@intel.com>,
	Yang Shi <shy828301@gmail.com>, Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios
Date: Thu, 10 Aug 2023 16:57:11 -0400	[thread overview]
Message-ID: <ZNVPJ9xxd2oarR3I@x1n> (raw)
In-Reply-To: <db3c4d94-a0a9-6703-6fe0-e1b8851e531f@redhat.com>

On Thu, Aug 10, 2023 at 07:47:35PM +0200, David Hildenbrand wrote:
> On 10.08.23 19:15, Peter Xu wrote:
> > On Thu, Aug 10, 2023 at 11:48:27AM +0100, Ryan Roberts wrote:
> > > > For PTE-mapped THP, it might be a bit bigger noise, although I doubt it is
> > > > really significant (judging from my experience on managing PageAnonExclusive
> > > > using set_bit/test_bit/clear_bit when (un)mapping anon pages).
> > > > 
> > > > As folio_add_file_rmap_range() indicates, for PTE-mapped THPs we should be
> > > > batching where possible (and Ryan is working on some more rmap batching).
> > > 
> > > Yes, I've just posted [1] which batches the rmap removal. That would allow you
> > > to convert the per-page atomic_dec() into a (usually) single per-large-folio
> > > atomic_sub().
> > > 
> > > [1] https://lore.kernel.org/linux-mm/20230810103332.3062143-1-ryan.roberts@arm.com/
> > 
> > Right, that'll definitely make more sense, thanks for the link; I'd be very
> > happy to read more later (finally I got some free time recently..).  But
> > then does it mean David's patch can be attached at the end instead of
> > proposed separately and early?
> 
> Not in my opinion. Batching rmap makes sense even without this change, and
> this change makes sense even without batching.
> 
> > 
> > I was asking mostly because I read it as a standalone patch first, and
> > honestly I don't know the effect.  It's based on not only the added atomic
> > ops itself, but also the field changes.
> > 
> > For example, this patch moves Hugh's _nr_pages_mapped into the 2nd tail
> > page, I think it means for any rmap change of any small page of a huge one
> > we'll need to start touching one more 64B cacheline on x86.  I really have
> > no idea what does it mean for especially a large SMP: see 292648ac5cf1 on
> > why I had an impression of that.  But I've no enough experience or clue to
> > prove it a problem either, maybe would be interesting to measure the time
> > needed for some pte-mapped loops?  E.g., something like faulting in a thp,
> 
> Okay, so your speculation right now is:
> 
> 1) The change in cacheline might be problematic.
> 
> 2) The additional atomic operation might be problematic.
> 
> > then measure the split (by e.g. mprotect() at offset 1M on a 4K?) time it
> > takes before/after this patch.
> 
> I can certainly try getting some numbers on that. If you're aware of other
> micro-benchmarks that would likely notice slower pte-mapping of THPs, please
> let me know.

Thanks.

> 
> > 
> > When looking at this, I actually found one thing that is slightly
> > confusing, not directly relevant to your patch, but regarding the reuse of
> > tail page 1 on offset 24 bytes.  Current it's Hugh's _nr_pages_mapped,
> > and you're proposing to replace it with the total mapcount:
> > 
> >          atomic_t   _nr_pages_mapped;     /*    88     4 */
> > 
> > Now my question is.. isn't byte 24 of tail page 1 used for keeping a
> > poisoned mapping?  See prep_compound_tail() where it has:
> > 
> > 	p->mapping = TAIL_MAPPING;
> > 
> > While here mapping is, afaict, also using offset 24 of the tail page 1:
> > 
> >          struct address_space * mapping;  /*    24     8 */
> > 
> > I hope I did a wrong math somewhere, though.
> > 
> 
> I think your math is correct.
> 
> prep_compound_head() is called after prep_compound_tail(), so
> prep_compound_head() wins.
> 
> In __split_huge_page_tail() there is a VM_BUG_ON_PAGE() that explains the
> situation:
> 
> /* ->mapping in first and second tail page is replaced by other uses */
> VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING,
> 	       page_tail);
> 
> Thanks for raising that, I had to look into that myself.

It's so confusing so I did try to document them a bit myself, then I found
maybe I should just post a patch for it and I just did:

https://lore.kernel.org/r/20230810204944.53471-1-peterx@redhat.com

It'll conflict with yours, but I marked RFC so it's never anything urgent,
but maybe that'll be helpful already for you to introduce any new fields
like total_mapcounts.

AFAICS if that patch was all correct (while I'm not yet sure..), you can
actually fit your new total mapcount field into page 1 so even avoid the
extra cacheline access.  You can have a look: the trick is refcount for
tail page 1 is still seems to be free on 32 bits (if that was your worry
before).  Then it'll be very nice if to keep Hugh's counter all in tail 1.

-- 
Peter Xu


  parent reply	other threads:[~2023-08-10 20:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09  8:32 [PATCH mm-unstable v1] mm: add a total mapcount for large folios David Hildenbrand
2023-08-09 15:45 ` Zi Yan
2023-08-09 19:07 ` Ryan Roberts
2023-08-09 19:17   ` David Hildenbrand
2023-08-10 10:40     ` Ryan Roberts
2023-08-10 11:14     ` David Hildenbrand
2023-08-10 11:27       ` David Hildenbrand
2023-08-10 11:32         ` David Hildenbrand
2023-08-10 11:35           ` Ryan Roberts
2023-08-09 19:21   ` Matthew Wilcox
2023-08-09 19:26     ` David Hildenbrand
2023-08-10  3:14       ` Yin Fengwei
2023-08-09 21:23 ` Peter Xu
2023-08-10  3:25   ` Matthew Wilcox
2023-08-10  8:37     ` David Hildenbrand
2023-08-10 21:48       ` Peter Xu
2023-08-10 21:54         ` Matthew Wilcox
2023-08-10 21:59           ` David Hildenbrand
2023-08-11 15:03             ` Peter Xu
2023-08-11 15:14               ` Zi Yan
2023-08-11 15:17               ` David Hildenbrand
2023-08-10  8:59   ` David Hildenbrand
2023-08-10 10:48     ` Ryan Roberts
2023-08-10 17:15       ` Peter Xu
2023-08-10 17:47         ` David Hildenbrand
2023-08-10 19:02           ` Ryan Roberts
2023-08-10 20:57           ` Peter Xu [this message]
2023-08-10 21:48             ` Matthew Wilcox
2023-08-10 22:27               ` David Hildenbrand
2023-08-11 15:18                 ` Peter Xu
2023-08-11 15:32                   ` David Hildenbrand
2023-08-11 15:58                     ` Peter Xu
2023-08-11 16:08                       ` David Hildenbrand
2023-08-11 16:11                         ` Zi Yan
2023-08-11 22:18                           ` Peter Xu
2023-08-10 22:16             ` David Hildenbrand
2023-08-10  3:24 ` Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNVPJ9xxd2oarR3I@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=hughd@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.