From: Yu Zhao <yuzhao@google.com>
To: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
"npiggin@gmail.com" <npiggin@gmail.com>
Subject: Re: [PATCH 3/3] mm/lru_gen: Don't build multi-gen LRU page table walk code on architecture not supported
Date: Tue, 27 Jun 2023 13:10:55 -0600 [thread overview]
Message-ID: <CAOUHufZZFNXx9Zi1QRSQ+JrWvcHYo_D9-cEM_gEV7KSdgB73_A@mail.gmail.com> (raw)
In-Reply-To: <e814b7e7-541b-518b-a63d-4fc2e7b87ab5@linux.ibm.com>
On Tue, Jun 27, 2023 at 5:48 AM Aneesh Kumar K V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 6/26/23 10:34 PM, Yu Zhao wrote:
> > On Mon, Jun 26, 2023 at 4:52 AM Aneesh Kumar K V
> > <aneesh.kumar@linux.ibm.com> wrote:
> >>
> >> On 6/26/23 1:04 AM, Yu Zhao wrote:
> >>> On Sat, Jun 24, 2023 at 8:54 AM Aneesh Kumar K.V
> >>> <aneesh.kumar@linux.ibm.com> wrote:
> >>>>
> >>>> Hi Yu Zhao,
> >>>>
> >>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >>>>
> >>>>> Not all architecture supports hardware atomic updates of access bits. On
> >>>>> such an arch, we don't use page table walk to classify pages into
> >>>>> generations. Add a kernel config option and remove adding all the page
> >>>>> table walk code on such architecture.
> >>>>>
> >>>>> No preformance change observed with mongodb ycsb test:
> >>>>>
> >>>>> Patch details Throughput(Ops/sec)
> >>>>> without patch 93278
> >>>>> With patch 93400
> >>>>>
> >>>>> Without patch:
> >>>>> $ size mm/vmscan.o
> >>>>> text data bss dec hex filename
> >>>>> 112102 42721 40 154863 25cef mm/vmscan.o
> >>>>>
> >>>>> With patch
> >>>>>
> >>>>> $ size mm/vmscan.o
> >>>>> text data bss dec hex filename
> >>>>> 105430 41333 24 146787 23d63 mm/vmscan.o
> >>>>>
> >>>>
> >>>> Any feedback on this patch? Can we look at merging this change?
> >>>
> >>> Just want to make sure I fully understand the motivation: are there
> >>> any other end goals besides reducing the footprint mentioned above?
> >>> E.g., preparing for HCA, etc. (My current understanding is that HCA
> >>> shouldn't care about it, since it's already runtime disabled if HCA
> >>> doesn't want to use it.)
> >>>
> >>
> >> My goal with this change was to remove all those dead code from getting complied
> >> in for ppc64.
> >
> > I see. But the first thing (lru_gen_add_folio()) you moved has nothing
> > to do with this goal, because it's still compiled after the entire
> > series.
> >
>
> Sure. will drop that change.
>
> >>> Also as explained offline, solely relying on folio_activate() in
> >>> lru_gen_look_around() can cause a measure regression on powerpc,
> >>> because
> >>> 1. PAGEVEC_SIZE is 15 whereas pglist_data->mm_walk.batched is
> >>> virtually unlimited.
> >>> 2. Once folio_activate() reaches that limit, it takes the LRU lock on
> >>> top of the PTL, which can be shared by multiple page tables on
> >>> powerpc.
> >>>
> >>> In fact, I think we try the opposite direction first, before arriving
> >>> at any conclusions, i.e.,
> >>> #define arch_has_hw_pte_young() radix_enabled()
> >>
> >> The reason it is disabled on powerpc was that a reference bit update takes a pagefault
> >> on powerpc irrespective of the translation mode.
> >
> > This is not true.
> >
> > From "IBM POWER9 Processor User Manual":
> > https://openpowerfoundation.org/resources/ibmpower9usermanual/
> >
> > 4.10.14 Reference and Change Bits
> > ...
> > When performing HPT translation, the hardware performs the R and C
> > bit updates nonatomically.
> > ...
> >
> > The radix case is more complex, and I'll leave it to you to interpret
> > what it means:
> >
> > From "Power ISA Version 3.0 B":
> > https://openpowerfoundation.org/specifications/isa/
> >
> > 5.7.12 Reference and Change Recording
> > ...
> > For Radix Tree translation, the Reference and Change bits are set atomically.
> > ...
> >
>
> it is atomic in that software use ldarx/stdcx to update these bits. Hardware/core won't
> update this directly even though Nest can update this directly without taking a fault. So
> for all purpose we can assume that on radix R/C bit is updated by page fault handler.
Thanks. To me, it sounds like stating a function provided by h/w, not
a requirement for s/w. (IMO, the latter would be something like
"software must/should set the bits atomically.) But I'll take your
word for it.
parent reply other threads:[~2023-06-27 19:12 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <e814b7e7-541b-518b-a63d-4fc2e7b87ab5@linux.ibm.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOUHufZZFNXx9Zi1QRSQ+JrWvcHYo_D9-cEM_gEV7KSdgB73_A@mail.gmail.com \
--to=yuzhao@google.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).