From: Mel Gorman <mgorman@suse.de>
To: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de,
mtosatti@redhat.com, linux-rt-users@vger.kernel.org,
vbabka@suse.cz, cl@linux.com, paulmck@kernel.org,
willy@infradead.org
Subject: Re: [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support
Date: Tue, 29 Mar 2022 10:45:38 +0100 [thread overview]
Message-ID: <20220329094538.GJ4363@suse.de> (raw)
In-Reply-To: <d21d742154cbd6d2b7546533655810e0bf7dd82f.camel@redhat.com>
On Mon, Mar 28, 2022 at 03:51:43PM +0200, Nicolas Saenz Julienne wrote:
> > Now we don't explicitly have this pattern because there isn't an
> > obvious this_cpu_read() for example but it can accidentally happen for
> > counting. __count_zid_vm_events -> __count_vm_events -> raw_cpu_add is
> > an example although a harmless one.
> >
> > Any of the mod_page_state ones are more problematic though because we
> > lock one PCP but potentially update the per-cpu pcp stats of another CPU
> > of a different PCP that we have not locked and those counters must be
> > accurate.
>
> But IIUC vmstats don't track pcplist usage (i.e. adding a page into the local
> pcplist doesn't affect the count at all). It is only when interacting with the
> buddy allocator that they get updated. It makes sense for the CPU that
> adds/removes pages from the allocator to do the stat update, regardless of the
> page's journey.
>
It probably doesn't, I didn't audit it. As I said, it's subtle which is
why I'm wary of relying on accidental safety of getting a per-cpu pointer
that may not be stable. Even if it was ok *now*, I would worry that it
would break in the future. There already has been cases where patches
tried to move vmstats outside the appropriate locking accidentally.
> > It *might* still be safe but it's subtle, it could be easily accidentally
> > broken in the future and it would be hard to detect because it would be
> > very slow corruption of VM counters like NR_FREE_PAGES that must be
> > accurate.
>
> What does accurate mean here? vmstat consumers don't get accurate data, only
> snapshots.
They are accurate in that they have "Eventual Consistency".
zone_page_state_snapshot exists to get a more accurate count but there is
always some drift but it still is accurate eventually. There is a clear
distinction between VM counters which can be inaccurate they are just to
assist debugging and vmstats like NR_FREE_PAGES that the kernel uses to
make decisions. It potentially gets very problematic if a per-cpu pointer
acquired from one zone gets migrated to another zone and the wrong vmstat
is updated. It *might* still be ok, I haven't audited it but if there is a
possible that two CPUs can be doing a RMW on one per-cpu vmstat structure,
it will corrupt and it'll be difficult to detect.
> And as I comment above you can't infer information about pcplist
> usage from these stats. So, I see no real need for CPU locality when updating
> them (which we're still retaining nonetheless, as per my comment above), the
> only thing that is really needed is atomicity, achieved by disabling IRQs (and
> preemption on RT). And this, even with your solution, is achieved through the
> struct zone's spin_lock (plus a preempt_disable() in RT).
>
Yes, but under the series I had, I was using local_lock to stabilise what
CPU is being used before acquiring the per-cpu pointer. Strictly speaking,
it doesn't need a local_lock but the local_lock is clearer in terms of
what is being protected and it works with PROVE_LOCKING which already
caught a problematic softirq interaction for me when developing the series.
> All in all, my point is that none of the stats are affected by the change, nor
> have a dependency with the pcplists handling. And if we ever have the need to
> pin vmstat updates to pcplist usage they should share the same pcp structure.
> That said, I'm happy with either solution as long as we get remote pcplist
> draining. So if still unconvinced, let me know how can I help. I have access to
> all sorts of machines to validate perf results, time to review, or even to move
> the series forward.
>
I also want the remote draining for PREEMPT_RT to avoid interference
of isolated CPUs due to workqueue activity but whatever the solution, I
would be happier if the per-cpu lock is acquired with the CPU stablised
and covers the scope of any vmstat delta updates stored in the per-cpu
structure. The earliest I will be rebasing my series is 5.18-rc1 as I
see limited value in basing it on 5.17 aiming for a 5.19 merge window.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2022-03-29 9:45 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-08 10:07 [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support Nicolas Saenz Julienne
2022-02-08 10:07 ` [PATCH 1/2] mm/page_alloc: Access lists in 'struct per_cpu_pages' indirectly Nicolas Saenz Julienne
2022-03-03 14:33 ` Marcelo Tosatti
2022-02-08 10:07 ` [PATCH 2/2] mm/page_alloc: Add remote draining support to per-cpu lists Nicolas Saenz Julienne
2022-02-08 15:47 ` Marcelo Tosatti
2022-02-15 8:47 ` Nicolas Saenz Julienne
2022-02-15 17:32 ` Paul E. McKenney
2022-02-09 8:55 ` [PATCH 0/2] mm/page_alloc: Remote per-cpu lists drain support Xiongfeng Wang
2022-02-09 9:45 ` Nicolas Saenz Julienne
2022-02-09 11:26 ` Xiongfeng Wang
2022-02-09 11:36 ` Nicolas Saenz Julienne
2022-02-10 10:59 ` Xiongfeng Wang
2022-02-10 11:04 ` Nicolas Saenz Julienne
2022-03-03 11:45 ` Mel Gorman
2022-03-07 13:57 ` Nicolas Saenz Julienne
2022-03-10 16:31 ` Mel Gorman
2022-03-07 20:47 ` Marcelo Tosatti
2022-03-24 18:59 ` Nicolas Saenz Julienne
2022-03-25 10:48 ` Mel Gorman
2022-03-28 13:51 ` Nicolas Saenz Julienne
2022-03-29 9:45 ` Mel Gorman [this message]
2022-03-30 11:29 ` Nicolas Saenz Julienne
2022-03-31 15:24 ` Mel Gorman
2022-03-03 13:27 ` Vlastimil Babka
2022-03-03 14:10 ` Nicolas Saenz Julienne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220329094538.GJ4363@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=nsaenzju@redhat.com \
--cc=paulmck@kernel.org \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.