From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] mm: remove lock_page_memcg() from rmap
Date: Wed, 30 Nov 2022 17:30:04 -0500 [thread overview]
Message-ID: <Y4fZbFNVckh1g4WO@cmpxchg.org> (raw)
In-Reply-To: <e0918c92-90cd-e3ed-f4e6-92d02062c252-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
On Wed, Nov 30, 2022 at 09:36:15AM -0800, Hugh Dickins wrote:
> On Wed, 30 Nov 2022, Shakeel Butt wrote:
> >
> > 2. For 6.2 (or 6.3), remove the non-present pte migration with some
> > additional text in the warning and do the rmap cleanup.
>
> I just had an idea for softening the impact of that change: a moment's
> more thought may prove it's a terrible idea, but right now I like it.
>
> What if we keep the non-present pte migration throughout the deprecation
> period, but with a change to the where the folio_trylock() is done, and
> a refusal to move the charge on the page of a non-present pte, if that
> page/folio is currently mapped anywhere else - the folio lock preventing
> it from then becoming mapped while in mem_cgroup_move_account().
I would like that better too. Charge moving has always been lossy
(because of trylocking the page, and having to isolate it), but
categorically leaving private swap pages behind seems like a bit much
to sneak in quietly.
> There's an argument that that's a better implementation anyway: that
> we should not interfere with others' pages; but perhaps it would turn
> out to be unimplementable, or would make for less predictable behaviour.
Hm, I think the below should work for swap pages. Do you see anything
obviously wrong with it, or scenarios I haven't considered?
@@ -5637,6 +5645,46 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* we call find_get_page() with swapper_space directly.
*/
page = find_get_page(swap_address_space(ent), swp_offset(ent));
+
+ /*
+ * Don't move shared charges. This isn't just for saner move
+ * semantics, it also ensures that page_mapped() is stable for
+ * the accounting in mem_cgroup_mapcount().
+ *
+ * We have to serialize against the following paths: fork
+ * (which may copy a page map or a swap pte), fault (which may
+ * change a swap pte into a page map), unmap (which may cause
+ * a page map or a swap pte to disappear), and reclaim (which
+ * may change a page map into a swap pte).
+ *
+ * - Without swapcache, we only want to move the charge if
+ * there are no other swap ptes. With the pte lock, the
+ * swapcount is stable against all of the above scenarios
+ * when it's 1 (our pte), which is the case we care about.
+ *
+ * - When there is a page in swapcache, we only want to move
+ * charges when neither the page nor the swap entry are
+ * mapped elsewhere. The pte lock prevents our pte from
+ * being forked or unmapped. The page lock will stop faults
+ * against, and reclaim of, the swapcache page. So if the
+ * page isn't mapped, and the swap count is 1 (our pte), the
+ * test results are stable and the charge is exclusive.
+ */
+ if (!page && __swap_count(ent) != 1)
+ return NULL;
+
+ if (page) {
+ if (!trylock_page(page)) {
+ put_page(page);
+ return NULL;
+ }
+ if (page_mapped(page) || __swap_count(ent) != 1) {
+ unlock_page(page);
+ put_page(page);
+ return NULL;
+ }
+ }
+
entry->val = ent.val;
return page;
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Shakeel Butt <shakeelb@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: remove lock_page_memcg() from rmap
Date: Wed, 30 Nov 2022 17:30:04 -0500 [thread overview]
Message-ID: <Y4fZbFNVckh1g4WO@cmpxchg.org> (raw)
In-Reply-To: <e0918c92-90cd-e3ed-f4e6-92d02062c252@google.com>
On Wed, Nov 30, 2022 at 09:36:15AM -0800, Hugh Dickins wrote:
> On Wed, 30 Nov 2022, Shakeel Butt wrote:
> >
> > 2. For 6.2 (or 6.3), remove the non-present pte migration with some
> > additional text in the warning and do the rmap cleanup.
>
> I just had an idea for softening the impact of that change: a moment's
> more thought may prove it's a terrible idea, but right now I like it.
>
> What if we keep the non-present pte migration throughout the deprecation
> period, but with a change to the where the folio_trylock() is done, and
> a refusal to move the charge on the page of a non-present pte, if that
> page/folio is currently mapped anywhere else - the folio lock preventing
> it from then becoming mapped while in mem_cgroup_move_account().
I would like that better too. Charge moving has always been lossy
(because of trylocking the page, and having to isolate it), but
categorically leaving private swap pages behind seems like a bit much
to sneak in quietly.
> There's an argument that that's a better implementation anyway: that
> we should not interfere with others' pages; but perhaps it would turn
> out to be unimplementable, or would make for less predictable behaviour.
Hm, I think the below should work for swap pages. Do you see anything
obviously wrong with it, or scenarios I haven't considered?
@@ -5637,6 +5645,46 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* we call find_get_page() with swapper_space directly.
*/
page = find_get_page(swap_address_space(ent), swp_offset(ent));
+
+ /*
+ * Don't move shared charges. This isn't just for saner move
+ * semantics, it also ensures that page_mapped() is stable for
+ * the accounting in mem_cgroup_mapcount().
+ *
+ * We have to serialize against the following paths: fork
+ * (which may copy a page map or a swap pte), fault (which may
+ * change a swap pte into a page map), unmap (which may cause
+ * a page map or a swap pte to disappear), and reclaim (which
+ * may change a page map into a swap pte).
+ *
+ * - Without swapcache, we only want to move the charge if
+ * there are no other swap ptes. With the pte lock, the
+ * swapcount is stable against all of the above scenarios
+ * when it's 1 (our pte), which is the case we care about.
+ *
+ * - When there is a page in swapcache, we only want to move
+ * charges when neither the page nor the swap entry are
+ * mapped elsewhere. The pte lock prevents our pte from
+ * being forked or unmapped. The page lock will stop faults
+ * against, and reclaim of, the swapcache page. So if the
+ * page isn't mapped, and the swap count is 1 (our pte), the
+ * test results are stable and the charge is exclusive.
+ */
+ if (!page && __swap_count(ent) != 1)
+ return NULL;
+
+ if (page) {
+ if (!trylock_page(page)) {
+ put_page(page);
+ return NULL;
+ }
+ if (page_mapped(page) || __swap_count(ent) != 1) {
+ unlock_page(page);
+ put_page(page);
+ return NULL;
+ }
+ }
+
entry->val = ent.val;
return page;
next prev parent reply other threads:[~2022-11-30 22:30 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-23 18:18 [PATCH] mm: remove lock_page_memcg() from rmap Johannes Weiner
2022-11-23 18:18 ` Johannes Weiner
[not found] ` <20221123181838.1373440-1-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-23 18:34 ` Shakeel Butt
2022-11-23 18:34 ` Shakeel Butt
2022-11-24 6:03 ` Hugh Dickins
2022-11-24 6:03 ` Hugh Dickins
2022-11-28 16:59 ` Johannes Weiner
[not found] ` <Y4TpCJ+5uCvWE6co-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-29 19:08 ` Johannes Weiner
2022-11-29 19:08 ` Johannes Weiner
[not found] ` <Y4ZYsrXLBFDIxuoO-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-29 19:23 ` Linus Torvalds
2022-11-29 19:23 ` Linus Torvalds
2022-11-29 19:42 ` Shakeel Butt
2022-11-29 19:42 ` Shakeel Butt
2022-11-30 7:33 ` Hugh Dickins
2022-11-30 7:33 ` Hugh Dickins
[not found] ` <3659bbe0-ccf2-7feb-5465-b287593aa421-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-11-30 16:42 ` Shakeel Butt
2022-11-30 16:42 ` Shakeel Butt
[not found] ` <CALvZod7_FjO-CjzHUpQTsCTm4-68a1eKi_qY=4XdF+g7yMLd4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-30 17:36 ` Hugh Dickins
2022-11-30 17:36 ` Hugh Dickins
[not found] ` <e0918c92-90cd-e3ed-f4e6-92d02062c252-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-11-30 22:30 ` Johannes Weiner [this message]
2022-11-30 22:30 ` Johannes Weiner
[not found] ` <Y4fZbFNVckh1g4WO-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-12-01 0:13 ` Hugh Dickins
2022-12-01 0:13 ` Hugh Dickins
[not found] ` <33f2f836-98a0-b593-1d43-b289d645db5-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-12-01 15:52 ` Johannes Weiner
2022-12-01 15:52 ` Johannes Weiner
2022-12-01 19:28 ` Hugh Dickins
2022-11-30 12:50 ` Michal Hocko
2022-11-30 12:50 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y4fZbFNVckh1g4WO@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-IBi9RG/b67k@public.gmane.org \
--cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.