From: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Andew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Alex Shi
<alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org,
tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org,
daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org,
willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
Fengguang Wu
<fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Rong Chen <rong.a.chen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg
Date: Thu, 5 Mar 2020 18:15:40 -0800 (PST) [thread overview]
Message-ID: <alpine.LSU.2.11.2003051642580.1190@eggly.anvils> (raw)
In-Reply-To: <59634b5f-b1b2-3b1d-d845-fd15565fcad4-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3492 bytes --]
On Tue, 3 Mar 2020, Alex Shi wrote:
> 在 2020/3/3 上午6:12, Andrew Morton 写道:
> >> Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
> >> and Yun Wang.
> > I'm not seeing a lot of evidence of review and test activity yet. But
> > I think I'll grab patches 01-06 as they look like fairly
> > straightforward improvements.
>
> cc Fengguang and Rong Chen
>
> I did some local functional testing and kselftest, they all look fine.
> 0day only warn me if some case failed. Is it no news is good news? :)
And now the bad news.
Andrew, please revert those six (or seven as they ended up in mmotm).
5.6-rc4-mm1 without them runs my tmpfs+loop+swapping+memcg+ksm kernel
build loads fine (did four hours just now), but 5.6-rc4-mm1 itself
crashed just after starting - seconds or minutes I didn't see,
but it did not complete an iteration.
I thought maybe those six would be harmless (though I've not looked
at them at all); but knew already that the full series is not good yet:
I gave it a try over 5.6-rc4 on Monday, and crashed very soon on simpler
testing, in different ways from what hits mmotm.
The first thing wrong with the full set was when I tried tmpfs+loop+
swapping kernel builds in "mem=700M cgroup_disabled=memory", of course
with CONFIG_DEBUG_LIST=y. That soon collapsed in a splurge of OOM kills
and list_del corruption messages: __list_del_entry_valid < list_del <
__page_cache_release < __put_page < put_page < __try_to_reclaim_swap <
free_swap_and_cache < shmem_free_swap < shmem_undo_range.
When I next tried with "mem=1G" and memcg enabled (but not being used),
that managed some iterations, no OOM kills, no list_del warnings (was
it swapping? perhaps, perhaps not, I was trying to go easy on it just
to see if "cgroup_disabled=memory" had been the problem); but when
rebooting after that, again list_del corruption messages and crash
(I didn't note them down).
So I didn't take much notice of what the mmotm crash backtrace showed
(but IIRC shmem and swap were in it).
Alex, I'm afraid you're focusing too much on performance results,
without doing the basic testing needed - I thought we had given you
some hints on the challenging areas (swapping, move_charge_at_immigrate,
page migration) when we attached a *correctly working* 5.3 version back
on 23rd August:
https://lore.kernel.org/linux-mm/alpine.LSU.2.11.1908231736001.16920@eggly.anvils/
(Correctly working, except missing two patches I'd mistakenly dropped
as unnecessary in earlier rebases: but our discussions with Johannes
later showed to be very necessary, though their races rarely seen.)
I have not had the time (and do not expect to have the time) to review
your series: maybe it's one or two small fixes away from being complete,
or maybe it's still fundamentally flawed, I do not know. I had naively
hoped that you would help with a patchset that worked, rather than
cutting it down into something which does not.
Submitting your series to routine testing is much easier for me than
reviewing it: but then, yes, it's a pity that I don't find the time
to report the results on intervening versions, which also crashed.
What I have to do now, is set aside time today and tomorrow, to package
up the old scripts I use, describe them and their environment, and send
them to you (cc akpm in case I fall under a bus): so that you can
reproduce the crashes for yourself, and get to work on them.
Hugh
next prev parent reply other threads:[~2020-03-06 2:15 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-02 11:00 [PATCH v9 00/21] per lruvec lru_lock for memcg Alex Shi
2020-03-02 11:00 ` [PATCH v9 01/20] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-03-02 11:00 ` [PATCH v9 02/20] mm/memcg: fold lock_page_lru into commit_charge Alex Shi
2020-03-02 11:00 ` [PATCH v9 04/20] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-03-04 7:47 ` Kirill A. Shutemov
2020-03-04 8:13 ` Alex Shi
2020-03-02 11:00 ` [PATCH v9 06/20] mm/thp: narrow lru locking Alex Shi
[not found] ` <1583146830-169516-7-git-send-email-alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-03-04 8:02 ` Kirill A. Shutemov
2020-03-04 8:51 ` Alex Shi
2020-03-02 11:00 ` [PATCH v9 07/20] mm/lru: introduce TestClearPageLRU Alex Shi
2020-03-02 22:11 ` Andrew Morton
[not found] ` <20200302141144.b30abe0d89306fd387e13a92-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2020-03-03 4:11 ` Alex Shi
[not found] ` <9cacdc21-9c1f-2a17-05cb-e9cf2959cef5-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-03-04 0:46 ` Andrew Morton
[not found] ` <20200303164659.b3a30ab9d68c9ed82299a29c-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2020-03-04 7:06 ` Alex Shi
[not found] ` <6d155f79-8ba2-b322-4e92-311e7be98f79-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-03-04 9:03 ` Rong Chen
2020-03-04 9:37 ` Alex Shi
2020-03-02 11:00 ` [PATCH v9 08/20] mm/lru: add page isolation precondition in __isolate_lru_page Alex Shi
2020-03-02 11:00 ` [PATCH v9 10/20] mm/lru: take PageLRU first in moving page between lru lists Alex Shi
2020-03-02 11:00 ` [PATCH v9 12/20] mm/mlock: clean up __munlock_isolate_lru_page Alex Shi
2020-03-02 11:00 ` [PATCH v9 17/20] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-03-02 11:00 ` [PATCH v9 19/20] mm/lru: add debug checking for page memcg moving Alex Shi
2020-03-02 11:00 ` [PATCH v9 20/20] mm/memcg: add debug checking in lock_page_memcg Alex Shi
[not found] ` <1583146830-169516-1-git-send-email-alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-03-02 11:00 ` [PATCH v9 03/20] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-03-02 11:00 ` [PATCH v9 05/20] mm/thp: clean up lru_add_page_tail Alex Shi
2020-03-02 11:00 ` [PATCH v9 09/20] mm/mlock: ClearPageLRU before get lru lock in munlock page isolation Alex Shi
2020-03-02 11:00 ` [PATCH v9 11/20] mm/memcg: move SetPageLRU out of lru_lock in commit_charge Alex Shi
2020-03-02 11:00 ` [PATCH v9 13/20] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-03-02 11:00 ` [PATCH v9 14/20] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-03-02 11:00 ` [PATCH v9 15/20] mm/mlock: optimize munlock_pagevec by relocking Alex Shi
2020-03-02 11:00 ` [PATCH v9 16/20] mm/swap: only change the lru_lock iff page's lruvec is different Alex Shi
2020-03-02 11:00 ` [PATCH v9 18/20] mm/lru: revise the comments of lru_lock Alex Shi
2020-03-02 22:12 ` [PATCH v9 00/21] per lruvec lru_lock for memcg Andrew Morton
[not found] ` <20200302141202.91d88e8b730b194a8bd8fa7d-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2020-03-03 9:32 ` Alex Shi
[not found] ` <59634b5f-b1b2-3b1d-d845-fd15565fcad4-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
2020-03-06 2:15 ` Hugh Dickins [this message]
[not found] ` <20200304031335.9784-1-hdanton@sina.com>
2020-03-04 7:19 ` [PATCH v9 02/20] mm/memcg: fold lock_page_lru into commit_charge Alex Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.2003051642580.1190@eggly.anvils \
--to=hughd-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org \
--cc=lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org \
--cc=rong.a.chen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox