From: Andrea Arcangeli <aarcange@redhat.com>
To: Gavin Guo <gavin.guo@canonical.com>
Cc: Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
linux-mm@kvack.org, Petr Holasek <pholasek@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arjan van de Ven <arjan@linux.intel.com>,
Jay Vosburgh <jay.vosburgh@canonical.com>
Subject: Re: [PATCH 1/1] ksm: introduce ksm_max_page_sharing per page deduplication limit
Date: Fri, 28 Oct 2016 20:31:13 +0200 [thread overview]
Message-ID: <20161028183113.GB4611@redhat.com> (raw)
In-Reply-To: <CA+eFSM2WuMYZ8XXo2fJH1SxwTUMRNxAAEgBjrqdhcS4ZMCHMEw@mail.gmail.com>
On Fri, Oct 28, 2016 at 02:26:03PM +0800, Gavin Guo wrote:
> I have tried verifying these patches. However, the default 256
> bytes max_page_sharing still suffers the hung task issue. Then, the
> following sequence has been tried to mitigate the symptom. When the
> value is decreased, it took more time to reproduce the symptom.
> Finally, the value 8 has been tried and I didn't continue with lower
> value.
>
> 128 -> 64 -> 32 -> 16 -> 8
>
> The crashdump has also been investigated.
You should try to get multiple sysrq+l too during the hang.
> stable_node: 0xffff880d36413040 stable_node->hlist->first = 0xffff880e4c9f4cf0
> crash> list hlist_node.next 0xffff880e4c9f4cf0 > rmap_item.lst
>
> $ wc -l rmap_item.lst
> $ 8 rmap_item.lst
>
> This shows that the list is actually reduced to 8 items. I wondered if the
> loop is still consuming a lot of time and hold the mmap_sem too long.
Even the default 256 would be enough (certainly with KVM that doesn't
have a deep anon_vma interval tree).
Perhaps this is an app with a massively large anon_vma interval tree
and uses MADV_MERGEABLE and not qemu/kvm? However then you'd run in
similar issues with anon pages rmap walks so KSM wouldn't be to
blame. The depth of the rmap_items multiplies the cost of the rbtree
walk 512 times but still it shouldn't freeze for seconds.
The important thing here is that the app is in control of the max
depth of the anon_vma interval tree while it's not in control of the
max depth of the rmap_item list, this is why it's fundamental that the
KSM rmap_item list is bounded to a max value, while the depth of the
interval tree is secondary issue because userland has a chance to
optimize for it. If the app deep forks and uses MADV_MERGEABLE that is
possible to optimize in userland. But I guess the app that is using
MADV_MERGEABLE is qemu/kvm for you too so it can't be a too long
interval tree. Furthermore if when the symptom triggers you still get
a long hang even with rmap_item depth of 8 and it just takes longer
time to reach the hanging point, it may be something else.
I assume this is not an upstream kernel, can you reproduce on the
upstream kernel? Sorry but I can't help you any further, if this isn't
first verified on the upstream kernel.
Also if you test on the upstream kernel you can leave the default
value of 256 and then use sysrq+l to get multiple dumps of what's
running in the CPUs. The crash dump is useful as well but it's also
interesting to see what's running most frequently during the hang
(which isn't guaranteed to be shown by the exact point in time the
crash dump is being taken). perf top -g may also help if this is a
computational complexity issue inside the kernel to see where most CPU
is being burnt.
Note the problem was reproduced and verified as fixed. It's quite easy
to reproduce, I used migrate_pages syscall to do that, and after the
deep KSM merging that takes several seconds in strace -tt, while with
the fix it stays in the order of milliseconds. The point is that with
deeper merging the migrate_pages could take minutes in unkillable R
state (or during swapping), while with the KSMscale fix it gets capped
to milliseconds no matter what.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-10-28 18:31 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-10 18:44 RFC [PATCH 0/1] ksm: introduce ksm_max_page_sharing per page deduplication limit Andrea Arcangeli
2015-11-10 18:44 ` [PATCH 1/1] " Andrea Arcangeli
2015-12-09 16:19 ` Petr Holasek
2015-12-09 17:15 ` Andrea Arcangeli
2015-12-09 18:10 ` Andrea Arcangeli
2015-12-10 16:06 ` Petr Holasek
2015-12-11 0:31 ` Andrew Morton
2016-01-14 23:36 ` Hugh Dickins
2016-01-16 17:49 ` Andrea Arcangeli
2016-01-16 18:00 ` Arjan van de Ven
2016-01-18 8:14 ` Hugh Dickins
2016-01-18 14:43 ` Arjan van de Ven
2016-01-18 9:10 ` Hugh Dickins
2016-01-18 9:45 ` Hugh Dickins
2016-01-18 17:46 ` Andrea Arcangeli
2016-03-17 21:34 ` Hugh Dickins
2016-03-17 21:50 ` Andrew Morton
2016-03-18 16:27 ` Andrea Arcangeli
2016-01-18 11:01 ` Mel Gorman
2016-01-18 22:19 ` Andrea Arcangeli
2016-01-19 10:43 ` Mel Gorman
2016-04-06 20:33 ` Rik van Riel
2016-04-06 22:02 ` Andrea Arcangeli
2016-09-21 15:12 ` Gavin Guo
2016-09-21 15:34 ` Andrea Arcangeli
2016-09-22 10:48 ` Gavin Guo
2016-10-28 6:26 ` Gavin Guo
2016-10-28 18:31 ` Andrea Arcangeli [this message]
2017-04-20 3:14 ` Gavin Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161028183113.GB4611@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=gavin.guo@canonical.com \
--cc=hughd@google.com \
--cc=jay.vosburgh@canonical.com \
--cc=linux-mm@kvack.org \
--cc=pholasek@redhat.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).