From: "Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)" <rruslich@cisco.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Taras Kondratiuk <takondra@cisco.com>,
Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, xe-linux-external@cisco.com,
linux-kernel@vger.kernel.org
Subject: Re: Detecting page cache trashing state
Date: Fri, 27 Oct 2017 23:19:02 +0300 [thread overview]
Message-ID: <d7bc14d7-5ae4-f16d-da38-2bc36d9deae8@cisco.com> (raw)
In-Reply-To: <20171025175424.GA14039@cmpxchg.org>
Hi Johannes,
On 10/25/2017 08:54 PM, Johannes Weiner wrote:
> Hi Ruslan,
>
> sorry about the delayed response, I missed the new activity in this
> older thread.
>
> On Thu, Sep 28, 2017 at 06:49:07PM +0300, Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco) wrote:
>> Hi Johannes,
>>
>> Hopefully I was able to rebase the patch on top v4.9.26 (latest supported
>> version by us right now)
>> and test a bit.
>> The overall idea definitely looks promising, although I have one question on
>> usage.
>> Will it be able to account the time which processes spend on handling major
>> page faults
>> (including fs and iowait time) of refaulting page?
> That's the main thing it should measure! :)
>
> The lock_page() and wait_on_page_locked() calls are where iowaits
> happen on a cache miss. If those are refaults, they'll be counted.
>
>> As we have one big application which code space occupies big amount of place
>> in page cache,
>> when the system under heavy memory usage will reclaim some of it, the
>> application will
>> start constantly thrashing. Since it code is placed on squashfs it spends
>> whole CPU time
>> decompressing the pages and seem memdelay counters are not detecting this
>> situation.
>> Here are some counters to indicate this:
>>
>> 19:02:44 CPU %user %nice %system %iowait %steal %idle
>> 19:02:45 all 0.00 0.00 100.00 0.00 0.00 0.00
>>
>> 19:02:44 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
>> pgscand/s pgsteal/s %vmeff
>> 19:02:45 15284.00 0.00 428.00 352.00 19990.00 0.00 0.00
>> 15802.00 0.00
>>
>> And as nobody actively allocating memory anymore looks like memdelay
>> counters are not
>> actively incremented:
>>
>> [:~]$ cat /proc/memdelay
>> 268035776
>> 6.13 5.43 3.58
>> 1.90 1.89 1.26
> How does it correlate with /proc/vmstat::workingset_activate during
> that time? It only counts thrashing time of refaults it can actively
> detect.
The workingset counters are growing quite actively too. Here are
some numbers per second:
workingset_refault 8201
workingset_activate 389
workingset_restore 187
workingset_nodereclaim 313
> Btw, how many CPUs does this system have? There is a bug in this
> version on how idle time is aggregated across multiple CPUs. The error
> compounds with the number of CPUs in the system.
The system has 2 CPU cores.
> I'm attaching 3 bugfixes that go on top of what you have. There might
> be some conflicts, but they should be minor variable naming issues.
>
I will test with your patches and get back to you.
Thanks,
Ruslan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-27 20:19 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 0:16 Detecting page cache trashing state Taras Kondratiuk
2017-09-15 11:55 ` Zdenek Kabelac
2017-09-15 14:22 ` Daniel Walker
2017-09-15 16:38 ` Taras Kondratiuk
2017-09-15 17:31 ` Daniel Walker
2017-09-15 14:36 ` Michal Hocko
2017-09-15 17:28 ` Taras Kondratiuk
2017-09-18 16:34 ` Johannes Weiner
2017-09-19 10:55 ` [PATCH 1/3] sched/loadavg: consolidate LOAD_INT, LOAD_FRAC macros kbuild test robot
2017-09-19 11:02 ` kbuild test robot
2017-09-28 15:49 ` Detecting page cache trashing state Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-10-25 16:53 ` Daniel Walker
2017-10-25 17:54 ` Johannes Weiner
2017-10-27 20:19 ` Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco) [this message]
2017-11-20 19:40 ` Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-11-27 2:18 ` Minchan Kim
2017-10-26 3:53 ` vinayak menon
2017-10-27 20:29 ` Ruslan Ruslichenko -X (rruslich - GLOBALLOGIC INC at Cisco)
2017-09-15 21:20 ` vcaputo
2017-09-15 23:40 ` Taras Kondratiuk
2017-09-18 5:55 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d7bc14d7-5ae4-f16d-da38-2bc36d9deae8@cisco.com \
--to=rruslich@cisco.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=takondra@cisco.com \
--cc=xe-linux-external@cisco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).