From: Johannes Weiner <hannes@cmpxchg.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads
Date: Tue, 1 Aug 2017 08:26:34 -0400 [thread overview]
Message-ID: <20170801122634.GA7237@cmpxchg.org> (raw)
In-Reply-To: <20170801075728.GE6524@worktop.programming.kicks-ass.net>
On Tue, Aug 01, 2017 at 09:57:28AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 31, 2017 at 02:41:42PM -0400, Johannes Weiner wrote:
> > On Mon, Jul 31, 2017 at 10:31:11AM +0200, Peter Zijlstra wrote:
>
> > > So could you start by describing what actual statistics we need? Because
> > > as is the scheduler already does a gazillion stats and why can't re
> > > repurpose some of those?
> >
> > If that's possible, that would be great of course.
> >
> > We want to be able to tell how many tasks in a domain (the system or a
> > memory cgroup) are inside a memdelay section as opposed to how many
>
> And you haven't even defined wth a memdelay section is yet..
It's what a task is in after it calls memdelay_enter() and before it
calls memdelay_leave().
Tasks mark themselves to be in a memory section when they know to
perform work that is necessary due to a lack of memory, such as
waiting for a refault or a direct reclaim invocation.
>From the patch:
+/**
+ * memdelay_enter - mark the beginning of a memory delay section
+ * @flags: flags to handle nested memdelay sections
+ *
+ * Marks the calling task as being delayed due to a lack of memory,
+ * such as waiting for a workingset refault or performing reclaim.
+ */
+/**
+ * memdelay_leave - mark the end of a memory delay section
+ * @flags: flags to handle nested memdelay sections
+ *
+ * Marks the calling task as no longer delayed due to memory.
+ */
where a reclaim callsite looks like this (decluttered):
memdelay_enter()
nr_reclaimed = do_try_to_free_pages()
memdelay_leave()
That's what defines the "unproductive due to lack of memory" state of
a task. Time spent in that state weighed against time spent while the
task is productive - runnable or in iowait while not in a memdelay
section - gives the memory health of the task. And the system and
cgroup states/health can be derived from task states as described:
> > are in a "productive" state such as runnable or iowait. Then derive
> > from that whether the domain as a whole is unproductive (all non-idle
> > tasks memdelayed), or partially unproductive (some delayed, but CPUs
> > are productive or there are iowait tasks). Then derive the percentages
> > of walltime the domain spends partially or fully unproductive.
> >
> > For that we need per-domain counters for
> >
> > 1) nr of tasks in memdelay sections
> > 2) nr of iowait or runnable/queued tasks that are NOT inside
> > memdelay sections
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads
Date: Tue, 1 Aug 2017 08:26:34 -0400 [thread overview]
Message-ID: <20170801122634.GA7237@cmpxchg.org> (raw)
In-Reply-To: <20170801075728.GE6524@worktop.programming.kicks-ass.net>
On Tue, Aug 01, 2017 at 09:57:28AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 31, 2017 at 02:41:42PM -0400, Johannes Weiner wrote:
> > On Mon, Jul 31, 2017 at 10:31:11AM +0200, Peter Zijlstra wrote:
>
> > > So could you start by describing what actual statistics we need? Because
> > > as is the scheduler already does a gazillion stats and why can't re
> > > repurpose some of those?
> >
> > If that's possible, that would be great of course.
> >
> > We want to be able to tell how many tasks in a domain (the system or a
> > memory cgroup) are inside a memdelay section as opposed to how many
>
> And you haven't even defined wth a memdelay section is yet..
It's what a task is in after it calls memdelay_enter() and before it
calls memdelay_leave().
Tasks mark themselves to be in a memory section when they know to
perform work that is necessary due to a lack of memory, such as
waiting for a refault or a direct reclaim invocation.
>From the patch:
+/**
+ * memdelay_enter - mark the beginning of a memory delay section
+ * @flags: flags to handle nested memdelay sections
+ *
+ * Marks the calling task as being delayed due to a lack of memory,
+ * such as waiting for a workingset refault or performing reclaim.
+ */
+/**
+ * memdelay_leave - mark the end of a memory delay section
+ * @flags: flags to handle nested memdelay sections
+ *
+ * Marks the calling task as no longer delayed due to memory.
+ */
where a reclaim callsite looks like this (decluttered):
memdelay_enter()
nr_reclaimed = do_try_to_free_pages()
memdelay_leave()
That's what defines the "unproductive due to lack of memory" state of
a task. Time spent in that state weighed against time spent while the
task is productive - runnable or in iowait while not in a memdelay
section - gives the memory health of the task. And the system and
cgroup states/health can be derived from task states as described:
> > are in a "productive" state such as runnable or iowait. Then derive
> > from that whether the domain as a whole is unproductive (all non-idle
> > tasks memdelayed), or partially unproductive (some delayed, but CPUs
> > are productive or there are iowait tasks). Then derive the percentages
> > of walltime the domain spends partially or fully unproductive.
> >
> > For that we need per-domain counters for
> >
> > 1) nr of tasks in memdelay sections
> > 2) nr of iowait or runnable/queued tasks that are NOT inside
> > memdelay sections
next prev parent reply other threads:[~2017-08-01 12:26 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-27 15:30 [PATCH 0/3] memdelay: memory health metric for systems and workloads Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:30 ` [PATCH 1/3] sched/loadavg: consolidate LOAD_INT, LOAD_FRAC macros Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:30 ` [PATCH 2/3] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:30 ` [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:56 ` Johannes Weiner
2017-07-27 15:56 ` Johannes Weiner
2017-07-29 9:10 ` Peter Zijlstra
2017-07-29 9:10 ` Peter Zijlstra
2017-07-30 15:28 ` Johannes Weiner
2017-07-30 15:28 ` Johannes Weiner
2017-07-31 8:31 ` Peter Zijlstra
2017-07-31 8:31 ` Peter Zijlstra
2017-07-31 18:41 ` Johannes Weiner
2017-07-31 18:41 ` Johannes Weiner
2017-07-31 19:49 ` Mike Galbraith
2017-07-31 19:49 ` Mike Galbraith
2017-07-31 20:38 ` Johannes Weiner
2017-07-31 20:38 ` Johannes Weiner
2017-08-01 2:23 ` Mike Galbraith
2017-08-01 2:23 ` Mike Galbraith
2017-08-01 7:57 ` Peter Zijlstra
2017-08-01 7:57 ` Peter Zijlstra
2017-08-01 12:26 ` Johannes Weiner [this message]
2017-08-01 12:26 ` Johannes Weiner
2017-08-13 14:52 ` Peter Zijlstra
2017-08-13 14:52 ` Peter Zijlstra
2017-07-29 13:31 ` kbuild test robot
2017-07-27 20:43 ` [PATCH 0/3] memdelay: memory health metric " Andrew Morton
2017-07-27 20:43 ` Andrew Morton
2017-07-28 19:43 ` Johannes Weiner
2017-07-28 19:43 ` Johannes Weiner
2017-08-02 8:11 ` Michal Hocko
2017-08-02 8:11 ` Michal Hocko
2017-07-29 2:48 ` Mike Galbraith
2017-07-29 2:48 ` Mike Galbraith
2017-07-29 3:21 ` Mike Galbraith
2017-07-29 3:21 ` Mike Galbraith
2017-07-29 6:38 ` Mike Galbraith
2017-07-29 6:38 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170801122634.GA7237@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.