From: Peter Zijlstra <peterz@infradead.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads
Date: Sat, 29 Jul 2017 11:10:55 +0200 [thread overview]
Message-ID: <20170729091055.GA6524@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20170727153010.23347-4-hannes@cmpxchg.org>
So no, this doesn't have a change in hell of making it.
On Thu, Jul 27, 2017 at 11:30:10AM -0400, Johannes Weiner wrote:
> +static void domain_cpu_update(struct memdelay_domain *md, int cpu,
> + int old, int new)
> +{
> + enum memdelay_domain_state state;
> + struct memdelay_domain_cpu *mdc;
> + unsigned long now, delta;
> + unsigned long flags;
> +
> + mdc = per_cpu_ptr(md->mdcs, cpu);
> + spin_lock_irqsave(&mdc->lock, flags);
Afaict this is inside scheduler locks, this cannot be a spinlock. Also,
do we really want to add more atomics there?
> + if (old) {
> + WARN_ONCE(!mdc->tasks[old], "cpu=%d old=%d new=%d counter=%d\n",
> + cpu, old, new, mdc->tasks[old]);
> + mdc->tasks[old] -= 1;
> + }
> + if (new)
> + mdc->tasks[new] += 1;
> +
> + /*
> + * The domain is somewhat delayed when a number of tasks are
> + * delayed but there are still others running the workload.
> + *
> + * The domain is fully delayed when all non-idle tasks on the
> + * CPU are delayed, or when a delayed task is actively running
> + * and preventing productive tasks from making headway.
> + *
> + * The state times then add up over all CPUs in the domain: if
> + * the domain is fully blocked on one CPU and there is another
> + * one running the workload, the domain is considered fully
> + * blocked 50% of the time.
> + */
> + if (!mdc->tasks[MTS_DELAYED_ACTIVE] && !mdc->tasks[MTS_DELAYED])
> + state = MDS_NONE;
> + else if (mdc->tasks[MTS_WORKING])
> + state = MDS_SOME;
> + else
> + state = MDS_FULL;
> +
> + if (mdc->state == state)
> + goto unlock;
> +
> + now = ktime_to_ns(ktime_get());
ktime_get_ns(), also no ktime in scheduler code.
> + delta = now - mdc->state_start;
> +
> + domain_move_clock(md);
> + md->times[mdc->state] += delta;
> +
> + mdc->state = state;
> + mdc->state_start = now;
> +unlock:
> + spin_unlock_irqrestore(&mdc->lock, flags);
> +}
> +
> +static struct memdelay_domain *memcg_domain(struct mem_cgroup *memcg)
> +{
> +#ifdef CONFIG_MEMCG
> + if (!mem_cgroup_disabled())
> + return memcg->memdelay_domain;
> +#endif
> + return &memdelay_global_domain;
> +}
> +
> +/**
> + * memdelay_task_change - note a task changing its delay/work state
> + * @task: the task changing state
> + * @delayed: 1 when task enters delayed state, -1 when it leaves
> + * @working: 1 when task enters working state, -1 when it leaves
> + * @active_delay: 1 when task enters active delay, -1 when it leaves
> + *
> + * Updates the task's domain counters to reflect a change in the
> + * task's delayed/working state.
> + */
> +void memdelay_task_change(struct task_struct *task, int old, int new)
> +{
> + int cpu = task_cpu(task);
> + struct mem_cgroup *memcg;
> + unsigned long delay = 0;
> +
> +#ifdef CONFIG_DEBUG_VM
> + WARN_ONCE(task->memdelay_state != old,
> + "cpu=%d task=%p state=%d (in_iowait=%d PF_MEMDELAYED=%d) old=%d new=%d\n",
> + cpu, task, task->memdelay_state, task->in_iowait,
> + !!(task->flags & PF_MEMDELAY), old, new);
> + task->memdelay_state = new;
> +#endif
> +
> + /* Account when tasks are entering and leaving delays */
> + if (old < MTS_DELAYED && new >= MTS_DELAYED) {
> + task->memdelay_start = ktime_to_ms(ktime_get());
> + } else if (old >= MTS_DELAYED && new < MTS_DELAYED) {
> + delay = ktime_to_ms(ktime_get()) - task->memdelay_start;
> + task->memdelay_total += delay;
> + }
Scheduler stuff will _NOT_ user ktime_get() and will _NOT_ do pointless
divisions into ms.
> +
> + /* Account domain state changes */
> + rcu_read_lock();
> + memcg = mem_cgroup_from_task(task);
> + do {
> + struct memdelay_domain *md;
> +
> + md = memcg_domain(memcg);
> + md->aggregate += delay;
> + domain_cpu_update(md, cpu, old, new);
> + } while (memcg && (memcg = parent_mem_cgroup(memcg)));
> + rcu_read_unlock();
We are _NOT_ going to do a 3rd cgroup iteration for every task action.
> +};
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kernel-team@fb.com
Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads
Date: Sat, 29 Jul 2017 11:10:55 +0200 [thread overview]
Message-ID: <20170729091055.GA6524@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20170727153010.23347-4-hannes@cmpxchg.org>
So no, this doesn't have a change in hell of making it.
On Thu, Jul 27, 2017 at 11:30:10AM -0400, Johannes Weiner wrote:
> +static void domain_cpu_update(struct memdelay_domain *md, int cpu,
> + int old, int new)
> +{
> + enum memdelay_domain_state state;
> + struct memdelay_domain_cpu *mdc;
> + unsigned long now, delta;
> + unsigned long flags;
> +
> + mdc = per_cpu_ptr(md->mdcs, cpu);
> + spin_lock_irqsave(&mdc->lock, flags);
Afaict this is inside scheduler locks, this cannot be a spinlock. Also,
do we really want to add more atomics there?
> + if (old) {
> + WARN_ONCE(!mdc->tasks[old], "cpu=%d old=%d new=%d counter=%d\n",
> + cpu, old, new, mdc->tasks[old]);
> + mdc->tasks[old] -= 1;
> + }
> + if (new)
> + mdc->tasks[new] += 1;
> +
> + /*
> + * The domain is somewhat delayed when a number of tasks are
> + * delayed but there are still others running the workload.
> + *
> + * The domain is fully delayed when all non-idle tasks on the
> + * CPU are delayed, or when a delayed task is actively running
> + * and preventing productive tasks from making headway.
> + *
> + * The state times then add up over all CPUs in the domain: if
> + * the domain is fully blocked on one CPU and there is another
> + * one running the workload, the domain is considered fully
> + * blocked 50% of the time.
> + */
> + if (!mdc->tasks[MTS_DELAYED_ACTIVE] && !mdc->tasks[MTS_DELAYED])
> + state = MDS_NONE;
> + else if (mdc->tasks[MTS_WORKING])
> + state = MDS_SOME;
> + else
> + state = MDS_FULL;
> +
> + if (mdc->state == state)
> + goto unlock;
> +
> + now = ktime_to_ns(ktime_get());
ktime_get_ns(), also no ktime in scheduler code.
> + delta = now - mdc->state_start;
> +
> + domain_move_clock(md);
> + md->times[mdc->state] += delta;
> +
> + mdc->state = state;
> + mdc->state_start = now;
> +unlock:
> + spin_unlock_irqrestore(&mdc->lock, flags);
> +}
> +
> +static struct memdelay_domain *memcg_domain(struct mem_cgroup *memcg)
> +{
> +#ifdef CONFIG_MEMCG
> + if (!mem_cgroup_disabled())
> + return memcg->memdelay_domain;
> +#endif
> + return &memdelay_global_domain;
> +}
> +
> +/**
> + * memdelay_task_change - note a task changing its delay/work state
> + * @task: the task changing state
> + * @delayed: 1 when task enters delayed state, -1 when it leaves
> + * @working: 1 when task enters working state, -1 when it leaves
> + * @active_delay: 1 when task enters active delay, -1 when it leaves
> + *
> + * Updates the task's domain counters to reflect a change in the
> + * task's delayed/working state.
> + */
> +void memdelay_task_change(struct task_struct *task, int old, int new)
> +{
> + int cpu = task_cpu(task);
> + struct mem_cgroup *memcg;
> + unsigned long delay = 0;
> +
> +#ifdef CONFIG_DEBUG_VM
> + WARN_ONCE(task->memdelay_state != old,
> + "cpu=%d task=%p state=%d (in_iowait=%d PF_MEMDELAYED=%d) old=%d new=%d\n",
> + cpu, task, task->memdelay_state, task->in_iowait,
> + !!(task->flags & PF_MEMDELAY), old, new);
> + task->memdelay_state = new;
> +#endif
> +
> + /* Account when tasks are entering and leaving delays */
> + if (old < MTS_DELAYED && new >= MTS_DELAYED) {
> + task->memdelay_start = ktime_to_ms(ktime_get());
> + } else if (old >= MTS_DELAYED && new < MTS_DELAYED) {
> + delay = ktime_to_ms(ktime_get()) - task->memdelay_start;
> + task->memdelay_total += delay;
> + }
Scheduler stuff will _NOT_ user ktime_get() and will _NOT_ do pointless
divisions into ms.
> +
> + /* Account domain state changes */
> + rcu_read_lock();
> + memcg = mem_cgroup_from_task(task);
> + do {
> + struct memdelay_domain *md;
> +
> + md = memcg_domain(memcg);
> + md->aggregate += delay;
> + domain_cpu_update(md, cpu, old, new);
> + } while (memcg && (memcg = parent_mem_cgroup(memcg)));
> + rcu_read_unlock();
We are _NOT_ going to do a 3rd cgroup iteration for every task action.
> +};
next prev parent reply other threads:[~2017-07-29 9:11 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-27 15:30 [PATCH 0/3] memdelay: memory health metric for systems and workloads Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:30 ` [PATCH 1/3] sched/loadavg: consolidate LOAD_INT, LOAD_FRAC macros Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:30 ` [PATCH 2/3] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:30 ` [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads Johannes Weiner
2017-07-27 15:30 ` Johannes Weiner
2017-07-27 15:56 ` Johannes Weiner
2017-07-27 15:56 ` Johannes Weiner
2017-07-29 9:10 ` Peter Zijlstra [this message]
2017-07-29 9:10 ` Peter Zijlstra
2017-07-30 15:28 ` Johannes Weiner
2017-07-30 15:28 ` Johannes Weiner
2017-07-31 8:31 ` Peter Zijlstra
2017-07-31 8:31 ` Peter Zijlstra
2017-07-31 18:41 ` Johannes Weiner
2017-07-31 18:41 ` Johannes Weiner
2017-07-31 19:49 ` Mike Galbraith
2017-07-31 19:49 ` Mike Galbraith
2017-07-31 20:38 ` Johannes Weiner
2017-07-31 20:38 ` Johannes Weiner
2017-08-01 2:23 ` Mike Galbraith
2017-08-01 2:23 ` Mike Galbraith
2017-08-01 7:57 ` Peter Zijlstra
2017-08-01 7:57 ` Peter Zijlstra
2017-08-01 12:26 ` Johannes Weiner
2017-08-01 12:26 ` Johannes Weiner
2017-08-13 14:52 ` Peter Zijlstra
2017-08-13 14:52 ` Peter Zijlstra
2017-07-29 13:31 ` kbuild test robot
2017-07-27 20:43 ` [PATCH 0/3] memdelay: memory health metric " Andrew Morton
2017-07-27 20:43 ` Andrew Morton
2017-07-28 19:43 ` Johannes Weiner
2017-07-28 19:43 ` Johannes Weiner
2017-08-02 8:11 ` Michal Hocko
2017-08-02 8:11 ` Michal Hocko
2017-07-29 2:48 ` Mike Galbraith
2017-07-29 2:48 ` Mike Galbraith
2017-07-29 3:21 ` Mike Galbraith
2017-07-29 3:21 ` Mike Galbraith
2017-07-29 6:38 ` Mike Galbraith
2017-07-29 6:38 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170729091055.GA6524@worktop.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.