From: viresh kumar <viresh.kumar@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Lameter <cl@linux.com>,
Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
vinmenon@codeaurora.org, shashim@codeaurora.org,
Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
dave@stgolabs.net, Konstantin Khlebnikov <koct9i@gmail.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Suresh Siddha <suresh.b.siddha@intel.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
Date: Sat, 28 Mar 2015 17:27:23 +0530 [thread overview]
Message-ID: <55169723.3070006@linaro.org> (raw)
In-Reply-To: <20150328095322.GH27490@worktop.programming.kicks-ass.net>
On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:
> Well, for one your patch is indeed disgusting.
Yeah, I agree :)
> But yes I'm aware Thomas
> wants to rewrite the timer thing. But Thomas is away for a little while
> and if this really needs to happen then it does.
Sometime back I was trying to use another bit from base pointer for
marking a timer as PINNED:
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..e7184f57449c 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
*/
#define TIMER_DEFERRABLE 0x1LU
#define TIMER_IRQSAFE 0x2LU
+#define TIMER_PINNED 0x4LU
-#define TIMER_FLAG_MASK 0x3LU
+#define TIMER_FLAG_MASK 0x7LU
And Fenguang's build-bot showed the problem (only) on blackfin [1].
config: make ARCH=blackfin allyesconfig
All error/warnings:
kernel/timer.c: In function 'init_timers':
>> kernel/timer.c:1683:2: error: call to '__compiletime_assert_1683'
>> declared with attribute error: BUILD_BUG_ON failed:
>> __alignof__(struct tvec_base) & TIMER_FLAG_MASK
So probably we need to make 'base' aligned to 8 bytes ?
So, what you are suggesting is something like this (untested):
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..68bf09d69352 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
*/
#define TIMER_DEFERRABLE 0x1LU
#define TIMER_IRQSAFE 0x2LU
+#define TIMER_RUNNING 0x4LU
-#define TIMER_FLAG_MASK 0x3LU
+#define TIMER_FLAG_MASK 0x7LU
#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
.entry = { .prev = TIMER_ENTRY_STATIC }, \
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..8f9efa64bd34 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -105,6 +105,21 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
}
+static inline unsigned int tbase_get_running(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base & TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_set_running(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base | TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_clear_running(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base & ~TIMER_RUNNING);
+}
+
static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
{
return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -781,21 +796,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
new_base = per_cpu(tvec_bases, cpu);
if (base != new_base) {
- /*
- * We are trying to schedule the timer on the local CPU.
- * However we can't change timer's base while it is running,
- * otherwise del_timer_sync() can't detect that the timer's
- * handler yet has not finished. This also guarantees that
- * the timer is serialized wrt itself.
- */
- if (likely(base->running_timer != timer)) {
- /* See the comment in lock_timer_base() */
- timer_set_base(timer, NULL);
- spin_unlock(&base->lock);
- base = new_base;
- spin_lock(&base->lock);
- timer_set_base(timer, base);
- }
+ /* See the comment in lock_timer_base() */
+ timer_set_base(timer, NULL);
+ spin_unlock(&base->lock);
+ base = new_base;
+ spin_lock(&base->lock);
+ timer_set_base(timer, base);
}
timer->expires = expires;
@@ -1016,7 +1022,7 @@ int try_to_del_timer_sync(struct timer_list *timer)
base = lock_timer_base(timer, &flags);
- if (base->running_timer != timer) {
+ if (tbase_get_running(timer->base)) {
timer_stats_timer_clear_start_info(timer);
ret = detach_if_pending(timer, base, true);
}
@@ -1202,6 +1208,7 @@ static inline void __run_timers(struct tvec_base *base)
timer_stats_account_timer(timer);
base->running_timer = timer;
+ tbase_set_running(timer->base);
detach_expired_timer(timer, base);
if (irqsafe) {
@@ -1216,6 +1223,7 @@ static inline void __run_timers(struct tvec_base *base)
}
}
base->running_timer = NULL;
+ tbase_clear_running(timer->base);
spin_unlock_irq(&base->lock);
}
------------x--------------------x----------------------
Right?
Now there are few issues I see here (Sorry if they are all imaginary):
- In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
us to a lot of other problems? It wouldn't be serialized to itself anymore ?
- Because the timer has migrated to another CPU, the locking in __run_timers()
needs to be fixed. And that will make it complicated ..
- __run_timer() doesn't lock bases of other CPUs, and it has to do it now..
- We probably need to take locks of both local CPU and the one to which timer migrated.
- Its possible now that there can be more than one running timer for a base, which wasn't
true earlier. Not sure if it will break something.
Thanks for your continuous support to reply to my (sometimes stupid) queries.
--
viresh
[1] https://lists.01.org/pipermail/kbuild-all/2014-April/003982.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: viresh kumar <viresh.kumar@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Lameter <cl@linux.com>,
Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
vinmenon@codeaurora.org, shashim@codeaurora.org,
Michal Hocko <mhocko@suse.cz>, Mel Gorman <mgorman@suse.de>,
dave@stgolabs.net, Konstantin Khlebnikov <koct9i@gmail.com>,
Linux Memory Management List <linux-mm@kvack.org>,
Suresh Siddha <suresh.b.siddha@intel.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work
Date: Sat, 28 Mar 2015 17:27:23 +0530 [thread overview]
Message-ID: <55169723.3070006@linaro.org> (raw)
In-Reply-To: <20150328095322.GH27490@worktop.programming.kicks-ass.net>
On 28 March 2015 at 15:23, Peter Zijlstra <peterz@infradead.org> wrote:
> Well, for one your patch is indeed disgusting.
Yeah, I agree :)
> But yes I'm aware Thomas
> wants to rewrite the timer thing. But Thomas is away for a little while
> and if this really needs to happen then it does.
Sometime back I was trying to use another bit from base pointer for
marking a timer as PINNED:
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..e7184f57449c 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
*/
#define TIMER_DEFERRABLE 0x1LU
#define TIMER_IRQSAFE 0x2LU
+#define TIMER_PINNED 0x4LU
-#define TIMER_FLAG_MASK 0x3LU
+#define TIMER_FLAG_MASK 0x7LU
And Fenguang's build-bot showed the problem (only) on blackfin [1].
config: make ARCH=blackfin allyesconfig
All error/warnings:
kernel/timer.c: In function 'init_timers':
>> kernel/timer.c:1683:2: error: call to '__compiletime_assert_1683'
>> declared with attribute error: BUILD_BUG_ON failed:
>> __alignof__(struct tvec_base) & TIMER_FLAG_MASK
So probably we need to make 'base' aligned to 8 bytes ?
So, what you are suggesting is something like this (untested):
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197e1587..68bf09d69352 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -67,8 +67,9 @@ extern struct tvec_base boot_tvec_bases;
*/
#define TIMER_DEFERRABLE 0x1LU
#define TIMER_IRQSAFE 0x2LU
+#define TIMER_RUNNING 0x4LU
-#define TIMER_FLAG_MASK 0x3LU
+#define TIMER_FLAG_MASK 0x7LU
#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \
.entry = { .prev = TIMER_ENTRY_STATIC }, \
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c504939..8f9efa64bd34 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -105,6 +105,21 @@ static inline unsigned int tbase_get_irqsafe(struct tvec_base *base)
return ((unsigned int)(unsigned long)base & TIMER_IRQSAFE);
}
+static inline unsigned int tbase_get_running(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base & TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_set_running(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base | TIMER_RUNNING);
+}
+
+static inline unsigned int tbase_clear_running(struct tvec_base *base)
+{
+ return ((unsigned int)(unsigned long)base & ~TIMER_RUNNING);
+}
+
static inline struct tvec_base *tbase_get_base(struct tvec_base *base)
{
return ((struct tvec_base *)((unsigned long)base & ~TIMER_FLAG_MASK));
@@ -781,21 +796,12 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
new_base = per_cpu(tvec_bases, cpu);
if (base != new_base) {
- /*
- * We are trying to schedule the timer on the local CPU.
- * However we can't change timer's base while it is running,
- * otherwise del_timer_sync() can't detect that the timer's
- * handler yet has not finished. This also guarantees that
- * the timer is serialized wrt itself.
- */
- if (likely(base->running_timer != timer)) {
- /* See the comment in lock_timer_base() */
- timer_set_base(timer, NULL);
- spin_unlock(&base->lock);
- base = new_base;
- spin_lock(&base->lock);
- timer_set_base(timer, base);
- }
+ /* See the comment in lock_timer_base() */
+ timer_set_base(timer, NULL);
+ spin_unlock(&base->lock);
+ base = new_base;
+ spin_lock(&base->lock);
+ timer_set_base(timer, base);
}
timer->expires = expires;
@@ -1016,7 +1022,7 @@ int try_to_del_timer_sync(struct timer_list *timer)
base = lock_timer_base(timer, &flags);
- if (base->running_timer != timer) {
+ if (tbase_get_running(timer->base)) {
timer_stats_timer_clear_start_info(timer);
ret = detach_if_pending(timer, base, true);
}
@@ -1202,6 +1208,7 @@ static inline void __run_timers(struct tvec_base *base)
timer_stats_account_timer(timer);
base->running_timer = timer;
+ tbase_set_running(timer->base);
detach_expired_timer(timer, base);
if (irqsafe) {
@@ -1216,6 +1223,7 @@ static inline void __run_timers(struct tvec_base *base)
}
}
base->running_timer = NULL;
+ tbase_clear_running(timer->base);
spin_unlock_irq(&base->lock);
}
------------x--------------------x----------------------
Right?
Now there are few issues I see here (Sorry if they are all imaginary):
- In case a timer re-arms itself from its handler and is migrated from CPU A to B, what
happens if the re-armed timer fires before the first handler finishes ? i.e. timer->fn()
hasn't finished running on CPU A and it has fired again on CPU B. Wouldn't this expose
us to a lot of other problems? It wouldn't be serialized to itself anymore ?
- Because the timer has migrated to another CPU, the locking in __run_timers()
needs to be fixed. And that will make it complicated ..
- __run_timer() doesn't lock bases of other CPUs, and it has to do it now..
- We probably need to take locks of both local CPU and the one to which timer migrated.
- Its possible now that there can be more than one running timer for a base, which wasn't
true earlier. Not sure if it will break something.
Thanks for your continuous support to reply to my (sometimes stupid) queries.
--
viresh
[1] https://lists.01.org/pipermail/kbuild-all/2014-April/003982.html
next prev parent reply other threads:[~2015-03-28 11:57 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-26 5:39 [RFC] vmstat: Avoid waking up idle-cpu to service shepherd work Viresh Kumar
2015-03-26 5:39 ` Viresh Kumar
2015-03-26 20:18 ` Andrew Morton
2015-03-26 20:18 ` Andrew Morton
2015-03-27 4:49 ` Viresh Kumar
2015-03-27 4:49 ` Viresh Kumar
2015-03-27 9:16 ` Peter Zijlstra
2015-03-27 9:16 ` Peter Zijlstra
2015-03-27 9:30 ` Peter Zijlstra
2015-03-27 9:30 ` Peter Zijlstra
2015-03-27 11:11 ` Christoph Lameter
2015-03-27 11:11 ` Christoph Lameter
2015-03-27 12:02 ` Peter Zijlstra
2015-03-27 12:02 ` Peter Zijlstra
2015-03-27 19:45 ` Christoph Lameter
2015-03-27 19:45 ` Christoph Lameter
2015-03-28 4:28 ` Viresh Kumar
2015-03-28 4:28 ` Viresh Kumar
2015-03-28 11:41 ` Peter Zijlstra
2015-03-28 11:41 ` Peter Zijlstra
2015-03-28 4:18 ` Viresh Kumar
2015-03-28 4:18 ` Viresh Kumar
2015-03-28 9:53 ` Peter Zijlstra
2015-03-28 9:53 ` Peter Zijlstra
2015-03-28 11:57 ` viresh kumar [this message]
2015-03-28 11:57 ` viresh kumar
2015-03-28 12:04 ` Viresh Kumar
2015-03-28 12:04 ` Viresh Kumar
2015-03-28 13:44 ` Peter Zijlstra
2015-03-28 13:44 ` Peter Zijlstra
2015-03-29 10:24 ` Peter Zijlstra
2015-03-29 10:24 ` Peter Zijlstra
2015-03-30 12:02 ` Viresh Kumar
2015-03-30 12:02 ` Viresh Kumar
2015-03-30 12:47 ` Peter Zijlstra
2015-03-30 12:47 ` Peter Zijlstra
2015-03-30 13:14 ` Viresh Kumar
2015-03-30 13:14 ` Viresh Kumar
2015-03-30 13:59 ` Peter Zijlstra
2015-03-30 13:59 ` Peter Zijlstra
2015-03-30 16:17 ` Viresh Kumar
2015-03-30 16:17 ` Viresh Kumar
2015-03-30 16:25 ` Peter Zijlstra
2015-03-30 16:25 ` Peter Zijlstra
2015-03-29 12:01 ` Viresh Kumar
2015-03-29 12:01 ` Viresh Kumar
2015-03-29 17:24 ` Peter Zijlstra
2015-03-29 17:24 ` Peter Zijlstra
2015-03-30 15:08 ` Michal Hocko
2015-03-30 15:08 ` Michal Hocko
2015-03-30 15:14 ` Peter Zijlstra
2015-03-30 15:14 ` Peter Zijlstra
2015-03-30 15:42 ` Christoph Lameter
2015-03-30 15:42 ` Christoph Lameter
2015-03-27 14:19 ` Michal Hocko
2015-03-27 14:19 ` Michal Hocko
2015-03-28 4:34 ` Viresh Kumar
2015-03-28 4:34 ` Viresh Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55169723.3070006@linaro.org \
--to=viresh.kumar@linaro.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=koct9i@gmail.com \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=peterz@infradead.org \
--cc=shashim@codeaurora.org \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=vinmenon@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.