From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754340AbbFEJI2 (ORCPT ); Fri, 5 Jun 2015 05:08:28 -0400 Received: from forward11h.cmail.yandex.net ([87.250.230.153]:49933 "EHLO forward11h.cmail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751748AbbFEJIZ (ORCPT ); Fri, 5 Jun 2015 05:08:25 -0400 X-Greylist: delayed 367 seconds by postgrey-1.27 at vger.kernel.org; Fri, 05 Jun 2015 05:08:24 EDT From: Kirill Tkhai To: Peter Zijlstra Cc: "umgwanakikbuti@gmail.com" , "mingo@elte.hu" , "ktkhai@parallels.com" , "rostedt@goodmis.org" , "tglx@linutronix.de" , "juri.lelli@gmail.com" , "pang.xunlei@linaro.org" , "oleg@redhat.com" , "wanpeng.li@linux.intel.com" , "linux-kernel@vger.kernel.org" In-Reply-To: <5883311433494921@web27h.yandex.ru> References: <20150603132903.203333087@infradead.org> <20150603134023.156059118@infradead.org> <214021433348760@web25g.yandex.ru> <20150603211324.GC3644@twins.programming.kicks-ass.net> <2134411433408823@web8j.yandex.ru> <20150604104902.GH3644@twins.programming.kicks-ass.net> <5883311433494921@web27h.yandex.ru> Subject: Re: [PATCH 8/9] hrtimer: Allow hrtimer::function() to free the timer MIME-Version: 1.0 Message-Id: <5894071433495014@web27h.yandex.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Fri, 05 Jun 2015 12:03:34 +0300 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This message is too late, /me going to see new series :) 05.06.2015, 12:02, "Kirill Tkhai" : > χ ώΤ, 04/06/2015 Χ 12:49 +0200, Peter Zijlstra ΠΙΫΕΤ: > On Thu, Jun 04, 2015 at 12:07:03PM +0300, Kirill Tkhai wrote: >>>> š--- a/include/linux/hrtimer.h >>>> š+++ b/include/linux/hrtimer.h >>>> š@@ -391,11 +391,25 @@ extern u64 hrtimer_get_next_event(void); >>>> ššš* A timer is active, when it is enqueued into the rbtree or the >>>> ššš* callback function is running or it's in the state of being migrated >>>> ššš* to another cpu. >>>> š+ * >>>> š+ * See __run_hrtimer(). >>>> ššš*/ >>>> š-static inline int hrtimer_active(const struct hrtimer *timer) >>>> š+static inline bool hrtimer_active(const struct hrtimer *timer) >>>> šš{ >>>> š- return timer->state != HRTIMER_STATE_INACTIVE || >>>> š- timer->base->running == timer; >>>> š+ if (timer->state != HRTIMER_STATE_INACTIVE) >>>> š+ return true; >>>> š+ >>>> š+ smp_rmb(); /* C matches A */ >>>> š+ >>>> š+ if (timer->base->running == timer) >>>> š+ return true; >>>> š+ >>>> š+ smp_rmb(); /* D matches B */ >>>> š+ >>>> š+ if (timer->state != HRTIMER_STATE_INACTIVE) >>>> š+ return true; >>>> š+ >>>> š+ return false; >>> šThis races with two sequential timer handlers. hrtimer_active() >>> šis preemptible everywhere, and no guarantees that all three "if" >>> šconditions check the same timer tick. >> šIndeed. >>> šHow about transformation of hrtimer_bases.lock: raw_spinlock_t --> seqlock_t? >> šIngo will like that because it means we already need to touch cpu_base. >> >> šBut I think there's a problem there on timer migration, the timer can >> šmigrate between bases while we do the seq read loop and then you can get >> šfalse positives on the different seqcount numbers. >> >> šWe could of course do something like the below, but hrtimer_is_active() >> šis turning into quite the monster. >> >> šNeeds more comments at the very least, its fully of trickery. > > Yeah, it's safe for now, but it may happen difficulties with a support > in the future, because barrier logic is not easy to review. But it seems > we may simplify it a little bit. Please, see the comments below. >> š--- >> š--- a/include/linux/hrtimer.h >> š+++ b/include/linux/hrtimer.h >> š@@ -59,7 +59,9 @@ enum hrtimer_restart { >> ššš* mean touching the timer after the callback, this makes it impossible to free >> ššš* the timer from the callback function. >> ššš* >> š- * Therefore we track the callback state in timer->base->running == timer. >> š+ * Therefore we track the callback state in: >> š+ * >> š+ * timer->base->cpu_base->running == timer >> ššš* >> ššš* On SMP it is possible to have a "callback function running and enqueued" >> ššš* status. It happens for example when a posix timer expired and the callback >> š@@ -144,7 +146,6 @@ struct hrtimer_clock_base { >> ššššššššššstruct timerqueue_head active; >> ššššššššššktime_t (*get_time)(void); >> ššššššššššktime_t offset; >> š- struct hrtimer *running; >> šš} __attribute__((__aligned__(HRTIMER_CLOCK_BASE_ALIGN))); >> >> ššenum šhrtimer_base_type { >> š@@ -159,6 +160,8 @@ enum šhrtimer_base_type { >> ššš* struct hrtimer_cpu_base - the per cpu clock bases >> ššš* @lock: lock protecting the base and associated clock bases >> ššš* and timers >> š+ * @seq: seqcount around __run_hrtimer >> š+ * @running: pointer to the currently running hrtimer >> ššš* @cpu: cpu number >> ššš* @active_bases: Bitfield to mark bases with active timers >> ššš* @clock_was_set_seq: Sequence counter of clock was set events >> š@@ -180,6 +183,8 @@ enum šhrtimer_base_type { >> ššš*/ >> ššstruct hrtimer_cpu_base { >> ššššššššššraw_spinlock_t lock; >> š+ seqcount_t seq; >> š+ struct hrtimer *running; >> ššššššššššunsigned int cpu; >> ššššššššššunsigned int active_bases; >> ššššššššššunsigned int clock_was_set_seq; >> š@@ -394,8 +399,24 @@ extern u64 hrtimer_get_next_event(void); >> ššš*/ >> ššstatic inline int hrtimer_active(const struct hrtimer *timer) >> šš{ >> š- return timer->state != HRTIMER_STATE_INACTIVE || >> š- timer->base->running == timer; >> š+ struct hrtimer_cpu_base *cpu_base; >> š+ unsigned int seq; >> š+ bool active; >> š+ >> š+ do { >> š+ active = false; >> š+ cpu_base = READ_ONCE(timer->base->cpu_base); >> š+ seqcount_lockdep_reader_access(&cpu_base->seq); >> š+ seq = raw_read_seqcount(&cpu_base->seq); >> š+ >> š+ if (timer->state != HRTIMER_STATE_INACTIVE || >> š+ šššcpu_base->running == timer) >> š+ active = true; >> š+ >> š+ } while (read_seqcount_retry(&cpu_base->seq, seq) || >> š+ cpu_base != READ_ONCE(timer->base->cpu_base)); >> š+ >> š+ return active; >> šš} > > This may race with migrate_hrtimer_list(), so it needs write seqcounter > too. >> šš/* >> š@@ -412,7 +433,7 @@ static inline int hrtimer_is_queued(stru >> ššš*/ >> ššstatic inline int hrtimer_callback_running(struct hrtimer *timer) >> šš{ >> š- return timer->base->running == timer; >> š+ return timer->base->cpu_base->running == timer; >> šš} >> >> šš/* Forward a hrtimer so it expires after now: */ >> š--- a/kernel/time/hrtimer.c >> š+++ b/kernel/time/hrtimer.c >> š@@ -67,6 +67,7 @@ >> ššDEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) = >> šš{ >> šššššššššš.lock = __RAW_SPIN_LOCK_UNLOCKED(hrtimer_bases.lock), >> š+ .seq = SEQCNT_ZERO(hrtimer_bases.seq), >> šššššššššš.clock_base = >> šššššššššš{ >> šššššššššššššššššš{ >> š@@ -113,9 +114,15 @@ static inline int hrtimer_clockid_to_bas >> šš/* >> ššš* We require the migration_base for lock_hrtimer_base()/switch_hrtimer_base() >> ššš* such that hrtimer_callback_running() can unconditionally dereference >> š- * timer->base. >> š+ * timer->base->cpu_base >> ššš*/ >> š-static struct hrtimer_clock_base migration_base; >> š+static struct hrtimer_cpu_base migration_cpu_base = { >> š+ .seq = SEQCNT_ZERO(migration_cpu_base), >> š+}; >> š+ >> š+static struct hrtimer_clock_base migration_base { >> š+ .cpu_base = &migration_cpu_base, >> š+}; >> >> šš/* >> ššš* We are using hashed locking: holding per_cpu(hrtimer_bases)[n].lock >> š@@ -1118,10 +1125,16 @@ static void __run_hrtimer(struct hrtimer >> ššššššššššenum hrtimer_restart (*fn)(struct hrtimer *); >> ššššššššššint restart; >> >> š- WARN_ON(!irqs_disabled()); >> š+ lockdep_assert_held(&cpu_base->lock); >> >> ššššššššššdebug_deactivate(timer); >> š- base->running = timer; >> š+ cpu_base->running = timer; > > My suggestion is do not use seqcounters for long parts of code, and implement > short primitives for changing timer state and cpu_base running timer. Something > like this: > > static inline void hrtimer_set_state(struct hrtimer *timer, unsigned long state) > { > ššššššššstruct hrtimer_cpu_base *cpu_base = timer->base->cpu_base; > > ššššššššlockdep_assert_held(&cpu_base->lock); > > ššššššššwrite_seqcount_begin(&cpu_base->seq); > šššššššštimer->state = state; > ššššššššwrite_seqcount_end(&cpu_base->seq); > } > > static inline void cpu_base_set_running(struct hrtimer_cpu_base *cpu_base, > ššššššššššššššššššššššššššššššššššššššššstruct hrtimer *timer) > { > ššššššššlockdep_assert_held(&cpu_base->lock); > > ššššššššwrite_seqcount_begin(&cpu_base->seq); > ššššššššcpu_base->running = timer; > ššššššššwrite_seqcount_end(&cpu_base->seq); > } > > Implemented this, we may less think about right barrier order, because > all changes are being made under seqcount. > > static inline int hrtimer_active(const struct hrtimer *timer) > { > ššššššššstruct hrtimer_cpu_base *cpu_base; > ššššššššstruct hrtimer_clock_base *base; > ššššššššunsigned int seq; > ššššššššbool active = false; > > ššššššššdo { > ššššššššššššššššbase = READ_ONCE(timer->base); > ššššššššššššššššif (base == &migration_base) { > ššššššššššššššššššššššššactive = true; > ššššššššššššššššššššššššbreak; > šššššššššššššššš} > > ššššššššššššššššcpu_base = base->cpu_base; > ššššššššššššššššseqcount_lockdep_reader_access(&cpu_base->seq); > ššššššššššššššššseq = raw_read_seqcount(&cpu_base->seq); > > ššššššššššššššššif (timer->state != HRTIMER_STATE_INACTIVE || > ššššššššššššššššššššcpu_base->running == timer) { > ššššššššššššššššššššššššactive = true; > ššššššššššššššššššššššššbreak; > šššššššššššššššš} > šššššššš} while (read_seqcount_retry(&cpu_base->seq, seq) || > šššššššššššššššššREAD_ONCE(timer->base) != base); > > ššššššššreturn active; > } >> š+ >> š+ /* >> š+ * separate the ->running assignment from the ->state assignment >> š+ */ >> š+ write_seqcount_begin(&cpu_base->seq); >> š+ >> šššššššššš__remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0); >> šššššššššštimer_stats_account_hrtimer(timer); >> ššššššššššfn = timer->function; >> š@@ -1150,8 +1163,13 @@ static void __run_hrtimer(struct hrtimer >> šššššššššššššš!(timer->state & HRTIMER_STATE_ENQUEUED)) >> ššššššššššššššššššenqueue_hrtimer(timer, base); >> >> š- WARN_ON_ONCE(base->running != timer); >> š- base->running = NULL; >> š+ /* >> š+ * separate the ->running assignment from the ->state assignment >> š+ */ >> š+ write_seqcount_end(&cpu_base->seq); >> š+ >> š+ WARN_ON_ONCE(cpu_base->running != timer); >> š+ cpu_base->running = NULL; >> šš} >> >> ššstatic void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)