From: Kirill Tkhai <ktkhai@odin.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>, <umgwanakikbuti@gmail.com>,
<mingo@elte.hu>, <ktkhai@parallels.com>, <rostedt@goodmis.org>,
<tglx@linutronix.de>, <juri.lelli@gmail.com>,
<pang.xunlei@linaro.org>, <wanpeng.li@linux.intel.com>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer
Date: Wed, 10 Jun 2015 10:46:51 +0300 [thread overview]
Message-ID: <1433922411.23588.132.camel@odin.com> (raw)
In-Reply-To: <20150609213318.GA12436@redhat.com>
Hi, Oleg,
В Вт, 09/06/2015 в 23:33 +0200, Oleg Nesterov пишет:
> On 06/08, Peter Zijlstra wrote:
> >
> > On Mon, Jun 08, 2015 at 11:14:17AM +0200, Peter Zijlstra wrote:
> > > > Finally. Suppose that timer->function() returns HRTIMER_RESTART
> > > > and hrtimer_active() is called right after __run_hrtimer() sets
> > > > cpu_base->running = NULL. I can't understand why hrtimer_active()
> > > > can't miss ENQUEUED in this case. We have wmb() in between, yes,
> > > > but then hrtimer_active() should do something like
> > > >
> > > > active = cpu_base->running == timer;
> > > > if (!active) {
> > > > rmb();
> > > > active = state != HRTIMER_STATE_INACTIVE;
> > > > }
> > > >
> > > > No?
> > >
> > > Hmm, good point. Let me think about that. It would be nice to be able to
> > > avoid more memory barriers.
> >
> > So your scenario is:
> >
> > [R] seq
> > RMB
> > [S] ->state = ACTIVE
> > WMB
> > [S] ->running = NULL
> > [R] ->running (== NULL)
> > [R] ->state (== INACTIVE; fail to observe
> > the ->state store due to
> > lack of order)
> > RMB
> > [R] seq (== seq)
> > [S] seq++
> >
> > Conversely, if we re-order the (first) seq++ store such that it comes
> > first:
> >
> > [S] seq++
> >
> > [R] seq
> > RMB
> > [R] ->running (== NULL)
> > [S] ->running = timer;
> > WMB
> > [S] ->state = INACTIVE
> > [R] ->state (== INACTIVE)
> > RMB
> > [R] seq (== seq)
> >
> > And we have another false negative.
> >
> > And in this case we need the read order the other way around, we'd need:
> >
> > active = timer->state != HRTIMER_STATE_INACTIVE;
> > if (!active) {
> > smp_rmb();
> > active = cpu_base->running == timer;
> > }
> >
> > Now I think we can fix this by either doing:
> >
> > WMB
> > seq++
> > WMB
> >
> > On both sides of __run_hrtimer(), or do
> >
> > bool hrtimer_active(const struct hrtimer *timer)
> > {
> > struct hrtimer_cpu_base *cpu_base;
> > unsigned int seq;
> >
> > do {
> > cpu_base = READ_ONCE(timer->base->cpu_base);
> > seq = raw_read_seqcount(&cpu_base->seq);
> >
> > if (timer->state != HRTIMER_STATE_INACTIVE)
> > return true;
> >
> > smp_rmb();
> >
> > if (cpu_base->running == timer)
> > return true;
> >
> > smp_rmb();
> >
> > if (timer->state != HRTIMER_STATE_INACTIVE)
> > return true;
> >
> > } while (read_seqcount_retry(&cpu_base->seq, seq) ||
> > cpu_base != READ_ONCE(timer->base->cpu_base));
> >
> > return false;
> > }
>
> You know, I simply can't convince myself I understand why this code
> correct... or not.
>
> But contrary to what I said before, I agree that we need to recheck
> timer->base. This probably needs more discussion, to me it is very
> unobvious why we can trust this cpu_base != READ_ONCE() check. Yes,
> we have a lot of barriers, but they do not pair with each other. Lets
> ignore this for now.
>
> > And since __run_hrtimer() is the more performance critical code, I think
> > it would be best to reduce the amount of memory barriers there.
>
> Yes, but wmb() is cheap on x86... Perhaps we can make this code
> "obviously correct" ?
>
>
> How about the following..... We add cpu_base->seq as before but
> limit its "write" scope so that we cam use the regular read/retry.
>
> So,
>
> hrtimer_active(timer)
> {
>
> do {
> base = READ_ONCE(timer->base->cpu_base);
> seq = read_seqcount_begin(&cpu_base->seq);
>
> if (timer->state & ENQUEUED ||
> base->running == timer)
> return true;
>
> } while (read_seqcount_retry(&cpu_base->seq, seq) ||
> base != READ_ONCE(timer->base->cpu_base));
>
> return false;
> }
>
> And we need to avoid the races with 2 transitions in __run_hrtimer().
>
> The first race is trivial, we change __run_hrtimer() to do
>
> write_seqcount_begin(cpu_base->seq);
> cpu_base->running = timer;
> __remove_hrtimer(timer); // clears ENQUEUED
> write_seqcount_end(cpu_base->seq);
We use seqcount, because we are afraid that hrtimer_active() may miss
timer->state or cpu_base->running, when we are clearing it.
If we use two pairs of write_seqcount_{begin,end} in __run_hrtimer(),
we may protect only the places where we do that:
cpu_base->running = timer;
write_seqcount_begin(cpu_base->seq);
__remove_hrtimer(timer); // clears ENQUEUED
write_seqcount_end(cpu_base->seq);
....
timer->state |= HRTIMER_STATE_ENQUEUED;
write_seqcount_begin(cpu_base->seq);
base->running = NULL;
write_seqcount_end(cpu_base->seq);
>
> and hrtimer_active() obviously can't race with this section.
>
> Then we change enqueue_hrtimer()
>
>
> + bool need_lock = base->cpu_base->running == timer;
> + if (need_lock)
> + write_seqcount_begin(cpu_base->seq);
> +
> timer->state |= HRTIMER_STATE_ENQUEUED;
> +
> + if (need_lock)
> + write_seqcount_end(cpu_base->seq);
>
>
> Now. If the timer is re-queued by the time __run_hrtimer() clears
> ->running we have the following sequence:
>
> write_seqcount_begin(cpu_base->seq);
> timer->state |= HRTIMER_STATE_ENQUEUED;
> write_seqcount_end(cpu_base->seq);
>
> base->running = NULL;
>
> and I think this should equally work, because in this case we do not
> care if hrtimer_active() misses "running = NULL".
>
> Yes, we only have this 2nd write_seqcount_begin/end if the timer re-
> arms itself, but otherwise we do not race. If another thread does
> hrtime_start() in between we can pretend that hrtimer_active() hits
> the "inactive".
>
> What do you think?
>
>
> And. Note that we can rewrite these 2 "write" critical sections in
> __run_hrtimer() and enqueue_hrtimer() as
>
> cpu_base->running = timer;
>
> write_seqcount_begin(cpu_base->seq);
> write_seqcount_end(cpu_base->seq);
>
> __remove_hrtimer(timer);
>
> and
>
> timer->state |= HRTIMER_STATE_ENQUEUED;
>
> write_seqcount_begin(cpu_base->seq);
> write_seqcount_end(cpu_base->seq);
>
> base->running = NULL;
>
> So we can probably use write_seqcount_barrier() except I am not sure
> about the 2nd wmb...
next prev parent reply other threads:[~2015-06-10 7:47 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 8:48 [PATCH 00/14] sched: balance callbacks Peter Zijlstra
2015-06-05 8:48 ` [PATCH 01/14] sched: Replace post_schedule with a balance callback list Peter Zijlstra
2015-06-05 8:48 ` [PATCH 02/14] sched: Use replace normalize_task() with __sched_setscheduler() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 03/14] sched: Allow balance callbacks for check_class_changed() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 04/14] sched,rt: Remove return value from pull_rt_task() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 05/14] sched,rt: Convert switched_{from,to}_rt() / prio_changed_rt() to balance callbacks Peter Zijlstra
2015-06-05 8:48 ` [PATCH 06/14] sched,dl: Remove return value from pull_dl_task() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 07/14] sched,dl: Convert switched_{from,to}_dl() / prio_changed_dl() to balance callbacks Peter Zijlstra
2015-06-05 8:48 ` [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer Peter Zijlstra
2015-06-05 9:48 ` Thomas Gleixner
2015-06-07 19:43 ` Oleg Nesterov
2015-06-07 22:33 ` Oleg Nesterov
2015-06-07 22:56 ` Oleg Nesterov
2015-06-08 8:06 ` Thomas Gleixner
2015-06-08 9:14 ` Peter Zijlstra
2015-06-08 10:55 ` Peter Zijlstra
2015-06-08 12:42 ` Peter Zijlstra
2015-06-08 14:27 ` Oleg Nesterov
2015-06-08 14:42 ` Peter Zijlstra
2015-06-08 15:49 ` Oleg Nesterov
2015-06-08 15:10 ` Peter Zijlstra
2015-06-08 15:16 ` Oleg Nesterov
2015-06-09 21:33 ` Oleg Nesterov
2015-06-09 21:39 ` Oleg Nesterov
2015-06-10 6:55 ` Peter Zijlstra
2015-06-10 7:46 ` Kirill Tkhai [this message]
2015-06-10 16:04 ` Oleg Nesterov
2015-06-11 7:31 ` Peter Zijlstra
2015-06-11 16:25 ` Kirill Tkhai
2015-06-10 15:49 ` Oleg Nesterov
2015-06-10 22:37 ` Peter Zijlstra
2015-06-08 14:03 ` Oleg Nesterov
2015-06-08 14:17 ` Peter Zijlstra
2015-06-08 15:10 ` [PATCH 0/3] hrtimer: HRTIMER_STATE_ fixes Oleg Nesterov
2015-06-08 15:11 ` [PATCH 2/3] hrtimer: turn newstate arg of __remove_hrtimer() into clear_enqueued Oleg Nesterov
2015-06-08 15:11 ` [PATCH 3/3] hrtimer: fix the __hrtimer_start_range_ns() race with hrtimer_active() Oleg Nesterov
2015-06-08 15:12 ` [PATCH 1/3] hrtimer: kill HRTIMER_STATE_MIGRATE, fix the race with hrtimer_is_queued() Oleg Nesterov
2015-06-08 15:35 ` [PATCH 0/3] hrtimer: HRTIMER_STATE_ fixes Peter Zijlstra
2015-06-08 15:56 ` Oleg Nesterov
2015-06-08 17:11 ` Thomas Gleixner
2015-06-08 19:08 ` Peter Zijlstra
2015-06-08 20:52 ` Oleg Nesterov
2015-06-08 15:10 ` [PATCH 1/3] hrtimer: kill HRTIMER_STATE_MIGRATE, fix the race with hrtimer_is_queued() Oleg Nesterov
2015-06-08 15:13 ` Oleg Nesterov
2015-06-05 8:48 ` [PATCH 09/14] sched,dl: Fix sched class hopping CBS hole Peter Zijlstra
2015-06-05 8:48 ` [PATCH 10/14] sched: Move code around Peter Zijlstra
2015-06-05 8:48 ` [PATCH 11/14] sched: Streamline the task migration locking a little Peter Zijlstra
2015-06-05 8:48 ` [PATCH 12/14] lockdep: Simplify lock_release() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 13/14] lockdep: Implement lock pinning Peter Zijlstra
2015-06-05 9:55 ` Ingo Molnar
2015-06-11 11:37 ` Peter Zijlstra
2015-06-05 8:48 ` [PATCH 14/14] sched,lockdep: Employ " Peter Zijlstra
2015-06-05 9:57 ` Ingo Molnar
2015-06-05 11:03 ` Peter Zijlstra
2015-06-05 11:24 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1433922411.23588.132.camel@odin.com \
--to=ktkhai@odin.com \
--cc=juri.lelli@gmail.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=pang.xunlei@linaro.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=umgwanakikbuti@gmail.com \
--cc=wanpeng.li@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox