From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com, mingo@elte.hu, ktkhai@parallels.com,
rostedt@goodmis.org, tglx@linutronix.de, juri.lelli@gmail.com,
pang.xunlei@linaro.org, wanpeng.li@linux.intel.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer
Date: Thu, 11 Jun 2015 00:37:36 +0200 [thread overview]
Message-ID: <20150610223736.GL3644@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150609213318.GA12436@redhat.com>
On Tue, Jun 09, 2015 at 11:33:18PM +0200, Oleg Nesterov wrote:
> And. Note that we can rewrite these 2 "write" critical sections in
> __run_hrtimer() and enqueue_hrtimer() as
>
> cpu_base->running = timer;
>
> write_seqcount_begin(cpu_base->seq);
> write_seqcount_end(cpu_base->seq);
>
> __remove_hrtimer(timer);
>
> and
>
> timer->state |= HRTIMER_STATE_ENQUEUED;
>
> write_seqcount_begin(cpu_base->seq);
> write_seqcount_end(cpu_base->seq);
>
> base->running = NULL;
>
> So we can probably use write_seqcount_barrier() except I am not sure
> about the 2nd wmb...
Which second wmb?
In any case, you use that transform from your reply to Kirill, and I
cannot currently see a hole in that. Lets call this transformation A. It
gets us the quoted bit above.
Now the above is:
seq++;
smp_wmb();
smp_wmb();
seq++;
Now, double barriers are pointless, so I think we can all agree that the
above is identical to the below. Lets call this tranformation B.
seq++;
smp_wmb();
seq++;
And then because you use the traditional seqcount read side, which
stalls when seq&1, we can transform the above into this. Transformation
C.
smp_wmb();
seq += 2;
Which is write_seqcount_barrier(), as you say above.
And since there are no odd numbers possible in that scheme, its
identical to my modified read side with the single increment. Transform
D.
The only difference at this point is that I have my seq increment on the
'wrong' side on the first state.
cpu_base->running = timer;
seq++;
smp_wmb();
timer->state = 0;
...
timer->state = 1;
smp_wmb();
seq++;
cpu_base->running = NULL;
Which, per my previous mail provides the following:
[S] seq++
[R] seq
RMB
[R] ->running (== NULL)
[S] ->running = timer;
WMB
[S] ->state = INACTIVE
[R] ->state (== INACTIVE)
RMB
[R] seq (== seq)
Which is where we had to modify the read side to do:
[R] ->state
RMB
[R] ->running
Now, if we use write_seqcount_barrier() that would become:
__run_hrtimer() hrtimer_active()
[S] ->running = timer; [R] seq
WMB RMB
[S] seq += 2; [R] ->running
[S] ->state = 0; [R] ->state
RMB
[R] seq
Which we can reorder like:
[R] seq
RMB
[R] ->running (== NULL)
[S] ->running = timer
WMB
[S] ->state = 0
[R] ->state (== 0)
RMB
[R] seq (== seq)
[S] seq += 2
Which still gives us that false negative and would still require the
read side to be modified to do:
[R] ->state
RMB
[R] ->running
IOW, one of our transforms (A-D) is faulty for it requires a
modification to the read side.
I suspect its T-C, where we loose the odd count that holds up the read
side.
Because the moment we go from:
Y = true;
seq++
WMB
seq++
X = false;
to:
Y = true;
WMB
seq += 2;
X = false;
It becomes possible to re-order like:
Y = true;
WMB
X = false
seq += 2;
And we loose our read order; or rather, where previously we ordered the
read side by seq, the seq increments are no longer ordered.
With this I think we can prove my code correct, however it also suggests
that:
cpu_base->running = timer;
seq++;
smp_wmb();
seq++;
timer->state = 0;
...
timer->state = 1;
seq++;
smp_wmb();
seq++;
cpu_base->running = NULL;
vs
hrtimer_active(timer)
{
do {
base = READ_ONCE(timer->base->cpu_base);
seq = read_seqcount_begin(&cpu_base->seq);
if (timer->state & ENQUEUED ||
base->running == timer)
return true;
} while (read_seqcount_retry(&cpu_base->seq, seq) ||
base != READ_ONCE(timer->base->cpu_base));
return false;
}
Is the all-round cheapest solution. Those extra seq increments are
almost free on all archs as the cacheline will be hot and modified on
the local cpu.
Only under the very rare condition of a concurrent hrtimer_active() call
will that seq line be pulled into shared state.
I shall go sleep now, and update my patch tomorrow, lets see if I will
still agree with myself after a sleep :-)
next prev parent reply other threads:[~2015-06-10 22:37 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 8:48 [PATCH 00/14] sched: balance callbacks Peter Zijlstra
2015-06-05 8:48 ` [PATCH 01/14] sched: Replace post_schedule with a balance callback list Peter Zijlstra
2015-06-05 8:48 ` [PATCH 02/14] sched: Use replace normalize_task() with __sched_setscheduler() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 03/14] sched: Allow balance callbacks for check_class_changed() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 04/14] sched,rt: Remove return value from pull_rt_task() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 05/14] sched,rt: Convert switched_{from,to}_rt() / prio_changed_rt() to balance callbacks Peter Zijlstra
2015-06-05 8:48 ` [PATCH 06/14] sched,dl: Remove return value from pull_dl_task() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 07/14] sched,dl: Convert switched_{from,to}_dl() / prio_changed_dl() to balance callbacks Peter Zijlstra
2015-06-05 8:48 ` [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer Peter Zijlstra
2015-06-05 9:48 ` Thomas Gleixner
2015-06-07 19:43 ` Oleg Nesterov
2015-06-07 22:33 ` Oleg Nesterov
2015-06-07 22:56 ` Oleg Nesterov
2015-06-08 8:06 ` Thomas Gleixner
2015-06-08 9:14 ` Peter Zijlstra
2015-06-08 10:55 ` Peter Zijlstra
2015-06-08 12:42 ` Peter Zijlstra
2015-06-08 14:27 ` Oleg Nesterov
2015-06-08 14:42 ` Peter Zijlstra
2015-06-08 15:49 ` Oleg Nesterov
2015-06-08 15:10 ` Peter Zijlstra
2015-06-08 15:16 ` Oleg Nesterov
2015-06-09 21:33 ` Oleg Nesterov
2015-06-09 21:39 ` Oleg Nesterov
2015-06-10 6:55 ` Peter Zijlstra
2015-06-10 7:46 ` Kirill Tkhai
2015-06-10 16:04 ` Oleg Nesterov
2015-06-11 7:31 ` Peter Zijlstra
2015-06-11 16:25 ` Kirill Tkhai
2015-06-10 15:49 ` Oleg Nesterov
2015-06-10 22:37 ` Peter Zijlstra [this message]
2015-06-08 14:03 ` Oleg Nesterov
2015-06-08 14:17 ` Peter Zijlstra
2015-06-08 15:10 ` [PATCH 0/3] hrtimer: HRTIMER_STATE_ fixes Oleg Nesterov
2015-06-08 15:11 ` [PATCH 2/3] hrtimer: turn newstate arg of __remove_hrtimer() into clear_enqueued Oleg Nesterov
2015-06-08 15:11 ` [PATCH 3/3] hrtimer: fix the __hrtimer_start_range_ns() race with hrtimer_active() Oleg Nesterov
2015-06-08 15:12 ` [PATCH 1/3] hrtimer: kill HRTIMER_STATE_MIGRATE, fix the race with hrtimer_is_queued() Oleg Nesterov
2015-06-08 15:35 ` [PATCH 0/3] hrtimer: HRTIMER_STATE_ fixes Peter Zijlstra
2015-06-08 15:56 ` Oleg Nesterov
2015-06-08 17:11 ` Thomas Gleixner
2015-06-08 19:08 ` Peter Zijlstra
2015-06-08 20:52 ` Oleg Nesterov
2015-06-08 15:10 ` [PATCH 1/3] hrtimer: kill HRTIMER_STATE_MIGRATE, fix the race with hrtimer_is_queued() Oleg Nesterov
2015-06-08 15:13 ` Oleg Nesterov
2015-06-05 8:48 ` [PATCH 09/14] sched,dl: Fix sched class hopping CBS hole Peter Zijlstra
2015-06-05 8:48 ` [PATCH 10/14] sched: Move code around Peter Zijlstra
2015-06-05 8:48 ` [PATCH 11/14] sched: Streamline the task migration locking a little Peter Zijlstra
2015-06-05 8:48 ` [PATCH 12/14] lockdep: Simplify lock_release() Peter Zijlstra
2015-06-05 8:48 ` [PATCH 13/14] lockdep: Implement lock pinning Peter Zijlstra
2015-06-05 9:55 ` Ingo Molnar
2015-06-11 11:37 ` Peter Zijlstra
2015-06-05 8:48 ` [PATCH 14/14] sched,lockdep: Employ " Peter Zijlstra
2015-06-05 9:57 ` Ingo Molnar
2015-06-05 11:03 ` Peter Zijlstra
2015-06-05 11:24 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150610223736.GL3644@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=juri.lelli@gmail.com \
--cc=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=pang.xunlei@linaro.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=umgwanakikbuti@gmail.com \
--cc=wanpeng.li@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox