Re: [RFC PATCH] sched_clock: Avoid tearing during read from NMI

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Daniel Thompson <daniel.thompson@linaro.org>
To: John Stultz <john.stultz@linaro.org>
Cc: lkml <linux-kernel@vger.kernel.org>,
	Patch Tracking <patches@linaro.org>,
	Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@codeaurora.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [RFC PATCH] sched_clock: Avoid tearing during read from NMI
Date: Wed, 21 Jan 2015 20:20:23 +0000	[thread overview]
Message-ID: <54C00A07.1000305@linaro.org> (raw)
In-Reply-To: <CALAqxLUmf_TNvTG2-=uavXjXk3040-MQ73V5z7S1UiGa+7c29Q@mail.gmail.com>

On 21/01/15 17:29, John Stultz wrote:
> On Wed, Jan 21, 2015 at 8:53 AM, Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
>> Currently it is possible for an NMI (or FIQ on ARM) to come in and
>> read sched_clock() whilst update_sched_clock() has half updated the
>> state. This results in a bad time value being observed.
>>
>> This patch fixes that problem in a similar manner to Thomas Gleixner's
>> 4396e058c52e("timekeeping: Provide fast and NMI safe access to
>> CLOCK_MONOTONIC").
>>
>> Note that ripping out the seqcount lock from sched_clock_register() and
>> replacing it with a large comment is not nearly as bad as it looks! The
>> locking here is actually pretty useless since most of the variables
>> modified within the write lock are not covered by the read lock. As a
>> result a big comment and the sequence bump implicit in the call
>> to update_epoch() should work pretty much the same.
> 
> It still looks pretty bad, even with the current explanation.

I'm inclined to agree. Although to be clear, the code I proposed should
not more broken than the code we have today (and arguably more honest).

>> -       raw_write_seqcount_begin(&cd.seq);
>> +       /*
>> +        * sched_clock will report a bad value if it executes
>> +        * concurrently with the following code. No locking exists to
>> +        * prevent this; we rely mostly on this function being called
>> +        * early during kernel boot up before we have lots of other
>> +        * stuff going on.
>> +        */
>>         read_sched_clock = read;
>>         sched_clock_mask = new_mask;
>>         cd.rate = rate;
>>         cd.wrap_kt = new_wrap_kt;
>>         cd.mult = new_mult;
>>         cd.shift = new_shift;
>> -       cd.epoch_cyc = new_epoch;
>> -       cd.epoch_ns = ns;
>> -       raw_write_seqcount_end(&cd.seq);
>> +       update_epoch(new_epoch, ns);
> 
> 
> So looking at this, the sched_clock_register() function may not be
> called super early, so I was looking to see what prevented bad reads
> prior to registration.

Certainly not super early, but, from the WARN_ON() at the top of the
function I thought it was intended to be called before start_kernel()
unmasks interrupts...

> And from quick inspection, its nothing. I
> suspect the undocumented trick that makes this work is that the mult
> value is initialzied to zero, so sched_clock returns 0 until things
> have been registered.
> 
> So it does seem like it would be worth while to do the initialization
> under the lock, or possibly use the suspend flag to make the first
> initialization safe.

As mentioned the existing write lock doesn't really do very much at the
moment.

The simple and (I think) strictly correct approach is to duplicate the
whole of the clock_data (minus the seqcount) and make the read lock in
sched_clock cover all accesses to the structure.

This would substantially enlarge the critical section in sched_clock()
meaning we might loop round the seqcount fractionally more often.
However if that causes any real problems it would be a sign the epoch
was being updated too frequently.

Unless I get any objections (or you really want me to look closely at
using suspend) then I'll try this approach in the next day or two.


Daniel.

next prev parent reply	other threads:[~2015-01-21 20:20 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-21 16:53 [RFC PATCH] sched_clock: Avoid tearing during read from NMI Daniel Thompson
2015-01-21 17:29 ` John Stultz
2015-01-21 20:20   ` Daniel Thompson [this message]
2015-01-21 20:58   ` Stephen Boyd
2015-01-22 13:06 ` [PATCH v2] sched_clock: Avoid deadlock " Daniel Thompson
2015-01-30 19:03 ` [PATCH v3 0/4] sched_clock: Optimize and avoid " Daniel Thompson
2015-01-30 19:03   ` [PATCH v3 1/4] sched_clock: Match scope of read and write seqcounts Daniel Thompson
2015-01-30 19:03   ` [PATCH v3 2/4] sched_clock: Optimize cache line usage Daniel Thompson
2015-02-05  1:14     ` Stephen Boyd
2015-02-05 10:21       ` Daniel Thompson
2015-01-30 19:03   ` [PATCH v3 3/4] sched_clock: Remove suspend from clock_read_data Daniel Thompson
2015-01-30 19:03   ` [PATCH v3 4/4] sched_clock: Avoid deadlock during read from NMI Daniel Thompson
2015-02-05  1:23     ` Stephen Boyd
2015-02-05  1:48       ` Steven Rostedt
2015-02-05  6:23         ` Stephen Boyd
2015-02-05  0:50   ` [PATCH v3 0/4] sched_clock: Optimize and avoid " Stephen Boyd
2015-02-05  9:05     ` Daniel Thompson
2015-02-08 12:09       ` Daniel Thompson
2015-02-09 22:08         ` Stephen Boyd
2015-02-08 12:02 ` [PATCH v4 0/5] " Daniel Thompson
2015-02-08 12:02   ` [PATCH v4 1/5] sched_clock: Match scope of read and write seqcounts Daniel Thompson
2015-02-08 12:02   ` [PATCH v4 2/5] sched_clock: Optimize cache line usage Daniel Thompson
2015-02-09  1:28     ` Will Deacon
2015-02-09  9:47       ` Daniel Thompson
2015-02-10  2:37         ` Stephen Boyd
2015-02-08 12:02   ` [PATCH v4 3/5] sched_clock: Remove suspend from clock_read_data Daniel Thompson
2015-02-08 12:02   ` [PATCH v4 4/5] sched_clock: Remove redundant notrace from update function Daniel Thompson
2015-02-08 12:02   ` [PATCH v4 5/5] sched_clock: Avoid deadlock during read from NMI Daniel Thompson
2015-02-13  3:49   ` [PATCH v4 0/5] sched_clock: Optimize and avoid " Stephen Boyd
2015-03-02 15:56 ` [PATCH v5 " Daniel Thompson
2015-03-02 15:56   ` [PATCH v5 1/5] sched_clock: Match scope of read and write seqcounts Daniel Thompson
2015-03-02 15:56   ` [PATCH v5 2/5] sched_clock: Optimize cache line usage Daniel Thompson
2015-03-02 15:56   ` [PATCH v5 3/5] sched_clock: Remove suspend from clock_read_data Daniel Thompson
2015-03-02 15:56   ` [PATCH v5 4/5] sched_clock: Remove redundant notrace from update function Daniel Thompson
2015-03-02 15:56   ` [PATCH v5 5/5] sched_clock: Avoid deadlock during read from NMI Daniel Thompson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54C00A07.1000305@linaro.org \
    --to=daniel.thompson@linaro.org \
    --cc=john.stultz@linaro.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@codeaurora.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox