Re: [PATCH 4/4] timekeeping: Use printk_deferred when holding timekeeping seqlock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Stultz <john.stultz@linaro.org>
To: George Spelvin <linux@horizon.com>
Cc: linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [PATCH 4/4] timekeeping: Use printk_deferred when holding timekeeping seqlock
Date: Mon, 12 May 2014 10:49:31 -0700	[thread overview]
Message-ID: <537109AB.8010609@linaro.org> (raw)
In-Reply-To: <20140506213306.17674.qmail@ns.horizon.com>

First: My apologies, for some reason your mail got tagged as spam, so I
only saw it just now looking for another missing email.

On 05/06/2014 02:33 PM, George Spelvin wrote:
> One misfeature in the timekeeping seqlock code I noticed is that
> read_seqcount_begin returns "unsigned int", not "unsigned long".
>
> Casting to a larger type is harmless, but inefficient.
Thanks for pointing this out. If you want to spin up a patch for this,
I'd be fine applying it. Otherwise I'll try to clean this up sometime soon.

>> This is due to printk() triggering console sem wakeup, which can
>> cause scheduling code to trigger hrtimers which may try to read
>> the time.
> An alternative solution, which avoids the need for this entire patch
> series, is to make ktime_get completely nonblocking.
>
> To do that, use a seqlock variant wherein you mintain an array of "struct
> timekeeper" structures so that reading is always non-blocking if there
> is no writer progress.  (I.e. livelock is remotely possible, but deadlock
> is not.)
>
> In the basic version, there are two.  (You can add more to further
> reduce the chance of livelock.)  The currently stable one is indicated
> by (timekeeper_seq->sequence & 2).  Writers update the appropriate
> one ping-pong.  The low two bits indicate four phases:
>
> 0: Both timekeepers stable, [0] preferred for reading
> 1: Timekeeper[1] is being written; read timekeeper[0] only
> 2: Both timekeepers stable, [1] preferred for reading
> 3: Timekeeper[0] is being written; read timekeeper[1] only

So this has been suggested before (and I've spent some time awhile back
trying to implement it).  The problem is that should the update be
significantly delayed (by for instance, the host preempting a VM's cpu
for some extended period), and a frequency adjustment slowing the clock
was being applied to the new timekeeper, the continued use of the old
timekeeper could cause the current time to over-shoot the new
timekeeper, causing time to go backwards when we finally made the switch. :(

But I believe the sched_clock code does something like this on x86,
since it doesn't have to handle frequency changes.

I would very much like to have a non-blocking implementation! We've also
looked at variants where we keep a valid-interval range in the structure
so readers can be sure the timekeeper they're using is still active
(sort of an expiration date), so we would only block if an update was
delayed. However this is problematic, since there are a number of
timekeeping calls in the hrtimer logic required to trigger the
timekeeping update. So if the update interrupt was delayed, those reads
would block, deadlocking the box.

So maybe you can come up with a tweak to make it possible, but when
discussing this at length w/ Matheiu Desnoyers, it seemed like the main
problem was there was no way to atomically validate that we hadn't been
delayed and update the pointer to the new structure without stopping all
the cpus with some sort of serializing lock.

thanks
-john

next prev parent reply	other threads:[~2014-05-12 17:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-06 21:33 [PATCH 4/4] timekeeping: Use printk_deferred when holding timekeeping seqlock George Spelvin
2014-05-12 16:21 ` [rough draft PATCH] avoid stalls on the " George Spelvin
2014-05-12 18:23   ` John Stultz
2014-05-12 17:49 ` John Stultz [this message]
2014-05-13  2:44   ` [PATCH 4/4] timekeeping: Use printk_deferred when holding " George Spelvin
2014-05-13  3:39     ` John Stultz
2014-05-13  5:13       ` George Spelvin
2014-05-13 12:07         ` Mathieu Desnoyers
2014-05-13 13:29           ` George Spelvin
2014-05-13 13:39             ` Mathieu Desnoyers
2014-05-13 16:18               ` George Spelvin
2014-05-13  2:45   ` [PATCH 1/2] timekeeping: Use unsigned int for seqlock sequence George Spelvin
2014-05-13 11:49     ` Mathieu Desnoyers
2014-05-13  2:48   ` [PATCH 2/2] timekeeping: Mark struct timekeeper * passed to notifiers as const George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2014-05-05 20:47 [PATCH 0/4] Convert timekeeping core to use printk_deferred (v3) John Stultz
2014-05-05 20:47 ` [PATCH 4/4] timekeeping: Use printk_deferred when holding timekeeping seqlock John Stultz
2014-05-02 22:09 [PATCH 0/4] Convert timekeeping core to use printk_deferred (v2) John Stultz
2014-05-02 22:09 ` [PATCH 4/4] timekeeping: Use printk_deferred when holding timekeeping seqlock John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537109AB.8010609@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=mathieu.desnoyers@efficios.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).