Re: [rough draft PATCH] avoid stalls on the timekeeping seqlock

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Stultz <john.stultz@linaro.org>
To: George Spelvin <linux@horizon.com>
Cc: linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [rough draft PATCH] avoid stalls on the timekeeping seqlock
Date: Mon, 12 May 2014 11:23:04 -0700	[thread overview]
Message-ID: <53711188.3050901@linaro.org> (raw)
In-Reply-To: <20140512162141.22331.qmail@ns.horizon.com>

On 05/12/2014 09:21 AM, George Spelvin wrote:
> Here's a non-working rough draft of that idea I suggested to make
> reading the time non-blocking, even if an update is in progress.
>
> Basically, it uses the idea proposed in a comment in update_wall_time,
> switching pointers so there's always one valid structure.
>
> This is non-working because last year the NTP variables lost their
> own locking and inherited the timekeeping locks I am redesigning.
> I haven't updated NTP yet.

An important part here is that the NTP state is really tied to the
current timekeeping structure. When everything was updated in lockstep,
we split the locks to simplify some of the locking rules. But when we
added the mirrored update in 3.10 (which is a lighter version of what
you're proposing), we had to go back to using the same locking for
everything.

Matheiu took a similar swing last year, and in doing so moved most of
the ntp state into the timekeeper. This seemed like a nice cleanup, but
since his appraoch ran into trouble, and stalled out, so we didn't get
the cleaup patches merged either.

You can check his series out here:
    https://lkml.org/lkml/2013/9/14/136


> One interesting possibility is that the write side of the locking
> is identical to a standard seqlock.  It would be possible to
> divide the timekeeping variables into non-blocking variables which
> are mirrored, and ones that require stalling during write
> seqlock updates.
>
> But that's somewhat deeper magic than I've attempted so far.
> This is a demonstration of the idea.
>
> Does it seem worth pursuing?


So again, I'd love to find a way to make it work, but I worry that the
freq changes make the non-blocking route not very feasible (though my
concerns may be overwrought - so feel free to push back here).

There's also the extra complexity of the vdso updates, which basically
are a similar update to a separate arch specific subsystem, which is
currently done under the lock. So that would have to get unified in this
non-blocking update as well.


A few small notes below:

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index f7df8ea217..0dfa4aa6fb 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -29,15 +29,15 @@
>  #include "timekeeping_internal.h"
>  
>  #define TK_CLEAR_NTP		(1 << 0)
> -#define TK_MIRROR		(1 << 1)
>  #define TK_CLOCK_WAS_SET	(1 << 2)
>  
> -static struct timekeeper timekeeper;
> +static struct timekeeper timekeeper[2];
>  static DEFINE_RAW_SPINLOCK(timekeeper_lock);
> +/* The following is NOT used as a standard seqlock */
>  static seqcount_t timekeeper_seq;
> -static struct timekeeper shadow_timekeeper;
>  
>  /* flag for if timekeeping is suspended */
> +/* Q: What are the locking rules for this variable? */
>  int __read_mostly timekeeping_suspended;

Really this is a left over bit that needs to be cleaned up and moved to
the timekeeping structure. Its just a global flag that we use to make
sure nothing calls into timekeeping logic while we're suspended, and was
added to sort out a few suspend/resume issues that cropped up when the
original generic timkeeping core was added.


> @@ -291,6 +289,89 @@ static void timekeeping_forward_now(struct timekeeper *tk)
>  }
>  
>  /**
> + * timekeeper_write_begin: Return a timekeeper that can be updated.
> + *
> + * Must be called with the timekeeper_lock held.
> + */
> +static inline struct timekeeper *timekeeper_write_begin(void)
> +{
> +	bool b;
> +
> +	write_seqcount_begin(&timekeeper_seq);
> +	b = (timekeeper_seq.sequence >> 1) & 1;
> +	timekeeper[!b] = timekeeper[b];
> +	return timekeeper + !b;
> +}
> +
> +/**
> + * timekeeper_write_end: Finish write, mark the modified timekeeper as current.
> + *
> + * Must be called with the timekeeper_lock held.
> + */
> +static inline void timekeeper_write_end(void)
> +{
> +	write_seqcount_end(&timekeeper_seq);
> +}
> +
> +/**
> + * __timekeeper_current: Return the current (for reading) timekeeper
> + * @seq: The current sequence number
> + *
> + * Return the timekeeper corresponding to the given sequence number.
> + */
> +static inline struct timekeeper const *__timekeeper_current(unsigned seq)
> +{
> +	return timekeeper + ((seq >> 1) & 1);
> +}
> +
> +/**
> + * timekeeper_current: Return the current (for reading) timekeeper
> + *
> + * On rare occasions, we want the current timekeeper without obtaining
> + * the seqlock.  For example, if we hold the timekeeper_loc but don't
> + * intend to write it.
> + */
> +static inline struct timekeeper const *timekeeper_current(void)
> +{
> +	return __timekeeper_current(timekeeper_seq.sequence);
> +}
> +
> +/**
> + * timekeeper_read_begin: Begin reading a timekeeper.
> + * @seqp: Pointer to variable to receive sequence number.
> + *	(Because this is inline, the compiler can optimize out
> + *	the memory access.)
> + *
> + * Returns a pointer to a readable timekeeper structure.
> + *
> + * Because we have two timekeeper structures that we ping-pong
> + * between, this never blocks.  Only if there are two calls
> + * to timekeeper_write_begin between read_begin and read_retry
> + * will a retry be forced.
> + */
> +static inline struct timekeeper const *timekeeper_read_begin(unsigned *seqp)
> +{
> +	unsigned seq = ACCESS_ONCE(timekeeper_seq.sequence);
> +	smp_rmb();
> +	*seqp = seq &= ~1u;
> +	return __timekeeper_current(seq);
> +}
> +
> +/**
> + * timekeeper_read_retry: Return true if read was inconsistent, must retry
> + * @seq: The return value from timekeeper_read_begin
> + *
> + * Because we ping-pong between two timekeeper structures, the window
> + * of validity is wider than a normal seqlock, and a retry is very
> + * unlikely.
> + */
> +static inline bool timekeeper_read_retry(unsigned seq)
> +{
> +	unsigned delta = timekeeper_seq.sequence - seq;
> +	return unlikely(delta > 2);
> +}

This all looks very clean and nice! I suspect even if the non-blocking
logic doesn't work out, there's probably some similar style cleanups
that could be done to make the current mirroring code nicer to read.

thanks
-john

next prev parent reply	other threads:[~2014-05-12 18:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-06 21:33 [PATCH 4/4] timekeeping: Use printk_deferred when holding timekeeping seqlock George Spelvin
2014-05-12 16:21 ` [rough draft PATCH] avoid stalls on the " George Spelvin
2014-05-12 18:23   ` John Stultz [this message]
2014-05-12 17:49 ` [PATCH 4/4] timekeeping: Use printk_deferred when holding " John Stultz
2014-05-13  2:44   ` George Spelvin
2014-05-13  3:39     ` John Stultz
2014-05-13  5:13       ` George Spelvin
2014-05-13 12:07         ` Mathieu Desnoyers
2014-05-13 13:29           ` George Spelvin
2014-05-13 13:39             ` Mathieu Desnoyers
2014-05-13 16:18               ` George Spelvin
2014-05-13  2:45   ` [PATCH 1/2] timekeeping: Use unsigned int for seqlock sequence George Spelvin
2014-05-13 11:49     ` Mathieu Desnoyers
2014-05-13  2:48   ` [PATCH 2/2] timekeeping: Mark struct timekeeper * passed to notifiers as const George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53711188.3050901@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=mathieu.desnoyers@efficios.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).