[PATCH 3.4 29/30] timekeeping: Avoid possible deadlock from clock_was_set_delayed

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Prarit Bhargava <prarit@redhat.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>,
	John Stultz <john.stultz@linaro.org>
Subject: [PATCH 3.4 29/30] timekeeping: Avoid possible deadlock from clock_was_set_delayed
Date: Tue, 11 Feb 2014 11:06:29 -0800	[thread overview]
Message-ID: <20140211184647.974439302@linuxfoundation.org> (raw)
In-Reply-To: <20140211184647.149512538@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

commit 6fdda9a9c5db367130cf32df5d6618d08b89f46a upstream.

As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
  hrtimer locks -> timekeeping locks

clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.

But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.

Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:

[  251.100221] ======================================================
[  251.100221] [ INFO: possible circular locking dependency detected ]
[  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[  251.101967] -------------------------------------------------------
[  251.101967] kworker/10:1/4506 is trying to acquire lock:
[  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
[  251.101967]
[  251.101967] but task is already holding lock:
[  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] which lock already depends on the new lock.
[  251.101967]
[  251.101967]
[  251.101967] the existing dependency chain (in reverse order) is:
[  251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
[  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
[  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
[  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
[  251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[  251.101967] other info that might help us debug this:
[  251.101967]
[  251.101967] Chain exists of:
  timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11

[  251.101967]  Possible unsafe locking scenario:
[  251.101967]
[  251.101967]        CPU0                    CPU1
[  251.101967]        ----                    ----
[  251.101967]   lock(hrtimer_bases.lock#11);
[  251.101967]                                lock(&rt_b->rt_runtime_lock);
[  251.101967]                                lock(hrtimer_bases.lock#11);
[  251.101967]   lock(timekeeper_seq);
[  251.101967]
[  251.101967]  *** DEADLOCK ***
[  251.101967]
[  251.101967] 3 locks held by kworker/10:1/4506:
[  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] stack backtrace:
[  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[  251.101967] Workqueue: events clock_was_set_work

So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.

This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 kernel/time/timekeeping.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -988,7 +988,8 @@ static void timekeeping_adjust(s64 offse
  *
  * Returns the unconsumed cycles.
  */
-static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
+static cycle_t logarithmic_accumulation(cycle_t offset, int shift,
+							unsigned int *clock_set)
 {
 	u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
 	u64 raw_nsecs;
@@ -1010,7 +1011,7 @@ static cycle_t logarithmic_accumulation(
 		timekeeper.xtime.tv_sec += leap;
 		timekeeper.wall_to_monotonic.tv_sec -= leap;
 		if (leap)
-			clock_was_set_delayed();
+			*clock_set = 1;
 	}
 
 	/* Accumulate raw time */
@@ -1042,6 +1043,7 @@ static void update_wall_time(void)
 	struct clocksource *clock;
 	cycle_t offset;
 	int shift = 0, maxshift;
+	unsigned int clock_set = 0;
 	unsigned long flags;
 
 	write_seqlock_irqsave(&timekeeper.lock, flags);
@@ -1077,7 +1079,7 @@ static void update_wall_time(void)
 	maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1;
 	shift = min(shift, maxshift);
 	while (offset >= timekeeper.cycle_interval) {
-		offset = logarithmic_accumulation(offset, shift);
+		offset = logarithmic_accumulation(offset, shift, &clock_set);
 		if(offset < timekeeper.cycle_interval<<shift)
 			shift--;
 	}
@@ -1131,7 +1133,7 @@ static void update_wall_time(void)
 		timekeeper.xtime.tv_sec += leap;
 		timekeeper.wall_to_monotonic.tv_sec -= leap;
 		if (leap)
-			clock_was_set_delayed();
+			clock_set = 1;
 	}
 
 	timekeeping_update(false);
@@ -1139,6 +1141,8 @@ static void update_wall_time(void)
 out:
 	write_sequnlock_irqrestore(&timekeeper.lock, flags);
 
+	if (clock_set)
+		clock_was_set_delayed();
 }
 
 /**

next prev parent reply	other threads:[~2014-02-11 19:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-11 19:06 [PATCH 3.4 00/30] 3.4.80-stable review Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 01/30] SELinux: Fix memory leak upon loading policy Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 02/30] intel-iommu: fix off-by-one in pagetable freeing Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 03/30] audit: correct a type mismatch in audit_syscall_exit() Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 04/30] mmc: atmel-mci: fix timeout errors in SDIO mode when using DMA Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 05/30] slub: Fix calculation of cpu slabs Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 07/30] ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to regulator API Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 08/30] mtd: mxc_nand: remove duplicated ecc_stats counting Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 09/30] ore: Fix wrong math in allocation of per device BIO Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 10/30] IB/qib: Fix QP check when looping back to/from QP1 Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 11/30] spidev: fix hang when transfer_one_message fails Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 12/30] NFSv4: OPEN must handle the NFS4ERR_IO return code correctly Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 13/30] nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 14/30] sunrpc: Fix infinite loop in RPC state machine Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 15/30] dm: wait until embedded kobject is released before destroying a device Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 16/30] dm space map common: make sure new space is used during extend Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 20/30] drm/radeon: set the full cache bit for fences on r7xx+ Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 21/30] drm/radeon/DCE4+: clear bios scratch dpms bit (v2) Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 22/30] PCI: Enable ARI if dev and upstream bridge support it; disable otherwise Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 23/30] hpfs: deadlock and race in directory lseek() Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 24/30] sched/rt: Fix SCHED_RR across cgroups Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 25/30] sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 26/30] sched: Unthrottle rt runqueues in __disable_runtime() Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 27/30] sched/rt: Avoid updating RT entry timeout twice within one tick period Greg Kroah-Hartman
2014-02-11 19:06 ` [PATCH 3.4 28/30] rtc-cmos: Add an alarm disable quirk Greg Kroah-Hartman
2014-02-11 19:06 ` Greg Kroah-Hartman [this message]
2014-02-11 19:06 ` [PATCH 3.4 30/30] 3.4.y: timekeeping: fix 32-bit overflow in get_monotonic_boottime Greg Kroah-Hartman
2014-02-12  4:19 ` [PATCH 3.4 00/30] 3.4.80-stable review Guenter Roeck
2014-02-12 18:55   ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140211184647.974439302@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=sasha.levin@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).