From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
john stultz <johnstul@us.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [circular locking bug] Re: [patch 00/15] clocksource / timekeeping rework V4 (resend V3 + bug fix)
Date: Mon, 17 Aug 2009 19:53:32 +0200 [thread overview]
Message-ID: <20090817195332.1e1f98b0@skybase> (raw)
In-Reply-To: <20090817092807.GA10460@elte.hu>
On Mon, 17 Aug 2009 11:28:07 +0200
Ingo Molnar <mingo@elte.hu> wrote:
>
> -tip testing still triggers lockdep troubles with latest
> timers/core. I think timekeeping_notify() might be involved.
>
> Bootlog below, config attached.
> [ 43.081726] =======================================================
> [ 43.089512] [ INFO: possible circular locking dependency detected ]
> [ 43.090020] 2.6.31-rc6-tip-01039-ge54d9cb-dirty #4937
> [ 43.090020] -------------------------------------------------------
> [ 43.090020] S99local/5002 is trying to acquire lock:
> [ 43.090020] (events){+.+.+.}, at: [<ffffffff8106ff4c>] cleanup_workqueue_thread+0x2d/0xdd
> [ 43.090020]
> [ 43.090020] but task is already holding lock:
> [ 43.090020] (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff818387aa>] cpu_down+0x3b/0xb4
> [ 43.090020]
> [ 43.090020] which lock already depends on the new lock.
> [ 43.090020]
> [ 43.090020]
> [ 43.090020] the existing dependency chain (in reverse order) is:
> [ 43.090020]
> [ 43.090020] -> #4 (cpu_add_remove_lock){+.+.+.}:
> [ 43.090020] [<ffffffff8108925d>] check_prev_add+0x18e/0x225
> [ 43.090020] [<ffffffff8108976a>] validate_chain+0x476/0x556
> [ 43.090020] [<ffffffff81089bfa>] __lock_acquire+0x3b0/0x43b
> [ 43.090020] [<ffffffff81089d5a>] lock_acquire+0xd5/0x10e
> [ 43.090020] [<ffffffff81872229>] __mutex_lock_common+0x5b/0x3b0
> [ 43.090020] [<ffffffff81872683>] mutex_lock_nested+0x45/0x62
> [ 43.090020] [<ffffffff8105de75>] cpu_maps_update_begin+0x25/0x3b
> [ 43.090020] [<ffffffff81070348>] __create_workqueue_key+0x137/0x200
> [ 43.090020] [<ffffffff810a2397>] stop_machine_create+0x4d/0xc2
> [ 43.090020] [<ffffffff810a2438>] stop_machine+0x2c/0x74
> [ 43.090020] [<ffffffff8107f163>] timekeeping_notify+0x31/0x4c
> [ 43.090020] [<ffffffff81080258>] clocksource_select+0xc9/0xe8
> [ 43.090020] [<ffffffff81080b8c>] clocksource_register+0x70/0xa1
> [ 43.090020] [<ffffffff81d4cc6a>] init_acpi_pm_clocksource+0xf3/0x114
> [ 43.090020] [<ffffffff8100a092>] do_one_initcall+0x65/0x15b
> [ 43.090020] [<ffffffff81d0d992>] do_basic_setup+0x59/0x82
> [ 43.090020] [<ffffffff81d0da3e>] kernel_init+0x83/0xdd
> [ 43.090020] [<ffffffff8101520a>] child_rip+0xa/0x20
> [ 43.090020]
> [ 43.090020] -> #3 (setup_lock){+.+.+.}:
> [ 43.090020] [<ffffffff8108925d>] check_prev_add+0x18e/0x225
> [ 43.090020] [<ffffffff8108976a>] validate_chain+0x476/0x556
> [ 43.090020] [<ffffffff81089bfa>] __lock_acquire+0x3b0/0x43b
> [ 43.090020] [<ffffffff81089d5a>] lock_acquire+0xd5/0x10e
> [ 43.090020] [<ffffffff81872229>] __mutex_lock_common+0x5b/0x3b0
> [ 43.090020] [<ffffffff81872683>] mutex_lock_nested+0x45/0x62
> [ 43.090020] [<ffffffff810a236f>] stop_machine_create+0x25/0xc2
> [ 43.090020] [<ffffffff810a2438>] stop_machine+0x2c/0x74
> [ 43.090020] [<ffffffff8107f163>] timekeeping_notify+0x31/0x4c
> [ 43.090020] [<ffffffff81080258>] clocksource_select+0xc9/0xe8
> [ 43.090020] [<ffffffff81080b8c>] clocksource_register+0x70/0xa1
> [ 43.090020] [<ffffffff81d4cc6a>] init_acpi_pm_clocksource+0xf3/0x114
> [ 43.090020] [<ffffffff8100a092>] do_one_initcall+0x65/0x15b
> [ 43.090020] [<ffffffff81d0d992>] do_basic_setup+0x59/0x82
> [ 43.090020] [<ffffffff81d0da3e>] kernel_init+0x83/0xdd
> [ 43.090020] [<ffffffff8101520a>] child_rip+0xa/0x20
> [ 43.090020]
> [ 43.090020] -> #2 (clocksource_mutex){+.+.+.}:
> [ 43.090020] [<ffffffff8108925d>] check_prev_add+0x18e/0x225
> [ 43.090020] [<ffffffff8108976a>] validate_chain+0x476/0x556
> [ 43.090020] [<ffffffff81089bfa>] __lock_acquire+0x3b0/0x43b
> [ 43.090020] [<ffffffff81089d5a>] lock_acquire+0xd5/0x10e
> [ 43.090020] [<ffffffff81872229>] __mutex_lock_common+0x5b/0x3b0
> [ 43.090020] [<ffffffff81872683>] mutex_lock_nested+0x45/0x62
> [ 43.090020] [<ffffffff8108076c>] clocksource_change_rating+0x34/0xaf
> [ 43.090020] [<ffffffff8108091a>] clocksource_watchdog_work+0x133/0x16f
> [ 43.090020] [<ffffffff8106f9ca>] run_workqueue+0x161/0x265
> [ 43.090020] [<ffffffff8106fbb3>] worker_thread+0xe5/0x10c
> [ 43.090020] [<ffffffff81074b64>] kthread+0x98/0xa0
> [ 43.090020] [<ffffffff8101520a>] child_rip+0xa/0x20
> [ 43.090020]
> [ 43.090020] -> #1 (&watchdog_work){+.+...}:
> [ 43.090020] [<ffffffff8108925d>] check_prev_add+0x18e/0x225
> [ 43.090020] [<ffffffff8108976a>] validate_chain+0x476/0x556
> [ 43.090020] [<ffffffff81089bfa>] __lock_acquire+0x3b0/0x43b
> [ 43.090020] [<ffffffff81089d5a>] lock_acquire+0xd5/0x10e
> [ 43.090020] [<ffffffff8106f9c4>] run_workqueue+0x15b/0x265
> [ 43.090020] [<ffffffff8106fbb3>] worker_thread+0xe5/0x10c
> [ 43.090020] [<ffffffff81074b64>] kthread+0x98/0xa0
> [ 43.090020] [<ffffffff8101520a>] child_rip+0xa/0x20
> [ 43.090020]
> [ 43.090020] -> #0 (events){+.+.+.}:
> [ 43.090020] [<ffffffff81089146>] check_prev_add+0x77/0x225
> [ 43.090020] [<ffffffff8108976a>] validate_chain+0x476/0x556
> [ 43.090020] [<ffffffff81089bfa>] __lock_acquire+0x3b0/0x43b
> [ 43.090020] [<ffffffff81089d5a>] lock_acquire+0xd5/0x10e
> [ 43.090020] [<ffffffff8106ff73>] cleanup_workqueue_thread+0x54/0xdd
> [ 43.090020] [<ffffffff81839c41>] workqueue_cpu_callback+0xdb/0x13f
> [ 43.090020] [<ffffffff8187714e>] notifier_call_chain+0x6d/0xb5
> [ 43.090020] [<ffffffff8107a19e>] raw_notifier_call_chain+0x22/0x38
> [ 43.090020] [<ffffffff81838735>] _cpu_down+0x2c2/0x2fc
> [ 43.090020] [<ffffffff818387cc>] cpu_down+0x5d/0xb4
> [ 43.090020] [<ffffffff8183988e>] store_online+0x3f/0x9a
> [ 43.090020] [<ffffffff813c2e41>] sysdev_store+0x2e/0x44
> [ 43.090020] [<ffffffff811559a5>] sysfs_write_file+0xf3/0x13c
> [ 43.090020] [<ffffffff810f6f7a>] vfs_write+0xbe/0x130
> [ 43.090020] [<ffffffff810f70e2>] sys_write+0x56/0x93
> [ 43.090020] [<ffffffff81014042>] system_call_fastpath+0x16/0x1b
> [ 43.090020]
> [ 43.090020] other info that might help us debug this:
> [ 43.090020]
> [ 43.090020] 2 locks held by S99local/5002:
> [ 43.090020] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811558f9>] sysfs_write_file+0x47/0x13c
> [ 43.090020] #1: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff818387aa>] cpu_down+0x3b/0xb4
> [ 43.090020]
> [ 43.090020] stack backtrace:
> [ 43.090020] Pid: 5002, comm: S99local Not tainted 2.6.31-rc6-tip-01039-ge54d9cb-dirty #4937
> [ 43.090020] Call Trace:
> [ 43.090020] [<ffffffff81088bb0>] print_circular_bug+0xcf/0xf4
> [ 43.090020] [<ffffffff81089146>] check_prev_add+0x77/0x225
> [ 43.090020] [<ffffffff818710e5>] ? wait_for_common+0xeb/0x150
> [ 43.090020] [<ffffffff8108976a>] validate_chain+0x476/0x556
> [ 43.090020] [<ffffffff81087d17>] ? mark_lock+0x31/0x17d
> [ 43.090020] [<ffffffff81089bfa>] __lock_acquire+0x3b0/0x43b
> [ 43.090020] [<ffffffff8106ff4c>] ? cleanup_workqueue_thread+0x2d/0xdd
> [ 43.090020] [<ffffffff8106ff4c>] ? cleanup_workqueue_thread+0x2d/0xdd
> [ 43.090020] [<ffffffff81089d5a>] lock_acquire+0xd5/0x10e
> [ 43.090020] [<ffffffff8106ff4c>] ? cleanup_workqueue_thread+0x2d/0xdd
> [ 43.090020] [<ffffffff81871283>] ? wait_for_completion+0x2b/0x41
> [ 43.090020] [<ffffffff81074be5>] ? kthread_stop+0x79/0xd9
> [ 43.090020] [<ffffffff8106ff73>] cleanup_workqueue_thread+0x54/0xdd
> [ 43.090020] [<ffffffff8106ff4c>] ? cleanup_workqueue_thread+0x2d/0xdd
> [ 43.090020] [<ffffffff81839c41>] workqueue_cpu_callback+0xdb/0x13f
> [ 43.090020] [<ffffffff8187714e>] notifier_call_chain+0x6d/0xb5
> [ 43.090020] [<ffffffff8107a19e>] raw_notifier_call_chain+0x22/0x38
> [ 43.090020] [<ffffffff81838735>] _cpu_down+0x2c2/0x2fc
> [ 43.090020] [<ffffffff818710e5>] ? wait_for_common+0xeb/0x150
> [ 43.090020] [<ffffffff818387cc>] cpu_down+0x5d/0xb4
> [ 43.090020] [<ffffffff8183988e>] store_online+0x3f/0x9a
> [ 43.090020] [<ffffffff813c2e41>] sysdev_store+0x2e/0x44
> [ 43.090020] [<ffffffff811559a5>] sysfs_write_file+0xf3/0x13c
> [ 43.090020] [<ffffffff810f65fa>] ? rw_verify_area+0x97/0xd1
> [ 43.090020] [<ffffffff810f6f7a>] vfs_write+0xbe/0x130
> [ 43.090020] [<ffffffff810f70e2>] sys_write+0x56/0x93
> [ 43.090020] [<ffffffff81014042>] system_call_fastpath+0x16/0x1b
> [ 43.829633] lockdep: fixing up alternatives.
Urg. As far as I read the lockdep message it complains about this:
cpu_down tries to remove the workqueue, the workqueue tries to get
the clocksource_mutex, timekeeping_notify is called with the
clocksource_mutex held and tries to execute stop_machine which is
blocked because of cpu_down.
A solution would be to use a single-threaded workqueue, no?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2009-08-17 17:53 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-14 13:47 [patch 00/15] clocksource / timekeeping rework V4 (resend V3 + bug fix) Martin Schwidefsky
2009-08-14 13:47 ` [patch 01/15] introduce timekeeping_leap_insert Martin Schwidefsky
2009-08-15 9:01 ` [tip:timers/core] timekeeping: Introduce timekeeping_leap_insert tip-bot for John Stultz
2009-08-14 13:47 ` [patch 02/15] remove clocksource inline functions Martin Schwidefsky
2009-08-15 9:01 ` [tip:timers/core] timekeeping: Remove " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 03/15] reset of cycle_last for tsc clocksource Martin Schwidefsky
2009-08-15 9:01 ` [tip:timers/core] timekeeping: Move reset of cycle_last for tsc clocksource to tsc tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 04/15] cleanup clocksource selection Martin Schwidefsky
2009-08-15 1:42 ` john stultz
2009-08-15 1:43 ` john stultz
2009-08-17 7:34 ` Martin Schwidefsky
2009-08-15 9:02 ` [tip:timers/core] clocksource: Cleanup " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 05/15] clocksource watchdog highres enablement Martin Schwidefsky
2009-08-15 9:02 ` [tip:timers/core] clocksource: Delay " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 06/15] clocksource watchdog resume logic Martin Schwidefsky
2009-08-15 9:02 ` [tip:timers/core] clocksource: Simplify " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 07/15] clocksource watchdog refactoring Martin Schwidefsky
2009-08-15 9:02 ` [tip:timers/core] clocksource: Refactor clocksource watchdog tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 08/15] clocksource watchdog work Martin Schwidefsky
2009-08-15 9:03 ` [tip:timers/core] clocksource: Move watchdog downgrade to a work queue thread tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 09/15] introduce struct timekeeper Martin Schwidefsky
2009-08-15 9:03 ` [tip:timers/core] timekeeping: Introduce " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 10/15] add xtime_shift and ntp_error_shift to " Martin Schwidefsky
2009-08-15 9:03 ` [tip:timers/core] timekeeping: Add " tip-bot for Martin Schwidefsky
2009-08-15 9:04 ` [patch 10/15] add " Thomas Gleixner
2009-08-14 13:47 ` [patch 11/15] move NTP adjusted clock multiplier " Martin Schwidefsky
2009-08-15 9:03 ` [tip:timers/core] timekeeping: Move " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 12/15] timekeeper read clock helper functions Martin Schwidefsky
2009-08-15 9:03 ` [tip:timers/core] timekeeping: Add timekeeper read_clock " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 13/15] update clocksource with stop_machine Martin Schwidefsky
2009-08-15 9:04 ` [tip:timers/core] timekeeping: Update " tip-bot for Martin Schwidefsky
2009-08-14 13:47 ` [patch 14/15] read_persistent_clock should return a timespec Martin Schwidefsky
2009-08-15 9:04 ` [tip:timers/core] timekeeping: Increase granularity of read_persistent_clock() tip-bot for Martin Schwidefsky
2009-08-22 10:32 ` Ingo Molnar
2009-08-22 15:15 ` Martin Schwidefsky
2009-08-22 15:33 ` Ingo Molnar
2009-08-22 20:23 ` Martin Schwidefsky
2009-08-23 8:53 ` Ingo Molnar
2009-08-23 9:03 ` [tip:timers/core] timekeeping: Increase granularity of read_persistent_clock(), build fix tip-bot for Martin Schwidefsky
2009-08-23 3:33 ` [tip:timers/core] timekeeping: Increase granularity of read_persistent_clock() Paul Mackerras
2009-08-23 8:47 ` Ingo Molnar
2009-08-24 3:20 ` Paul Mackerras
2009-08-24 8:23 ` Ingo Molnar
2009-08-25 3:49 ` Paul Mackerras
2009-08-25 8:26 ` Ingo Molnar
2009-08-25 9:57 ` Paul Mackerras
2009-08-25 10:17 ` Ingo Molnar
2009-08-25 11:33 ` Paul Mackerras
2009-08-25 13:50 ` Ingo Molnar
2009-08-25 21:33 ` Theodore Tso
2009-08-25 22:03 ` Ingo Molnar
2009-08-26 0:26 ` Paul Mackerras
2009-08-26 0:22 ` Paul Mackerras
2009-08-25 23:48 ` Paul Mackerras
2009-08-26 9:44 ` Benjamin Herrenschmidt
2009-08-14 13:47 ` [patch 15/15] introduce read_boot_clock Martin Schwidefsky
2009-08-15 9:04 ` [tip:timers/core] timekeeping: Introduce read_boot_clock tip-bot for Martin Schwidefsky
2009-08-14 14:08 ` [patch 00/15] clocksource / timekeeping rework V4 (resend V3 + bug fix) Thomas Gleixner
2009-08-14 14:22 ` Martin Schwidefsky
2009-08-14 22:56 ` john stultz
2009-08-15 1:46 ` john stultz
2009-08-15 9:01 ` Thomas Gleixner
2009-08-15 9:52 ` Ingo Molnar
2009-08-15 10:08 ` Thomas Gleixner
2009-08-17 7:40 ` Martin Schwidefsky
2009-08-17 8:45 ` Thomas Gleixner
2009-08-17 9:28 ` [circular locking bug] " Ingo Molnar
2009-08-17 17:53 ` Martin Schwidefsky [this message]
2009-08-18 15:09 ` Martin Schwidefsky
2009-08-19 10:06 ` [tip:timers/core] clocksource: Avoid clocksource watchdog circular locking dependency tip-bot for Martin Schwidefsky
2009-08-19 20:25 ` [circular locking bug] Re: [patch 00/15] clocksource / timekeeping rework V4 (resend V3 + bug fix) Ingo Molnar
2009-08-20 9:28 ` Martin Schwidefsky
2009-08-20 9:58 ` Ingo Molnar
2009-08-20 10:35 ` Martin Schwidefsky
2009-08-20 16:14 ` Thomas Gleixner
2009-08-20 16:53 ` Martin Schwidefsky
2009-08-20 19:08 ` Thomas Gleixner
2009-08-19 9:46 ` [tip:timers/core] clocksource: Protect the watchdog rating changes with clocksource_mutex tip-bot for Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090817195332.1e1f98b0@skybase \
--to=schwidefsky@de.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox