From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966584AbaFRNCX (ORCPT ); Wed, 18 Jun 2014 09:02:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12127 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966310AbaFRNCV (ORCPT ); Wed, 18 Jun 2014 09:02:21 -0400 Date: Wed, 18 Jun 2014 09:02:11 -0400 From: Don Zickus To: David Rientjes Cc: LKML , akpm@linux-foundation.org, peter@lekensteyn.nl, mhocko@suse.cz Subject: Re: [PATCH] watchdog: Remove preemption restrictions when restarting lockup detector Message-ID: <20140618130211.GN7959@redhat.com> References: <1403010653-243385-1-git-send-email-dzickus@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 17, 2014 at 05:59:05PM -0700, David Rientjes wrote: > On Tue, 17 Jun 2014, Don Zickus wrote: > > > Peter Wu noticed the following splat on his machine when updating > > /proc/sys/kernel/watchdog_thresh: > > > > [ 0.676701] BUG: sleeping function called from invalid context at /tmp/linux/mm/slub.c:965 > > [ 0.679396] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init > > [ 0.681204] 3 locks held by init/1: > > [ 0.682371] #0: (sb_writers#3){.+.+.+}, at: [] vfs_write+0x143/0x180 > > [ 0.685887] #1: (watchdog_proc_mutex){+.+.+.}, at: [] proc_dowatchdog+0x33/0x110 > > [ 0.689631] #2: (cpu_hotplug.lock){.+.+.+}, at: [] get_online_cpus+0x32/0x80 > > [ 0.693117] Preemption disabled at:[] proc_dowatchdog+0xe4/0x110 > > [ 0.695753] > > [ 0.696588] CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 > > [ 0.698404] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 0.704622] ffff88003f250000 ffff88003f1ffd10 ffffffff81624c59 0000000000000000 > > [ 0.707749] ffff88003f1ffd30 ffffffff8108df1d 0000000000000010 ffffffff81c56360 > > [ 0.711010] ffff88003f1ffd78 ffffffff8116a5de ffffffff811159a5 ffffffff810a5f4a > > [ 0.714053] Call Trace: > > [ 0.715015] [] dump_stack+0x4e/0x7a > > [ 0.716619] [] __might_sleep+0x11d/0x190 > > [ 0.718232] [] kmem_cache_alloc_trace+0x4e/0x1e0 > > [ 0.720214] [] ? perf_event_alloc+0x55/0x440 > > [ 0.721910] [] ? mark_held_locks+0x6a/0x90 > > [ 0.723558] [] perf_event_alloc+0x55/0x440 > > [ 0.725304] [] ? restart_watchdog_hrtimer+0x50/0x50 > > [ 0.727279] [] perf_event_create_kernel_counter+0x26/0xe0 > > [ 0.729269] [] watchdog_nmi_enable+0x75/0x140 > > [ 0.730965] [] update_timers_all_cpus+0x53/0xa0 > > [ 0.732953] [] proc_dowatchdog+0xe4/0x110 > > [ 0.738408] [] proc_sys_call_handler+0xb3/0xc0 > > [ 0.740266] [] proc_sys_write+0x14/0x20 > > [ 0.742086] [] vfs_write+0xad/0x180 > > [ 0.743669] [] ? trace_hardirqs_on_caller+0xfd/0x1c0 > > [ 0.745593] [] SyS_write+0x49/0xb0 > > [ 0.747101] [] system_call_fastpath+0x16/0x1b > > [ 0.749069] NMI watchdog: disabled (cpu0): hardware events not enabled > > > > What happened is after updating the watchdog_thresh, the lockup detector is > > restarted to utilize the new value. Part of this process involved disabling > > preemption. Once preemption was disabled, perf tried to allocate a new event > > (as part of the restart). This caused the above BUG_ON as you can't sleep with > > preemption disabled. > > > > The preemption restriction seemed agressive as we are not doing anything > > on that particular cpu, but with all the online cpus (which are protected by > > the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. > > > > Reported-and-Tested-by: Peter Wu > > Acked-by: Michal Hocko > > Signed-off-by: Don Zickus > > Acked-by: David Rientjes > > I think this deserves a Cc: stable@vger.kernel.org # 3.13+ Agreed. :-) Thanks! Cheers, Don