From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752191AbbHBWOh (ORCPT ); Sun, 2 Aug 2015 18:14:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57962 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751635AbbHBWOg (ORCPT ); Sun, 2 Aug 2015 18:14:36 -0400 Date: Mon, 3 Aug 2015 00:14:29 +0200 From: Jiri Olsa To: Ulrich Obergfell Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, dzickus@redhat.com, atomlin@redhat.com, jolsa@kernel.org, mhocko@suse.cz, eranian@google.com, cmetcalf@ezchip.com, fweisbec@gmail.com Subject: Re: [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Message-ID: <20150802221429.GD11901@krava.redhat.com> References: <1438433365-2979-1-git-send-email-uobergfe@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1438433365-2979-1-git-send-email-uobergfe@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Aug 01, 2015 at 02:49:21PM +0200, Ulrich Obergfell wrote: > Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were > only called in watchdog thread context. However, the following commits > utilize these functions outside of watchdog thread context too. > > commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca > Author: Michal Hocko > Date: Tue Sep 24 15:27:30 2013 -0700 > > watchdog: update watchdog_thresh properly > > commit b3738d29323344da3017a91010530cf3a58590fc > Author: Stephane Eranian > Date: Mon Nov 17 20:07:03 2014 +0100 > > watchdog: Add watchdog enable/disable all functions > > Hence, it is now possible that these functions execute concurrently with > the same 'cpu' argument. This concurrency is problematic because per-cpu > 'watchdog_ev' can be accessed/modified without adequate synchronization. > > The patch series aims to address the above problem. However, instead of > introducing locks to protect per-cpu 'watchdog_ev' a different approach > is taken: Invoke these functions by parking and unparking the watchdog > threads (to ensure they are always called in watchdog thread context). > > static struct smp_hotplug_thread watchdog_threads = { > ... > .park = watchdog_disable, // calls watchdog_nmi_disable() > .unpark = watchdog_enable, // calls watchdog_nmi_enable() > }; > > Both previously mentioned commits call these functions in a similar way > and thus in principle contain some duplicate code. The patch series also > avoids this duplication by providing a commonly usable mechanism. > > - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that > park/unpark all watchdog threads specified in 'watchdog_cpumask'. They > are intended to be called inside of kernel/watchdog.c only. > > - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can > be utilized by external callers to deactivate the hard and soft lockup > detector temporarily. > > - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code > that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca. > > - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that > was introduced by commit b3738d29323344da3017a91010530cf3a58590fc. > > A few corner cases should be mentioned here for completeness. > > - kthread_park() of watchdog/N could hang if cpu N is already locked up. > However, if watchdog is enabled the lockup will be detected anyway. > > - kthread_unpark() of watchdog/N could hang if cpu N got locked up after > kthread_park(). The occurrence of this scenario should be _very_ rare > in practice, in particular because it is not expected that temporary > deactivation will happen frequently, and if it happens at all it is > expected that the duration of deactivation will be short. > > Ulrich Obergfell (4): > watchdog: introduce watchdog_park_threads() and > watchdog_unpark_threads() > watchdog: introduce watchdog_suspend() and watchdog_resume() > watchdog: use park/unpark functions in update_watchdog_all_cpus() > watchdog: use suspend/resume interface in fixup_ht_bug() tested the same way as for I did for my other patch for this issue: http://marc.info/?l=linux-kernel&m=143834383400803&w=2 works nicely ;-) thanks, jirka