linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Pingfan Liu <kernelfans@gmail.com>
To: Petr Mladek <pmladek@suse.com>
Cc: linux-kernel@vger.kernel.org, Sumit Garg <sumit.garg@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Marc Zyngier <maz@kernel.org>, Kees Cook <keescook@chromium.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wang Qing <wangqing@vivo.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Santosh Sivaraj <santosh@fossix.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 3/4] kernel/watchdog: adapt the watchdog_hld interface for async model
Date: Fri, 8 Oct 2021 23:10:12 +0800	[thread overview]
Message-ID: <YWBfVBXBY5ykK+qT@piliu.users.ipa.redhat.com> (raw)
In-Reply-To: <YV/c6X7bPT5pBg/R@piliu.users.ipa.redhat.com>

On Fri, Oct 08, 2021 at 01:53:45PM +0800, Pingfan Liu wrote:
> On Tue, Oct 05, 2021 at 09:03:17AM +0200, Petr Mladek wrote:
> [...]
> > > +static void lockup_detector_delay_init(struct work_struct *work);
> > > +bool hld_detector_delay_initialized __initdata;
> > > +
> > > +struct wait_queue_head hld_detector_wait __initdata =
> > > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > > +
> > > +static struct work_struct detector_work __initdata =
> > > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > > +
> > > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > > +{
> > > +	int ret;
> > > +
> > > +	wait_event(hld_detector_wait, hld_detector_delay_initialized);
> > > +	ret = watchdog_nmi_probe();
> > > +	if (!ret) {
> > > +		nmi_watchdog_available = true;
> > > +		lockup_detector_setup();
> > 
> > Is it really safe to call the entire lockup_detector_setup()
> > later?
> > 
> > It manipulates also softlockup detector. And more importantly,
> > the original call is before smp_init(). It means that it was
> > running when only single CPU was on.
> > 
> For the race analysis, lockup_detector_reconfigure() is on the centre stage.
> Since proc_watchdog_update() can call lockup_detector_reconfigure() to
> re-initialize both soft and hard lockup detector, so the race issue
> should be already taken into consideration.
> 
> > It seems that x86 has some problem with hardlockup detector as
> > well. It later manipulates only the hardlockup detector. Also it uses
> > cpus_read_lock() to prevent races with CPU hotplug, see
> > fixup_ht_bug().
> > 
> Yes. But hardlockup_detector_perf_{stop,start}() can not meet the
> requirement, since no perf_event is created yet. So there is no handy
> interface to re-initialize hardlockup detector directly.
> 
> > 
> > > +	} else {
> > > +		WARN_ON(ret == -EBUSY);
> > > +		pr_info("Perf NMI watchdog permanently disabled\n");
> > > +	}
> > > +}
> > > +
> > >  void __init lockup_detector_init(void)
> > >  {
> > > +	int ret;
> > > +
> > >  	if (tick_nohz_full_enabled())
> > >  		pr_info("Disabling watchdog on nohz_full cores by default\n");
> > >  
> > >  	cpumask_copy(&watchdog_cpumask,
> > >  		     housekeeping_cpumask(HK_FLAG_TIMER));
> > >  
> > > -	if (!watchdog_nmi_probe())
> > > +	ret = watchdog_nmi_probe();
> > > +	if (!ret)
> > >  		nmi_watchdog_available = true;
> > > +	else if (ret == -EBUSY)
> > > +		queue_work_on(smp_processor_id(), system_wq, &detector_work);
> > 
> > IMHO, this is not acceptable. It will block one worker until someone
> > wakes it. Only arm64 will have a code to wake up the work and only
> > when pmu is successfully initialized. In all other cases, the worker
> > will stay blocked forever.
> > 
> What about consider -EBUSY and hld_detector_delay_initialized as a unit?
                                                                     ^^^
								     unity
> If watchdog_nmi_probe() returns -EBUSY, then
> set the state of ld_detector_delay_initialized as "waiting", and then moved to state "finished".
> 
> And at the end of do_initcalls(), check the state is "finished". If not,
> then throw a warning and wake up the worker.
> 
> > The right solution is to do it the other way. Queue the work
> > from arm64-specific code when armv8_pmu_driver_init() succeeded.
> > 
> Could it be better if watchdog can provide a common framework for future
> extension instead of arch specific? The 2nd argument is to avoid the
> message "Perf NMI watchdog permanently disabled" while later enabling
> it.  (Please see
> lockup_detector_init()->watchdog_nmi_probe()->hardlockup_detector_perf_init(),
> but if providing arch specific probe method, it can be avoided)
> 
Sorry for poor expression. I have not explained it completely for the
second point.

Since using arch specific watchdog_nmi_probe() to avoid misleading
message "Perf NMI watchdog permanently disabled", then -EBUSY should be
returned. And from watchdog level, it should know how to handle error,
that is to say queue_work_on(smp_processor_id(), system_wq, &detector_work).

Thanks,

	Pingfan

> > Also I suggest to flush the work to make sure that it is finished
> > before __init code gets removed.
> > 
> Good point, and very interesting. I will look into it.
> 
> > 
> > The open question is what code the work will call. As mentioned
> > above, I am not sure that lockup_detector_delay_init() is safe.
> > IMHO, we need to manipulate only hardlockup detector and
> > we have to serialize it against CPU hotplug.
> > 
> As explained ahead, it has already consider the race against CPU
> hotplug.
> 
> Thanks,
> 
> 	Pingfan
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-10-08 15:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23 14:09 [PATCHv2 0/4] watchdog_hld cleanup and async model for arm64 Pingfan Liu
2021-09-23 14:09 ` [PATCHv2 1/4] kernel/watchdog: trival cleanups Pingfan Liu
2021-10-04  9:32   ` Petr Mladek
2021-10-08  4:04     ` Pingfan Liu
2021-09-23 14:09 ` [PATCHv2 2/4] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create() Pingfan Liu
2021-10-04 12:32   ` Petr Mladek
2021-10-08  4:11     ` Pingfan Liu
2021-09-23 14:09 ` [PATCHv2 3/4] kernel/watchdog: adapt the watchdog_hld interface for async model Pingfan Liu
2021-10-05  7:03   ` Petr Mladek
2021-10-08  5:53     ` Pingfan Liu
2021-10-08 15:10       ` Pingfan Liu [this message]
2021-09-23 14:09 ` [PATCHv2 4/4] arm64: Enable perf events based hard lockup detector Pingfan Liu
2021-09-23 14:29   ` Pingfan Liu
2021-09-24  5:18     ` Sumit Garg
2021-09-24 13:31       ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWBfVBXBY5ykK+qT@piliu.users.ipa.redhat.com \
    --to=kernelfans@gmail.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=masahiroy@kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=samitolvanen@google.com \
    --cc=santosh@fossix.org \
    --cc=sumit.garg@linaro.org \
    --cc=wangqing@vivo.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).