From: Lecopzer Chen <lecopzer.chen@mediatek.com>
To: <pmladek@suse.com>
Cc: <acme@kernel.org>, <akpm@linux-foundation.org>,
<alexander.shishkin@linux.intel.com>, <catalin.marinas@arm.com>,
<davem@davemloft.net>, <jolsa@redhat.com>, <jthierry@redhat.com>,
<keescook@chromium.org>, <kernelfans@gmail.com>,
<lecopzer.chen@mediatek.com>,
<linux-arm-kernel@lists.infradead.org>,
<linux-kernel@vger.kernel.org>,
<linux-mediatek@lists.infradead.org>,
<linux-perf-users@vger.kernel.org>, <mark.rutland@arm.com>,
<masahiroy@kernel.org>, <matthias.bgg@gmail.com>,
<maz@kernel.org>, <mcgrof@kernel.org>, <mingo@redhat.com>,
<namhyung@kernel.org>, <nixiaoming@huawei.com>,
<peterz@infradead.org>, <sparclinux@vger.kernel.org>,
<sumit.garg@linaro.org>, <wangqing@vivo.com>, <will@kernel.org>,
<yj.chiang@mediatek.com>
Subject: Re: [PATCH v3 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model
Date: Fri, 8 Apr 2022 00:21:43 +0800 [thread overview]
Message-ID: <20220407162143.1127-1-lecopzer.chen@mediatek.com> (raw)
In-Reply-To: <YkxeAM+SwYHAnJE1@alley>
> On Tue 2022-04-05 21:35:03, Lecopzer Chen wrote:
> > > On Thu 2022-03-24 22:14:04, Lecopzer Chen wrote:
> > > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > > yet. E.g. on arm64, PMU is not ready until
> > > > device_initcall(armv8_pmu_driver_init). And it is deeply integrated
> > > > with the driver model and cpuhp. Hence it is hard to push this
> > > > initialization before smp_init().
> > > >
> > > > But it is easy to take an opposite approach and try to initialize
> > > > the watchdog once again later.
> > > > The delayed probe is called using workqueues. It need to allocate
> > > > memory and must be proceed in a normal context.
> > > > The delayed probe is queued only when the early one returns -EBUSY.
> > > > It is the return code returned when PMU is not ready yet.
> > > >
> > > > Provide an API - retry_lockup_detector_init() for anyone who needs
> > > > to delayed init lockup detector.
> > > >
> > > > The original assumption is: nobody should use delayed probe after
> > > > lockup_detector_check() which has __init attribute.
> > > > That is, anyone uses this API must call between lockup_detector_init()
> > > > and lockup_detector_check(), and the caller must have __init attribute
> > > >
> > > > --- a/kernel/watchdog.c
> > > > +++ b/kernel/watchdog.c
> > > > +}
> > > > +
> > > > +/*
> > > > + * retry_lockup_detector_init - retry init lockup detector if possible.
> > > > + *
> > > > + * Only take effect when allow_lockup_detector_init_retry is true, which
> > > > + * means it must call between lockup_detector_init() and lockup_detector_check().
> > > > + * Be aware that caller must have __init attribute, relative functions
> > > > + * will be freed after kernel initialization.
> > > > + */
> > > > +void __init retry_lockup_detector_init(void)
> > > > +{
> > > > + if (!allow_lockup_detector_init_retry)
> > > > + return;
> > > > +
> > > > + queue_work_on(__smp_processor_id(), system_wq, &detector_work);
> > > > +}
> > > > +
> > > > +/* Ensure the check is called after the initialization of driver */
> > > > +static int __init lockup_detector_check(void)
> > > > +{
> > > > + /* Make sure no work is pending. */
> > > > + flush_work(&detector_work);
> > >
> > > This is racy. We should first disable
> > > "allow_lockup_detector_init_retry" to make sure
> > > that retry_lockup_detector_init() will not queue
> > > the work any longer.
> >
> > But disable before flush_work will make the
> > lockup_detector_delay_init() ->
> > watchdog_nmi_probe ->
> > + if (!allow_lockup_detector_init_retry)
> > + return -EBUSY;
>
> I see. It is exactly the reason why I suggest to remove the
> optimization and keep the code simple.
>
> > how about:
> > ...
> > static bool __init delayed_init_allowed = true;
> > ...
> > /*
> > * retry_lockup_detector_init - retry init lockup detector if possible.
> > *
> > * Only take effect when allow_lockup_detector_init_retry is true, which
> > * means it must call between lockup_detector_init() and lockup_detector_check().
> > * Be aware that caller must have __init attribute, relative functions
> > * will be freed after kernel initialization.
> > */
> > void __init retry_lockup_detector_init(void)
> > {
> > if (!allow_lockup_detector_init_retry || !delayed_init_allowed)
> > return;
> >
> > /*
> > * we shouldn't queue any delayed init work twice to avoid
> > * any unwanted racy.
> > */
> > delayed_init_allowed = false;
>
> Grrr, this is so complicated and confusing. It might be because of
> badly selected variable names or comments. But I think that it is
> simply a bad approach.
>
> OK, you suggest two variables. If I get it correctly:
>
> + The variable "delayed_init_allowed"
> tries to prevent the race in lockup_detector_check().
>
> It will make sure that the work could not be queued after
> flush_work() finishes.
>
> Is this obvious from the comment?
> Is this obvious from the variable name?
>
> I am sorry. But it is not obvious to me. I understand it only
> because I see it together in this mail. It will be pretty
> hard to get it from the code when I see it one year later.
>
>
> + The variable "allow_lockup_detector_init_retry" has an unclear
> meaning. It might mean:
>
> + watchdog_nmi_probe() ended with -EBUSY in
> lockup_detector_init() and we can try the delayed init.
>
> + but it also means that watchdog_nmi_probe() succeeded in
> lockup_detector_delay_init() and there is no need to
> try the delayed init any longer.
>
> Is this obvious from the variable name?
> Is it explained anywhere?
> Is it easy to understand?
>
> No, from my POV. It is really bad idea to have a single
> variable with so many meanings.
>
>
> And this is my problem with this approach. There was one variable with
> unclear meanting. And you are trying to fix it by two variables
> with unclear meaning.
>
I really apreciate for your reply, many thanks for it.
For my point of view, the naming for "delayed_init_allowed" is the
whole system state now is able to(allowed) do delayed init.
The "allow_lockup_detector_init_retry" is that delayed init is ready
to retry register NMI watchdog.
Thus the meaning of delayed_init_"allowed" is, we are allowed to
do delayed init including
1. initialization of delayed init
2. the retry of delayed init
I'm sorry for that I didn't express the meaning clearly.
I'll have definition in detail in the later version patch not only
a brief comment and I hope you can review much easier.
This only explain the thought how I made decision for the naming.
So let me back to the discussion below.
> > queue_work_on(__smp_processor_id(), system_wq, &detector_work);
> > }
> >
> >
> > /*
> > * Ensure the check is called after the initialization of driver
> > * and before removing init code.
> > */
> > static int __init lockup_detector_check(void)
> > {
> > delayed_init_allowed = false;
> > flush_work(&detector_work);
> > allow_lockup_detector_init_retry = false;
> >
> > return 0;
> > }
>
> No, please keep it simple. Just have one variable that will say
> whether we are allowed to queue the work:
>
> + It will be allowed when watchdog_nmi_probe() ended
> with -EBUSY in lockup_detector_init()
>
> + It will not longer be allowed when watchdog_nmi_probe()
> succeeded or when lockup_detector_check() flushes
> the pending works.
>
Okay, let me think about it. I'll try to find a better solution that
only uses one variable.
And it's strongly about how users use it in 5th patch, I'll give further
reply in 5th patch
thanks
BRs,
Lecopzer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-04-07 16:37 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-24 14:14 [PATCH v3 0/5] Support hld delayed init based on Pseudo-NMI for arm64 Lecopzer Chen
2022-03-24 14:14 ` [PATCH v3 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT Lecopzer Chen
2022-03-24 14:14 ` [PATCH v3 2/5] kernel/watchdog: change watchdog_nmi_enable() to void Lecopzer Chen
2022-03-24 14:14 ` [PATCH v3 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event Lecopzer Chen
2022-03-24 14:14 ` [PATCH v3 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model Lecopzer Chen
2022-04-04 14:41 ` Petr Mladek
2022-04-05 13:35 ` Lecopzer Chen
2022-04-05 15:19 ` Petr Mladek
2022-04-07 16:21 ` Lecopzer Chen [this message]
2022-03-24 14:14 ` [PATCH v3 5/5] arm64: Enable perf events based hard lockup detector Lecopzer Chen
2022-04-04 14:17 ` Petr Mladek
2022-04-05 12:53 ` Lecopzer Chen
2022-04-05 14:36 ` Petr Mladek
2022-04-07 16:59 ` Lecopzer Chen
2022-04-13 10:25 ` Petr Mladek
2022-04-21 16:30 ` Lecopzer Chen
2022-04-26 16:38 ` Lecopzer Chen
2022-04-28 8:27 ` Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220407162143.1127-1-lecopzer.chen@mediatek.com \
--to=lecopzer.chen@mediatek.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=catalin.marinas@arm.com \
--cc=davem@davemloft.net \
--cc=jolsa@redhat.com \
--cc=jthierry@redhat.com \
--cc=keescook@chromium.org \
--cc=kernelfans@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=matthias.bgg@gmail.com \
--cc=maz@kernel.org \
--cc=mcgrof@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=nixiaoming@huawei.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=sparclinux@vger.kernel.org \
--cc=sumit.garg@linaro.org \
--cc=wangqing@vivo.com \
--cc=will@kernel.org \
--cc=yj.chiang@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).