All of lore.kernel.org
 help / color / mirror / Atom feed
From: Geraldo Nascimento <geraldogabriel@gmail.com>
To: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] workqueue: missing NOT while checking if Workqueue is offline
Date: Sun, 29 May 2022 03:41:00 -0300	[thread overview]
Message-ID: <YpMVfN7UUGxX0UxF@geday> (raw)
In-Reply-To: <YpMPPyIZVlBwUrNe@slm.duckdns.org>

On Sat, May 28, 2022 at 08:14:23PM -1000, Tejun Heo wrote:
> On Sun, May 29, 2022 at 02:53:39AM -0300, Geraldo Nascimento wrote:
> > On Sat, May 28, 2022 at 07:24:41PM -1000, Tejun Heo wrote:
> > > On Sun, May 29, 2022 at 01:29:32AM -0300, Geraldo Nascimento wrote:
> > > > I would like very much to hear the opinion of the maintainers!
> > > 
> > > I have a hard time understanding what you're trying to do. Can you please
> > > slow down and start from describing the problem itself?
> > 
> > Hi Tejun,
> > 
> > Sorry for the hurry.
> > 
> > The problem is best described in https://gitlab.freedesktop.org/drm/amd/-/issues/1898
> > 
> > From my understanding from the context of __cancel_work_timer() we should not
> > ever call __flush_work() but I may be wrong. In the present case as
> 
> Yeah, you're wrong.
>

No problem from my side, sorry for wasting your time.

> > described in AMD's GitLab __cancel_work_timer() is being called by
> > cancel_delayed_work_sync() inside kfd_process_notifier_release()
> > from drivers/gpu/drm/amd/amdkfd/kfd_process.c:1157 (Linux 5.18).
> 
> Have you confirmed that that actually is the warning which is triggering? I
> don't see how that condition would trigger that late during the boot and the
> warning line being reported doesn't match v5.16 source code, so I'm not sure
> but skimming the instructon sequence, that's the second UD2 sequence, so I'm
> gonna guess that's the second WARN_ON - the !work->func one and someone else
> on the gitlab bug report seems to agree too.

While I can confirm from my dmesg traces it's the second WARN_ON (the one on
(!work->func)) that triggers, remember it's being called from
__flush_work() due to the lack of NOT operator on wq_online, inside
__cancel_work_timer().

To be honest, for me, it only makes sense to call __flush_work() from
the context of __cancel_work_timer() if we are sure the work isn't
executing. One of the few ways to be sure is when kthreads haven't
initialized yet, that means workqueue_init() hasn't fired yet and
wq_online is still false, so it makes sense to negate that false and
exceptionally call __flush_work() without waiting for completion of the
work - i.e., __flush_work() will WARN_ON workqueue offline condition and
return false immediately because task was already idle.

I know I may be repeating myself, but I'm trying to make my point, from
the little understanding I have of the kernel. And I know that you know best,
and that the possibility of a bug like that lying undiscovered on a
code-base as scrutinized as the Linux kernel is, is very small.

> 
> It's usually a lot more helpful if the bug report is complete - include the
> full warning message with some context at least, make sure that the kernel
> you're using is an upstream one or something close enough. If not, point to
> the source tree. Also, try to clearly distinguish what you know and what you
> suspect. Both can help but mixing them up together tends to cause confusion
> for everyone involved.

Sorry. Will try to do better.

> 
> It just looks like the code is trying to cancel a work item which hasn't
> been initialized and what it prolly needs is an ifdef around that cancel
> call depending on the config option.

Thanks for looking into it,
Geraldo Nascimento

> 
> Thanks.
> 
> -- 
> tejun

  reply	other threads:[~2022-05-29  6:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-28 20:07 [PATCH] workqueue: missing NOT while checking if Workqueue is offline Geraldo Nascimento
2022-05-29  1:25 ` Geraldo Nascimento
2022-05-29  4:29   ` Geraldo Nascimento
2022-05-29  5:24     ` Tejun Heo
2022-05-29  5:53       ` Geraldo Nascimento
2022-05-29  6:14         ` Tejun Heo
2022-05-29  6:41           ` Geraldo Nascimento [this message]
2022-05-31  4:13           ` Geraldo Nascimento
2022-05-31 17:43             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpMVfN7UUGxX0UxF@geday \
    --to=geraldogabriel@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.