From: Daniel Thompson <daniel.thompson@linaro.org>
To: Liuye <liu.yeC@h3c.com>
Cc: "jason.wessel@windriver.com" <jason.wessel@windriver.com>,
"dianders@chromium.org" <dianders@chromium.org>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"jirislaby@kernel.org" <jirislaby@kernel.org>,
"kgdb-bugreport@lists.sourceforge.net"
<kgdb-bugreport@lists.sourceforge.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-serial@vger.kernel.org" <linux-serial@vger.kernel.org>
Subject: Re: 答复: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
Date: Thu, 14 Mar 2024 13:09:16 +0000 [thread overview]
Message-ID: <20240314130916.GE202685@aspen.lan> (raw)
In-Reply-To: <56ed54fd241c462189d2d030ad51eac6@h3c.com>
On Thu, Mar 14, 2024 at 07:06:22AM +0000, Liuye wrote:
> >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
> >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> >> >> I know that you said schedule_work is not NMI save, which is
> >> >> >> the first issue. Perhaps it can be fixed using
> >> >> >> irq_work_queue. But even if irq_work_queue is used to
> >> >> >> implement it, there will still be a deadlock problem because
> >> >> >> slave cpu1 still has not released the running queue lock of
> >> >> >> master CPU0.
> >> >> >
> >> >> >This doesn't sound right to me. Why do you think CPU1 won't
> >> >> >release the run queue lock?
> >> >>
> >> >> In this example, CPU1 is waiting for CPU0 to release
> >> >> dbg_slave_lock.
> >> >
> >> >That shouldn't be a problem. CPU0 will have released that lock by
> >> >the time the irq work is dispatched.
> >>
> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to
> >> be handled, and we are back to the previous issue.
> >
> > Sorry but I still don't understand what problem you think can happen
> > here. What is wrong with calling schedule_work() from the IRQ work
> > handler?
> >
> > Both irq_work_queue() and schedule_work() are calls to queue deferred
> > work. It does not matter when the work is queued (providing we are
> > lock safe). What matters is when the work is actually executed.
> >
> > Please can you describe the problem you think exists based on when
> > the work is executed.
>
> CPU0 enters the KDB process when processing serial port interrupts and
> triggers an IPI (NMI) to other CPUs. After entering a stable state,
> CPU0 is in interrupt context, while other CPUs are in NMI context.
> Before other CPUs enter NMI context, there is a chance to obtain the
> running queue of CPU0.
Focusing on the run queue locks in this analysis is a mistake. Before
the other CPUs enter NMI context there is a chance for them to obtain
*any* locks, including the timer wheel locks.
> At this time, when CPU0 is processing kgdboc_restore_input, calling
> schedule_work, need_more_worker here determines the chance to wake up
> processes on system_wq.
>
> This will cause CPU0 to acquire the running queue lock of this core,
> which is held by other CPUs. but other CPUs are still in NMI context
> and have not exited because waiting for CPU0 to release the
> dbg_slave_lock after schedule_work.
>
> After thinking about it, the problem is not whether schedule_work is
> NMI safe, but that processes on system_wq should not be awakened
> immediately when schedule_work is called.
I disagree with this conclusion.
The problem *is* that schedue_work() is not NMI-safe.
You cannot solve an NMI safety problem by replacing schedule_work()
with another function that is also not NMI-safe. That simply changes
the locks that need to be taken to provoke a deadlock.
> I replaced schedule_work with schedule_delayed_work, and this solved
> my problem.
This may stop some specific reproduction from taking place but it
does not fix the potential deadlock.
I still believe that using irq_work is the only way to solve this
properly. Please try the following change:
diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb1640054..161b25ecc5e15 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/serial_core.h>
+#include <linux/irq_work.h>
#define MAX_CONFIG_LEN 40
@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+ schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
static void kgdboc_restore_input(void)
{
if (likely(system_state == SYSTEM_RUNNING))
- schedule_work(&kgdboc_restore_input_work);
+ irq_work_queue(&kgdboc_restore_input_irq_work);
}
static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
i--;
}
}
+ irq_work_sync(&kgdboc_restore_input_irq_work);
flush_work(&kgdboc_restore_input_work);
}
#else /* ! CONFIG_KDB_KEYBOARD */
next prev parent reply other threads:[~2024-03-14 13:09 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-28 2:56 [PATCH] kdb: Fix the deadlock issue in KDB debugging LiuYe
2024-02-28 12:05 ` Daniel Thompson
2024-03-01 3:30 ` 答复: " Liuye
2024-03-01 10:59 ` Daniel Thompson
2024-03-12 8:37 ` 答复: " Liuye
2024-03-12 9:57 ` Daniel Thompson
2024-03-12 10:04 ` 答复: " Liuye
2024-03-12 10:24 ` Daniel Thompson
2024-03-13 1:22 ` 答复: " Liuye
2024-03-13 14:17 ` Daniel Thompson
2024-03-14 7:06 ` 答复: " Liuye
2024-03-14 13:09 ` Daniel Thompson [this message]
2024-03-15 9:59 ` 答复: " Liuye
2024-03-16 2:34 ` [PATCH v1] " liu.yec
2024-03-20 16:28 ` Daniel Thompson
2024-03-21 2:26 ` [PATCH V3] " liu.yec
2024-03-21 7:38 ` Greg KH
2024-03-21 7:57 ` 答复: " Liuye
2024-03-21 11:04 ` Daniel Thompson
2024-03-21 11:50 ` [PATCH V4] " liu.yec
2024-03-22 6:54 ` Jiri Slaby
2024-03-22 7:50 ` 答复: " Liuye
2024-03-22 15:58 ` Daniel Thompson
2024-03-23 1:41 ` [PATCH V5] " liu.yec
2024-03-25 16:54 ` Daniel Thompson
2024-03-26 0:47 ` 答复: " Liuye
2024-03-26 7:40 ` [PATCH V6] " liu.yec
2024-03-26 8:22 ` Greg KH
2024-03-26 8:54 ` [PATCH V7] " liu.yec
2024-04-02 12:58 ` Daniel Thompson
2024-04-03 6:11 ` [PATCH V8] " liu.yec
2024-04-03 13:58 ` Daniel Thompson
2024-04-03 22:22 ` Andy Shevchenko
2024-04-08 1:44 ` LiuYe
2024-04-08 10:29 ` Andy Shevchenko
2024-04-09 2:03 ` [PATCH V9] " liu.yec
2024-04-10 2:06 ` [PATCH V10] " liu.yec
2024-04-10 3:59 ` Andy Shevchenko
2024-04-10 5:30 ` Greg KH
2024-04-10 5:54 ` 答复: " Liuye
2024-04-10 5:59 ` Greg KH
2024-04-10 6:10 ` 答复: " Liuye
2024-04-10 6:15 ` Greg KH
2024-04-10 6:30 ` 答复: " Liuye
2024-04-10 7:18 ` [PATCH V11] " liu.yec
2024-04-10 8:24 ` 答复: 答复: 答复: [PATCH V10] " Greg KH
2024-04-10 8:38 ` 答复: " Liuye
2024-03-02 20:44 ` [PATCH] " Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240314130916.GE202685@aspen.lan \
--to=daniel.thompson@linaro.org \
--cc=dianders@chromium.org \
--cc=gregkh@linuxfoundation.org \
--cc=jason.wessel@windriver.com \
--cc=jirislaby@kernel.org \
--cc=kgdb-bugreport@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=liu.yeC@h3c.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.