linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liuye <liu.yeC@h3c.com>
To: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "jason.wessel@windriver.com" <jason.wessel@windriver.com>,
	"dianders@chromium.org" <dianders@chromium.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"jirislaby@kernel.org" <jirislaby@kernel.org>,
	"kgdb-bugreport@lists.sourceforge.net"
	<kgdb-bugreport@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-serial@vger.kernel.org" <linux-serial@vger.kernel.org>
Subject: 答复: 答复: 答复: 答复: 答复: 答复: [PATCH] kdb: Fix the deadlock issue in KDB debugging.
Date: Fri, 15 Mar 2024 09:59:08 +0000	[thread overview]
Message-ID: <ae45b2736f364629afe6806c354cec14@h3c.com> (raw)
In-Reply-To: <20240314130916.GE202685@aspen.lan>

>On Thu, Mar 14, 2024 at 07:06:22AM +0000, Liuye wrote:
>> >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
>> >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> >> >> I know that you said schedule_work is not NMI save, which is 
>> >> >> >> the first issue. Perhaps it can be fixed using 
>> >> >> >> irq_work_queue. But even if irq_work_queue is used to 
>> >> >> >> implement it, there will still be a deadlock problem because 
>> >> >> >> slave cpu1 still has not released the running queue lock of 
>> >> >> >> master CPU0.
>> >> >> >
>> >> >> >This doesn't sound right to me. Why do you think CPU1 won't 
>> >> >> >release the run queue lock?
>> >> >>
>> >> >> In this example, CPU1 is waiting for CPU0 to release 
>> >> >> dbg_slave_lock.
>> >> >
>> >> >That shouldn't be a problem. CPU0 will have released that lock by 
>> >> >the time the irq work is dispatched.
>> >>
>> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to 
>> >> be handled, and we are back to the previous issue.
>> >
>> > Sorry but I still don't understand what problem you think can happen 
>> > here. What is wrong with calling schedule_work() from the IRQ work 
>> > handler?
>> >
>> > Both irq_work_queue() and schedule_work() are calls to queue 
>> > deferred work. It does not matter when the work is queued (providing 
>> > we are lock safe). What matters is when the work is actually executed.
>> >
>> > Please can you describe the problem you think exists based on when 
>> > the work is executed.
>>
>> CPU0 enters the KDB process when processing serial port interrupts and 
>> triggers an IPI (NMI) to other CPUs.  After entering a stable state,
>> CPU0 is in interrupt context, while other CPUs are in NMI context.
>> Before other CPUs enter NMI context, there is a chance to obtain the 
>> running queue of CPU0.
>
>Focusing on the run queue locks in this analysis is a mistake. Before the other CPUs enter NMI context there is a chance for them to obtain
>*any* locks, including the timer wheel locks.

Yes, you are right. I did not consider it comprehensively.

>> At this time, when CPU0 is processing kgdboc_restore_input, calling 
>> schedule_work, need_more_worker here determines the chance to wake up 
>> processes on system_wq.
>>
>> This will cause CPU0 to acquire the running queue lock of this core, 
>> which is held by other CPUs.  but other CPUs are still in NMI context 
>> and have not exited because waiting for CPU0 to release the 
>> dbg_slave_lock after schedule_work.
>>
>> After thinking about it, the problem is not whether schedule_work is 
>> NMI safe, but that processes on system_wq should not be awakened 
>> immediately when schedule_work is called.
>
>I disagree with this conclusion.
>
>The problem *is* that schedue_work() is not NMI-safe.
>
>You cannot solve an NMI safety problem by replacing schedule_work() with another function that is also not NMI-safe. That simply changes the locks that need to be taken to provoke a deadlock.
>
>
>> I replaced schedule_work with schedule_delayed_work, and this solved 
>> my problem.
>
>This may stop some specific reproduction from taking place but it does not fix the potential deadlock.
>
>I still believe that using irq_work is the only way to solve this properly. Please try the following change:

I tried the following patch and it also resolved the issue.
Thank you for your guidance and suggestions. I will organize this issue and resend a patch.

>diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 7ce7bb1640054..161b25ecc5e15 100644
>--- a/drivers/tty/serial/kgdboc.c
>+++ b/drivers/tty/serial/kgdboc.c
>@@ -22,6 +22,7 @@
> #include <linux/module.h>
> #include <linux/platform_device.h>
> #include <linux/serial_core.h>
>+#include <linux/irq_work.h>
>
> #define MAX_CONFIG_LEN		40
>
>@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
>
> static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
>
>+static void kgdboc_queue_restore_input_helper(struct irq_work *unused) 
>+{
>+	schedule_work(&kgdboc_restore_input_work);
>+}
>+
>+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, 
>+kgdboc_queue_restore_input_helper);
>+
> static void kgdboc_restore_input(void)
> {
> 	if (likely(system_state == SYSTEM_RUNNING))
>-		schedule_work(&kgdboc_restore_input_work);
>+		irq_work_queue(&kgdboc_restore_input_irq_work);
> }
>
> static int kgdboc_register_kbd(char **cptr) @@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
> 			i--;
> 		}
> 	}
>+	irq_work_sync(&kgdboc_restore_input_irq_work);
> 	flush_work(&kgdboc_restore_input_work);
> }
> #else /* ! CONFIG_KDB_KEYBOARD */

  reply	other threads:[~2024-03-15 10:00 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28  2:56 [PATCH] kdb: Fix the deadlock issue in KDB debugging LiuYe
2024-02-28 12:05 ` Daniel Thompson
2024-03-01  3:30   ` 答复: " Liuye
2024-03-01 10:59     ` Daniel Thompson
2024-03-12  8:37       ` 答复: " Liuye
2024-03-12  9:57         ` Daniel Thompson
2024-03-12 10:04           ` 答复: " Liuye
2024-03-12 10:24             ` Daniel Thompson
2024-03-13  1:22               ` 答复: " Liuye
2024-03-13 14:17                 ` Daniel Thompson
2024-03-14  7:06                   ` 答复: " Liuye
2024-03-14 13:09                     ` Daniel Thompson
2024-03-15  9:59                       ` Liuye [this message]
2024-03-16  2:34                       ` [PATCH v1] " liu.yec
2024-03-20 16:28                         ` Daniel Thompson
2024-03-21  2:26                           ` [PATCH V3] " liu.yec
2024-03-21  7:38                             ` Greg KH
2024-03-21  7:57                               ` 答复: " Liuye
2024-03-21 11:04                                 ` Daniel Thompson
2024-03-21 11:50                                   ` [PATCH V4] " liu.yec
2024-03-22  6:54                                     ` Jiri Slaby
2024-03-22  7:50                                       ` 答复: " Liuye
2024-03-22 15:58                                         ` Daniel Thompson
2024-03-23  1:41                                           ` [PATCH V5] " liu.yec
2024-03-25 16:54                                             ` Daniel Thompson
2024-03-26  0:47                                               ` 答复: " Liuye
2024-03-26  7:40                                               ` [PATCH V6] " liu.yec
2024-03-26  8:22                                                 ` Greg KH
2024-03-26  8:54                                                   ` [PATCH V7] " liu.yec
2024-04-02 12:58                                                     ` Daniel Thompson
2024-04-03  6:11                                                       ` [PATCH V8] " liu.yec
2024-04-03 13:58                                                         ` Daniel Thompson
2024-04-03 22:22                                                         ` Andy Shevchenko
2024-04-08  1:44                                                           ` LiuYe
2024-04-08 10:29                                                             ` Andy Shevchenko
2024-04-09  2:03                                                               ` [PATCH V9] " liu.yec
2024-04-10  2:06                                                                 ` [PATCH V10] " liu.yec
2024-04-10  3:59                                                                   ` Andy Shevchenko
2024-04-10  5:30                                                                   ` Greg KH
2024-04-10  5:54                                                                     ` 答复: " Liuye
2024-04-10  5:59                                                                       ` Greg KH
2024-04-10  6:10                                                                         ` 答复: " Liuye
2024-04-10  6:15                                                                           ` Greg KH
2024-04-10  6:30                                                                             ` 答复: " Liuye
2024-04-10  7:18                                                                               ` [PATCH V11] " liu.yec
2024-04-10  8:24                                                                               ` 答复: 答复: 答复: [PATCH V10] " Greg KH
2024-04-10  8:38                                                                                 ` 答复: " Liuye
2024-03-02 20:44 ` [PATCH] " Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae45b2736f364629afe6806c354cec14@h3c.com \
    --to=liu.yec@h3c.com \
    --cc=daniel.thompson@linaro.org \
    --cc=dianders@chromium.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jason.wessel@windriver.com \
    --cc=jirislaby@kernel.org \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).