public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ethan Zhao <haifeng.zhao@linux.intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
	Yunhui Cui <cuiyunhui@bytedance.com>,
	dwmw2@infradead.org, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Date: Wed, 26 Feb 2025 11:50:45 +0800	[thread overview]
Message-ID: <888f41b7-dac6-4faf-9f71-4d7bea050e41@linux.intel.com> (raw)
In-Reply-To: <20250225142610.GB545008@ziepe.ca>

在 2025/2/25 22:26, Jason Gunthorpe 写道:
> On Tue, Feb 25, 2025 at 04:54:54PM +0800, Ethan Zhao wrote:
>>> On 2025/2/25 14:48, Yunhui Cui wrote:
>>>> We found that executing the command ./a.out &;reboot -f (where a.out
>>>> is a
>>>> program that only executes a while(1) infinite loop) can
>>>> probabilistically
>>>> cause the system to hang in the intel_iommu_shutdown() function,
>>>> rendering
>>>> it unresponsive. Through analysis, we identified that the factors
>>>> contributing to this issue are as follows:
>>>>
>>>> 1. The reboot -f command does not prompt the kernel to notify the
>>>> application layer to perform cleanup actions, allowing the
>>>> application to
>>>> continue running.
>>>>
>>>> 2. When the kernel reaches the intel_iommu_shutdown() function, only the
>>>> BSP (Bootstrap Processor) CPU is operational in the system.
>>>>
>>>> 3. During the execution of intel_iommu_shutdown(), the function
>>>> down_write
>>>> (&dmar_global_lock) causes the process to sleep and be scheduled out.
> Why does this happen? If the kernel has shutdown other CPUs then what
> thread is holding the other side of this lock and why?
>
>>>> 4. At this point, though the processor's interrupt flag is not cleared,
>>>>    allowing interrupts to be accepted. However, only legacy devices
>>>> and NMI
>>>> (Non-Maskable Interrupt) interrupts could come in, as other interrupts
>>>> routing have already been disabled. If no legacy or NMI interrupts occur
>>>> at this stage, the scheduler will not be able to run.
>>>> 5. If the application got scheduled at this time is executing a
>>>> while(1)-
>>>> type loop, it will be unable to be preempted, leading to an infinite
>>>> loop
>>>> and causing the system to become unresponsive.
> If the schedular doesn't run how did we get from 4 -> 5?
>
> Maybe the issue is the shutdown handler here is running in the wrong
> time and it should not be running after the scheduler has been shut
> down.
>
> I don't think removing the lock is a great idea without more
> explanation.

Seems it is not so simple job to explain why there is no race window between
this iommu_shutdown() and following dmar_global_lock holders.

1. PCIe hotplug dmar_pci_bus_notifier()

2. mm_core_init detect_intel_iommu()

3. late_initcall dmar_free_unused_resources()

4. acpi attach dmar_device_hotplug()

5. pci_iommu_init intel_iommu_init() init_dmars()

6. rootfs_initcall ir_dev_scope_init()

though here is the last stage of reboot. then how about we turn back to v1

Just repalce with own_write() with down_write_trylock().

Thanks,

Ethan


>
> Jason

-- 
"firm, enduring, strong, and long-lived"


  parent reply	other threads:[~2025-02-26  3:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25  6:48 [PATCH v2] iommu/vt-d: fix system hang on reboot -f Yunhui Cui
2025-02-25  7:01 ` Baolu Lu
2025-02-25  8:54   ` Ethan Zhao
2025-02-25 14:26     ` Jason Gunthorpe
2025-02-26  0:35       ` Ethan Zhao
2025-02-26  3:50       ` Ethan Zhao [this message]
2025-02-26  5:18         ` Baolu Lu
2025-02-26  5:55           ` Ethan Zhao
2025-02-26 13:04             ` Jason Gunthorpe
2025-02-27  0:40               ` Ethan Zhao
2025-02-27 20:38                 ` Jason Gunthorpe
2025-02-28  0:51                   ` Ethan Zhao
2025-02-28  2:18                     ` [External] " yunhui cui
2025-02-28  4:34                       ` Ethan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=888f41b7-dac6-4faf-9f71-4d7bea050e41@linux.intel.com \
    --to=haifeng.zhao@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=cuiyunhui@bytedance.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox