From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: bhelgaas@google.com, vgoyal@redhat.com
Cc: alex.williamson@redhat.com, linux-pci@vger.kernel.org,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
hbabu@us.ibm.com, iommu@lists.linux-foundation.org,
ddutile@redhat.com, ishii.hironobu@jp.fujitsu.com,
bill.sumner@hp.com
Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA
Date: Tue, 30 Jul 2013 15:09:58 +0900 [thread overview]
Message-ID: <51F758B6.9090204@jp.fujitsu.com> (raw)
In-Reply-To: <CAErSpo4JtG5qVJb-nCywe3vkft=-cDeRDKn_V92RCZ-sGXpwbg@mail.gmail.com>
(2013/07/29 23:17), Bjorn Helgaas wrote:
> On Sun, Jul 28, 2013 at 6:37 PM, Takao Indoh <indou.takao@jp.fujitsu.com> wrote:
>> (2013/07/26 2:00), Bjorn Helgaas wrote:
>>> On Wed, Jul 24, 2013 at 12:29 AM, Takao Indoh
>>> <indou.takao@jp.fujitsu.com> wrote:
>>>> Sorry for letting this discussion slide, I was busy on other works:-(
>>>> Anyway, the summary of previous discussion is:
>>>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>>>> boot. This expects PCI enumeration is done before IOMMU
>>>> initialization as follows.
>>>> (1) PCI enumeration
>>>> (2) fs_initcall ---> device reset
>>>> (3) IOMMU initialization
>>>> - This works on x86, but does not work on other architecture because
>>>> IOMMU is initialized before PCI enumeration on some architectures. So,
>>>> device reset should be done where IOMMU is initialized instead of
>>>> initcall.
>>>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>>>
>>>> Resetting devices in panic kernel is against kdump policy and seems not to
>>>> be good idea. So I think adding reset code into iommu initialization is
>>>> better. I'll post patches for that.
>>>
>>> Of course nobody *wants* to do anything in the panic kernel. But
>>> simply saying "it's against kdump policy and seems not to be a good
>>> idea" is not a technical argument. There are things that are
>>> impractical to do in the kdump kernel, so they have to be done in the
>>> panic kernel even though we know the kernel is unreliable and the
>>> attempt may fail.
>>
>> Accessing kernel data in panic kernel causes panic again, so
>> - Don't touch kernel data in panic situation
>> - Jump to kdump kernel as quickly as possible, and do things in safe
>> kernel
>> These are basic "kdump policy". Of course if there are any works which
>> we cannot do in kdump kernel and can do only in panic kernel, for
>> example saving registers or stopping cpus, we should do them in panic
>> kernel.
>>
>> Resetting devices in panic kernel is worth considering if we can safely
>> find pci_dev and reset it, but I have no idea how to do that because
>> for example struct pci_dev may be borken.
>
> Nobody can guarantee that the panic kernel can do *anything* safely
> because any arbitrary kernel data or text may be corrupted. But if
> you consider any specific data structure, e.g., CPU or PCI device
> lists, it's not very likely that it will be corrupted.
To reset device we need to scan pci device tree using for_each_pci_dev.
Something like bust_spinlocks() to clear pci_lock forcibly is needed.
Vivek, adding these into kdump is acceptable for you? Or any other
ideas? I think iterating over a list like for_each_pci_dev is dangerous.
>
>>> My point about IOMMU and PCI initialization order doesn't go away just
>>> because it doesn't fit "kdump policy." Having system initialization
>>> occur in a logical order is far more important than making kdump work.
>>
>> My next plan is as follows. I think this is matched to logical order
>> on boot.
>>
>> drivers/pci/pci.c
>> - Add function to reset bus, for example, pci_reset_bus(struct pci_bus *bus)
>>
>> drivers/iommu/intel-iommu.c
>> - On initialization, if IOMMU is already enabled, call this bus reset
>> function before disabling and re-enabling IOMMU.
>
> I raised this issue because of arches like sparc that enumerate the
> IOMMU before the PCI devices that use it. In that situation, I think
> you're proposing this:
>
> panic kernel
> enable IOMMU
> panic
> kdump kernel
> initialize IOMMU (already enabled)
> pci_reset_bus
> disable IOMMU
> enable IOMMU
> enumerate PCI devices
>
> But the problem is that when you call pci_reset_bus(), you haven't
> enumerated the PCI devices, so you don't know what to reset.
Right, so my idea is adding reset code into "intel-iommu.c". intel-iommu
initialization is based on the assumption that enumeration of PCI devices
is already done. We can find target device from IOMMU page table instead
of scanning all devices in pci tree.
Therefore, this idea is only for intel-iommu. Other architectures need
to implement their own reset code.
Thanks,
Takao Indoh
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2013-07-30 6:11 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-14 5:29 [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA Takao Indoh
2013-05-14 22:04 ` Eric W. Biederman
2013-05-21 23:46 ` Takao Indoh
2013-06-06 7:25 ` Takao Indoh
2013-06-07 4:14 ` Bjorn Helgaas
2013-06-07 8:46 ` Takao Indoh
2013-06-11 2:20 ` Bjorn Helgaas
2013-06-11 6:08 ` Takao Indoh
2013-06-11 23:19 ` Sumner, William
2013-06-12 0:53 ` Bjorn Helgaas
2013-06-12 13:19 ` Don Dutile
2013-06-13 3:25 ` Takao Indoh
2013-06-12 4:45 ` Bjorn Helgaas
2013-06-13 2:44 ` Takao Indoh
2013-06-13 3:41 ` Bjorn Helgaas
2013-06-14 2:11 ` Takao Indoh
2013-07-24 6:29 ` Takao Indoh
2013-07-25 14:24 ` Vivek Goyal
2013-07-29 0:20 ` Takao Indoh
2013-07-25 17:00 ` Bjorn Helgaas
2013-07-29 0:37 ` Takao Indoh
2013-07-29 14:17 ` Bjorn Helgaas
2013-07-30 6:09 ` Takao Indoh [this message]
2013-07-30 15:59 ` Bjorn Helgaas
2013-07-31 0:35 ` Takao Indoh
2013-07-31 3:11 ` Alex Williamson
2013-07-31 5:50 ` Takao Indoh
2013-07-31 21:08 ` Bjorn Helgaas
2013-07-31 21:23 ` Rafael J. Wysocki
2013-08-01 6:34 ` Takao Indoh
2013-08-01 12:42 ` Alex Williamson
2013-08-01 13:20 ` Vivek Goyal
2013-07-31 19:56 ` Vivek Goyal
2013-07-31 16:09 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51F758B6.9090204@jp.fujitsu.com \
--to=indou.takao@jp.fujitsu.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=bill.sumner@hp.com \
--cc=ddutile@redhat.com \
--cc=hbabu@us.ibm.com \
--cc=iommu@lists.linux-foundation.org \
--cc=ishii.hironobu@jp.fujitsu.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox