All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: vgoyal@redhat.com
Cc: alex.williamson@redhat.com, linux-pci@vger.kernel.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	hbabu@us.ibm.com, iommu@lists.linux-foundation.org,
	ddutile@redhat.com, ishii.hironobu@jp.fujitsu.com,
	bhelgaas@google.com, bill.sumner@hp.com
Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA
Date: Mon, 29 Jul 2013 09:20:21 +0900	[thread overview]
Message-ID: <51F5B545.5050300@jp.fujitsu.com> (raw)
In-Reply-To: <20130725142446.GK11993@redhat.com>

(2013/07/25 23:24), Vivek Goyal wrote:
> On Wed, Jul 24, 2013 at 03:29:58PM +0900, Takao Indoh wrote:
>> Sorry for letting this discussion slide, I was busy on other works:-(
>> Anyway, the summary of previous discussion is:
>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>>    boot. This expects PCI enumeration is done before IOMMU
>>    initialization as follows.
>>      (1) PCI enumeration
>>      (2) fs_initcall ---> device reset
>>      (3) IOMMU initialization
>> - This works on x86, but does not work on other architecture because
>>    IOMMU is initialized before PCI enumeration on some architectures. So,
>>    device reset should be done where IOMMU is initialized instead of
>>    initcall.
>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>
>> Resetting devices in panic kernel is against kdump policy and seems not to
>> be good idea. So I think adding reset code into iommu initialization is
>> better. I'll post patches for that.
> 
> I don't understand all the details but I agree that idea of trying to
> reset IOMMU in crashed kernel might not fly.
> 
>>
>> Another discussion point is how to handle buggy devices. Resetting buggy
>> devices makes system more unstable. One of ideas is using boot parameter
>> so that user can choose to reset devices or not.
> 
> So who would decide which device is buggy and don't reset it. Give
> some details here.

I found the case that kdump does not work after resetting devices and
it works when removing reset patch. The cause of problem is a bug of
PCIe switch chip. If there is boot parameter not to reset devices,
user can use it as workaround.

I think in this case we should add PCI quirk to avoid this buggy
hardware, but we need to wait errata from vendor and it basically takes
long time.

> 
> Can't we simply blacklist associated module, so that it never loads
> and then it never tries to reset the devices?
> 

So you mean that device reset should be done on its driver loading?

Thanks,
Takao Indoh


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Takao Indoh <indou.takao-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	hbabu-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	ishii.hironobu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	bill.sumner-VXdhtT5mjnY@public.gmane.org
Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA
Date: Mon, 29 Jul 2013 09:20:21 +0900	[thread overview]
Message-ID: <51F5B545.5050300@jp.fujitsu.com> (raw)
In-Reply-To: <20130725142446.GK11993-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

(2013/07/25 23:24), Vivek Goyal wrote:
> On Wed, Jul 24, 2013 at 03:29:58PM +0900, Takao Indoh wrote:
>> Sorry for letting this discussion slide, I was busy on other works:-(
>> Anyway, the summary of previous discussion is:
>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>>    boot. This expects PCI enumeration is done before IOMMU
>>    initialization as follows.
>>      (1) PCI enumeration
>>      (2) fs_initcall ---> device reset
>>      (3) IOMMU initialization
>> - This works on x86, but does not work on other architecture because
>>    IOMMU is initialized before PCI enumeration on some architectures. So,
>>    device reset should be done where IOMMU is initialized instead of
>>    initcall.
>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>
>> Resetting devices in panic kernel is against kdump policy and seems not to
>> be good idea. So I think adding reset code into iommu initialization is
>> better. I'll post patches for that.
> 
> I don't understand all the details but I agree that idea of trying to
> reset IOMMU in crashed kernel might not fly.
> 
>>
>> Another discussion point is how to handle buggy devices. Resetting buggy
>> devices makes system more unstable. One of ideas is using boot parameter
>> so that user can choose to reset devices or not.
> 
> So who would decide which device is buggy and don't reset it. Give
> some details here.

I found the case that kdump does not work after resetting devices and
it works when removing reset patch. The cause of problem is a bug of
PCIe switch chip. If there is boot parameter not to reset devices,
user can use it as workaround.

I think in this case we should add PCI quirk to avoid this buggy
hardware, but we need to wait errata from vendor and it basically takes
long time.

> 
> Can't we simply blacklist associated module, so that it never loads
> and then it never tries to reset the devices?
> 

So you mean that device reset should be done on its driver loading?

Thanks,
Takao Indoh

WARNING: multiple messages have this Message-ID (diff)
From: Takao Indoh <indou.takao@jp.fujitsu.com>
To: vgoyal@redhat.com
Cc: bhelgaas@google.com, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, iommu@lists.linux-foundation.org,
	kexec@lists.infradead.org, ishii.hironobu@jp.fujitsu.com,
	ddutile@redhat.com, bill.sumner@hp.com,
	alex.williamson@redhat.com, hbabu@us.ibm.com
Subject: Re: [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA
Date: Mon, 29 Jul 2013 09:20:21 +0900	[thread overview]
Message-ID: <51F5B545.5050300@jp.fujitsu.com> (raw)
In-Reply-To: <20130725142446.GK11993@redhat.com>

(2013/07/25 23:24), Vivek Goyal wrote:
> On Wed, Jul 24, 2013 at 03:29:58PM +0900, Takao Indoh wrote:
>> Sorry for letting this discussion slide, I was busy on other works:-(
>> Anyway, the summary of previous discussion is:
>> - My patch adds new initcall(fs_initcall) to reset all PCIe endpoints on
>>    boot. This expects PCI enumeration is done before IOMMU
>>    initialization as follows.
>>      (1) PCI enumeration
>>      (2) fs_initcall ---> device reset
>>      (3) IOMMU initialization
>> - This works on x86, but does not work on other architecture because
>>    IOMMU is initialized before PCI enumeration on some architectures. So,
>>    device reset should be done where IOMMU is initialized instead of
>>    initcall.
>> - Or, as another idea, we can reset devices in first kernel(panic kernel)
>>
>> Resetting devices in panic kernel is against kdump policy and seems not to
>> be good idea. So I think adding reset code into iommu initialization is
>> better. I'll post patches for that.
> 
> I don't understand all the details but I agree that idea of trying to
> reset IOMMU in crashed kernel might not fly.
> 
>>
>> Another discussion point is how to handle buggy devices. Resetting buggy
>> devices makes system more unstable. One of ideas is using boot parameter
>> so that user can choose to reset devices or not.
> 
> So who would decide which device is buggy and don't reset it. Give
> some details here.

I found the case that kdump does not work after resetting devices and
it works when removing reset patch. The cause of problem is a bug of
PCIe switch chip. If there is boot parameter not to reset devices,
user can use it as workaround.

I think in this case we should add PCI quirk to avoid this buggy
hardware, but we need to wait errata from vendor and it basically takes
long time.

> 
> Can't we simply blacklist associated module, so that it never loads
> and then it never tries to reset the devices?
> 

So you mean that device reset should be done on its driver loading?

Thanks,
Takao Indoh


  reply	other threads:[~2013-07-29  0:21 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-14  5:29 [PATCH v2] PCI: Reset PCIe devices to stop ongoing DMA Takao Indoh
2013-05-14  5:29 ` Takao Indoh
2013-05-14 22:04 ` Eric W. Biederman
2013-05-14 22:04   ` Eric W. Biederman
2013-05-14 22:04   ` Eric W. Biederman
2013-05-21 23:46   ` Takao Indoh
2013-05-21 23:46     ` Takao Indoh
2013-06-06  7:25 ` Takao Indoh
2013-06-06  7:25   ` Takao Indoh
2013-06-06  7:25   ` Takao Indoh
2013-06-07  4:14 ` Bjorn Helgaas
2013-06-07  4:14   ` Bjorn Helgaas
2013-06-07  8:46   ` Takao Indoh
2013-06-07  8:46     ` Takao Indoh
2013-06-07  8:46     ` Takao Indoh
2013-06-11  2:20     ` Bjorn Helgaas
2013-06-11  2:20       ` Bjorn Helgaas
2013-06-11  6:08       ` Takao Indoh
2013-06-11  6:08         ` Takao Indoh
2013-06-11  6:08         ` Takao Indoh
2013-06-11 23:19         ` Sumner, William
2013-06-11 23:19           ` Sumner, William
2013-06-12  0:53           ` Bjorn Helgaas
2013-06-12  0:53             ` Bjorn Helgaas
2013-06-12 13:19           ` Don Dutile
2013-06-12 13:19             ` Don Dutile
2013-06-13  3:25             ` Takao Indoh
2013-06-13  3:25               ` Takao Indoh
2013-06-12  4:45         ` Bjorn Helgaas
2013-06-12  4:45           ` Bjorn Helgaas
2013-06-12  4:45           ` Bjorn Helgaas
2013-06-13  2:44           ` Takao Indoh
2013-06-13  2:44             ` Takao Indoh
2013-06-13  2:44             ` Takao Indoh
2013-06-13  3:41             ` Bjorn Helgaas
2013-06-13  3:41               ` Bjorn Helgaas
2013-06-13  3:41               ` Bjorn Helgaas
2013-06-14  2:11               ` Takao Indoh
2013-06-14  2:11                 ` Takao Indoh
2013-06-14  2:11                 ` Takao Indoh
     [not found]                 ` <51BA7BB6.1080104-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2013-06-25  8:41                   ` Dave Young
2013-07-24  6:29                 ` Takao Indoh
2013-07-24  6:29                   ` Takao Indoh
2013-07-25 14:24                   ` Vivek Goyal
2013-07-25 14:24                     ` Vivek Goyal
2013-07-25 14:24                     ` Vivek Goyal
2013-07-29  0:20                     ` Takao Indoh [this message]
2013-07-29  0:20                       ` Takao Indoh
2013-07-29  0:20                       ` Takao Indoh
2013-07-25 17:00                   ` Bjorn Helgaas
2013-07-25 17:00                     ` Bjorn Helgaas
2013-07-29  0:37                     ` Takao Indoh
2013-07-29  0:37                       ` Takao Indoh
2013-07-29 14:17                       ` Bjorn Helgaas
2013-07-29 14:17                         ` Bjorn Helgaas
2013-07-29 14:17                         ` Bjorn Helgaas
2013-07-30  6:09                         ` Takao Indoh
2013-07-30  6:09                           ` Takao Indoh
2013-07-30  6:09                           ` Takao Indoh
2013-07-30 15:59                           ` Bjorn Helgaas
2013-07-30 15:59                             ` Bjorn Helgaas
2013-07-31  0:35                             ` Takao Indoh
2013-07-31  0:35                               ` Takao Indoh
2013-07-31  3:11                               ` Alex Williamson
2013-07-31  3:11                                 ` Alex Williamson
2013-07-31  3:11                                 ` Alex Williamson
2013-07-31  5:50                                 ` Takao Indoh
2013-07-31  5:50                                   ` Takao Indoh
2013-07-31  5:50                                   ` Takao Indoh
2013-07-31 21:08                               ` Bjorn Helgaas
2013-07-31 21:08                                 ` Bjorn Helgaas
2013-07-31 21:23                                 ` Rafael J. Wysocki
2013-07-31 21:23                                   ` Rafael J. Wysocki
2013-07-31 21:23                                   ` Rafael J. Wysocki
2013-08-01  6:34                                   ` Takao Indoh
2013-08-01  6:34                                     ` Takao Indoh
2013-08-01 12:42                                     ` Alex Williamson
2013-08-01 12:42                                       ` Alex Williamson
2013-08-01 12:42                                       ` Alex Williamson
2013-08-01 13:20                                     ` Vivek Goyal
2013-08-01 13:20                                       ` Vivek Goyal
2013-08-01 13:20                                       ` Vivek Goyal
2013-07-31 19:56                             ` Vivek Goyal
2013-07-31 19:56                               ` Vivek Goyal
2013-07-31 19:56                               ` Vivek Goyal
2013-07-31 16:09                     ` Vivek Goyal
2013-07-31 16:09                       ` Vivek Goyal
2013-07-31 16:09                       ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51F5B545.5050300@jp.fujitsu.com \
    --to=indou.takao@jp.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=bill.sumner@hp.com \
    --cc=ddutile@redhat.com \
    --cc=hbabu@us.ibm.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=ishii.hironobu@jp.fujitsu.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.