Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
       [not found]   ` <b62c2a8e-fe14-6d7d-147c-0ce3b0c0ab2f@codeaurora.org>
@ 2018-04-27 21:12     ` Bjorn Helgaas
  2018-04-28  0:56       ` Dave Young
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2018-04-27 21:12 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Lukas Wunner,
	Eric Biederman, Bjorn Helgaas, Vivek Goyal

[+cc Eric, Vivek, kexec list]

On Fri, Apr 27, 2018 at 03:34:30PM -0400, Sinan Kaya wrote:
> On 4/27/2018 3:22 PM, Bjorn Helgaas wrote:
> > Sinan mooted the idea of using a "no-wait" path of sending the "don't
> > generate hotplug interrupts" command.  I think we should work on this
> > idea a little more.  If we're shutting down the whole system, I can't
> > believe there's much value in *anything* we do in the pciehp_remove()
> > path.
> > 
> > Maybe we should just get rid of pciehp_remove() (and probably
> > pcie_port_remove_service() and the other service driver remove methods)
> > completely.  That dates from when the service drivers could be modules that
> > could be potentially unloaded, but unloading them hasn't been possible for
> > years.
> 
> Shutdown path is also used for kexec. Leaving hotplug interrupts
> pending is dangerous for the newly loaded kernel as it leaves
> spurious interrupts during the new kernel boot.
> 
> I think we should always disable the hotplug interrupt on shutdown.
> We might think of not waiting for command-completion as a
> middle-ground or go to polling path instead of interrupts all the
> time.

Ah, I forgot about the kexec path.  The kexec path is used for
crashdump, too, so ideally the newly-loaded kernel would defend itself
when possible so it doesn't depend on the original kernel doing things
correctly.

Seems like this question of whether to do things in the original
kernel or the kexec-ed kernel comes up periodically, but I can never
remember a definitive answer.  My initial reaction is that it'd be
nice if we didn't have to do *any* shutdown in the original kernel,
but I'm sure there are reasons that's not practical.

I copied Eric (kexec maintainer) and Vivek (contact listed in
Documentation/kdump/kdump.txt) in case they have suggestions or would
consider some sort of Documentation/ update.

Bjorn

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-27 21:12     ` pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Bjorn Helgaas
@ 2018-04-28  0:56       ` Dave Young
  2018-04-28  1:18         ` Dave Young
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Young @ 2018-04-28  0:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Sinan Kaya,
	Lukas Wunner, Eric Biederman, Bjorn Helgaas, Vivek Goyal

On 04/27/18 at 04:12pm, Bjorn Helgaas wrote:
> [+cc Eric, Vivek, kexec list]
> 
> On Fri, Apr 27, 2018 at 03:34:30PM -0400, Sinan Kaya wrote:
> > On 4/27/2018 3:22 PM, Bjorn Helgaas wrote:
> > > Sinan mooted the idea of using a "no-wait" path of sending the "don't
> > > generate hotplug interrupts" command.  I think we should work on this
> > > idea a little more.  If we're shutting down the whole system, I can't
> > > believe there's much value in *anything* we do in the pciehp_remove()
> > > path.
> > > 
> > > Maybe we should just get rid of pciehp_remove() (and probably
> > > pcie_port_remove_service() and the other service driver remove methods)
> > > completely.  That dates from when the service drivers could be modules that
> > > could be potentially unloaded, but unloading them hasn't been possible for
> > > years.
> > 
> > Shutdown path is also used for kexec. Leaving hotplug interrupts
> > pending is dangerous for the newly loaded kernel as it leaves
> > spurious interrupts during the new kernel boot.
> > 
> > I think we should always disable the hotplug interrupt on shutdown.
> > We might think of not waiting for command-completion as a
> > middle-ground or go to polling path instead of interrupts all the
> > time.
> 
> Ah, I forgot about the kexec path.  The kexec path is used for
> crashdump, too, so ideally the newly-loaded kernel would defend itself
> when possible so it doesn't depend on the original kernel doing things
> correctly.

It is true for kdump.  But kexec needs device shutdown.

> 
> Seems like this question of whether to do things in the original
> kernel or the kexec-ed kernel comes up periodically, but I can never
> remember a definitive answer.  My initial reaction is that it'd be
> nice if we didn't have to do *any* shutdown in the original kernel,
> but I'm sure there are reasons that's not practical.

Devices sometimes assume it is in a good state initialized in firmware boot
phase, so we need a shutdown in 1st kernel so that kexec kernel can boot
correctly for those devices.  For kdump since kernel already panicked
and it is not reliable so we do as less as we can in the 1st kernel
crash path, but there are some special handling for kdump in various drivers
to reset the devices in 2nd kernel, eg. when it see "reset_devices" kernel parameter.

> 
> I copied Eric (kexec maintainer) and Vivek (contact listed in
> Documentation/kdump/kdump.txt) in case they have suggestions or would
> consider some sort of Documentation/ update.
> 
> Bjorn
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-28  0:56       ` Dave Young
@ 2018-04-28  1:18         ` Dave Young
  2018-04-28 13:03           ` okaya
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Young @ 2018-04-28  1:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Sinan Kaya,
	Lukas Wunner, Eric Biederman, Bjorn Helgaas, Vivek Goyal

On 04/28/18 at 08:56am, Dave Young wrote:
> On 04/27/18 at 04:12pm, Bjorn Helgaas wrote:
> > [+cc Eric, Vivek, kexec list]
> > 
> > On Fri, Apr 27, 2018 at 03:34:30PM -0400, Sinan Kaya wrote:
> > > On 4/27/2018 3:22 PM, Bjorn Helgaas wrote:
> > > > Sinan mooted the idea of using a "no-wait" path of sending the "don't
> > > > generate hotplug interrupts" command.  I think we should work on this
> > > > idea a little more.  If we're shutting down the whole system, I can't
> > > > believe there's much value in *anything* we do in the pciehp_remove()
> > > > path.
> > > > 
> > > > Maybe we should just get rid of pciehp_remove() (and probably
> > > > pcie_port_remove_service() and the other service driver remove methods)
> > > > completely.  That dates from when the service drivers could be modules that

Hmm, if it is the remove() method then kexec does not use it.  kexec use
the shutdown() method instead.  I missed this details when I replied.

> > > > could be potentially unloaded, but unloading them hasn't been possible for
> > > > years.
> > > 
> > > Shutdown path is also used for kexec. Leaving hotplug interrupts
> > > pending is dangerous for the newly loaded kernel as it leaves
> > > spurious interrupts during the new kernel boot.
> > > 
> > > I think we should always disable the hotplug interrupt on shutdown.
> > > We might think of not waiting for command-completion as a
> > > middle-ground or go to polling path instead of interrupts all the
> > > time.
> > 
> > Ah, I forgot about the kexec path.  The kexec path is used for
> > crashdump, too, so ideally the newly-loaded kernel would defend itself
> > when possible so it doesn't depend on the original kernel doing things
> > correctly.
> 
> It is true for kdump.  But kexec needs device shutdown.
> 
> > 
> > Seems like this question of whether to do things in the original
> > kernel or the kexec-ed kernel comes up periodically, but I can never
> > remember a definitive answer.  My initial reaction is that it'd be
> > nice if we didn't have to do *any* shutdown in the original kernel,
> > but I'm sure there are reasons that's not practical.
> 
> Devices sometimes assume it is in a good state initialized in firmware boot
> phase, so we need a shutdown in 1st kernel so that kexec kernel can boot
> correctly for those devices.  For kdump since kernel already panicked
> and it is not reliable so we do as less as we can in the 1st kernel
> crash path, but there are some special handling for kdump in various drivers
> to reset the devices in 2nd kernel, eg. when it see "reset_devices" kernel parameter.
> 
> > 
> > I copied Eric (kexec maintainer) and Vivek (contact listed in
> > Documentation/kdump/kdump.txt) in case they have suggestions or would
> > consider some sort of Documentation/ update.
> > 
> > Bjorn
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> 
> Thanks
> Dave
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-28  1:18         ` Dave Young
@ 2018-04-28 13:03           ` okaya
  2018-04-30 20:48             ` Sinan Kaya
  0 siblings, 1 reply; 12+ messages in thread
From: okaya @ 2018-04-28 13:03 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Lukas Wunner,
	Bjorn Helgaas, Eric Biederman, Bjorn Helgaas, Vivek Goyal

On 2018-04-27 21:18, Dave Young wrote:
> On 04/28/18 at 08:56am, Dave Young wrote:
>> On 04/27/18 at 04:12pm, Bjorn Helgaas wrote:
>> > [+cc Eric, Vivek, kexec list]
>> >
>> > On Fri, Apr 27, 2018 at 03:34:30PM -0400, Sinan Kaya wrote:
>> > > On 4/27/2018 3:22 PM, Bjorn Helgaas wrote:
>> > > > Sinan mooted the idea of using a "no-wait" path of sending the "don't
>> > > > generate hotplug interrupts" command.  I think we should work on this
>> > > > idea a little more.  If we're shutting down the whole system, I can't
>> > > > believe there's much value in *anything* we do in the pciehp_remove()
>> > > > path.
>> > > >
>> > > > Maybe we should just get rid of pciehp_remove() (and probably
>> > > > pcie_port_remove_service() and the other service driver remove methods)
>> > > > completely.  That dates from when the service drivers could be modules that
> 
> Hmm, if it is the remove() method then kexec does not use it.  kexec 
> use
> the shutdown() method instead.  I missed this details when I replied.

Portdrv hooks up remove handler to shutdown. That's why remove is 
getting called.

> 
>> > > > could be potentially unloaded, but unloading them hasn't been possible for
>> > > > years.
>> > >
>> > > Shutdown path is also used for kexec. Leaving hotplug interrupts
>> > > pending is dangerous for the newly loaded kernel as it leaves
>> > > spurious interrupts during the new kernel boot.
>> > >
>> > > I think we should always disable the hotplug interrupt on shutdown.
>> > > We might think of not waiting for command-completion as a
>> > > middle-ground or go to polling path instead of interrupts all the
>> > > time.
>> >
>> > Ah, I forgot about the kexec path.  The kexec path is used for
>> > crashdump, too, so ideally the newly-loaded kernel would defend itself
>> > when possible so it doesn't depend on the original kernel doing things
>> > correctly.
>> 
>> It is true for kdump.  But kexec needs device shutdown.
>> 
>> >
>> > Seems like this question of whether to do things in the original
>> > kernel or the kexec-ed kernel comes up periodically, but I can never
>> > remember a definitive answer.  My initial reaction is that it'd be
>> > nice if we didn't have to do *any* shutdown in the original kernel,
>> > but I'm sure there are reasons that's not practical.
>> 
>> Devices sometimes assume it is in a good state initialized in firmware 
>> boot
>> phase, so we need a shutdown in 1st kernel so that kexec kernel can 
>> boot
>> correctly for those devices.  For kdump since kernel already panicked
>> and it is not reliable so we do as less as we can in the 1st kernel
>> crash path, but there are some special handling for kdump in various 
>> drivers
>> to reset the devices in 2nd kernel, eg. when it see "reset_devices" 
>> kernel parameter.
>> 
>> >
>> > I copied Eric (kexec maintainer) and Vivek (contact listed in
>> > Documentation/kdump/kdump.txt) in case they have suggestions or would
>> > consider some sort of Documentation/ update.
>> >
>> > Bjorn
>> >
>> > _______________________________________________
>> > kexec mailing list
>> > kexec@lists.infradead.org
>> > http://lists.infradead.org/mailman/listinfo/kexec
>> 
>> Thanks
>> Dave
>> 
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-28 13:03           ` okaya
@ 2018-04-30 20:48             ` Sinan Kaya
  2018-04-30 21:17               ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Sinan Kaya @ 2018-04-30 20:48 UTC (permalink / raw)
  To: Paul Menzel
  Cc: linux-pci, kexec, linux-kernel, Lukas Wunner, Bjorn Helgaas,
	Eric Biederman, Bjorn Helgaas, Dave Young, Vivek Goyal

Bjorn,

On 4/28/2018 9:03 AM, okaya@codeaurora.org wrote:
>> Hmm, if it is the remove() method then kexec does not use it.  kexec use
>> the shutdown() method instead.  I missed this details when I replied.
> 
> Portdrv hooks up remove handler to shutdown. That's why remove is getting called.

What should we do about this?

Since there is an actual HW errata involved, should we quirk this root port and
not wait as if remove/shutdown doesn't exist?

Paul,
You might want to file a bugzilla so that we can keep our debug efforts out of this
list.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-30 20:48             ` Sinan Kaya
@ 2018-04-30 21:17               ` Bjorn Helgaas
  2018-04-30 21:27                 ` Sinan Kaya
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2018-04-30 21:17 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Lukas Wunner,
	Eric Biederman, Bjorn Helgaas, Dave Young, Vivek Goyal

On Mon, Apr 30, 2018 at 04:48:15PM -0400, Sinan Kaya wrote:
> Bjorn,
> 
> On 4/28/2018 9:03 AM, okaya@codeaurora.org wrote:
> >> Hmm, if it is the remove() method then kexec does not use it.  kexec use
> >> the shutdown() method instead.  I missed this details when I replied.
> > 
> > Portdrv hooks up remove handler to shutdown. That's why remove is getting called.
> 
> What should we do about this?
> 
> Since there is an actual HW errata involved, should we quirk this
> root port and not wait as if remove/shutdown doesn't exist?

I was hoping to avoid a quirk because AFAIK all Intel parts have this
issue so it will be an ongoing maintenance issue.  I tried to avoid
the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
timeout from hotplug command start time").

But we still see the alarming messages, so we should probably add a
quirk to get rid of those.

But I haven't given up on the idea of getting rid of the
pciehp_remove() path.  I'm not convinced yet that we actually need to
do anything to shut this device down.  I don't like the assumption
that kexec requires this.  The kexec is fundamentally just a branch,
and anything we do before the branch (i.e., in the old kernel), we
should also be able to do after the branch (i.e., in the kexec-ed
kernel).

> Paul,
> You might want to file a bugzilla so that we can keep our debug
> efforts out of this list.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-30 21:17               ` Bjorn Helgaas
@ 2018-04-30 21:27                 ` Sinan Kaya
  2018-05-01 12:38                   ` Sinan Kaya
  0 siblings, 1 reply; 12+ messages in thread
From: Sinan Kaya @ 2018-04-30 21:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Lukas Wunner,
	Eric Biederman, Bjorn Helgaas, Dave Young, Vivek Goyal

On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
>> What should we do about this?
>>
>> Since there is an actual HW errata involved, should we quirk this
>> root port and not wait as if remove/shutdown doesn't exist?
> I was hoping to avoid a quirk because AFAIK all Intel parts have this
> issue so it will be an ongoing maintenance issue.  I tried to avoid
> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
> timeout from hotplug command start time").
> 
> But we still see the alarming messages, so we should probably add a
> quirk to get rid of those.
> 
> But I haven't given up on the idea of getting rid of the
> pciehp_remove() path.  I'm not convinced yet that we actually need to
> do anything to shut this device down.  I don't like the assumption
> that kexec requires this.  The kexec is fundamentally just a branch,
> and anything we do before the branch (i.e., in the old kernel), we
> should also be able to do after the branch (i.e., in the kexec-ed
> kernel).
> 

In my experience with kexec, MSI type edge interrupts are harmless.
You might just see a few unhandled interrupt messages during boot
if something is pending from the first kernel.

It is the level interrupts that are more concerning. It remains pending
until the interrupt source is cleared. CPU never returns from the
interrupt handler to actually continue booting the second kernel.

Execution doesn't reach to PCIe hp driver initialization for
acknowledging the interrupt.

How about remove() only if MSI is disabled? Most root port interrupts
are MSI based anyhow.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-04-30 21:27                 ` Sinan Kaya
@ 2018-05-01 12:38                   ` Sinan Kaya
  2018-05-01 12:59                     ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Sinan Kaya @ 2018-05-01 12:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Marc Zyngier, linux-pci, Paul Menzel, kexec, linux-kernel,
	Lukas Wunner, Eric Biederman, Bjorn Helgaas, Dave Young,
	Vivek Goyal

+Marc,

On 4/30/2018 5:27 PM, Sinan Kaya wrote:
> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
>>> What should we do about this?
>>>
>>> Since there is an actual HW errata involved, should we quirk this
>>> root port and not wait as if remove/shutdown doesn't exist?
>> I was hoping to avoid a quirk because AFAIK all Intel parts have this
>> issue so it will be an ongoing maintenance issue.  I tried to avoid
>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
>> timeout from hotplug command start time").
>>
>> But we still see the alarming messages, so we should probably add a
>> quirk to get rid of those.
>>
>> But I haven't given up on the idea of getting rid of the
>> pciehp_remove() path.  I'm not convinced yet that we actually need to
>> do anything to shut this device down.  I don't like the assumption
>> that kexec requires this.  The kexec is fundamentally just a branch,
>> and anything we do before the branch (i.e., in the old kernel), we
>> should also be able to do after the branch (i.e., in the kexec-ed
>> kernel).
>>
> 
> In my experience with kexec, MSI type edge interrupts are harmless.
> You might just see a few unhandled interrupt messages during boot
> if something is pending from the first kernel.
> 
> It is the level interrupts that are more concerning. It remains pending
> until the interrupt source is cleared. CPU never returns from the
> interrupt handler to actually continue booting the second kernel.

This makes me wonder why kexec doesn't disable all interrupt sources by
itself instead of relying on the drivers shutdown routine. Some drivers
don't even have a shutdown callback. Kexec could have done both as another
example. Something like.

1. Call shutdown for all drivers if available.
2. Disable all interrupt sources in the interrupt controller
3. Start the new kernel.

> 
> Execution doesn't reach to PCIe hp driver initialization for
> acknowledging the interrupt.
> 
> How about remove() only if MSI is disabled? Most root port interrupts
> are MSI based anyhow.
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-05-01 12:38                   ` Sinan Kaya
@ 2018-05-01 12:59                     ` Marc Zyngier
  2018-05-01 13:25                       ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2018-05-01 12:59 UTC (permalink / raw)
  To: Sinan Kaya, Bjorn Helgaas
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Lukas Wunner,
	Eric Biederman, Bjorn Helgaas, Dave Young, Vivek Goyal

On 01/05/18 13:38, Sinan Kaya wrote:
> +Marc,
> 
> On 4/30/2018 5:27 PM, Sinan Kaya wrote:
>> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
>>>> What should we do about this?
>>>>
>>>> Since there is an actual HW errata involved, should we quirk this
>>>> root port and not wait as if remove/shutdown doesn't exist?
>>> I was hoping to avoid a quirk because AFAIK all Intel parts have this
>>> issue so it will be an ongoing maintenance issue.  I tried to avoid
>>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
>>> timeout from hotplug command start time").
>>>
>>> But we still see the alarming messages, so we should probably add a
>>> quirk to get rid of those.
>>>
>>> But I haven't given up on the idea of getting rid of the
>>> pciehp_remove() path.  I'm not convinced yet that we actually need to
>>> do anything to shut this device down.  I don't like the assumption
>>> that kexec requires this.  The kexec is fundamentally just a branch,
>>> and anything we do before the branch (i.e., in the old kernel), we
>>> should also be able to do after the branch (i.e., in the kexec-ed
>>> kernel).
>>>
>>
>> In my experience with kexec, MSI type edge interrupts are harmless.
>> You might just see a few unhandled interrupt messages during boot
>> if something is pending from the first kernel.

Unfortunately, that's not always the case.

A number of GICv3/v4 implementations (a very common interrupt controller
on ARM servers) cannot be disabled, which means they will keep writing
to their pending tables long after kexec will have started the new
kernel. And since we don't track memory allocation across kexec, you
end-up with significant chances of observing single bit corruption as
interrupts carry on being delivered. Oh, and you won't actually be able
to take MSIs because you can't even reprogram the damn thing.

Yes, this can be considered a HW bug.

>> It is the level interrupts that are more concerning. It remains pending
>> until the interrupt source is cleared. CPU never returns from the
>> interrupt handler to actually continue booting the second kernel.
> 
> This makes me wonder why kexec doesn't disable all interrupt sources by
> itself instead of relying on the drivers shutdown routine. Some drivers
> don't even have a shutdown callback. Kexec could have done both as another
> example. Something like.
> 
> 1. Call shutdown for all drivers if available.
> 2. Disable all interrupt sources in the interrupt controller
> 3. Start the new kernel.

See above. Although you can shut off the end-point and to some extent
mask interrupts before jumping into the payload, it is not always
possible to go back to a reasonable state where you can take actually MSIs.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-05-01 12:59                     ` Marc Zyngier
@ 2018-05-01 13:25                       ` Bjorn Helgaas
  2018-05-01 16:31                         ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2018-05-01 13:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Sinan Kaya,
	Lukas Wunner, Eric Biederman, Bjorn Helgaas, Dave Young,
	Vivek Goyal

On Tue, May 01, 2018 at 01:59:20PM +0100, Marc Zyngier wrote:
> On 01/05/18 13:38, Sinan Kaya wrote:
> > +Marc,
> > 
> > On 4/30/2018 5:27 PM, Sinan Kaya wrote:
> >> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
> >>>> What should we do about this?
> >>>>
> >>>> Since there is an actual HW errata involved, should we quirk this
> >>>> root port and not wait as if remove/shutdown doesn't exist?
> >>> I was hoping to avoid a quirk because AFAIK all Intel parts have this
> >>> issue so it will be an ongoing maintenance issue.  I tried to avoid
> >>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
> >>> timeout from hotplug command start time").
> >>>
> >>> But we still see the alarming messages, so we should probably add a
> >>> quirk to get rid of those.
> >>>
> >>> But I haven't given up on the idea of getting rid of the
> >>> pciehp_remove() path.  I'm not convinced yet that we actually need to
> >>> do anything to shut this device down.  I don't like the assumption
> >>> that kexec requires this.  The kexec is fundamentally just a branch,
> >>> and anything we do before the branch (i.e., in the old kernel), we
> >>> should also be able to do after the branch (i.e., in the kexec-ed
> >>> kernel).
> >>>
> >>
> >> In my experience with kexec, MSI type edge interrupts are harmless.
> >> You might just see a few unhandled interrupt messages during boot
> >> if something is pending from the first kernel.
> 
> Unfortunately, that's not always the case.
> 
> A number of GICv3/v4 implementations (a very common interrupt controller
> on ARM servers) cannot be disabled, which means they will keep writing
> to their pending tables long after kexec will have started the new
> kernel. And since we don't track memory allocation across kexec, you
> end-up with significant chances of observing single bit corruption as
> interrupts carry on being delivered. Oh, and you won't actually be able
> to take MSIs because you can't even reprogram the damn thing.
> 
> Yes, this can be considered a HW bug.
> 
> >> It is the level interrupts that are more concerning. It remains pending
> >> until the interrupt source is cleared. CPU never returns from the
> >> interrupt handler to actually continue booting the second kernel.
> > 
> > This makes me wonder why kexec doesn't disable all interrupt sources by
> > itself instead of relying on the drivers shutdown routine. Some drivers
> > don't even have a shutdown callback. Kexec could have done both as another
> > example. Something like.
> > 
> > 1. Call shutdown for all drivers if available.
> > 2. Disable all interrupt sources in the interrupt controller
> > 3. Start the new kernel.
> 
> See above. Although you can shut off the end-point and to some extent
> mask interrupts before jumping into the payload, it is not always
> possible to go back to a reasonable state where you can take actually MSIs.

This is exactly the sort of thing it would be nice to collect and
document as part of the background of "why kexec works the way it
does."  It certainly helps explain things that are far from obvious if
you don't have the background.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-05-01 13:25                       ` Bjorn Helgaas
@ 2018-05-01 16:31                         ` Marc Zyngier
  2018-05-01 22:32                           ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2018-05-01 16:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Sinan Kaya,
	Lukas Wunner, Eric Biederman, Bjorn Helgaas, Dave Young,
	Vivek Goyal

On Tue, 01 May 2018 14:25:54 +0100,
Bjorn Helgaas wrote:

Hi Bjorn,

> On Tue, May 01, 2018 at 01:59:20PM +0100, Marc Zyngier wrote:
> > On 01/05/18 13:38, Sinan Kaya wrote:
> > > +Marc,
> > > 
> > > On 4/30/2018 5:27 PM, Sinan Kaya wrote:
> > >> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
> > >>>> What should we do about this?
> > >>>>
> > >>>> Since there is an actual HW errata involved, should we quirk this
> > >>>> root port and not wait as if remove/shutdown doesn't exist?
> > >>> I was hoping to avoid a quirk because AFAIK all Intel parts have this
> > >>> issue so it will be an ongoing maintenance issue.  I tried to avoid
> > >>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
> > >>> timeout from hotplug command start time").
> > >>>
> > >>> But we still see the alarming messages, so we should probably add a
> > >>> quirk to get rid of those.
> > >>>
> > >>> But I haven't given up on the idea of getting rid of the
> > >>> pciehp_remove() path.  I'm not convinced yet that we actually need to
> > >>> do anything to shut this device down.  I don't like the assumption
> > >>> that kexec requires this.  The kexec is fundamentally just a branch,
> > >>> and anything we do before the branch (i.e., in the old kernel), we
> > >>> should also be able to do after the branch (i.e., in the kexec-ed
> > >>> kernel).
> > >>>
> > >>
> > >> In my experience with kexec, MSI type edge interrupts are harmless.
> > >> You might just see a few unhandled interrupt messages during boot
> > >> if something is pending from the first kernel.
> > 
> > Unfortunately, that's not always the case.
> > 
> > A number of GICv3/v4 implementations (a very common interrupt controller
> > on ARM servers) cannot be disabled, which means they will keep writing
> > to their pending tables long after kexec will have started the new
> > kernel. And since we don't track memory allocation across kexec, you
> > end-up with significant chances of observing single bit corruption as
> > interrupts carry on being delivered. Oh, and you won't actually be able
> > to take MSIs because you can't even reprogram the damn thing.
> > 
> > Yes, this can be considered a HW bug.
> > 
> > >> It is the level interrupts that are more concerning. It remains pending
> > >> until the interrupt source is cleared. CPU never returns from the
> > >> interrupt handler to actually continue booting the second kernel.
> > > 
> > > This makes me wonder why kexec doesn't disable all interrupt sources by
> > > itself instead of relying on the drivers shutdown routine. Some drivers
> > > don't even have a shutdown callback. Kexec could have done both as another
> > > example. Something like.
> > > 
> > > 1. Call shutdown for all drivers if available.
> > > 2. Disable all interrupt sources in the interrupt controller
> > > 3. Start the new kernel.
> > 
> > See above. Although you can shut off the end-point and to some extent
> > mask interrupts before jumping into the payload, it is not always
> > possible to go back to a reasonable state where you can take actually MSIs.
> 
> This is exactly the sort of thing it would be nice to collect and
> document as part of the background of "why kexec works the way it
> does."  It certainly helps explain things that are far from obvious if
> you don't have the background.

I'd certainly be happy to help with it if someone was willing to
kickstart such a document. kexec/kdump is a huge bag of "interesting"
tricks, and it has driven me mad over the past couple of months (I'm
typing this from a laptop that uses kexec as its bootloader, and it is
*not fun*).

	M.

-- 
Jazz is not dead, it just smell funny.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
  2018-05-01 16:31                         ` Marc Zyngier
@ 2018-05-01 22:32                           ` Eric W. Biederman
  0 siblings, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 2018-05-01 22:32 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-pci, Paul Menzel, kexec, linux-kernel, Sinan Kaya,
	Lukas Wunner, Bjorn Helgaas, Bjorn Helgaas, Dave Young,
	Vivek Goyal

Marc Zyngier <marc.zyngier@arm.com> writes:

> On Tue, 01 May 2018 14:25:54 +0100,
> Bjorn Helgaas wrote:
>
> Hi Bjorn,
>
>> On Tue, May 01, 2018 at 01:59:20PM +0100, Marc Zyngier wrote:
>> > On 01/05/18 13:38, Sinan Kaya wrote:
>> > > +Marc,
>> > > 
>> > > On 4/30/2018 5:27 PM, Sinan Kaya wrote:
>> > >> On 4/30/2018 5:17 PM, Bjorn Helgaas wrote:
>> > >>>> What should we do about this?
>> > >>>>
>> > >>>> Since there is an actual HW errata involved, should we quirk this
>> > >>>> root port and not wait as if remove/shutdown doesn't exist?
>> > >>> I was hoping to avoid a quirk because AFAIK all Intel parts have this
>> > >>> issue so it will be an ongoing maintenance issue.  I tried to avoid
>> > >>> the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
>> > >>> timeout from hotplug command start time").
>> > >>>
>> > >>> But we still see the alarming messages, so we should probably add a
>> > >>> quirk to get rid of those.
>> > >>>
>> > >>> But I haven't given up on the idea of getting rid of the
>> > >>> pciehp_remove() path.  I'm not convinced yet that we actually need to
>> > >>> do anything to shut this device down.  I don't like the assumption
>> > >>> that kexec requires this.  The kexec is fundamentally just a branch,
>> > >>> and anything we do before the branch (i.e., in the old kernel), we
>> > >>> should also be able to do after the branch (i.e., in the kexec-ed
>> > >>> kernel).
>> > >>>
>> > >>
>> > >> In my experience with kexec, MSI type edge interrupts are harmless.
>> > >> You might just see a few unhandled interrupt messages during boot
>> > >> if something is pending from the first kernel.
>> > 
>> > Unfortunately, that's not always the case.
>> > 
>> > A number of GICv3/v4 implementations (a very common interrupt controller
>> > on ARM servers) cannot be disabled, which means they will keep writing
>> > to their pending tables long after kexec will have started the new
>> > kernel. And since we don't track memory allocation across kexec, you
>> > end-up with significant chances of observing single bit corruption as
>> > interrupts carry on being delivered. Oh, and you won't actually be able
>> > to take MSIs because you can't even reprogram the damn thing.
>> > 
>> > Yes, this can be considered a HW bug.
>> > 
>> > >> It is the level interrupts that are more concerning. It remains pending
>> > >> until the interrupt source is cleared. CPU never returns from the
>> > >> interrupt handler to actually continue booting the second kernel.
>> > > 
>> > > This makes me wonder why kexec doesn't disable all interrupt sources by
>> > > itself instead of relying on the drivers shutdown routine. Some drivers
>> > > don't even have a shutdown callback. Kexec could have done both as another
>> > > example. Something like.
>> > > 
>> > > 1. Call shutdown for all drivers if available.
>> > > 2. Disable all interrupt sources in the interrupt controller
>> > > 3. Start the new kernel.
>> > 
>> > See above. Although you can shut off the end-point and to some extent
>> > mask interrupts before jumping into the payload, it is not always
>> > possible to go back to a reasonable state where you can take actually MSIs.
>> 
>> This is exactly the sort of thing it would be nice to collect and
>> document as part of the background of "why kexec works the way it
>> does."  It certainly helps explain things that are far from obvious if
>> you don't have the background.
>
> I'd certainly be happy to help with it if someone was willing to
> kickstart such a document. kexec/kdump is a huge bag of "interesting"
> tricks, and it has driven me mad over the past couple of months (I'm
> typing this from a laptop that uses kexec as its bootloader, and it is
> *not fun*).

I don't know if it helps documentation wise but here is my memory of why
things are the way they are.

Case 1) kexec-on-panic.

In this case we run the new kernel in memory reserved since boot of the
previous kernel in memory has never been used by any device driver.
This means on-going DMA transactions that we don't manage to shut off
are harmless.

In actual execution a bare minimum of hardware is shutdown on the
kexec-on-panic path.  Ideally it would be nothing.  The crashing kernel
simply can not be trusted to shut things down itself.

The kernel that is executing in the after the crash loads a bare minimum
of drivers and does it's best to initialize the hardware.  Ideally if
something goes wrong the kernel will hang before we write to hardware
and mess anything up.

With this we get something like a 50% or a 60% success rate of capture
crashdump in practice in the field.

Everything else that has been tried relies more on the crashing kernel
and looks great in testing and then turns out to not have a measurable
success rate in practice.

Using lkdtm you can setup tests of various kinds of kernel corruption
and failure and see some approximation of the success rate of kexec will
see in practice.

I forget where we are with iommus, but the principles remain and iommus
tend to tricky just because they get in the middle of everything.

If someone stares hard enough we are probably at the point on x86 where
we can remove the irq shutdown code.

The kexec on panic case tends to be tested more on enterprise kernels
than on normal ones.

Case 2) Ordinary kexec.

The goal is to have a fully functionaly uncompromised system (unlike
kexec on panic).  Hardware bugs mean that in the general case the only
place we can shutdown hardware reliably is the drivers themselves.

All devices doing DMA must be shutdown in the kexec'ing kernel.  In part
because there is no guarantee that we will even load a driver for that
hardware.

The presence of DMA drove most of the decisions.  But from this thread I
see that irq handling follows the same pattern.  The best place to shut
anything down is in the driver where there is full knowledge of how
things work.

One of the more annoying things that have been discovered is the generic
pci dma disable bit doesn't work uniformly acrosss hardware.  Which
means there is no known generic way to shut down dma across the board.

In the prototypes there was only the "remove" method of drivers and that
worked well.  When it came time to merge the original kexec
implementation the maintainer of the power mananagement subsystem
insisted we add a new "shutdown" method instead, because while it is
necessary to shutdown the hardware you should not need to clean up the
data structures.

In practice that idea flopped.  The most reliable way I know to run
kexec is to remmod all of the drivers before runing sys_reboot(...,
LINUX_REBOT_CMD_KEXEC, ...) so that the shutdown methods get run.

It has been asked and I have given my approval to anyone who wants to do
the work to switch form the "shutdown" methods to "remove" on the kexec
path.  But so far it is a big enough project that no one has done that
yet.

It has been suggested that hardware does not need to be shutdown at the
end of the kernel before returning to a a firmware method.  Which is
incorrect.  Most firmware when it regains control triggers a system
reset to get the hardware back into a usable state, and be able to
reboot the system.  There is a magic register for this on x86.  On older
x86 systems and others that transfer control to firmware without doing a
soft hardware reset of the system and all of the devices.  Without
shutting down the devices they will work about as well as kexec does
when you don't remove the devices.  That is why I merged the reboot
and the kexec code paths.  Well that and so that there is a little
more testing.

In practice it still seems that rmmod is the only testing that reliably
happens to drivers.  So not sharing that code path makes kexec more
fragile than necessary.

Hopefully this helps put things into perspective and can help with your
docuement.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-05-01 22:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de>
     [not found] ` <20180427192207.GG8199@bhelgaas-glaptop.roam.corp.google.com>
     [not found]   ` <b62c2a8e-fe14-6d7d-147c-0ce3b0c0ab2f@codeaurora.org>
2018-04-27 21:12     ` pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago) Bjorn Helgaas
2018-04-28  0:56       ` Dave Young
2018-04-28  1:18         ` Dave Young
2018-04-28 13:03           ` okaya
2018-04-30 20:48             ` Sinan Kaya
2018-04-30 21:17               ` Bjorn Helgaas
2018-04-30 21:27                 ` Sinan Kaya
2018-05-01 12:38                   ` Sinan Kaya
2018-05-01 12:59                     ` Marc Zyngier
2018-05-01 13:25                       ` Bjorn Helgaas
2018-05-01 16:31                         ` Marc Zyngier
2018-05-01 22:32                           ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox