* Crashdump and IOMMU problems
@ 2011-05-11 12:46 Andrew Cooper
2011-05-11 22:11 ` Kay, Allen M
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2011-05-11 12:46 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Wei Wang, Allen Kay
Hello,
I have been debugging kexec interaction problems with XenServer and
found that the problem lies in how Xen tares down the computer in a crash.
The Xen kexec path does not touch IOMMU at all, which leaves the kexec
native kernel with interrupt remapping enabled without realizing it.
This leads to the kexec kernel failing to understand why its interrupts
aren't working.
As a debugging measure, I have put iommu_ops->suspend() and
iommu_disable_IR() on the kexec path and this 'fixes' the problem,
although it is far from safe.
From a correctness point of view, Xen really does need to shutdown all
IOMMU remapping before it jumps to the crash kernel. I know that kdump
is a "seat of the pants best effort" in the best case, but there is more
which Xen needs to do to help it along. I was considering adding a
crash_shutdown function to iommu_ops which goes and twiddles the
relevant disable bits, without saving state.
However, disabling DMA remapping while transfers are still ongoing is
likely asking for trouble. Seeing as people on here are likely to know
far more than me on this subject:
1) Is there a systematic way to find and disable active DMA transfers,
or indeed a systematic way to shut down PCI (etc) devices which is safe
for the kexec path.
2) Are there any other PC subsystems which could do with being shut down
in a sensible manor to make life easier for the kdump kernel?
Thanks in advance,
~Andrew
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Crashdump and IOMMU problems
2011-05-11 12:46 Crashdump and IOMMU problems Andrew Cooper
@ 2011-05-11 22:11 ` Kay, Allen M
2011-05-12 8:24 ` Tim Deegan
2011-05-12 9:01 ` Jan Beulich
0 siblings, 2 replies; 4+ messages in thread
From: Kay, Allen M @ 2011-05-11 22:11 UTC (permalink / raw)
To: Andrew Cooper, xen-devel@lists.xensource.com; +Cc: Wei Wang, Jan Beulich
I believe Jan was involved with adding kexec support in iommu code.
As for systematic way to disable active DMA, isn't this similar to OS shutdown case when all the drivers are unloaded? Does kexec unload all of the device drivers? Once all the drivers are unloaded, there shouldn't be any DMA transactions going on.
I don't know much about kexec flow but it sound like the high level flow is: 1) dom0 kernel shutdown all of the device dirver and then 2) call iommu_ops->suspend() or crash_shutdown() to disable all of the iommu hardware.
Allen
-----Original Message-----
From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
Sent: Wednesday, May 11, 2011 5:47 AM
To: xen-devel@lists.xensource.com
Cc: Kay, Allen M; Wei Wang
Subject: Crashdump and IOMMU problems
Hello,
I have been debugging kexec interaction problems with XenServer and
found that the problem lies in how Xen tares down the computer in a crash.
The Xen kexec path does not touch IOMMU at all, which leaves the kexec
native kernel with interrupt remapping enabled without realizing it.
This leads to the kexec kernel failing to understand why its interrupts
aren't working.
As a debugging measure, I have put iommu_ops->suspend() and
iommu_disable_IR() on the kexec path and this 'fixes' the problem,
although it is far from safe.
From a correctness point of view, Xen really does need to shutdown all
IOMMU remapping before it jumps to the crash kernel. I know that kdump
is a "seat of the pants best effort" in the best case, but there is more
which Xen needs to do to help it along. I was considering adding a
crash_shutdown function to iommu_ops which goes and twiddles the
relevant disable bits, without saving state.
However, disabling DMA remapping while transfers are still ongoing is
likely asking for trouble. Seeing as people on here are likely to know
far more than me on this subject:
1) Is there a systematic way to find and disable active DMA transfers,
or indeed a systematic way to shut down PCI (etc) devices which is safe
for the kexec path.
2) Are there any other PC subsystems which could do with being shut down
in a sensible manor to make life easier for the kdump kernel?
Thanks in advance,
~Andrew
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RE: Crashdump and IOMMU problems
2011-05-11 22:11 ` Kay, Allen M
@ 2011-05-12 8:24 ` Tim Deegan
2011-05-12 9:01 ` Jan Beulich
1 sibling, 0 replies; 4+ messages in thread
From: Tim Deegan @ 2011-05-12 8:24 UTC (permalink / raw)
To: Kay, Allen M
Cc: Wei Wang, Andrew Cooper, xen-devel@lists.xensource.com,
Jan Beulich
At 23:11 +0100 on 11 May (1305155493), Kay, Allen M wrote:
> I believe Jan was involved with adding kexec support in iommu code.
>
> As for systematic way to disable active DMA, isn't this similar to OS shutdown case when all the drivers are unloaded? Does kexec unload all of the device drivers? Once all the drivers are unloaded, there shouldn't be any DMA transactions going on.
>
> I don't know much about kexec flow but it sound like the high level flow is: 1) dom0 kernel shutdown all of the device dirver and then 2) call iommu_ops->suspend() or crash_shutdown() to disable all of the iommu hardware.
>
In Xen we might have to kexec on crash without knowing even which VM is
driving the devices, so we can't do a clean unload. It might be
worthwhile just walking the PCI busses disabling everything, though.
Tim.
--
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Crashdump and IOMMU problems
2011-05-11 22:11 ` Kay, Allen M
2011-05-12 8:24 ` Tim Deegan
@ 2011-05-12 9:01 ` Jan Beulich
1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2011-05-12 9:01 UTC (permalink / raw)
To: Andrew Cooper, Allen M Kay; +Cc: Wei Wang, xen-devel@lists.xensource.com
[-- Attachment #1.1: Type: text/plain, Size: 214 bytes --]
>>> "Kay, Allen M" <allen.m.kay@intel.com> 12.05.11 00:11 >>>
>I believe Jan was involved with adding kexec support in iommu code.
No, I definitely wasn't (nevertheless I'm interested in the subject).
Jan
[-- Attachment #1.2: HTML --]
[-- Type: text/html, Size: 498 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-05-12 9:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-11 12:46 Crashdump and IOMMU problems Andrew Cooper
2011-05-11 22:11 ` Kay, Allen M
2011-05-12 8:24 ` Tim Deegan
2011-05-12 9:01 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).