* [PATCH] amd iommu: force flush of iommu prior during shutdown @ 2010-03-31 15:24 Neil Horman 2010-03-31 15:54 ` Vivek Goyal 2010-03-31 21:25 ` Chris Wright 0 siblings, 2 replies; 41+ messages in thread From: Neil Horman @ 2010-03-31 15:24 UTC (permalink / raw) To: iommu; +Cc: joerg.roedel, nhorman, vgoyal, hbabu, kexec, linux-kernel Flush iommu during shutdown When using an iommu, its possible, if a kdump kernel boot follows a primary kernel crash, that dma operations might still be in flight from the previous kernel during the kdump kernel boot. This can lead to memory corruption, crashes, and other erroneous behavior, specifically I've seen it manifest during a kdump boot as endless iommu error log entries of the form: AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d address=0x000000000245a0c0 flags=0x0070] Followed by an inability to access hard drives, and various other resources. I've written this fix for it. In short it just forces a flush of the in flight dma operations on shutdown, so that the new kernel is certain not to have any in-flight dmas trying to complete after we've reset all the iommu page tables, causing the above errors. I've tested it and it fixes the problem for me quite well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> amd_iommu_init.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c index 9dc91b4..8fbdf58 100644 --- a/arch/x86/kernel/amd_iommu_init.c +++ b/arch/x86/kernel/amd_iommu_init.c @@ -265,8 +265,26 @@ static void iommu_enable(struct amd_iommu *iommu) iommu_feature_enable(iommu, CONTROL_IOMMU_EN); } -static void iommu_disable(struct amd_iommu *iommu) +static void iommu_disable(struct amd_iommu *iommu, bool flush) { + + /* + * This ensures that all in-flight dmas for this iommu + * are complete prior to shutting it down + * its a bit racy, but I think its ok, given that if we're flushing + * we're in a shutdown path (either a graceful shutdown or a + * crash leading to a kdump boot. That means we're down to one + * cpu, and the other system hardware isn't going to issue + * subsequent dma operations. + * Also note that we gate the flusing on the flush boolean because + * the enable_iommus path uses this function and we can't flush any + * data in that path until later when the iommus are fully initialized + */ + if (flush) { + amd_iommu_flush_all_devices(); + amd_iommu_flush_all_domains(); + } + /* Disable command buffer */ iommu_feature_disable(iommu, CONTROL_CMDBUF_EN); @@ -276,6 +294,7 @@ static void iommu_disable(struct amd_iommu *iommu) /* Disable IOMMU hardware itself */ iommu_feature_disable(iommu, CONTROL_IOMMU_EN); + } /* @@ -1114,7 +1133,7 @@ static void enable_iommus(void) struct amd_iommu *iommu; for_each_iommu(iommu) { - iommu_disable(iommu); + iommu_disable(iommu, false); iommu_set_device_table(iommu); iommu_enable_command_buffer(iommu); iommu_enable_event_buffer(iommu); @@ -1129,7 +1148,7 @@ static void disable_iommus(void) struct amd_iommu *iommu; for_each_iommu(iommu) - iommu_disable(iommu); + iommu_disable(iommu, true); } /* ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 15:24 [PATCH] amd iommu: force flush of iommu prior during shutdown Neil Horman @ 2010-03-31 15:54 ` Vivek Goyal 2010-03-31 18:28 ` Neil Horman 2010-03-31 18:43 ` [PATCH] amd iommu: force flush of iommu prior during shutdown Eric W. Biederman 2010-03-31 21:25 ` Chris Wright 1 sibling, 2 replies; 41+ messages in thread From: Vivek Goyal @ 2010-03-31 15:54 UTC (permalink / raw) To: Neil Horman Cc: iommu, joerg.roedel, hbabu, kexec, linux-kernel, Eric W. Biederman On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote: > Flush iommu during shutdown > > When using an iommu, its possible, if a kdump kernel boot follows a primary > kernel crash, that dma operations might still be in flight from the previous > kernel during the kdump kernel boot. This can lead to memory corruption, > crashes, and other erroneous behavior, specifically I've seen it manifest during > a kdump boot as endless iommu error log entries of the form: > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > address=0x000000000245a0c0 flags=0x0070] > > Followed by an inability to access hard drives, and various other resources. > > I've written this fix for it. In short it just forces a flush of the in flight > dma operations on shutdown, so that the new kernel is certain not to have any > in-flight dmas trying to complete after we've reset all the iommu page tables, > causing the above errors. I've tested it and it fixes the problem for me quite > well. CCing Eric also. Neil, this is interesting. In the past we noticed similar issues, especially on PPC. But I was told that we could not clear the iommu mapping entries as we had no control on in flight DMA and if a DMA comes later after clearing an entry and entry is not present, it is an error. Hence one of the suggestions was not to clear iommu mapping entries but reserve some for kdump operation and use those in kdump kernel. So this call amd_iommu_flush_all_devices() will be able to tell devices that don't do any more DMAs and hence it is safe to reprogram iommu mapping entries. Thanks Vivek > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > > amd_iommu_init.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c > index 9dc91b4..8fbdf58 100644 > --- a/arch/x86/kernel/amd_iommu_init.c > +++ b/arch/x86/kernel/amd_iommu_init.c > @@ -265,8 +265,26 @@ static void iommu_enable(struct amd_iommu *iommu) > iommu_feature_enable(iommu, CONTROL_IOMMU_EN); > } > > -static void iommu_disable(struct amd_iommu *iommu) > +static void iommu_disable(struct amd_iommu *iommu, bool flush) > { > + > + /* > + * This ensures that all in-flight dmas for this iommu > + * are complete prior to shutting it down > + * its a bit racy, but I think its ok, given that if we're flushing > + * we're in a shutdown path (either a graceful shutdown or a > + * crash leading to a kdump boot. That means we're down to one > + * cpu, and the other system hardware isn't going to issue > + * subsequent dma operations. > + * Also note that we gate the flusing on the flush boolean because > + * the enable_iommus path uses this function and we can't flush any > + * data in that path until later when the iommus are fully initialized > + */ > + if (flush) { > + amd_iommu_flush_all_devices(); > + amd_iommu_flush_all_domains(); > + } > + > /* Disable command buffer */ > iommu_feature_disable(iommu, CONTROL_CMDBUF_EN); > > @@ -276,6 +294,7 @@ static void iommu_disable(struct amd_iommu *iommu) > > /* Disable IOMMU hardware itself */ > iommu_feature_disable(iommu, CONTROL_IOMMU_EN); > + > } > > /* > @@ -1114,7 +1133,7 @@ static void enable_iommus(void) > struct amd_iommu *iommu; > > for_each_iommu(iommu) { > - iommu_disable(iommu); > + iommu_disable(iommu, false); > iommu_set_device_table(iommu); > iommu_enable_command_buffer(iommu); > iommu_enable_event_buffer(iommu); > @@ -1129,7 +1148,7 @@ static void disable_iommus(void) > struct amd_iommu *iommu; > > for_each_iommu(iommu) > - iommu_disable(iommu); > + iommu_disable(iommu, true); > } > > /* ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 15:54 ` Vivek Goyal @ 2010-03-31 18:28 ` Neil Horman 2010-03-31 18:57 ` Eric W. Biederman 2010-03-31 18:43 ` [PATCH] amd iommu: force flush of iommu prior during shutdown Eric W. Biederman 1 sibling, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-03-31 18:28 UTC (permalink / raw) To: Vivek Goyal Cc: iommu, joerg.roedel, hbabu, kexec, linux-kernel, Eric W. Biederman On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: > On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote: > > Flush iommu during shutdown > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > kernel crash, that dma operations might still be in flight from the previous > > kernel during the kdump kernel boot. This can lead to memory corruption, > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > a kdump boot as endless iommu error log entries of the form: > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > address=0x000000000245a0c0 flags=0x0070] > > > > Followed by an inability to access hard drives, and various other resources. > > > > I've written this fix for it. In short it just forces a flush of the in flight > > dma operations on shutdown, so that the new kernel is certain not to have any > > in-flight dmas trying to complete after we've reset all the iommu page tables, > > causing the above errors. I've tested it and it fixes the problem for me quite > > well. > > CCing Eric also. > > Neil, this is interesting. In the past we noticed similar issues, > especially on PPC. But I was told that we could not clear the iommu > mapping entries as we had no control on in flight DMA and if a DMA comes > later after clearing an entry and entry is not present, it is an error. > Yes, the problem is (as I understand it) is that the triggering of DMA operations to/from a device doesn't have synchronization with the iommu itself. I.e. to conduct a dma you have to: 1) map the in-memory buffer to a dma address using something like pci_map_single. This results (in systems with an iommu) getting page table space allocated in the iommu for the translation. 2) triggering the dma to/from the device by tickling whatever hardware the device has mapped. 3) completing the dma by calling pci_unmap_single (or other function) which frees the page table space in the iommu The problem, exactly as you indicate is that on a kdump panic, we might boot the new kernel and re-enable the iommu with these dmas still in flight. If we start messing about with the iommu page tables then, we start getting all sorts of errors, and other various failures. > Hence one of the suggestions was not to clear iommu mapping entries but > reserve some for kdump operation and use those in kdump kernel. > Yeah, thats a solution, but it seems awfully complex to me. To do that, we need to teach every iommu we support about kdump, by telling it how much space to reserve, and when to use it and when not to (i.e. we'd have to tell it to use the kdump space, vs the normal space dependent on the status of the reset_devices flag, or something equally unpleasant). Actually, thinking about it, I'm not sure that will even work, as IIRC the iommu only has one page table base pointer. So we would either need to re-write that pointer to point into the kdump kernels memory space (invalidating the old table entries, which perpetuates this bug), or we would need to further enhance the iommu code to be able to access the old page tables via read_from_oldmem/write_to_oldmem when booting a kdump kernel, wouldn't we? Using this method, all we really do is try to ensure that, prior to disabling the iommu, we make sure that any pending dmas are complete. That way, when we re-enable the iommu in the kdump kernel, we can safely maniuplate the new page tables, knowing that no pending dma is using them In fairness to this debate, my proposal does have a small race condition. In the above sequence, because the cpu triggers a dma independently of the setup of the mapping in the iommu, it is possible that a dma might be triggered immediately after we flush the iotlb, which may leave an in-flight dma pending while we boot the kdump kernel. In practice though, this will never happen. By the time we arrive at this code, we've already executed native_machine_crash_shutdown which: 1) halts all the other cpus in the system 2) disables local interrupts Because of those two events, we're effectively on a path that we can't be preempted-from. So as long as we don't trigger any dma operations between our return from iommu_shutdown and machine_kexec (which is the next call), we're safe. > So this call amd_iommu_flush_all_devices() will be able to tell devices > that don't do any more DMAs and hence it is safe to reprogram iommu > mapping entries. > It blocks the cpu until any pending DMA operations are complete. Hmm, as I think about it, there is still a small possibility that a device like a NIC which has several buffers pre-dma-mapped could start a new dma before we completely disabled the iommu, althought thats small. I never saw that in my testing, but hitting that would be fairly difficult I think, since its literally just a few hundred cycles between the flush and the actual hardware disable operation. According to this though: http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf That window could be closed fairly easily, but simply disabling read and write permissions for each device table entry prior to calling flush. If we do that, then flush the device table, any subsequently started dma operation would just get noted in the error log, which we could ignore, since we're abot to boot to the kdump kernel anyway. Would you like me to respin w/ that modification? Neil > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 18:28 ` Neil Horman @ 2010-03-31 18:57 ` Eric W. Biederman 2010-03-31 19:18 ` Neil Horman 0 siblings, 1 reply; 41+ messages in thread From: Eric W. Biederman @ 2010-03-31 18:57 UTC (permalink / raw) To: Neil Horman; +Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel Neil Horman <nhorman@tuxdriver.com> writes: > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: >> So this call amd_iommu_flush_all_devices() will be able to tell devices >> that don't do any more DMAs and hence it is safe to reprogram iommu >> mapping entries. >> > It blocks the cpu until any pending DMA operations are complete. Hmm, as I > think about it, there is still a small possibility that a device like a NIC > which has several buffers pre-dma-mapped could start a new dma before we > completely disabled the iommu, althought thats small. I never saw that in my > testing, but hitting that would be fairly difficult I think, since its literally > just a few hundred cycles between the flush and the actual hardware disable > operation. > > According to this though: > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf > That window could be closed fairly easily, but simply disabling read and write > permissions for each device table entry prior to calling flush. If we do that, > then flush the device table, any subsequently started dma operation would just > get noted in the error log, which we could ignore, since we're abot to boot to > the kdump kernel anyway. > > Would you like me to respin w/ that modification? Disabling permissions on all devices sounds good for the new virtualization capable iommus. I think older iommus will still be challenged. I think on x86 we have simply been able to avoid using those older iommus. I like the direction you are going but please let's put this in a paranoid iommu enable routine. Eric ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 18:57 ` Eric W. Biederman @ 2010-03-31 19:18 ` Neil Horman 2010-03-31 19:51 ` Eric W. Biederman 0 siblings, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-03-31 19:18 UTC (permalink / raw) To: Eric W. Biederman Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote: > Neil Horman <nhorman@tuxdriver.com> writes: > > > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: > > >> So this call amd_iommu_flush_all_devices() will be able to tell devices > >> that don't do any more DMAs and hence it is safe to reprogram iommu > >> mapping entries. > >> > > It blocks the cpu until any pending DMA operations are complete. Hmm, as I > > think about it, there is still a small possibility that a device like a NIC > > which has several buffers pre-dma-mapped could start a new dma before we > > completely disabled the iommu, althought thats small. I never saw that in my > > testing, but hitting that would be fairly difficult I think, since its literally > > just a few hundred cycles between the flush and the actual hardware disable > > operation. > > > > According to this though: > > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf > > That window could be closed fairly easily, but simply disabling read and write > > permissions for each device table entry prior to calling flush. If we do that, > > then flush the device table, any subsequently started dma operation would just > > get noted in the error log, which we could ignore, since we're abot to boot to > > the kdump kernel anyway. > > > > Would you like me to respin w/ that modification? > > Disabling permissions on all devices sounds good for the new virtualization > capable iommus. I think older iommus will still be challenged. I think > on x86 we have simply been able to avoid using those older iommus. > > I like the direction you are going but please let's put this in a > paranoid iommu enable routine. > You mean like initialize the device table so that all devices are default disabled on boot, and then selectively enable them (perhaps during a device_attach)? I can give that a spin. Neil > Eric > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 19:18 ` Neil Horman @ 2010-03-31 19:51 ` Eric W. Biederman 2010-03-31 20:27 ` Neil Horman 0 siblings, 1 reply; 41+ messages in thread From: Eric W. Biederman @ 2010-03-31 19:51 UTC (permalink / raw) To: Neil Horman; +Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel Neil Horman <nhorman@tuxdriver.com> writes: > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote: >> Neil Horman <nhorman@tuxdriver.com> writes: >> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: >> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices >> >> that don't do any more DMAs and hence it is safe to reprogram iommu >> >> mapping entries. >> >> >> > It blocks the cpu until any pending DMA operations are complete. Hmm, as I >> > think about it, there is still a small possibility that a device like a NIC >> > which has several buffers pre-dma-mapped could start a new dma before we >> > completely disabled the iommu, althought thats small. I never saw that in my >> > testing, but hitting that would be fairly difficult I think, since its literally >> > just a few hundred cycles between the flush and the actual hardware disable >> > operation. >> > >> > According to this though: >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf >> > That window could be closed fairly easily, but simply disabling read and write >> > permissions for each device table entry prior to calling flush. If we do that, >> > then flush the device table, any subsequently started dma operation would just >> > get noted in the error log, which we could ignore, since we're abot to boot to >> > the kdump kernel anyway. >> > >> > Would you like me to respin w/ that modification? >> >> Disabling permissions on all devices sounds good for the new virtualization >> capable iommus. I think older iommus will still be challenged. I think >> on x86 we have simply been able to avoid using those older iommus. >> >> I like the direction you are going but please let's put this in a >> paranoid iommu enable routine. >> > You mean like initialize the device table so that all devices are default > disabled on boot, and then selectively enable them (perhaps during a > device_attach)? I can give that a spin. That sounds good. Eric ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 19:51 ` Eric W. Biederman @ 2010-03-31 20:27 ` Neil Horman 2010-04-01 4:04 ` Eric W. Biederman 2010-04-01 14:29 ` Joerg Roedel 0 siblings, 2 replies; 41+ messages in thread From: Neil Horman @ 2010-03-31 20:27 UTC (permalink / raw) To: Eric W. Biederman Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote: > Neil Horman <nhorman@tuxdriver.com> writes: > > > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote: > >> Neil Horman <nhorman@tuxdriver.com> writes: > >> > >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: > >> > >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices > >> >> that don't do any more DMAs and hence it is safe to reprogram iommu > >> >> mapping entries. > >> >> > >> > It blocks the cpu until any pending DMA operations are complete. Hmm, as I > >> > think about it, there is still a small possibility that a device like a NIC > >> > which has several buffers pre-dma-mapped could start a new dma before we > >> > completely disabled the iommu, althought thats small. I never saw that in my > >> > testing, but hitting that would be fairly difficult I think, since its literally > >> > just a few hundred cycles between the flush and the actual hardware disable > >> > operation. > >> > > >> > According to this though: > >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf > >> > That window could be closed fairly easily, but simply disabling read and write > >> > permissions for each device table entry prior to calling flush. If we do that, > >> > then flush the device table, any subsequently started dma operation would just > >> > get noted in the error log, which we could ignore, since we're abot to boot to > >> > the kdump kernel anyway. > >> > > >> > Would you like me to respin w/ that modification? > >> > >> Disabling permissions on all devices sounds good for the new virtualization > >> capable iommus. I think older iommus will still be challenged. I think > >> on x86 we have simply been able to avoid using those older iommus. > >> > >> I like the direction you are going but please let's put this in a > >> paranoid iommu enable routine. > >> > > You mean like initialize the device table so that all devices are default > > disabled on boot, and then selectively enable them (perhaps during a > > device_attach)? I can give that a spin. > > That sounds good. > So I'm officially rescinding this patch. It apparently just covered up the problem, rather than solved it outright. This is going to take some more thought on my part. I read the code a bit closer, and the amd iommu on boot up currently marks all its entries as valid and having a valid translation (because if they're marked as invalid they're passed through untranslated which strikes me as dangerous, since it means a dma address treated as a bus address could lead to memory corruption. The saving grace is that they are marked as non-readable and non-writeable, so any device doing a dma after the reinit should get logged (which it does), and then target aborted (which should effectively squash the translation) I'm starting to wonder if: 1) some dmas are so long lived they start aliasing new dmas that get mapped in the kdump kernel leading to various erroneous behavior or 2) a slew of target aborts to some hardware results in them being in an inconsistent state I'm going to try marking the dev table on shutdown such that all devices have no read/write permissions to see if that changes the situation. It should I think give me a pointer as to weather (1) or (2) is the more likely problem. Lots more thinking to do.... Neil > Eric > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 20:27 ` Neil Horman @ 2010-04-01 4:04 ` Eric W. Biederman 2010-04-01 12:49 ` Neil Horman 2010-04-01 14:29 ` Joerg Roedel 1 sibling, 1 reply; 41+ messages in thread From: Eric W. Biederman @ 2010-04-01 4:04 UTC (permalink / raw) To: Neil Horman; +Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel Neil Horman <nhorman@tuxdriver.com> writes: > On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote: >> Neil Horman <nhorman@tuxdriver.com> writes: >> >> > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote: >> >> Neil Horman <nhorman@tuxdriver.com> writes: >> >> >> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: >> >> >> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices >> >> >> that don't do any more DMAs and hence it is safe to reprogram iommu >> >> >> mapping entries. >> >> >> >> >> > It blocks the cpu until any pending DMA operations are complete. Hmm, as I >> >> > think about it, there is still a small possibility that a device like a NIC >> >> > which has several buffers pre-dma-mapped could start a new dma before we >> >> > completely disabled the iommu, althought thats small. I never saw that in my >> >> > testing, but hitting that would be fairly difficult I think, since its literally >> >> > just a few hundred cycles between the flush and the actual hardware disable >> >> > operation. >> >> > >> >> > According to this though: >> >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf >> >> > That window could be closed fairly easily, but simply disabling read and write >> >> > permissions for each device table entry prior to calling flush. If we do that, >> >> > then flush the device table, any subsequently started dma operation would just >> >> > get noted in the error log, which we could ignore, since we're abot to boot to >> >> > the kdump kernel anyway. >> >> > >> >> > Would you like me to respin w/ that modification? >> >> >> >> Disabling permissions on all devices sounds good for the new virtualization >> >> capable iommus. I think older iommus will still be challenged. I think >> >> on x86 we have simply been able to avoid using those older iommus. >> >> >> >> I like the direction you are going but please let's put this in a >> >> paranoid iommu enable routine. >> >> >> > You mean like initialize the device table so that all devices are default >> > disabled on boot, and then selectively enable them (perhaps during a >> > device_attach)? I can give that a spin. >> >> That sounds good. >> > > So I'm officially rescinding this patch. It apparently just covered up the > problem, rather than solved it outright. This is going to take some more > thought on my part. I read the code a bit closer, and the amd iommu on boot up > currently marks all its entries as valid and having a valid translation (because > if they're marked as invalid they're passed through untranslated which strikes > me as dangerous, since it means a dma address treated as a bus address could > lead to memory corruption. The saving grace is that they are marked as > non-readable and non-writeable, so any device doing a dma after the reinit > should get logged (which it does), and then target aborted (which should > effectively squash the translation) > > I'm starting to wonder if: > > 1) some dmas are so long lived they start aliasing new dmas that get mapped in > the kdump kernel leading to various erroneous behavior I do know things like arp refreshes used to cause me trouble. I have a particular memory of kexec into memtest86 and a little while later memory corruption. > 2) a slew of target aborts to some hardware results in them being in an > inconsistent state > > I'm going to try marking the dev table on shutdown such that all devices have no > read/write permissions to see if that changes the situation. It should I think > give me a pointer as to weather (1) or (2) is the more likely problem. > > Lots more thinking to do.... I guess I can see devices getting confused by target aborts. I'm wondering if (a) we can suppress these DMAs. or (b) we can reset the pci devices before we use them. With pcie that should be possible. We used to be able simply not to use the IOMMU in x86 and avoid this trouble. Now with per device enables it looks like we need to do something with it. Eric ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 4:04 ` Eric W. Biederman @ 2010-04-01 12:49 ` Neil Horman 0 siblings, 0 replies; 41+ messages in thread From: Neil Horman @ 2010-04-01 12:49 UTC (permalink / raw) To: Eric W. Biederman Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel On Wed, Mar 31, 2010 at 09:04:27PM -0700, Eric W. Biederman wrote: > Neil Horman <nhorman@tuxdriver.com> writes: > > > On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote: > >> Neil Horman <nhorman@tuxdriver.com> writes: > >> > >> > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote: > >> >> Neil Horman <nhorman@tuxdriver.com> writes: > >> >> > >> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote: > >> >> > >> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices > >> >> >> that don't do any more DMAs and hence it is safe to reprogram iommu > >> >> >> mapping entries. > >> >> >> > >> >> > It blocks the cpu until any pending DMA operations are complete. Hmm, as I > >> >> > think about it, there is still a small possibility that a device like a NIC > >> >> > which has several buffers pre-dma-mapped could start a new dma before we > >> >> > completely disabled the iommu, althought thats small. I never saw that in my > >> >> > testing, but hitting that would be fairly difficult I think, since its literally > >> >> > just a few hundred cycles between the flush and the actual hardware disable > >> >> > operation. > >> >> > > >> >> > According to this though: > >> >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf > >> >> > That window could be closed fairly easily, but simply disabling read and write > >> >> > permissions for each device table entry prior to calling flush. If we do that, > >> >> > then flush the device table, any subsequently started dma operation would just > >> >> > get noted in the error log, which we could ignore, since we're abot to boot to > >> >> > the kdump kernel anyway. > >> >> > > >> >> > Would you like me to respin w/ that modification? > >> >> > >> >> Disabling permissions on all devices sounds good for the new virtualization > >> >> capable iommus. I think older iommus will still be challenged. I think > >> >> on x86 we have simply been able to avoid using those older iommus. > >> >> > >> >> I like the direction you are going but please let's put this in a > >> >> paranoid iommu enable routine. > >> >> > >> > You mean like initialize the device table so that all devices are default > >> > disabled on boot, and then selectively enable them (perhaps during a > >> > device_attach)? I can give that a spin. > >> > >> That sounds good. > >> > > > > So I'm officially rescinding this patch. It apparently just covered up the > > problem, rather than solved it outright. This is going to take some more > > thought on my part. I read the code a bit closer, and the amd iommu on boot up > > currently marks all its entries as valid and having a valid translation (because > > if they're marked as invalid they're passed through untranslated which strikes > > me as dangerous, since it means a dma address treated as a bus address could > > lead to memory corruption. The saving grace is that they are marked as > > non-readable and non-writeable, so any device doing a dma after the reinit > > should get logged (which it does), and then target aborted (which should > > effectively squash the translation) > > > > I'm starting to wonder if: > > > > 1) some dmas are so long lived they start aliasing new dmas that get mapped in > > the kdump kernel leading to various erroneous behavior > > I do know things like arp refreshes used to cause me trouble. I have > a particular memory of kexec into memtest86 and a little while later > memory corruption. > > > > 2) a slew of target aborts to some hardware results in them being in an > > inconsistent state > > > > I'm going to try marking the dev table on shutdown such that all devices have no > > read/write permissions to see if that changes the situation. It should I think > > give me a pointer as to weather (1) or (2) is the more likely problem. > > > > Lots more thinking to do.... > > I guess I can see devices getting confused by target aborts. > I'm wondering if (a) we can suppress these DMAs. or (b) we can reset the pci > devices before we use them. With pcie that should be possible. > > We used to be able simply not to use the IOMMU in x86 and avoid this trouble. > Now with per device enables it looks like we need to do something with it. > Agreed. Another strategy I think worth considering is: 1) Leave the iommu on throughout the kdump reboot process, so as to allow dma's to complete without any interference 2) Make sure the log configuration is enabled prior to kdump reboot 3) Flush the page table cache in the iommu prior to shutdown 4) on re-init in the kdump kernel, query the log. If its non-empty, recognize that we're in a kdump boot, and instead of creating a new devtable/page table/domain table, just clone the old ones from the previous kernels memory 5) use the log to detect which entries in the iommu have been touched, and assume those touched entries are done, marking them as invalid, until such time a minimum of free entries in the table are provided 6) continue use with those available free entries with this approach, we could let the 'old' dmas complete without interference, and just allocate new dma's from the unused entries in the new kernel. Just a thought. Neil > Eric > > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 20:27 ` Neil Horman 2010-04-01 4:04 ` Eric W. Biederman @ 2010-04-01 14:29 ` Joerg Roedel 2010-04-01 14:47 ` Neil Horman 1 sibling, 1 reply; 41+ messages in thread From: Joerg Roedel @ 2010-04-01 14:29 UTC (permalink / raw) To: Neil Horman Cc: Eric W. Biederman, kexec, linux-kernel, hbabu, iommu, Vivek Goyal Hi Neil, first some general words about the problem you discovered: The problem is not caused by in-flight DMA. The problem is that the IOMMU hardware has cached the old DTE entry for the device (including the old page-table root pointer) and is using it still when the kdump kernel has booted. We had this problem once and fixed it by flushing a DTE in the IOMMU before it is used for the first time. This seems to be broken now. Which kernel have you seen this on? I am back in office next tuesday and will look into this problem too. On Wed, Mar 31, 2010 at 04:27:45PM -0400, Neil Horman wrote: > So I'm officially rescinding this patch. Yeah, the right solution to this problem is to find out why every DTE is not longer flushed before first use. > It apparently just covered up the problem, rather than solved it > outright. This is going to take some more thought on my part. I read > the code a bit closer, and the amd iommu on boot up currently marks > all its entries as valid and having a valid translation (because if > they're marked as invalid they're passed through untranslated which > strikes me as dangerous, since it means a dma address treated as a bus > address could lead to memory corruption. The saving grace is that > they are marked as non-readable and non-writeable, so any device doing > a dma after the reinit should get logged (which it does), and then > target aborted (which should effectively squash the translation) Right. The default for all devices is to forbid DMA. > I'm starting to wonder if: > > 1) some dmas are so long lived they start aliasing new dmas that get mapped in > the kdump kernel leading to various erroneous behavior At least not in this case. Even when this is true the DMA would target memory of the crashed kernel and not the kdump area. This is not even memory corruption because the device will write to memory the driver has allocated for it. > 2) a slew of target aborts to some hardware results in them being in an > inconsistent state Thats indeed true. I have seen that with ixgbe cards for example. They seem to be really confused after an target abort. > I'm going to try marking the dev table on shutdown such that all devices have no > read/write permissions to see if that changes the situation. It should I think > give me a pointer as to weather (1) or (2) is the more likely problem. Probably not. You still need to flush the old entries out of the IOMMU. Thanks, Joerg ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 14:29 ` Joerg Roedel @ 2010-04-01 14:47 ` Neil Horman 2010-04-01 15:56 ` Joerg Roedel 0 siblings, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-04-01 14:47 UTC (permalink / raw) To: Joerg Roedel Cc: Eric W. Biederman, kexec, linux-kernel, hbabu, iommu, Vivek Goyal On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote: > Hi Neil, > > first some general words about the problem you discovered: The problem > is not caused by in-flight DMA. The problem is that the IOMMU hardware > has cached the old DTE entry for the device (including the old > page-table root pointer) and is using it still when the kdump kernel has > booted. We had this problem once and fixed it by flushing a DTE in the > IOMMU before it is used for the first time. This seems to be broken > now. Which kernel have you seen this on? > First noted on 2.6.32 (the RHEL6 beta kernel), but I've reproduced with the latest linus tree as well. > I am back in office next tuesday and will look into this problem too. > Thank you. > On Wed, Mar 31, 2010 at 04:27:45PM -0400, Neil Horman wrote: > > So I'm officially rescinding this patch. > > Yeah, the right solution to this problem is to find out why every DTE is > not longer flushed before first use. > Right, I've checked the commits that chris noted in his previous email and they're in place, so I'm not sure how we're getting stale dte's > > It apparently just covered up the problem, rather than solved it > > outright. This is going to take some more thought on my part. I read > > the code a bit closer, and the amd iommu on boot up currently marks > > all its entries as valid and having a valid translation (because if > > they're marked as invalid they're passed through untranslated which > > strikes me as dangerous, since it means a dma address treated as a bus > > address could lead to memory corruption. The saving grace is that > > they are marked as non-readable and non-writeable, so any device doing > > a dma after the reinit should get logged (which it does), and then > > target aborted (which should effectively squash the translation) > > Right. The default for all devices is to forbid DMA. > Thanks, glad to know I read that right, took me a bit to understand it :) > > I'm starting to wonder if: > > > > 1) some dmas are so long lived they start aliasing new dmas that get mapped in > > the kdump kernel leading to various erroneous behavior > > At least not in this case. Even when this is true the DMA would target > memory of the crashed kernel and not the kdump area. This is not even > memory corruption because the device will write to memory the driver has > allocated for it. > Yeah, I figured that old dma's going to old locations were ok, I was more concerned that if an 'old' dma lived through our resetting of the iommu page table, leading to us pointing an old dma address to a new physical address within the new kernel memory space. Although, given the reset state of the tables, for that to happen a dma would have to not attempt a memory transaction until sometime later in the boot, which seems...unlikely to say the least, so I agree this is almost certainly not happening. > > 2) a slew of target aborts to some hardware results in them being in an > > inconsistent state > > Thats indeed true. I have seen that with ixgbe cards for example. They > seem to be really confused after an target abort. > Yeah, this part worries me, target aborts lead to various brain dead hardware pieces. What are you thoughts on leaving the iommu on through a reboot to avoid this issue (possibly resetting any pci device that encounters a target abort, as noted in the error log on the iommu? > > I'm going to try marking the dev table on shutdown such that all devices have no > > read/write permissions to see if that changes the situation. It should I think > > give me a pointer as to weather (1) or (2) is the more likely problem. > > Probably not. You still need to flush the old entries out of the IOMMU. > Yeah, after reading your explination above, I agree Neil > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 14:47 ` Neil Horman @ 2010-04-01 15:56 ` Joerg Roedel 2010-04-01 17:11 ` Neil Horman 0 siblings, 1 reply; 41+ messages in thread From: Joerg Roedel @ 2010-04-01 15:56 UTC (permalink / raw) To: Neil Horman Cc: Eric W. Biederman, kexec, linux-kernel, hbabu, iommu, Vivek Goyal On Thu, Apr 01, 2010 at 10:47:36AM -0400, Neil Horman wrote: > On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote: > > I am back in office next tuesday and will look into this problem too. > > > Thank you. Just took a look and I think the problem is that the devices are attached to domains before the IOMMU hardware is enabled. This happens in the function prealloc_protection_domains(). The attach code issues the dte-invalidate commands but they are not executed because the hardware is off. I will verify this when I have access to hardware again. The possible fix will be to enable the hardware earlier in the initialization path. > > Right. The default for all devices is to forbid DMA. > > > Thanks, glad to know I read that right, took me a bit to understand it :) I should probably add a comment :-) > > Thats indeed true. I have seen that with ixgbe cards for example. They > > seem to be really confused after an target abort. > > > Yeah, this part worries me, target aborts lead to various brain dead hardware > pieces. What are you thoughts on leaving the iommu on through a reboot to avoid > this issue (possibly resetting any pci device that encounters a target abort, as > noted in the error log on the iommu? This would only prevent possible data corruption. When the IOMMU is off the devices will not get a target abort but will only write to different physical memory locations. The window where a target abort can happen starts when the kdump kernel re-enables the IOMMU and ends when the new driver for that device attaches. This is a small window but there is not a lot we can do to avoid this small time window. Joerg ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 15:56 ` Joerg Roedel @ 2010-04-01 17:11 ` Neil Horman 2010-04-01 20:14 ` Joerg Roedel 0 siblings, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-04-01 17:11 UTC (permalink / raw) To: Joerg Roedel Cc: Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote: > On Thu, Apr 01, 2010 at 10:47:36AM -0400, Neil Horman wrote: > > On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote: > > > I am back in office next tuesday and will look into this problem too. > > > > > Thank you. > > Just took a look and I think the problem is that the devices are > attached to domains before the IOMMU hardware is enabled. This happens > in the function prealloc_protection_domains(). The attach code issues > the dte-invalidate commands but they are not executed because the > hardware is off. I will verify this when I have access to hardware > again. > The possible fix will be to enable the hardware earlier in the > initialization path. > That sounds like a reasonable theory, I'll try hack something together shortly. > > > Right. The default for all devices is to forbid DMA. > > > > > Thanks, glad to know I read that right, took me a bit to understand it :) > > I should probably add a comment :-) > > > > Thats indeed true. I have seen that with ixgbe cards for example. They > > > seem to be really confused after an target abort. > > > > > Yeah, this part worries me, target aborts lead to various brain dead hardware > > pieces. What are you thoughts on leaving the iommu on through a reboot to avoid > > this issue (possibly resetting any pci device that encounters a target abort, as > > noted in the error log on the iommu? > > This would only prevent possible data corruption. When the IOMMU is off > the devices will not get a target abort but will only write to different > physical memory locations. The window where a target abort can happen > starts when the kdump kernel re-enables the IOMMU and ends when the new > driver for that device attaches. This is a small window but there is not > a lot we can do to avoid this small time window. > Can you explain this a bit further please? From what I read, when the iommu is disabled, AIUI it does no translations. That means that any dma addresses which the driver mapped via the iommu prior to a crash that are stored in devices will just get strobed on the bus without any translation. If those dma address do not lay on top of any physical ram, won't that lead to bus errors, and transaction aborts? Worse, if those dma addresses do lie on top of real physical addresses, won't we get corruption in various places? Or am I missing part of how that works? Regards Neil > Joerg > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 17:11 ` Neil Horman @ 2010-04-01 20:14 ` Joerg Roedel 2010-04-02 0:00 ` Neil Horman 0 siblings, 1 reply; 41+ messages in thread From: Joerg Roedel @ 2010-04-01 20:14 UTC (permalink / raw) To: Neil Horman Cc: Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal On Thu, Apr 01, 2010 at 01:11:49PM -0400, Neil Horman wrote: > On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote: > > The possible fix will be to enable the hardware earlier in the > > initialization path. > > > That sounds like a reasonable theory, I'll try hack something together > shortly. Great. So the problem might be already fixed when I am back in the office ;-) > > This would only prevent possible data corruption. When the IOMMU is off > > the devices will not get a target abort but will only write to different > > physical memory locations. The window where a target abort can happen > > starts when the kdump kernel re-enables the IOMMU and ends when the new > > driver for that device attaches. This is a small window but there is not > > a lot we can do to avoid this small time window. > > > Can you explain this a bit further please? From what I read, when the iommu is > disabled, AIUI it does no translations. That means that any dma addresses which > the driver mapped via the iommu prior to a crash that are stored in devices will > just get strobed on the bus without any translation. If those dma address do > not lay on top of any physical ram, won't that lead to bus errors, and > transaction aborts? Worse, if those dma addresses do lie on top of real > physical addresses, won't we get corruption in various places? Or am I missing > part of how that works? Hm, the device address may not be a valid host physical address, thats true. But the problem with the small time-window when the IOMMU hardware is re-programmed from the kdump kernel still exists. I need to think about other possible side-effects of leaving the IOMMU enabled on shutdown^Wboot into a kdump kernel. Joerg ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 20:14 ` Joerg Roedel @ 2010-04-02 0:00 ` Neil Horman 2010-04-02 0:30 ` Chris Wright 0 siblings, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-04-02 0:00 UTC (permalink / raw) To: Joerg Roedel Cc: Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal On Thu, Apr 01, 2010 at 10:14:34PM +0200, Joerg Roedel wrote: > On Thu, Apr 01, 2010 at 01:11:49PM -0400, Neil Horman wrote: > > On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote: > > > > The possible fix will be to enable the hardware earlier in the > > > initialization path. > > > > > That sounds like a reasonable theory, I'll try hack something together > > shortly. > > Great. So the problem might be already fixed when I am back in the > office ;-) > Don't hold your breath, but I'll try my best :) > > > This would only prevent possible data corruption. When the IOMMU is off > > > the devices will not get a target abort but will only write to different > > > physical memory locations. The window where a target abort can happen > > > starts when the kdump kernel re-enables the IOMMU and ends when the new > > > driver for that device attaches. This is a small window but there is not > > > a lot we can do to avoid this small time window. > > > > > Can you explain this a bit further please? From what I read, when the iommu is > > disabled, AIUI it does no translations. That means that any dma addresses which > > the driver mapped via the iommu prior to a crash that are stored in devices will > > just get strobed on the bus without any translation. If those dma address do > > not lay on top of any physical ram, won't that lead to bus errors, and > > transaction aborts? Worse, if those dma addresses do lie on top of real > > physical addresses, won't we get corruption in various places? Or am I missing > > part of how that works? > > Hm, the device address may not be a valid host physical address, thats > true. But the problem with the small time-window when the IOMMU hardware > is re-programmed from the kdump kernel still exists. > I need to think about other possible side-effects of leaving the IOMMU > enabled on shutdown^Wboot into a kdump kernel. > I think its an interesting angle to consider. Thats why I was talking about cloning the old tables in the new kdump kernel and using the error log to filter out entries that we could safely assume were complete until enough of the iommu page tables were free, so that we could continue to hobble along in the kdump kernel until we got to a proper reboot. All just thought experiment of course. I'll try tinkering with your idea above first. Neil > Joerg > > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-02 0:00 ` Neil Horman @ 2010-04-02 0:30 ` Chris Wright 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright 0 siblings, 1 reply; 41+ messages in thread From: Chris Wright @ 2010-04-02 0:30 UTC (permalink / raw) To: Neil Horman Cc: Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal * Neil Horman (nhorman@tuxdriver.com) wrote: > On Thu, Apr 01, 2010 at 10:14:34PM +0200, Joerg Roedel wrote: > > On Thu, Apr 01, 2010 at 01:11:49PM -0400, Neil Horman wrote: > > > On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote: > > > > > > The possible fix will be to enable the hardware earlier in the > > > > initialization path. > > > > > > > That sounds like a reasonable theory, I'll try hack something together > > > shortly. > > > > Great. So the problem might be already fixed when I am back in the > > office ;-) > > > Don't hold your breath, but I'll try my best :) The problem we had before was a combo of not clearing PDE and not having the command buffer set up properly. Now we have basically the same problem. The attach is running too early, so the invalidate commands are falling on deaf ears. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 0:30 ` Chris Wright @ 2010-04-02 1:23 ` Chris Wright 2010-04-02 1:31 ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright ` (4 more replies) 0 siblings, 5 replies; 41+ messages in thread From: Chris Wright @ 2010-04-02 1:23 UTC (permalink / raw) To: Joerg Roedel Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal Hit another kdump problem as reported by Neil Horman. When initializaing the IOMMU, we attach devices to their domains before the IOMMU is fully (re)initialized. Attaching a device will issue some important invalidations. In the context of the newly kexec'd kdump kernel, the IOMMU may have stale cached data from the original kernel. Because we do the attach too early, the invalidation commands are placed in the new command buffer before the IOMMU is updated w/ that buffer. This leaves the stale entries in the kdump context and can renders device unusable. Simply enable the IOMMU before we do the attach. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> --- arch/x86/kernel/amd_iommu_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/amd_iommu_init.c +++ b/arch/x86/kernel/amd_iommu_init.c @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void) if (ret) goto free; + enable_iommus(); + if (iommu_pass_through) ret = amd_iommu_init_passthrough(); else @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void) amd_iommu_init_notifier(); - enable_iommus(); - if (iommu_pass_through) goto out; ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright @ 2010-04-02 1:31 ` Chris Wright 2010-04-02 1:35 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman ` (3 subsequent siblings) 4 siblings, 0 replies; 41+ messages in thread From: Chris Wright @ 2010-04-02 1:31 UTC (permalink / raw) To: Joerg Roedel Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal To catch future potential issues we can add a warning whenever we issue a command before the command buffer is fully initialized. Signed-off-by: Chris Wright <chrisw@sous-sol.org> --- Or not...this is just if you think it's useful ;-) arch/x86/include/asm/amd_iommu_types.h | 1 + arch/x86/kernel/amd_iommu.c | 1 + arch/x86/kernel/amd_iommu_init.c | 5 +++-- 3 files changed, 5 insertions(+), 2 deletions(-) --- a/arch/x86/include/asm/amd_iommu_types.h +++ b/arch/x86/include/asm/amd_iommu_types.h @@ -140,6 +140,7 @@ /* constants to configure the command buffer */ #define CMD_BUFFER_SIZE 8192 +#define CMD_BUFFER_UNINITIALIZED 1 #define CMD_BUFFER_ENTRIES 512 #define MMIO_CMD_SIZE_SHIFT 56 #define MMIO_CMD_SIZE_512 (0x9ULL << MMIO_CMD_SIZE_SHIFT) --- a/arch/x86/kernel/amd_iommu.c +++ b/arch/x86/kernel/amd_iommu.c @@ -392,6 +392,7 @@ static int __iommu_queue_command(struct u32 tail, head; u8 *target; + WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);; tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET); target = iommu->cmd_buf + tail; memcpy_toio(target, cmd, sizeof(*cmd)); --- a/arch/x86/kernel/amd_iommu_init.c +++ b/arch/x86/kernel/amd_iommu_init.c @@ -436,7 +436,7 @@ static u8 * __init alloc_command_buffer( if (cmd_buf == NULL) return NULL; - iommu->cmd_buf_size = CMD_BUFFER_SIZE; + iommu->cmd_buf_size = CMD_BUFFER_SIZE | CMD_BUFFER_UNINITIALIZED; return cmd_buf; } @@ -453,6 +453,7 @@ void amd_iommu_reset_cmd_buffer(struct a writel(0x00, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET); iommu_feature_enable(iommu, CONTROL_CMDBUF_EN); + iommu->cmd_buf_size &= ~(CMD_BUFFER_UNINITIALIZED); } /* @@ -477,7 +478,7 @@ static void iommu_enable_command_buffer( static void __init free_command_buffer(struct amd_iommu *iommu) { free_pages((unsigned long)iommu->cmd_buf, - get_order(iommu->cmd_buf_size)); + get_order(iommu->cmd_buf_size & ~(CMD_BUFFER_UNINITIALIZED))); } /* allocates the memory where the IOMMU will log its events to */ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright 2010-04-02 1:31 ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright @ 2010-04-02 1:35 ` Neil Horman 2010-04-02 1:38 ` Chris Wright 2010-04-02 9:11 ` Joerg Roedel ` (2 subsequent siblings) 4 siblings, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-04-02 1:35 UTC (permalink / raw) To: Chris Wright Cc: Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote: > Hit another kdump problem as reported by Neil Horman. When initializaing > the IOMMU, we attach devices to their domains before the IOMMU is > fully (re)initialized. Attaching a device will issue some important > invalidations. In the context of the newly kexec'd kdump kernel, the > IOMMU may have stale cached data from the original kernel. Because we > do the attach too early, the invalidation commands are placed in the new > command buffer before the IOMMU is updated w/ that buffer. This leaves > the stale entries in the kdump context and can renders device unusable. > Simply enable the IOMMU before we do the attach. > > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Signed-off-by: Chris Wright <chrisw@sous-sol.org> I'll test this out this weekend, thanks Chris! Neil > --- > arch/x86/kernel/amd_iommu_init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/arch/x86/kernel/amd_iommu_init.c > +++ b/arch/x86/kernel/amd_iommu_init.c > @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void) > if (ret) > goto free; > > + enable_iommus(); > + > if (iommu_pass_through) > ret = amd_iommu_init_passthrough(); > else > @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void) > > amd_iommu_init_notifier(); > > - enable_iommus(); > - > if (iommu_pass_through) > goto out; > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 1:35 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman @ 2010-04-02 1:38 ` Chris Wright 0 siblings, 0 replies; 41+ messages in thread From: Chris Wright @ 2010-04-02 1:38 UTC (permalink / raw) To: Neil Horman Cc: Chris Wright, Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal * Neil Horman (nhorman@redhat.com) wrote: > On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote: > > Hit another kdump problem as reported by Neil Horman. When initializaing > > the IOMMU, we attach devices to their domains before the IOMMU is > > fully (re)initialized. Attaching a device will issue some important > > invalidations. In the context of the newly kexec'd kdump kernel, the > > IOMMU may have stale cached data from the original kernel. Because we > > do the attach too early, the invalidation commands are placed in the new > > command buffer before the IOMMU is updated w/ that buffer. This leaves > > the stale entries in the kdump context and can renders device unusable. > > Simply enable the IOMMU before we do the attach. > > > > Cc: Neil Horman <nhorman@tuxdriver.com> > > Cc: Vivek Goyal <vgoyal@redhat.com> > > Signed-off-by: Chris Wright <chrisw@sous-sol.org> > > I'll test this out this weekend, thanks Chris! Great, thanks! I tested w/ both default and iommu=pt. Both worked, didn't spot any regressions. But additional testing is very welcome. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright 2010-04-02 1:31 ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright 2010-04-02 1:35 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman @ 2010-04-02 9:11 ` Joerg Roedel 2010-04-02 23:59 ` Chris Wright 2010-04-02 15:59 ` Vivek Goyal 2010-04-05 15:34 ` Neil Horman 4 siblings, 1 reply; 41+ messages in thread From: Joerg Roedel @ 2010-04-02 9:11 UTC (permalink / raw) To: Chris Wright Cc: Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote: > Hit another kdump problem as reported by Neil Horman. When initializaing > the IOMMU, we attach devices to their domains before the IOMMU is > fully (re)initialized. Attaching a device will issue some important > invalidations. In the context of the newly kexec'd kdump kernel, the > IOMMU may have stale cached data from the original kernel. Because we > do the attach too early, the invalidation commands are placed in the new > command buffer before the IOMMU is updated w/ that buffer. This leaves > the stale entries in the kdump context and can renders device unusable. > Simply enable the IOMMU before we do the attach. > > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Signed-off-by: Chris Wright <chrisw@sous-sol.org> > --- > arch/x86/kernel/amd_iommu_init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/arch/x86/kernel/amd_iommu_init.c > +++ b/arch/x86/kernel/amd_iommu_init.c > @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void) > if (ret) > goto free; > > + enable_iommus(); > + > if (iommu_pass_through) > ret = amd_iommu_init_passthrough(); > else > @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void) > > amd_iommu_init_notifier(); > > - enable_iommus(); > - > if (iommu_pass_through) > goto out; Ok, good to know this fixes the problem. One issue: If the initialization of the domains fails the iommu hardware needs to be disabled again in the free path. Thanks, Joerg ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 9:11 ` Joerg Roedel @ 2010-04-02 23:59 ` Chris Wright 0 siblings, 0 replies; 41+ messages in thread From: Chris Wright @ 2010-04-02 23:59 UTC (permalink / raw) To: Joerg Roedel Cc: Chris Wright, Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal * Joerg Roedel (joro@8bytes.org) wrote: > Ok, good to know this fixes the problem. One issue: If the > initialization of the domains fails the iommu hardware needs to be > disabled again in the free path. Sure, thanks for catching. Will resend shortly thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright ` (2 preceding siblings ...) 2010-04-02 9:11 ` Joerg Roedel @ 2010-04-02 15:59 ` Vivek Goyal 2010-04-02 22:38 ` Chris Wright 2010-04-03 17:38 ` Joerg Roedel 2010-04-05 15:34 ` Neil Horman 4 siblings, 2 replies; 41+ messages in thread From: Vivek Goyal @ 2010-04-02 15:59 UTC (permalink / raw) To: Chris Wright Cc: Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote: > Hit another kdump problem as reported by Neil Horman. When initializaing > the IOMMU, we attach devices to their domains before the IOMMU is > fully (re)initialized. Attaching a device will issue some important > invalidations. In the context of the newly kexec'd kdump kernel, the > IOMMU may have stale cached data from the original kernel. Because we > do the attach too early, the invalidation commands are placed in the new > command buffer before the IOMMU is updated w/ that buffer. This leaves > the stale entries in the kdump context and can renders device unusable. > Simply enable the IOMMU before we do the attach. > > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Signed-off-by: Chris Wright <chrisw@sous-sol.org> > --- > arch/x86/kernel/amd_iommu_init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/arch/x86/kernel/amd_iommu_init.c > +++ b/arch/x86/kernel/amd_iommu_init.c > @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void) > if (ret) > goto free; > > + enable_iommus(); > + > if (iommu_pass_through) > ret = amd_iommu_init_passthrough(); > else > @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void) > > amd_iommu_init_notifier(); > > - enable_iommus(); > - Ok, so now we do enable_iommu() before we attach devices and flush DTE and domain PDE, IO TLB. That makes sense. Following is just for my education purposes. Trying to understand better the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be sequence of events. 1. kernel crashes, we leave IOMMU enabled. 2. boot into capture kernel and we call enable_iommus(). This function first disables iommu, sets up new device table and enables iommus again. a. So during this small window when iommu is disabled and we enable it back, any inflight DMA will passthrough possibly to an unintended physical address as translation is disabled and it can corrupt the kdump kenrel. b. Even after enabling the iommu, I guess we will continue to use cached DTE, and translation information to handle any in-flight DMA. The difference is that now iommus are enabled so any in-flight DMA should go to the address as intended in first kenrel and should not corrupt anything. 3. Once iommus are enabled again, we allocated and initilize protection domains. We attach devices to domains. In the process we flush the DTE, PDE and IO TLBs. c. Looks like do_attach->set_dte_entry(), by default gives write permission (IW) to all the devices. I am assuming that at this point of time translation is enabled and possibly unity mapped. If that's the case, any in-flight DMA will not be allowed to happen at unity mapped address and this can possibly corrupt the kernel? I understand this patch should fix the case when in second kernel a device is not doing DMA because of possibly cached DTE, and translation information. But looks like in-flight DMA issues will still need a closer look. But that is a separate issue and needs to be addressed in separate set of patches. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 15:59 ` Vivek Goyal @ 2010-04-02 22:38 ` Chris Wright 2010-04-02 22:55 ` Eric W. Biederman 2010-04-03 17:38 ` Joerg Roedel 1 sibling, 1 reply; 41+ messages in thread From: Chris Wright @ 2010-04-02 22:38 UTC (permalink / raw) To: Vivek Goyal Cc: Chris Wright, Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman * Vivek Goyal (vgoyal@redhat.com) wrote: > Following is just for my education purposes. Trying to understand better > the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be > sequence of events. > > 1. kernel crashes, we leave IOMMU enabled. No, we actually disable the IOMMU panic() crash_exec() machine_crash_shutdown() native_machine_crash_shutdown() x86_platform.iommu_shutdown() == amd_iommu_init.c::disable_iommus() > 2. boot into capture kernel and we call enable_iommus(). This function > first disables iommu, sets up new device table and enables iommus > again. > a. So during this small window when iommu is disabled and we enable > it back, any inflight DMA will passthrough possibly to an > unintended physical address as translation is disabled and it > can corrupt the kdump kenrel. Yes, should be possible. > b. Even after enabling the iommu, I guess we will continue to > use cached DTE, and translation information to handle any > in-flight DMA. The difference is that now iommus are enabled > so any in-flight DMA should go to the address as intended in > first kenrel and should not corrupt anything. Cached DTE only gets you domainID and pt root, so results depend upon other caches too. > 3. Once iommus are enabled again, we allocated and initilize protection > domains. We attach devices to domains. In the process we flush the > DTE, PDE and IO TLBs. Yes, but this is immediately after 2b above. We can't send the invalidate commands until the iommu is enabled. > c. Looks like do_attach->set_dte_entry(), by default gives write > permission (IW) to all the devices. I am assuming that at > this point of time translation is enabled and possibly unity > mapped. Unlikely to have full unity map, it's typically just for a range, and this is also typically for devices that have interaction w/ BIOS (typical examples are usb/SMM for keyboard and integrated graphics). And the mapping is to reserved memory areas (BIOS regions). On my test box, for example, there are none of the BIOS specified ranges. > If that's the case, any in-flight DMA will not be > allowed to happen at unity mapped address and this can possibly > corrupt the kernel? I don't understand what you mean there. > I understand this patch should fix the case when in second kernel a > device is not doing DMA because of possibly cached DTE, and translation > information. But looks like in-flight DMA issues will still need a closer > look. But that is a separate issue and needs to be addressed in separate > set of patches. All of the in-flight DMA corrupts kdump kernel memory issues can exist w/out an IOMMU. On the one hand, the IOMMU actually reduces the window of opportunity, but on the other it adds possible translation issue. About the only thing we can do here is modify the crashing kernel to update the DTE to disable dma, invalidate, and leave IOMMU enabled. This is all in the context of crashing, so any further errors here mean we probably won't get to kdump kernel. The kexec'd kernel will still need to defensively prepare things as it always does since it can't trust anything that came before it. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 22:38 ` Chris Wright @ 2010-04-02 22:55 ` Eric W. Biederman 2010-04-02 23:57 ` Chris Wright 0 siblings, 1 reply; 41+ messages in thread From: Eric W. Biederman @ 2010-04-02 22:55 UTC (permalink / raw) To: Chris Wright Cc: Vivek Goyal, Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu Chris Wright <chrisw@sous-sol.org> writes: > * Vivek Goyal (vgoyal@redhat.com) wrote: >> Following is just for my education purposes. Trying to understand better >> the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be >> sequence of events. >> >> 1. kernel crashes, we leave IOMMU enabled. > > No, we actually disable the IOMMU > > panic() > crash_exec() > machine_crash_shutdown() > native_machine_crash_shutdown() > x86_platform.iommu_shutdown() == amd_iommu_init.c::disable_iommus() That call exists but is there any good reason for it? Nothing should be on that can be reasonably done anywhere else. This round of bug fixing and testing would be a good time to remove that extraneous call. >> 2. boot into capture kernel and we call enable_iommus(). This function >> first disables iommu, sets up new device table and enables iommus >> again. >> a. So during this small window when iommu is disabled and we enable >> it back, any inflight DMA will passthrough possibly to an >> unintended physical address as translation is disabled and it >> can corrupt the kdump kenrel. > > Yes, should be possible. > >> b. Even after enabling the iommu, I guess we will continue to >> use cached DTE, and translation information to handle any >> in-flight DMA. The difference is that now iommus are enabled >> so any in-flight DMA should go to the address as intended in >> first kenrel and should not corrupt anything. > > Cached DTE only gets you domainID and pt root, so results depend upon > other caches too. > >> 3. Once iommus are enabled again, we allocated and initilize protection >> domains. We attach devices to domains. In the process we flush the >> DTE, PDE and IO TLBs. > > Yes, but this is immediately after 2b above. We can't send the invalidate > commands until the iommu is enabled. > >> c. Looks like do_attach->set_dte_entry(), by default gives write >> permission (IW) to all the devices. I am assuming that at >> this point of time translation is enabled and possibly unity >> mapped. > > Unlikely to have full unity map, it's typically just for a range, > and this is also typically for devices that have interaction w/ BIOS > (typical examples are usb/SMM for keyboard and integrated graphics). > And the mapping is to reserved memory areas (BIOS regions). On my test > box, for example, there are none of the BIOS specified ranges. > >> If that's the case, any in-flight DMA will not be >> allowed to happen at unity mapped address and this can possibly >> corrupt the kernel? > > I don't understand what you mean there. > >> I understand this patch should fix the case when in second kernel a >> device is not doing DMA because of possibly cached DTE, and translation >> information. But looks like in-flight DMA issues will still need a closer >> look. But that is a separate issue and needs to be addressed in separate >> set of patches. > > All of the in-flight DMA corrupts kdump kernel memory issues can exist > w/out an IOMMU. On the one hand, the IOMMU actually reduces the window > of opportunity, but on the other it adds possible translation issue. ? If disabling the IOMMU changes the destination in RAM of in-flight DMA the issues are not at all the same. > About the only thing we can do here is modify the crashing kernel to > update the DTE to disable dma, invalidate, and leave IOMMU enabled. > This is all in the context of crashing, so any further errors here mean > we probably won't get to kdump kernel. Why even touch the IOMMU? > The kexec'd kernel will still need to defensively prepare things as it > always does since it can't trust anything that came before it. Yes. Eric ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 22:55 ` Eric W. Biederman @ 2010-04-02 23:57 ` Chris Wright 0 siblings, 0 replies; 41+ messages in thread From: Chris Wright @ 2010-04-02 23:57 UTC (permalink / raw) To: Eric W. Biederman Cc: Chris Wright, Vivek Goyal, Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu * Eric W. Biederman (ebiederm@xmission.com) wrote: > Chris Wright <chrisw@sous-sol.org> writes: > > > * Vivek Goyal (vgoyal@redhat.com) wrote: > >> Following is just for my education purposes. Trying to understand better > >> the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be > >> sequence of events. > >> > >> 1. kernel crashes, we leave IOMMU enabled. > > > > No, we actually disable the IOMMU > > > > panic() > > crash_exec() > > machine_crash_shutdown() > > native_machine_crash_shutdown() > > x86_platform.iommu_shutdown() == amd_iommu_init.c::disable_iommus() > > That call exists but is there any good reason for it? It was originally added it for this very case, although I don't think it should be there. I too am a proponent of removing it, because I think all work should be done in the newly kexec'd kernel. > Nothing should be on that can be reasonably done anywhere else. > This round of bug fixing and testing would be a good time to remove > that extraneous call. > > >> 2. boot into capture kernel and we call enable_iommus(). This function > >> first disables iommu, sets up new device table and enables iommus > >> again. > >> a. So during this small window when iommu is disabled and we enable > >> it back, any inflight DMA will passthrough possibly to an > >> unintended physical address as translation is disabled and it > >> can corrupt the kdump kenrel. > > > > Yes, should be possible. > > > >> b. Even after enabling the iommu, I guess we will continue to > >> use cached DTE, and translation information to handle any > >> in-flight DMA. The difference is that now iommus are enabled > >> so any in-flight DMA should go to the address as intended in > >> first kenrel and should not corrupt anything. > > > > Cached DTE only gets you domainID and pt root, so results depend upon > > other caches too. > > > >> 3. Once iommus are enabled again, we allocated and initilize protection > >> domains. We attach devices to domains. In the process we flush the > >> DTE, PDE and IO TLBs. > > > > Yes, but this is immediately after 2b above. We can't send the invalidate > > commands until the iommu is enabled. > > > >> c. Looks like do_attach->set_dte_entry(), by default gives write > >> permission (IW) to all the devices. I am assuming that at > >> this point of time translation is enabled and possibly unity > >> mapped. > > > > Unlikely to have full unity map, it's typically just for a range, > > and this is also typically for devices that have interaction w/ BIOS > > (typical examples are usb/SMM for keyboard and integrated graphics). > > And the mapping is to reserved memory areas (BIOS regions). On my test > > box, for example, there are none of the BIOS specified ranges. > > > >> If that's the case, any in-flight DMA will not be > >> allowed to happen at unity mapped address and this can possibly > >> corrupt the kernel? > > > > I don't understand what you mean there. > > > >> I understand this patch should fix the case when in second kernel a > >> device is not doing DMA because of possibly cached DTE, and translation > >> information. But looks like in-flight DMA issues will still need a closer > >> look. But that is a separate issue and needs to be addressed in separate > >> set of patches. > > > > All of the in-flight DMA corrupts kdump kernel memory issues can exist > > w/out an IOMMU. On the one hand, the IOMMU actually reduces the window > > of opportunity, but on the other it adds possible translation issue. > > ? > If disabling the IOMMU changes the destination in RAM of in-flight DMA > the issues are not at all the same. > > > About the only thing we can do here is modify the crashing kernel to > > update the DTE to disable dma, invalidate, and leave IOMMU enabled. > > This is all in the context of crashing, so any further errors here mean > > we probably won't get to kdump kernel. > > Why even touch the IOMMU? I agree, and was trying to show that it's risky to do that while crashing. Removing the IOMMU disable may still allow some DMA to succeed. To actually close that off, you have to force all DMA transactions to fail. Personally, I'm skeptical that the tradeoff is worth it. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 15:59 ` Vivek Goyal 2010-04-02 22:38 ` Chris Wright @ 2010-04-03 17:38 ` Joerg Roedel 2010-04-05 14:17 ` Vivek Goyal 1 sibling, 1 reply; 41+ messages in thread From: Joerg Roedel @ 2010-04-03 17:38 UTC (permalink / raw) To: Vivek Goyal Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman On Fri, Apr 02, 2010 at 11:59:32AM -0400, Vivek Goyal wrote: > 1. kernel crashes, we leave IOMMU enabled. True for everything except gart and amd iommu. > a. So during this small window when iommu is disabled and we enable > it back, any inflight DMA will passthrough possibly to an > unintended physical address as translation is disabled and it > can corrupt the kdump kenrel. Right. > b. Even after enabling the iommu, I guess we will continue to > use cached DTE, and translation information to handle any > in-flight DMA. The difference is that now iommus are enabled > so any in-flight DMA should go to the address as intended in > first kenrel and should not corrupt anything. Right. > > 3. Once iommus are enabled again, we allocated and initilize protection > domains. We attach devices to domains. In the process we flush the > DTE, PDE and IO TLBs. > > c. Looks like do_attach->set_dte_entry(), by default gives write > permission (IW) to all the devices. I am assuming that at > this point of time translation is enabled and possibly unity > mapped. No, The IW bit in the DTE must be set because all write permission bits (DTE and page tabled) are ANDed to determine if a device can write to a particular address. So as long as the paging mode is unequal to zero the hardware will walk the page-table first to find out if the device has write permission. With paging mode == 0 your statement about read-write unity-mapping is true. This is used for a pass-through domain (iommu=pt) btw. Joerg ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-03 17:38 ` Joerg Roedel @ 2010-04-05 14:17 ` Vivek Goyal 2010-04-05 14:32 ` Joerg Roedel 0 siblings, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2010-04-05 14:17 UTC (permalink / raw) To: Joerg Roedel Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman On Sat, Apr 03, 2010 at 07:38:36PM +0200, Joerg Roedel wrote: > On Fri, Apr 02, 2010 at 11:59:32AM -0400, Vivek Goyal wrote: > > 1. kernel crashes, we leave IOMMU enabled. > > True for everything except gart and amd iommu. > > > a. So during this small window when iommu is disabled and we enable > > it back, any inflight DMA will passthrough possibly to an > > unintended physical address as translation is disabled and it > > can corrupt the kdump kenrel. > > Right. > > > b. Even after enabling the iommu, I guess we will continue to > > use cached DTE, and translation information to handle any > > in-flight DMA. The difference is that now iommus are enabled > > so any in-flight DMA should go to the address as intended in > > first kenrel and should not corrupt anything. > > Right. > > > > > 3. Once iommus are enabled again, we allocated and initilize protection > > domains. We attach devices to domains. In the process we flush the > > DTE, PDE and IO TLBs. > > > > c. Looks like do_attach->set_dte_entry(), by default gives write > > permission (IW) to all the devices. I am assuming that at > > this point of time translation is enabled and possibly unity > > mapped. > > No, The IW bit in the DTE must be set because all write permission bits > (DTE and page tabled) are ANDed to determine if a device can write to a > particular address. So as long as the paging mode is unequal to zero the > hardware will walk the page-table first to find out if the device has > write permission. And by default valid PTEs are not present (except for some unity mappings as specified by ACPI tables), so we will end the transaction with IO_PAGE_FAULT? I am assuming that we will not set unity mappings for kernel reserved area and so either an in-flight DMA will not be allowed and IO_PAGE_FAULT will be logged or it will be allowed to some unity mapping which is not mapped to kdump kernel area hence no corruption of capture kernel? > With paging mode == 0 your statement about read-write > unity-mapping is true. This is used for a pass-through domain (iommu=pt) > btw. Ok, so in case of pass through, I think one just needs to make sure that don't use iommu=pt in second kernel if one did not use iommu=pt in first kernel. Otherwise you can redirect the the in-flight DMAs in second kernel to an entirely unintended physical memory. So following seems to be the summary. - Don't disable AMD IOMMU after crash in machine_crash_shutdown(), because disabling it can direct in-flight DMAs to unintended physical meory areas and can corrupt other data structures. - Once the iommu is enabled in second kernel, most likely in-flight DMAs will end with IO_PAGE_FAULT (iommu!=pt). Only selective unity mapping areas will be setup based on ACPI tables and these should be BIOS region and should not overlap with kdump reserved memory. iommu=pt should also be safe if iommu=pt was used in first kernel also. - Only small window where in-flight DMA can corrupt things is when we are initializing iommu in second kernel. (We first disable iommu and then enable it back). During this small period translation will be disabled and some IO can go to unintended address. And there does not seem to be any easy way to plug this hole. Have I got it right? Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-05 14:17 ` Vivek Goyal @ 2010-04-05 14:32 ` Joerg Roedel 0 siblings, 0 replies; 41+ messages in thread From: Joerg Roedel @ 2010-04-05 14:32 UTC (permalink / raw) To: Vivek Goyal Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman On Mon, Apr 05, 2010 at 10:17:50AM -0400, Vivek Goyal wrote: > And by default valid PTEs are not present (except for some unity mappings > as specified by ACPI tables), so we will end the transaction with > IO_PAGE_FAULT? I am assuming that we will not set unity mappings for > kernel reserved area and so either an in-flight DMA will not be allowed > and IO_PAGE_FAULT will be logged or it will be allowed to some unity > mapping which is not mapped to kdump kernel area hence no corruption of > capture kernel? Right. The unity-mappings are typically used for devices that are controled by the BIOS and define memory regions owned by the BIOS. So Linux will not use the unity mapped regions anyway, not in the first kernel and not in the kdump kernel. > > With paging mode == 0 your statement about read-write > > unity-mapping is true. This is used for a pass-through domain (iommu=pt) > > btw. > > Ok, so in case of pass through, I think one just needs to make sure that > don't use iommu=pt in second kernel if one did not use iommu=pt in first kernel. > Otherwise you can redirect the the in-flight DMAs in second kernel to an > entirely unintended physical memory. The kdump kernel should use the same setting as the plain kernel. > So following seems to be the summary. > > - Don't disable AMD IOMMU after crash in machine_crash_shutdown(), because > disabling it can direct in-flight DMAs to unintended physical meory > areas and can corrupt other data structures. Right, that really seems to be the best solution. > - Once the iommu is enabled in second kernel, most likely in-flight DMAs > will end with IO_PAGE_FAULT (iommu!=pt). Only selective unity mapping > areas will be setup based on ACPI tables and these should be BIOS region > and should not overlap with kdump reserved memory. iommu=pt should also > be safe if iommu=pt was used in first kernel also. Right. With Chris' patches the DTE entries of newly attached domains are flushed at IOMMU initialization in the kdump kernel. So the new data structures are in place and used by the hardware. > - Only small window where in-flight DMA can corrupt things is when we > are initializing iommu in second kernel. (We first disable iommu and then > enable it back). During this small period translation will be disabled and > some IO can go to unintended address. And there does not seem to be any easy > way to plug this hole. Right. > Have I got it right? Yes :-) Joerg ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright ` (3 preceding siblings ...) 2010-04-02 15:59 ` Vivek Goyal @ 2010-04-05 15:34 ` Neil Horman 4 siblings, 0 replies; 41+ messages in thread From: Neil Horman @ 2010-04-05 15:34 UTC (permalink / raw) To: Chris Wright Cc: Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote: > Hit another kdump problem as reported by Neil Horman. When initializaing > the IOMMU, we attach devices to their domains before the IOMMU is > fully (re)initialized. Attaching a device will issue some important > invalidations. In the context of the newly kexec'd kdump kernel, the > IOMMU may have stale cached data from the original kernel. Because we > do the attach too early, the invalidation commands are placed in the new > command buffer before the IOMMU is updated w/ that buffer. This leaves > the stale entries in the kdump context and can renders device unusable. > Simply enable the IOMMU before we do the attach. > > Cc: Neil Horman <nhorman@tuxdriver.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Signed-off-by: Chris Wright <chrisw@sous-sol.org> > --- > arch/x86/kernel/amd_iommu_init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/arch/x86/kernel/amd_iommu_init.c > +++ b/arch/x86/kernel/amd_iommu_init.c > @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void) > if (ret) > goto free; > > + enable_iommus(); > + > if (iommu_pass_through) > ret = amd_iommu_init_passthrough(); > else > @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void) > > amd_iommu_init_notifier(); > > - enable_iommus(); > - > if (iommu_pass_through) > goto out; > > FWIW, this patch series fixed the kdump issues I was having on my AMD system here. Thanks Chris Acked-by: Neil Horman <nhorman@tuxdriver.com> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 15:54 ` Vivek Goyal 2010-03-31 18:28 ` Neil Horman @ 2010-03-31 18:43 ` Eric W. Biederman 1 sibling, 0 replies; 41+ messages in thread From: Eric W. Biederman @ 2010-03-31 18:43 UTC (permalink / raw) To: Vivek Goyal; +Cc: Neil Horman, iommu, joerg.roedel, hbabu, kexec, linux-kernel Vivek Goyal <vgoyal@redhat.com> writes: > On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote: >> Flush iommu during shutdown >> >> When using an iommu, its possible, if a kdump kernel boot follows a primary >> kernel crash, that dma operations might still be in flight from the previous >> kernel during the kdump kernel boot. This can lead to memory corruption, >> crashes, and other erroneous behavior, specifically I've seen it manifest during >> a kdump boot as endless iommu error log entries of the form: >> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d >> address=0x000000000245a0c0 flags=0x0070] >> >> Followed by an inability to access hard drives, and various other resources. >> >> I've written this fix for it. In short it just forces a flush of the in flight >> dma operations on shutdown, so that the new kernel is certain not to have any >> in-flight dmas trying to complete after we've reset all the iommu page tables, >> causing the above errors. I've tested it and it fixes the problem for me quite >> well. > > CCing Eric also. Thanks. > Neil, this is interesting. In the past we noticed similar issues, > especially on PPC. But I was told that we could not clear the iommu > mapping entries as we had no control on in flight DMA and if a DMA comes > later after clearing an entry and entry is not present, it is an error. Which is exactly what the reported error looks like. > Hence one of the suggestions was not to clear iommu mapping entries but > reserve some for kdump operation and use those in kdump kernel. > > So this call amd_iommu_flush_all_devices() will be able to tell devices > that don't do any more DMAs and hence it is safe to reprogram iommu > mapping entries. I took a quick look at our crash shutdown path and I am very disappointed with the way it has gone lately. Regardless of the merits flushing an iommu versus doing things with an iommu I don't see how we are in any better position in the crash kernel then we are in the kdump kernel. So what are we doing touching it in the kdump path? Likewise for the hpet. We also seem to be at a point where if we have a tsc we don't need to enable interrupts until we are ready to enable them in native mode. And except for a few weird SMP 486's tsc and apics came in at the same time. So my grumpy code review says we should gut crash.c (like below) and fix the initialization paths so they do the right thing. --- arch/x86/kernel/crash.c | 18 ------------------ 1 files changed, 0 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index a4849c1..8e33c50 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -22,12 +22,10 @@ #include <asm/nmi.h> #include <asm/hw_irq.h> #include <asm/apic.h> -#include <asm/hpet.h> #include <linux/kdebug.h> #include <asm/cpu.h> #include <asm/reboot.h> #include <asm/virtext.h> -#include <asm/x86_init.h> #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC) @@ -56,15 +54,11 @@ static void kdump_nmi_callback(int cpu, struct die_args *args) */ cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); - - disable_local_APIC(); } static void kdump_nmi_shootdown_cpus(void) { nmi_shootdown_cpus(kdump_nmi_callback); - - disable_local_APIC(); } #else @@ -96,17 +90,5 @@ void native_machine_crash_shutdown(struct pt_regs *regs) cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); - lapic_shutdown(); -#if defined(CONFIG_X86_IO_APIC) - disable_IO_APIC(); -#endif -#ifdef CONFIG_HPET_TIMER - hpet_disable(); -#endif - -#ifdef CONFIG_X86_64 - x86_platform.iommu_shutdown(); -#endif - crash_save_cpu(regs, safe_smp_processor_id()); } -- 1.6.5.2.143.g8cc62 ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 15:24 [PATCH] amd iommu: force flush of iommu prior during shutdown Neil Horman 2010-03-31 15:54 ` Vivek Goyal @ 2010-03-31 21:25 ` Chris Wright 2010-04-01 1:13 ` Neil Horman 2010-04-01 2:44 ` Vivek Goyal 1 sibling, 2 replies; 41+ messages in thread From: Chris Wright @ 2010-03-31 21:25 UTC (permalink / raw) To: Neil Horman; +Cc: iommu, joerg.roedel, vgoyal, hbabu, kexec, linux-kernel * Neil Horman (nhorman@tuxdriver.com) wrote: > Flush iommu during shutdown > > When using an iommu, its possible, if a kdump kernel boot follows a primary > kernel crash, that dma operations might still be in flight from the previous > kernel during the kdump kernel boot. This can lead to memory corruption, > crashes, and other erroneous behavior, specifically I've seen it manifest during > a kdump boot as endless iommu error log entries of the form: > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > address=0x000000000245a0c0 flags=0x0070] We've already fixed this problem once before, so some code shift must have brought it back. Personally, I prefer to do this on the bringup path than the teardown path. Besides keeping the teardown path as simple as possible (goal is to get to kdump kernel asap), there's also reason to competely flush on startup in genernal in case BIOS has done anything unsavory. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 21:25 ` Chris Wright @ 2010-04-01 1:13 ` Neil Horman 2010-04-01 1:39 ` Chris Wright 2010-04-01 2:24 ` Vivek Goyal 2010-04-01 2:44 ` Vivek Goyal 1 sibling, 2 replies; 41+ messages in thread From: Neil Horman @ 2010-04-01 1:13 UTC (permalink / raw) To: Chris Wright; +Cc: iommu, joerg.roedel, vgoyal, hbabu, kexec, linux-kernel On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > * Neil Horman (nhorman@tuxdriver.com) wrote: > > Flush iommu during shutdown > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > kernel crash, that dma operations might still be in flight from the previous > > kernel during the kdump kernel boot. This can lead to memory corruption, > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > a kdump boot as endless iommu error log entries of the form: > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > address=0x000000000245a0c0 flags=0x0070] > > We've already fixed this problem once before, so some code shift must > have brought it back. Personally, I prefer to do this on the bringup > path than the teardown path. Besides keeping the teardown path as > simple as possible (goal is to get to kdump kernel asap), there's also > reason to competely flush on startup in genernal in case BIOS has done > anything unsavory. > Chris, Can you elaborate on what you did with the iommu to make this safe? It will save me time digging through the history on this code, and help me understand better whats going on here. I was starting to think that we should just leave the iommu on through a kdump, and re-construct a new page table based on the old table (filtered by the error log) on kdump boot, but it sounds like a better solution might be in place. Thanks Neil > thanks, > -chris > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 1:13 ` Neil Horman @ 2010-04-01 1:39 ` Chris Wright 2010-04-01 2:24 ` Vivek Goyal 1 sibling, 0 replies; 41+ messages in thread From: Chris Wright @ 2010-04-01 1:39 UTC (permalink / raw) To: Neil Horman Cc: Chris Wright, iommu, joerg.roedel, vgoyal, hbabu, kexec, linux-kernel * Neil Horman (nhorman@tuxdriver.com) wrote: > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > Flush iommu during shutdown > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > kernel crash, that dma operations might still be in flight from the previous > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > a kdump boot as endless iommu error log entries of the form: > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > address=0x000000000245a0c0 flags=0x0070] > > > > We've already fixed this problem once before, so some code shift must > > have brought it back. Personally, I prefer to do this on the bringup > > path than the teardown path. Besides keeping the teardown path as > > simple as possible (goal is to get to kdump kernel asap), there's also > > reason to competely flush on startup in genernal in case BIOS has done > > anything unsavory. > > > Chris, > Can you elaborate on what you did with the iommu to make this safe? It > will save me time digging through the history on this code, and help me > understand better whats going on here. > > I was starting to think that we should just leave the iommu on through a kdump, > and re-construct a new page table based on the old table (filtered by the error > log) on kdump boot, but it sounds like a better solution might be in place. The code used to simply insure a clean slate on startup by flushing the relevant domain table entry and the cached translations as devices were attached (happens during init of the kernel either base kernel or kdump one). See here: 42a49f965a8d24ed92af04f5b564d63f17fd9c56 a8c485bb6857811807d42f9fd1fde2f5f89cc5c9 What's changed is the initialization doesn't appear to do the proper flushes anymore. Your patch has the effect of puting the back, but during shtudown rather than initialization. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 1:13 ` Neil Horman 2010-04-01 1:39 ` Chris Wright @ 2010-04-01 2:24 ` Vivek Goyal 2010-04-01 12:53 ` Neil Horman 1 sibling, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2010-04-01 2:24 UTC (permalink / raw) To: Neil Horman; +Cc: Chris Wright, iommu, joerg.roedel, hbabu, kexec, linux-kernel On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote: > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > Flush iommu during shutdown > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > kernel crash, that dma operations might still be in flight from the previous > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > a kdump boot as endless iommu error log entries of the form: > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > address=0x000000000245a0c0 flags=0x0070] > > > > We've already fixed this problem once before, so some code shift must > > have brought it back. Personally, I prefer to do this on the bringup > > path than the teardown path. Besides keeping the teardown path as > > simple as possible (goal is to get to kdump kernel asap), there's also > > reason to competely flush on startup in genernal in case BIOS has done > > anything unsavory. > > > Chris, > Can you elaborate on what you did with the iommu to make this safe? It > will save me time digging through the history on this code, and help me > understand better whats going on here. > > I was starting to think that we should just leave the iommu on through a kdump, > and re-construct a new page table based on the old table (filtered by the error > log) on kdump boot, but it sounds like a better solution might be in place. > Hi Neil, Is following sequence possible. - In crashed kernel, take away the write permission from all the devices. Mark bit 62 zero for all devices in device table. - Leave the iommu on and let the device entries be valid in kdump kernel so that any in-flight dma does not become pass through (which can cause more damage and corrupt kdump kernel). - During kdump kernel initialization, load a new device table where again all the devices don't have write permission. looks like by default we create a device table with all bits zero except DEV_ENTRY_VALID and DEV_ENTRY_TRANSLATION bit. - Reset the device where we want to setup any dma or operate on. - Allow device to do DMA/write. So by default all the devices will not be able to do write to memory and selective devices are given access only after a reset. I am not sure what are the dependencies for loading a new device table in second kernel. If it requires disabling the IOMMU, then we leave a window where in-flight dma will become passthrough and has the potential to corrupt kdump kernel. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 2:24 ` Vivek Goyal @ 2010-04-01 12:53 ` Neil Horman 2010-04-01 15:02 ` Vivek Goyal 0 siblings, 1 reply; 41+ messages in thread From: Neil Horman @ 2010-04-01 12:53 UTC (permalink / raw) To: Vivek Goyal Cc: Neil Horman, joerg.roedel, kexec, linux-kernel, Chris Wright, hbabu, iommu On Wed, Mar 31, 2010 at 10:24:18PM -0400, Vivek Goyal wrote: > On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote: > > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > > Flush iommu during shutdown > > > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > > kernel crash, that dma operations might still be in flight from the previous > > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > > a kdump boot as endless iommu error log entries of the form: > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > > address=0x000000000245a0c0 flags=0x0070] > > > > > > We've already fixed this problem once before, so some code shift must > > > have brought it back. Personally, I prefer to do this on the bringup > > > path than the teardown path. Besides keeping the teardown path as > > > simple as possible (goal is to get to kdump kernel asap), there's also > > > reason to competely flush on startup in genernal in case BIOS has done > > > anything unsavory. > > > > > Chris, > > Can you elaborate on what you did with the iommu to make this safe? It > > will save me time digging through the history on this code, and help me > > understand better whats going on here. > > > > I was starting to think that we should just leave the iommu on through a kdump, > > and re-construct a new page table based on the old table (filtered by the error > > log) on kdump boot, but it sounds like a better solution might be in place. > > > > Hi Neil, > > Is following sequence possible. > > - In crashed kernel, take away the write permission from all the devices. > Mark bit 62 zero for all devices in device table. > > - Leave the iommu on and let the device entries be valid in kdump kernel > so that any in-flight dma does not become pass through (which can cause > more damage and corrupt kdump kernel). > > - During kdump kernel initialization, load a new device table where again > all the devices don't have write permission. looks like by default > we create a device table with all bits zero except DEV_ENTRY_VALID > and DEV_ENTRY_TRANSLATION bit. > > - Reset the device where we want to setup any dma or operate on. > > - Allow device to do DMA/write. > > So by default all the devices will not be able to do write to memory > and selective devices are given access only after a reset. > > I am not sure what are the dependencies for loading a new device table > in second kernel. If it requires disabling the IOMMU, then we leave a > window where in-flight dma will become passthrough and has the potential > to corrupt kdump kernel. > I think this is possible, but I'm a bit concerned with how some devices will handle a reset. For instance, what will happen to an HBA or a disk, if we reset it as the module is loading? Is that safe? Neil > Thanks > Vivek > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 12:53 ` Neil Horman @ 2010-04-01 15:02 ` Vivek Goyal 2010-04-01 15:13 ` Neil Horman 0 siblings, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2010-04-01 15:02 UTC (permalink / raw) To: Neil Horman Cc: Neil Horman, joerg.roedel, kexec, linux-kernel, Chris Wright, hbabu, iommu On Thu, Apr 01, 2010 at 08:53:04AM -0400, Neil Horman wrote: > On Wed, Mar 31, 2010 at 10:24:18PM -0400, Vivek Goyal wrote: > > On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote: > > > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > > > Flush iommu during shutdown > > > > > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > > > kernel crash, that dma operations might still be in flight from the previous > > > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > > > a kdump boot as endless iommu error log entries of the form: > > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > > > address=0x000000000245a0c0 flags=0x0070] > > > > > > > > We've already fixed this problem once before, so some code shift must > > > > have brought it back. Personally, I prefer to do this on the bringup > > > > path than the teardown path. Besides keeping the teardown path as > > > > simple as possible (goal is to get to kdump kernel asap), there's also > > > > reason to competely flush on startup in genernal in case BIOS has done > > > > anything unsavory. > > > > > > > Chris, > > > Can you elaborate on what you did with the iommu to make this safe? It > > > will save me time digging through the history on this code, and help me > > > understand better whats going on here. > > > > > > I was starting to think that we should just leave the iommu on through a kdump, > > > and re-construct a new page table based on the old table (filtered by the error > > > log) on kdump boot, but it sounds like a better solution might be in place. > > > > > > > Hi Neil, > > > > Is following sequence possible. > > > > - In crashed kernel, take away the write permission from all the devices. > > Mark bit 62 zero for all devices in device table. > > > > - Leave the iommu on and let the device entries be valid in kdump kernel > > so that any in-flight dma does not become pass through (which can cause > > more damage and corrupt kdump kernel). > > > > - During kdump kernel initialization, load a new device table where again > > all the devices don't have write permission. looks like by default > > we create a device table with all bits zero except DEV_ENTRY_VALID > > and DEV_ENTRY_TRANSLATION bit. > > > > - Reset the device where we want to setup any dma or operate on. > > > > - Allow device to do DMA/write. > > > > So by default all the devices will not be able to do write to memory > > and selective devices are given access only after a reset. > > > > I am not sure what are the dependencies for loading a new device table > > in second kernel. If it requires disabling the IOMMU, then we leave a > > window where in-flight dma will become passthrough and has the potential > > to corrupt kdump kernel. > > > I think this is possible, but I'm a bit concerned with how some devices will > handle a reset. For instance, what will happen to an HBA or a disk, if we reset > it as the module is loading? Is that safe? I think we need to reset devices in driver if "reset_devices" is set. So we will not reset these during normal boot. Regarding being safe, I don't know. I am assuming that driver knows (or need to know), how to reset device safely while driver is initializing. That's the whole assumption kdump is built on, that once driver is initializing, it will first reset the device (if reset_devices is set), so that chances of device working properly in second kernel increase. Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 15:02 ` Vivek Goyal @ 2010-04-01 15:13 ` Neil Horman 0 siblings, 0 replies; 41+ messages in thread From: Neil Horman @ 2010-04-01 15:13 UTC (permalink / raw) To: Vivek Goyal Cc: Neil Horman, joerg.roedel, kexec, linux-kernel, Chris Wright, hbabu, iommu On Thu, Apr 01, 2010 at 11:02:03AM -0400, Vivek Goyal wrote: > On Thu, Apr 01, 2010 at 08:53:04AM -0400, Neil Horman wrote: > > On Wed, Mar 31, 2010 at 10:24:18PM -0400, Vivek Goyal wrote: > > > On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote: > > > > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > > > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > > > > Flush iommu during shutdown > > > > > > > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > > > > kernel crash, that dma operations might still be in flight from the previous > > > > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > > > > a kdump boot as endless iommu error log entries of the form: > > > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > > > > address=0x000000000245a0c0 flags=0x0070] > > > > > > > > > > We've already fixed this problem once before, so some code shift must > > > > > have brought it back. Personally, I prefer to do this on the bringup > > > > > path than the teardown path. Besides keeping the teardown path as > > > > > simple as possible (goal is to get to kdump kernel asap), there's also > > > > > reason to competely flush on startup in genernal in case BIOS has done > > > > > anything unsavory. > > > > > > > > > Chris, > > > > Can you elaborate on what you did with the iommu to make this safe? It > > > > will save me time digging through the history on this code, and help me > > > > understand better whats going on here. > > > > > > > > I was starting to think that we should just leave the iommu on through a kdump, > > > > and re-construct a new page table based on the old table (filtered by the error > > > > log) on kdump boot, but it sounds like a better solution might be in place. > > > > > > > > > > Hi Neil, > > > > > > Is following sequence possible. > > > > > > - In crashed kernel, take away the write permission from all the devices. > > > Mark bit 62 zero for all devices in device table. > > > > > > - Leave the iommu on and let the device entries be valid in kdump kernel > > > so that any in-flight dma does not become pass through (which can cause > > > more damage and corrupt kdump kernel). > > > > > > - During kdump kernel initialization, load a new device table where again > > > all the devices don't have write permission. looks like by default > > > we create a device table with all bits zero except DEV_ENTRY_VALID > > > and DEV_ENTRY_TRANSLATION bit. > > > > > > - Reset the device where we want to setup any dma or operate on. > > > > > > - Allow device to do DMA/write. > > > > > > So by default all the devices will not be able to do write to memory > > > and selective devices are given access only after a reset. > > > > > > I am not sure what are the dependencies for loading a new device table > > > in second kernel. If it requires disabling the IOMMU, then we leave a > > > window where in-flight dma will become passthrough and has the potential > > > to corrupt kdump kernel. > > > > > I think this is possible, but I'm a bit concerned with how some devices will > > handle a reset. For instance, what will happen to an HBA or a disk, if we reset > > it as the module is loading? Is that safe? > > I think we need to reset devices in driver if "reset_devices" is set. So > we will not reset these during normal boot. > > Regarding being safe, I don't know. I am assuming that driver knows (or > need to know), how to reset device safely while driver is initializing. > That's the whole assumption kdump is built on, that once driver is > initializing, it will first reset the device (if reset_devices is set), so > that chances of device working properly in second kernel increase. > Yes, I agree, I was more just asking is it safe to unilaterally reset devices during boot? I suppose it is, but I'm not entirely sure Neil > Vivek > ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-03-31 21:25 ` Chris Wright 2010-04-01 1:13 ` Neil Horman @ 2010-04-01 2:44 ` Vivek Goyal 2010-04-01 7:10 ` Chris Wright 1 sibling, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2010-04-01 2:44 UTC (permalink / raw) To: Chris Wright; +Cc: Neil Horman, iommu, joerg.roedel, hbabu, kexec, linux-kernel On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > * Neil Horman (nhorman@tuxdriver.com) wrote: > > Flush iommu during shutdown > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > kernel crash, that dma operations might still be in flight from the previous > > kernel during the kdump kernel boot. This can lead to memory corruption, > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > a kdump boot as endless iommu error log entries of the form: > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > address=0x000000000245a0c0 flags=0x0070] > > We've already fixed this problem once before, so some code shift must > have brought it back. Personally, I prefer to do this on the bringup > path than the teardown path. Besides keeping the teardown path as > simple as possible (goal is to get to kdump kernel asap), there's also > reason to competely flush on startup in genernal in case BIOS has done > anything unsavory. Can we flush domains (all the I/O TLBs assciated with each domain), during initialization? I think all the domain data built by previous kernel will be lost and new kernel will have no idea about. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 2:44 ` Vivek Goyal @ 2010-04-01 7:10 ` Chris Wright 2010-04-01 12:56 ` Neil Horman 0 siblings, 1 reply; 41+ messages in thread From: Chris Wright @ 2010-04-01 7:10 UTC (permalink / raw) To: Vivek Goyal Cc: Chris Wright, Neil Horman, iommu, joerg.roedel, hbabu, kexec, linux-kernel * Vivek Goyal (vgoyal@redhat.com) wrote: > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > Flush iommu during shutdown > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > kernel crash, that dma operations might still be in flight from the previous > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > a kdump boot as endless iommu error log entries of the form: > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > address=0x000000000245a0c0 flags=0x0070] > > > > We've already fixed this problem once before, so some code shift must > > have brought it back. Personally, I prefer to do this on the bringup > > path than the teardown path. Besides keeping the teardown path as > > simple as possible (goal is to get to kdump kernel asap), there's also > > reason to competely flush on startup in genernal in case BIOS has done > > anything unsavory. > > Can we flush domains (all the I/O TLBs assciated with each domain), during > initialization? I think all the domain data built by previous kernel will > be lost and new kernel will have no idea about. We first invalidate the device table entry, so new translation requests will see the new domainid for a given BDF. Then we invalidate the whole set of page tables associated w/ the new domainid. Now all dma transactions will need page table walk (page tables will be empty excpet for any 1:1 mappings). Any old domainid's from previous kernel that aren't found in new device table entries are effectively moot. Just so happens that in kexec/kdump case, they'll be the same domainid's, but that doesn't matter. thanks, -chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown 2010-04-01 7:10 ` Chris Wright @ 2010-04-01 12:56 ` Neil Horman 0 siblings, 0 replies; 41+ messages in thread From: Neil Horman @ 2010-04-01 12:56 UTC (permalink / raw) To: Chris Wright Cc: Vivek Goyal, Neil Horman, joerg.roedel, kexec, linux-kernel, hbabu, iommu On Thu, Apr 01, 2010 at 12:10:40AM -0700, Chris Wright wrote: > * Vivek Goyal (vgoyal@redhat.com) wrote: > > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote: > > > * Neil Horman (nhorman@tuxdriver.com) wrote: > > > > Flush iommu during shutdown > > > > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary > > > > kernel crash, that dma operations might still be in flight from the previous > > > > kernel during the kdump kernel boot. This can lead to memory corruption, > > > > crashes, and other erroneous behavior, specifically I've seen it manifest during > > > > a kdump boot as endless iommu error log entries of the form: > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d > > > > address=0x000000000245a0c0 flags=0x0070] > > > > > > We've already fixed this problem once before, so some code shift must > > > have brought it back. Personally, I prefer to do this on the bringup > > > path than the teardown path. Besides keeping the teardown path as > > > simple as possible (goal is to get to kdump kernel asap), there's also > > > reason to competely flush on startup in genernal in case BIOS has done > > > anything unsavory. > > > > Can we flush domains (all the I/O TLBs assciated with each domain), during > > initialization? I think all the domain data built by previous kernel will > > be lost and new kernel will have no idea about. > > We first invalidate the device table entry, so new translation requests > will see the new domainid for a given BDF. Then we invalidate the > whole set of page tables associated w/ the new domainid. Now all dma > transactions will need page table walk (page tables will be empty excpet > for any 1:1 mappings). Any old domainid's from previous kernel that > aren't found in new device table entries are effectively moot. Just so > happens that in kexec/kdump case, they'll be the same domainid's, but > that doesn't matter. > > thanks, > -chris > Additionally chris (this is just for my own education here), what happens when we disable the iommu while dma's are in flight? I ask because from what I read, my assumption is that the iommu effectively enters a passive mode where bus accesses from devices holding dma addresses that were previously provided by an iommu translation will just get strobed onto the bus without being translated back to physical addresses. Won't that result in bus errors causing master aborts? If so, it would seem that it would be further cause to leave the iommu on during a crash/kdump boot. Neil > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2010-04-05 15:35 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-31 15:24 [PATCH] amd iommu: force flush of iommu prior during shutdown Neil Horman 2010-03-31 15:54 ` Vivek Goyal 2010-03-31 18:28 ` Neil Horman 2010-03-31 18:57 ` Eric W. Biederman 2010-03-31 19:18 ` Neil Horman 2010-03-31 19:51 ` Eric W. Biederman 2010-03-31 20:27 ` Neil Horman 2010-04-01 4:04 ` Eric W. Biederman 2010-04-01 12:49 ` Neil Horman 2010-04-01 14:29 ` Joerg Roedel 2010-04-01 14:47 ` Neil Horman 2010-04-01 15:56 ` Joerg Roedel 2010-04-01 17:11 ` Neil Horman 2010-04-01 20:14 ` Joerg Roedel 2010-04-02 0:00 ` Neil Horman 2010-04-02 0:30 ` Chris Wright 2010-04-02 1:23 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright 2010-04-02 1:31 ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright 2010-04-02 1:35 ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman 2010-04-02 1:38 ` Chris Wright 2010-04-02 9:11 ` Joerg Roedel 2010-04-02 23:59 ` Chris Wright 2010-04-02 15:59 ` Vivek Goyal 2010-04-02 22:38 ` Chris Wright 2010-04-02 22:55 ` Eric W. Biederman 2010-04-02 23:57 ` Chris Wright 2010-04-03 17:38 ` Joerg Roedel 2010-04-05 14:17 ` Vivek Goyal 2010-04-05 14:32 ` Joerg Roedel 2010-04-05 15:34 ` Neil Horman 2010-03-31 18:43 ` [PATCH] amd iommu: force flush of iommu prior during shutdown Eric W. Biederman 2010-03-31 21:25 ` Chris Wright 2010-04-01 1:13 ` Neil Horman 2010-04-01 1:39 ` Chris Wright 2010-04-01 2:24 ` Vivek Goyal 2010-04-01 12:53 ` Neil Horman 2010-04-01 15:02 ` Vivek Goyal 2010-04-01 15:13 ` Neil Horman 2010-04-01 2:44 ` Vivek Goyal 2010-04-01 7:10 ` Chris Wright 2010-04-01 12:56 ` Neil Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox