[PATCH] amd iommu: force flush of iommu prior during shutdown

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] amd iommu: force flush of iommu prior during shutdown
@ 2010-03-31 15:24 Neil Horman
  2010-03-31 15:54 ` Vivek Goyal
  2010-03-31 21:25 ` Chris Wright
  0 siblings, 2 replies; 41+ messages in thread
From: Neil Horman @ 2010-03-31 15:24 UTC (permalink / raw)
  To: iommu; +Cc: joerg.roedel, nhorman, vgoyal, hbabu, kexec, linux-kernel

Flush iommu during shutdown

When using an iommu, its possible, if a kdump kernel boot follows a primary
kernel crash, that dma operations might still be in flight from the previous
kernel during the kdump kernel boot.  This can lead to memory corruption,
crashes, and other erroneous behavior, specifically I've seen it manifest during
a kdump boot as endless iommu error log entries of the form:
AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
address=0x000000000245a0c0 flags=0x0070]

Followed by an inability to access hard drives, and various other resources.

I've written this fix for it.  In short it just forces a flush of the in flight
dma operations on shutdown, so that the new kernel is certain not to have any
in-flight dmas trying to complete after we've reset all the iommu page tables,
causing the above errors.  I've tested it and it fixes the problem for me quite
well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


amd_iommu_init.c |   25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
index 9dc91b4..8fbdf58 100644
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -265,8 +265,26 @@ static void iommu_enable(struct amd_iommu *iommu)
 	iommu_feature_enable(iommu, CONTROL_IOMMU_EN);
 }
 
-static void iommu_disable(struct amd_iommu *iommu)
+static void iommu_disable(struct amd_iommu *iommu, bool flush)
 {
+
+	/*
+	 * This ensures that all in-flight dmas for this iommu
+	 * are complete prior to shutting it down
+	 * its a bit racy, but I think its ok, given that if we're flushing
+	 * we're in a shutdown path (either a graceful shutdown or a
+	 * crash leading to a kdump boot.  That means we're down to one
+	 * cpu, and the other system hardware isn't going to issue
+	 * subsequent dma operations.
+	 * Also note that we gate the flusing on the flush boolean because
+	 * the enable_iommus path uses this function and we can't flush any
+	 * data in that path until later when the iommus are fully initialized
+	 */
+	if (flush) {
+		amd_iommu_flush_all_devices();
+		amd_iommu_flush_all_domains();
+	}
+
 	/* Disable command buffer */
 	iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
 
@@ -276,6 +294,7 @@ static void iommu_disable(struct amd_iommu *iommu)
 
 	/* Disable IOMMU hardware itself */
 	iommu_feature_disable(iommu, CONTROL_IOMMU_EN);
+
 }
 
 /*
@@ -1114,7 +1133,7 @@ static void enable_iommus(void)
 	struct amd_iommu *iommu;
 
 	for_each_iommu(iommu) {
-		iommu_disable(iommu);
+		iommu_disable(iommu, false);
 		iommu_set_device_table(iommu);
 		iommu_enable_command_buffer(iommu);
 		iommu_enable_event_buffer(iommu);
@@ -1129,7 +1148,7 @@ static void disable_iommus(void)
 	struct amd_iommu *iommu;
 
 	for_each_iommu(iommu)
-		iommu_disable(iommu);
+		iommu_disable(iommu, true);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 15:24 [PATCH] amd iommu: force flush of iommu prior during shutdown Neil Horman
@ 2010-03-31 15:54 ` Vivek Goyal
  2010-03-31 18:28   ` Neil Horman
  2010-03-31 18:43   ` [PATCH] amd iommu: force flush of iommu prior during shutdown Eric W. Biederman
  2010-03-31 21:25 ` Chris Wright
  1 sibling, 2 replies; 41+ messages in thread
From: Vivek Goyal @ 2010-03-31 15:54 UTC (permalink / raw)
  To: Neil Horman
  Cc: iommu, joerg.roedel, hbabu, kexec, linux-kernel,
	Eric W. Biederman

On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote:
> Flush iommu during shutdown
> 
> When using an iommu, its possible, if a kdump kernel boot follows a primary
> kernel crash, that dma operations might still be in flight from the previous
> kernel during the kdump kernel boot.  This can lead to memory corruption,
> crashes, and other erroneous behavior, specifically I've seen it manifest during
> a kdump boot as endless iommu error log entries of the form:
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> address=0x000000000245a0c0 flags=0x0070]
> 
> Followed by an inability to access hard drives, and various other resources.
> 
> I've written this fix for it.  In short it just forces a flush of the in flight
> dma operations on shutdown, so that the new kernel is certain not to have any
> in-flight dmas trying to complete after we've reset all the iommu page tables,
> causing the above errors.  I've tested it and it fixes the problem for me quite
> well.

CCing Eric also.

Neil, this is interesting. In the past we noticed similar issues,
especially on PPC. But I was told that we could not clear the iommu
mapping entries as we had no control on in flight DMA and if a DMA comes
later after clearing an entry and entry is not present, it is an error.

Hence one of the suggestions was not to clear iommu mapping entries but
reserve some for kdump operation and use those in kdump kernel.

So this call amd_iommu_flush_all_devices() will be able to tell devices
that don't do any more DMAs and hence it is safe to reprogram iommu
mapping entries.

Thanks
Vivek
 
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
> amd_iommu_init.c |   25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c
> index 9dc91b4..8fbdf58 100644
> --- a/arch/x86/kernel/amd_iommu_init.c
> +++ b/arch/x86/kernel/amd_iommu_init.c
> @@ -265,8 +265,26 @@ static void iommu_enable(struct amd_iommu *iommu)
>  	iommu_feature_enable(iommu, CONTROL_IOMMU_EN);
>  }
>  
> -static void iommu_disable(struct amd_iommu *iommu)
> +static void iommu_disable(struct amd_iommu *iommu, bool flush)
>  {
> +
> +	/*
> +	 * This ensures that all in-flight dmas for this iommu
> +	 * are complete prior to shutting it down
> +	 * its a bit racy, but I think its ok, given that if we're flushing
> +	 * we're in a shutdown path (either a graceful shutdown or a
> +	 * crash leading to a kdump boot.  That means we're down to one
> +	 * cpu, and the other system hardware isn't going to issue
> +	 * subsequent dma operations.
> +	 * Also note that we gate the flusing on the flush boolean because
> +	 * the enable_iommus path uses this function and we can't flush any
> +	 * data in that path until later when the iommus are fully initialized
> +	 */
> +	if (flush) {
> +		amd_iommu_flush_all_devices();
> +		amd_iommu_flush_all_domains();
> +	}
> +
>  	/* Disable command buffer */
>  	iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
>  
> @@ -276,6 +294,7 @@ static void iommu_disable(struct amd_iommu *iommu)
>  
>  	/* Disable IOMMU hardware itself */
>  	iommu_feature_disable(iommu, CONTROL_IOMMU_EN);
> +
>  }
>  
>  /*
> @@ -1114,7 +1133,7 @@ static void enable_iommus(void)
>  	struct amd_iommu *iommu;
>  
>  	for_each_iommu(iommu) {
> -		iommu_disable(iommu);
> +		iommu_disable(iommu, false);
>  		iommu_set_device_table(iommu);
>  		iommu_enable_command_buffer(iommu);
>  		iommu_enable_event_buffer(iommu);
> @@ -1129,7 +1148,7 @@ static void disable_iommus(void)
>  	struct amd_iommu *iommu;
>  
>  	for_each_iommu(iommu)
> -		iommu_disable(iommu);
> +		iommu_disable(iommu, true);
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 15:54 ` Vivek Goyal
@ 2010-03-31 18:28   ` Neil Horman
  2010-03-31 18:57     ` Eric W. Biederman
  2010-03-31 18:43   ` [PATCH] amd iommu: force flush of iommu prior during shutdown Eric W. Biederman
  1 sibling, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-03-31 18:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: iommu, joerg.roedel, hbabu, kexec, linux-kernel,
	Eric W. Biederman

On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
> On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote:
> > Flush iommu during shutdown
> > 
> > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > kernel crash, that dma operations might still be in flight from the previous
> > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > a kdump boot as endless iommu error log entries of the form:
> > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > address=0x000000000245a0c0 flags=0x0070]
> > 
> > Followed by an inability to access hard drives, and various other resources.
> > 
> > I've written this fix for it.  In short it just forces a flush of the in flight
> > dma operations on shutdown, so that the new kernel is certain not to have any
> > in-flight dmas trying to complete after we've reset all the iommu page tables,
> > causing the above errors.  I've tested it and it fixes the problem for me quite
> > well.
> 
> CCing Eric also.
> 
> Neil, this is interesting. In the past we noticed similar issues,
> especially on PPC. But I was told that we could not clear the iommu
> mapping entries as we had no control on in flight DMA and if a DMA comes
> later after clearing an entry and entry is not present, it is an error.
> 
Yes, the problem is (as I understand it) is that the triggering of DMA
operations to/from a device doesn't have synchronization with the iommu itself.
I.e. to conduct a dma you have to:

1) map the in-memory buffer to a dma address using something like
pci_map_single.  This results (in systems with an iommu) getting page table
space allocated in the iommu for the translation.

2) triggering the dma to/from the device by tickling whatever hardware the
device has mapped.

3) completing the dma by calling pci_unmap_single (or other function) which
frees the page table space in the iommu

The problem, exactly as you indicate is that on a kdump panic, we might boot the
new kernel and re-enable the iommu with these dmas still in flight.  If we start
messing about with the iommu page tables then, we start getting all sorts of
errors, and other various failures.

> Hence one of the suggestions was not to clear iommu mapping entries but
> reserve some for kdump operation and use those in kdump kernel.
> 
Yeah, thats a solution, but it seems awfully complex to me.  To do that, we need
to teach every iommu we support about kdump, by telling it how much space to
reserve, and when to use it and when not to (i.e. we'd have to tell it to use
the kdump space, vs the normal space dependent on the status of the
reset_devices flag, or something equally unpleasant).

Actually, thinking about it, I'm not sure that will even work, as IIRC the iommu
only has one page table base pointer.  So we would either need to re-write that
pointer to point into the kdump kernels memory space (invalidating the old table
entries, which perpetuates this bug), or we would need to further enhance the
iommu code to be able to access the old page tables via
read_from_oldmem/write_to_oldmem when booting a kdump kernel, wouldn't we?

Using this method, all we really do is try to ensure that, prior to disabling
the iommu, we make sure that any pending dmas are complete.  That way, when we
re-enable the iommu in the kdump kernel, we can safely maniuplate the new page
tables, knowing that no pending dma is using them

In fairness to this debate, my proposal does have a small race condition.  In
the above sequence, because the cpu triggers a dma independently of the setup of
the mapping in the iommu, it is possible that a dma might be triggered
immediately after we flush the iotlb, which may leave an in-flight dma pending
while we boot the kdump kernel.  In practice though, this will never happen.  By
the time we arrive at this code, we've already executed
native_machine_crash_shutdown which:

1) halts all the other cpus in the system
2) disables local interrupts

Because of those two events, we're effectively on a path that we can't be
preempted-from.  So as long as we don't trigger any dma operations between our
return from iommu_shutdown and machine_kexec (which is the next call), we're
safe.

> So this call amd_iommu_flush_all_devices() will be able to tell devices
> that don't do any more DMAs and hence it is safe to reprogram iommu
> mapping entries.
> 
It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
think about it, there is still a small possibility that a device like a NIC
which has several buffers pre-dma-mapped could start a new dma before we
completely disabled the iommu, althought thats small.  I never saw that in my
testing, but hitting that would be fairly difficult I think, since its literally
just a few hundred cycles between the flush and the actual hardware disable
operation.

According to this though:
http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
That window could be closed fairly easily, but simply disabling read and write
permissions for each device table entry prior to calling flush.  If we do that,
then flush the device table, any subsequently started dma operation would just
get noted in the error log, which we could ignore, since we're abot to boot to
the kdump kernel anyway.

Would you like me to respin w/ that modification?

Neil

> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 18:28   ` Neil Horman
@ 2010-03-31 18:57     ` Eric W. Biederman
  2010-03-31 19:18       ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2010-03-31 18:57 UTC (permalink / raw)
  To: Neil Horman; +Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel

Neil Horman <nhorman@tuxdriver.com> writes:

> On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:

>> So this call amd_iommu_flush_all_devices() will be able to tell devices
>> that don't do any more DMAs and hence it is safe to reprogram iommu
>> mapping entries.
>> 
> It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
> think about it, there is still a small possibility that a device like a NIC
> which has several buffers pre-dma-mapped could start a new dma before we
> completely disabled the iommu, althought thats small.  I never saw that in my
> testing, but hitting that would be fairly difficult I think, since its literally
> just a few hundred cycles between the flush and the actual hardware disable
> operation.
>
> According to this though:
> http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
> That window could be closed fairly easily, but simply disabling read and write
> permissions for each device table entry prior to calling flush.  If we do that,
> then flush the device table, any subsequently started dma operation would just
> get noted in the error log, which we could ignore, since we're abot to boot to
> the kdump kernel anyway.
>
> Would you like me to respin w/ that modification?

Disabling permissions on all devices sounds good for the new virtualization
capable iommus.  I think older iommus will still be challenged.  I think
on x86 we have simply been able to avoid using those older iommus.

I like the direction you are going but please let's put this in a
paranoid iommu enable routine.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 18:57     ` Eric W. Biederman
@ 2010-03-31 19:18       ` Neil Horman
  2010-03-31 19:51         ` Eric W. Biederman
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-03-31 19:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel

On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
> 
> >> So this call amd_iommu_flush_all_devices() will be able to tell devices
> >> that don't do any more DMAs and hence it is safe to reprogram iommu
> >> mapping entries.
> >> 
> > It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
> > think about it, there is still a small possibility that a device like a NIC
> > which has several buffers pre-dma-mapped could start a new dma before we
> > completely disabled the iommu, althought thats small.  I never saw that in my
> > testing, but hitting that would be fairly difficult I think, since its literally
> > just a few hundred cycles between the flush and the actual hardware disable
> > operation.
> >
> > According to this though:
> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
> > That window could be closed fairly easily, but simply disabling read and write
> > permissions for each device table entry prior to calling flush.  If we do that,
> > then flush the device table, any subsequently started dma operation would just
> > get noted in the error log, which we could ignore, since we're abot to boot to
> > the kdump kernel anyway.
> >
> > Would you like me to respin w/ that modification?
> 
> Disabling permissions on all devices sounds good for the new virtualization
> capable iommus.  I think older iommus will still be challenged.  I think
> on x86 we have simply been able to avoid using those older iommus.
> 
> I like the direction you are going but please let's put this in a
> paranoid iommu enable routine.
> 
You mean like initialize the device table so that all devices are default
disabled on boot, and then selectively enable them (perhaps during a
device_attach)?  I can give that a spin.
Neil

> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 19:18       ` Neil Horman
@ 2010-03-31 19:51         ` Eric W. Biederman
  2010-03-31 20:27           ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2010-03-31 19:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel

Neil Horman <nhorman@tuxdriver.com> writes:

> On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>> 
>> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
>> 
>> >> So this call amd_iommu_flush_all_devices() will be able to tell devices
>> >> that don't do any more DMAs and hence it is safe to reprogram iommu
>> >> mapping entries.
>> >> 
>> > It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
>> > think about it, there is still a small possibility that a device like a NIC
>> > which has several buffers pre-dma-mapped could start a new dma before we
>> > completely disabled the iommu, althought thats small.  I never saw that in my
>> > testing, but hitting that would be fairly difficult I think, since its literally
>> > just a few hundred cycles between the flush and the actual hardware disable
>> > operation.
>> >
>> > According to this though:
>> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
>> > That window could be closed fairly easily, but simply disabling read and write
>> > permissions for each device table entry prior to calling flush.  If we do that,
>> > then flush the device table, any subsequently started dma operation would just
>> > get noted in the error log, which we could ignore, since we're abot to boot to
>> > the kdump kernel anyway.
>> >
>> > Would you like me to respin w/ that modification?
>> 
>> Disabling permissions on all devices sounds good for the new virtualization
>> capable iommus.  I think older iommus will still be challenged.  I think
>> on x86 we have simply been able to avoid using those older iommus.
>> 
>> I like the direction you are going but please let's put this in a
>> paranoid iommu enable routine.
>> 
> You mean like initialize the device table so that all devices are default
> disabled on boot, and then selectively enable them (perhaps during a
> device_attach)?  I can give that a spin.

That sounds good.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 19:51         ` Eric W. Biederman
@ 2010-03-31 20:27           ` Neil Horman
  2010-04-01  4:04             ` Eric W. Biederman
  2010-04-01 14:29             ` Joerg Roedel
  0 siblings, 2 replies; 41+ messages in thread
From: Neil Horman @ 2010-03-31 20:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel

On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote:
> >> Neil Horman <nhorman@tuxdriver.com> writes:
> >> 
> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
> >> 
> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices
> >> >> that don't do any more DMAs and hence it is safe to reprogram iommu
> >> >> mapping entries.
> >> >> 
> >> > It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
> >> > think about it, there is still a small possibility that a device like a NIC
> >> > which has several buffers pre-dma-mapped could start a new dma before we
> >> > completely disabled the iommu, althought thats small.  I never saw that in my
> >> > testing, but hitting that would be fairly difficult I think, since its literally
> >> > just a few hundred cycles between the flush and the actual hardware disable
> >> > operation.
> >> >
> >> > According to this though:
> >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
> >> > That window could be closed fairly easily, but simply disabling read and write
> >> > permissions for each device table entry prior to calling flush.  If we do that,
> >> > then flush the device table, any subsequently started dma operation would just
> >> > get noted in the error log, which we could ignore, since we're abot to boot to
> >> > the kdump kernel anyway.
> >> >
> >> > Would you like me to respin w/ that modification?
> >> 
> >> Disabling permissions on all devices sounds good for the new virtualization
> >> capable iommus.  I think older iommus will still be challenged.  I think
> >> on x86 we have simply been able to avoid using those older iommus.
> >> 
> >> I like the direction you are going but please let's put this in a
> >> paranoid iommu enable routine.
> >> 
> > You mean like initialize the device table so that all devices are default
> > disabled on boot, and then selectively enable them (perhaps during a
> > device_attach)?  I can give that a spin.
> 
> That sounds good.
> 

So I'm officially rescinding this patch.  It apparently just covered up the
problem, rather than solved it outright.  This is going to take some more
thought on my part.  I read the code a bit closer, and the amd iommu on boot up
currently marks all its entries as valid and having a valid translation (because
if they're marked as invalid they're passed through untranslated which strikes
me as dangerous, since it means a dma address treated as a bus address could
lead to memory corruption.  The saving grace is that they are marked as
non-readable and non-writeable, so any device doing a dma after the reinit
should get logged (which it does), and then target aborted (which should
effectively squash the translation)

I'm starting to wonder if:

1) some dmas are so long lived they start aliasing new dmas that get mapped in
the kdump kernel leading to various erroneous behavior

or

2) a slew of target aborts to some hardware results in them being in an
inconsistent state

I'm going to try marking the dev table on shutdown such that all devices have no
read/write permissions to see if that changes the situation.  It should I think
give me a pointer as to weather (1) or (2) is the more likely problem.

Lots more thinking to do....
Neil

> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 20:27           ` Neil Horman
@ 2010-04-01  4:04             ` Eric W. Biederman
  2010-04-01 12:49               ` Neil Horman
  2010-04-01 14:29             ` Joerg Roedel
  1 sibling, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2010-04-01  4:04 UTC (permalink / raw)
  To: Neil Horman; +Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel

Neil Horman <nhorman@tuxdriver.com> writes:

> On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>> 
>> > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote:
>> >> Neil Horman <nhorman@tuxdriver.com> writes:
>> >> 
>> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
>> >> 
>> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices
>> >> >> that don't do any more DMAs and hence it is safe to reprogram iommu
>> >> >> mapping entries.
>> >> >> 
>> >> > It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
>> >> > think about it, there is still a small possibility that a device like a NIC
>> >> > which has several buffers pre-dma-mapped could start a new dma before we
>> >> > completely disabled the iommu, althought thats small.  I never saw that in my
>> >> > testing, but hitting that would be fairly difficult I think, since its literally
>> >> > just a few hundred cycles between the flush and the actual hardware disable
>> >> > operation.
>> >> >
>> >> > According to this though:
>> >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
>> >> > That window could be closed fairly easily, but simply disabling read and write
>> >> > permissions for each device table entry prior to calling flush.  If we do that,
>> >> > then flush the device table, any subsequently started dma operation would just
>> >> > get noted in the error log, which we could ignore, since we're abot to boot to
>> >> > the kdump kernel anyway.
>> >> >
>> >> > Would you like me to respin w/ that modification?
>> >> 
>> >> Disabling permissions on all devices sounds good for the new virtualization
>> >> capable iommus.  I think older iommus will still be challenged.  I think
>> >> on x86 we have simply been able to avoid using those older iommus.
>> >> 
>> >> I like the direction you are going but please let's put this in a
>> >> paranoid iommu enable routine.
>> >> 
>> > You mean like initialize the device table so that all devices are default
>> > disabled on boot, and then selectively enable them (perhaps during a
>> > device_attach)?  I can give that a spin.
>> 
>> That sounds good.
>> 
>
> So I'm officially rescinding this patch.  It apparently just covered up the
> problem, rather than solved it outright.  This is going to take some more
> thought on my part.  I read the code a bit closer, and the amd iommu on boot up
> currently marks all its entries as valid and having a valid translation (because
> if they're marked as invalid they're passed through untranslated which strikes
> me as dangerous, since it means a dma address treated as a bus address could
> lead to memory corruption.  The saving grace is that they are marked as
> non-readable and non-writeable, so any device doing a dma after the reinit
> should get logged (which it does), and then target aborted (which should
> effectively squash the translation)
>
> I'm starting to wonder if:
>
> 1) some dmas are so long lived they start aliasing new dmas that get mapped in
> the kdump kernel leading to various erroneous behavior

I do know things like arp refreshes used to cause me trouble.  I have
a particular memory of kexec into memtest86 and a little while later 
memory corruption.


> 2) a slew of target aborts to some hardware results in them being in an
> inconsistent state
>
> I'm going to try marking the dev table on shutdown such that all devices have no
> read/write permissions to see if that changes the situation.  It should I think
> give me a pointer as to weather (1) or (2) is the more likely problem.
>
> Lots more thinking to do....

I guess I can see devices getting confused by target aborts.
I'm wondering if (a) we can suppress these DMAs.  or (b) we can reset the pci
devices before we use them.  With pcie that should be possible.

We used to be able simply not to use the IOMMU in x86 and avoid this trouble.
Now with per device enables it looks like we need to do something with it.

Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01  4:04             ` Eric W. Biederman
@ 2010-04-01 12:49               ` Neil Horman
  0 siblings, 0 replies; 41+ messages in thread
From: Neil Horman @ 2010-04-01 12:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, iommu, joerg.roedel, hbabu, kexec, linux-kernel

On Wed, Mar 31, 2010 at 09:04:27PM -0700, Eric W. Biederman wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Wed, Mar 31, 2010 at 12:51:25PM -0700, Eric W. Biederman wrote:
> >> Neil Horman <nhorman@tuxdriver.com> writes:
> >> 
> >> > On Wed, Mar 31, 2010 at 11:57:46AM -0700, Eric W. Biederman wrote:
> >> >> Neil Horman <nhorman@tuxdriver.com> writes:
> >> >> 
> >> >> > On Wed, Mar 31, 2010 at 11:54:30AM -0400, Vivek Goyal wrote:
> >> >> 
> >> >> >> So this call amd_iommu_flush_all_devices() will be able to tell devices
> >> >> >> that don't do any more DMAs and hence it is safe to reprogram iommu
> >> >> >> mapping entries.
> >> >> >> 
> >> >> > It blocks the cpu until any pending DMA operations are complete.  Hmm, as I
> >> >> > think about it, there is still a small possibility that a device like a NIC
> >> >> > which has several buffers pre-dma-mapped could start a new dma before we
> >> >> > completely disabled the iommu, althought thats small.  I never saw that in my
> >> >> > testing, but hitting that would be fairly difficult I think, since its literally
> >> >> > just a few hundred cycles between the flush and the actual hardware disable
> >> >> > operation.
> >> >> >
> >> >> > According to this though:
> >> >> > http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf
> >> >> > That window could be closed fairly easily, but simply disabling read and write
> >> >> > permissions for each device table entry prior to calling flush.  If we do that,
> >> >> > then flush the device table, any subsequently started dma operation would just
> >> >> > get noted in the error log, which we could ignore, since we're abot to boot to
> >> >> > the kdump kernel anyway.
> >> >> >
> >> >> > Would you like me to respin w/ that modification?
> >> >> 
> >> >> Disabling permissions on all devices sounds good for the new virtualization
> >> >> capable iommus.  I think older iommus will still be challenged.  I think
> >> >> on x86 we have simply been able to avoid using those older iommus.
> >> >> 
> >> >> I like the direction you are going but please let's put this in a
> >> >> paranoid iommu enable routine.
> >> >> 
> >> > You mean like initialize the device table so that all devices are default
> >> > disabled on boot, and then selectively enable them (perhaps during a
> >> > device_attach)?  I can give that a spin.
> >> 
> >> That sounds good.
> >> 
> >
> > So I'm officially rescinding this patch.  It apparently just covered up the
> > problem, rather than solved it outright.  This is going to take some more
> > thought on my part.  I read the code a bit closer, and the amd iommu on boot up
> > currently marks all its entries as valid and having a valid translation (because
> > if they're marked as invalid they're passed through untranslated which strikes
> > me as dangerous, since it means a dma address treated as a bus address could
> > lead to memory corruption.  The saving grace is that they are marked as
> > non-readable and non-writeable, so any device doing a dma after the reinit
> > should get logged (which it does), and then target aborted (which should
> > effectively squash the translation)
> >
> > I'm starting to wonder if:
> >
> > 1) some dmas are so long lived they start aliasing new dmas that get mapped in
> > the kdump kernel leading to various erroneous behavior
> 
> I do know things like arp refreshes used to cause me trouble.  I have
> a particular memory of kexec into memtest86 and a little while later 
> memory corruption.
> 
> 
> > 2) a slew of target aborts to some hardware results in them being in an
> > inconsistent state
> >
> > I'm going to try marking the dev table on shutdown such that all devices have no
> > read/write permissions to see if that changes the situation.  It should I think
> > give me a pointer as to weather (1) or (2) is the more likely problem.
> >
> > Lots more thinking to do....
> 
> I guess I can see devices getting confused by target aborts.
> I'm wondering if (a) we can suppress these DMAs.  or (b) we can reset the pci
> devices before we use them.  With pcie that should be possible.
> 
> We used to be able simply not to use the IOMMU in x86 and avoid this trouble.
> Now with per device enables it looks like we need to do something with it.
> 
Agreed.  Another strategy I think worth considering is:
1) Leave the iommu on throughout the kdump reboot process, so as to allow dma's
to complete without any interference

2) Make sure the log configuration is enabled prior to kdump reboot

3) Flush the page table cache in the iommu prior to shutdown

4) on re-init in the kdump kernel, query the log.  If its non-empty, recognize
that we're in a kdump boot, and instead of creating a new devtable/page
table/domain table, just clone the old ones from the previous kernels memory

5) use the log to detect which entries in the iommu have been touched, and
assume those touched entries are done, marking them as invalid, until such time
a minimum of free entries in the table are provided

6) continue use with those available free entries

with this approach, we could let the 'old' dmas complete without interference,
and just allocate new dma's from the unused entries in the new kernel.

Just a thought.
Neil

> Eric
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 20:27           ` Neil Horman
  2010-04-01  4:04             ` Eric W. Biederman
@ 2010-04-01 14:29             ` Joerg Roedel
  2010-04-01 14:47               ` Neil Horman
  1 sibling, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-04-01 14:29 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric W. Biederman, kexec, linux-kernel, hbabu, iommu, Vivek Goyal

Hi Neil,

first some general words about the problem you discovered: The problem
is not caused by in-flight DMA. The problem is that the IOMMU hardware
has cached the old DTE entry for the device (including the old
page-table root pointer) and is using it still when the kdump kernel has
booted. We had this problem once and fixed it by flushing a DTE in the
IOMMU before it is used for the first time. This seems to be broken
now. Which kernel have you seen this on?

I am back in office next tuesday and will look into this problem too.

On Wed, Mar 31, 2010 at 04:27:45PM -0400, Neil Horman wrote:
> So I'm officially rescinding this patch.

Yeah, the right solution to this problem is to find out why every DTE is
not longer flushed before first use.

> It apparently just covered up the problem, rather than solved it
> outright.  This is going to take some more thought on my part.  I read
> the code a bit closer, and the amd iommu on boot up currently marks
> all its entries as valid and having a valid translation (because if
> they're marked as invalid they're passed through untranslated which
> strikes me as dangerous, since it means a dma address treated as a bus
> address could lead to memory corruption.  The saving grace is that
> they are marked as non-readable and non-writeable, so any device doing
> a dma after the reinit should get logged (which it does), and then
> target aborted (which should effectively squash the translation)

Right. The default for all devices is to forbid DMA.

> I'm starting to wonder if:
> 
> 1) some dmas are so long lived they start aliasing new dmas that get mapped in
> the kdump kernel leading to various erroneous behavior

At least not in this case. Even when this is true the DMA would target
memory of the crashed kernel and not the kdump area. This is not even
memory corruption because the device will write to memory the driver has
allocated for it.

> 2) a slew of target aborts to some hardware results in them being in an
> inconsistent state

Thats indeed true. I have seen that with ixgbe cards for example. They
seem to be really confused after an target abort.

> I'm going to try marking the dev table on shutdown such that all devices have no
> read/write permissions to see if that changes the situation.  It should I think
> give me a pointer as to weather (1) or (2) is the more likely problem.

Probably not. You still need to flush the old entries out of the IOMMU.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 14:29             ` Joerg Roedel
@ 2010-04-01 14:47               ` Neil Horman
  2010-04-01 15:56                 ` Joerg Roedel
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-04-01 14:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Eric W. Biederman, kexec, linux-kernel, hbabu, iommu, Vivek Goyal

On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote:
> Hi Neil,
> 
> first some general words about the problem you discovered: The problem
> is not caused by in-flight DMA. The problem is that the IOMMU hardware
> has cached the old DTE entry for the device (including the old
> page-table root pointer) and is using it still when the kdump kernel has
> booted. We had this problem once and fixed it by flushing a DTE in the
> IOMMU before it is used for the first time. This seems to be broken
> now. Which kernel have you seen this on?
> 
First noted on 2.6.32 (the RHEL6 beta kernel), but I've reproduced with the
latest linus tree as well.

> I am back in office next tuesday and will look into this problem too.
> 
Thank you.

> On Wed, Mar 31, 2010 at 04:27:45PM -0400, Neil Horman wrote:
> > So I'm officially rescinding this patch.
> 
> Yeah, the right solution to this problem is to find out why every DTE is
> not longer flushed before first use.
> 
Right, I've checked the commits that chris noted in his previous email and
they're in place, so I'm not sure how we're getting stale dte's

> > It apparently just covered up the problem, rather than solved it
> > outright.  This is going to take some more thought on my part.  I read
> > the code a bit closer, and the amd iommu on boot up currently marks
> > all its entries as valid and having a valid translation (because if
> > they're marked as invalid they're passed through untranslated which
> > strikes me as dangerous, since it means a dma address treated as a bus
> > address could lead to memory corruption.  The saving grace is that
> > they are marked as non-readable and non-writeable, so any device doing
> > a dma after the reinit should get logged (which it does), and then
> > target aborted (which should effectively squash the translation)
> 
> Right. The default for all devices is to forbid DMA.
> 
Thanks, glad to know I read that right, took me a bit to understand it :)

> > I'm starting to wonder if:
> > 
> > 1) some dmas are so long lived they start aliasing new dmas that get mapped in
> > the kdump kernel leading to various erroneous behavior
> 
> At least not in this case. Even when this is true the DMA would target
> memory of the crashed kernel and not the kdump area. This is not even
> memory corruption because the device will write to memory the driver has
> allocated for it.
> 
Yeah, I figured that old dma's going to old locations were ok, I was more
concerned that if an 'old' dma lived through our resetting of the iommu page
table, leading to us pointing an old dma address to a new physical address
within the new kernel memory space.  Although, given the reset state of the
tables, for that to happen a dma would have to not attempt a memory transaction
until sometime later in the boot, which seems...unlikely to say the least, so I
agree this is almost certainly not happening.

> > 2) a slew of target aborts to some hardware results in them being in an
> > inconsistent state
> 
> Thats indeed true. I have seen that with ixgbe cards for example. They
> seem to be really confused after an target abort.
> 
Yeah, this part worries me, target aborts lead to various brain dead hardware
pieces.  What are you thoughts on leaving the iommu on through a reboot to avoid
this issue (possibly resetting any pci device that encounters a target abort, as
noted in the error log on the iommu?

> > I'm going to try marking the dev table on shutdown such that all devices have no
> > read/write permissions to see if that changes the situation.  It should I think
> > give me a pointer as to weather (1) or (2) is the more likely problem.
> 
> Probably not. You still need to flush the old entries out of the IOMMU.
> 
Yeah, after reading your explination above, I agree
Neil

> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 14:47               ` Neil Horman
@ 2010-04-01 15:56                 ` Joerg Roedel
  2010-04-01 17:11                   ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-04-01 15:56 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric W. Biederman, kexec, linux-kernel, hbabu, iommu, Vivek Goyal

On Thu, Apr 01, 2010 at 10:47:36AM -0400, Neil Horman wrote:
> On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote:
> > I am back in office next tuesday and will look into this problem too.
> > 
> Thank you.

Just took a look and I think the problem is that the devices are
attached to domains before the IOMMU hardware is enabled. This happens
in the function prealloc_protection_domains(). The attach code issues
the dte-invalidate commands but they are not executed because the
hardware is off. I will verify this when I have access to hardware
again.
The possible fix will be to enable the hardware earlier in the
initialization path.

> > Right. The default for all devices is to forbid DMA.
> > 
> Thanks, glad to know I read that right, took me a bit to understand it :)

I should probably add a comment :-)

> > Thats indeed true. I have seen that with ixgbe cards for example. They
> > seem to be really confused after an target abort.
> > 
> Yeah, this part worries me, target aborts lead to various brain dead hardware
> pieces.  What are you thoughts on leaving the iommu on through a reboot to avoid
> this issue (possibly resetting any pci device that encounters a target abort, as
> noted in the error log on the iommu?

This would only prevent possible data corruption. When the IOMMU is off
the devices will not get a target abort but will only write to different
physical memory locations. The window where a target abort can happen
starts when the kdump kernel re-enables the IOMMU and ends when the new
driver for that device attaches. This is a small window but there is not
a lot we can do to avoid this small time window.

	Joerg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 15:56                 ` Joerg Roedel
@ 2010-04-01 17:11                   ` Neil Horman
  2010-04-01 20:14                     ` Joerg Roedel
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-04-01 17:11 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman,
	Vivek Goyal

On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote:
> On Thu, Apr 01, 2010 at 10:47:36AM -0400, Neil Horman wrote:
> > On Thu, Apr 01, 2010 at 04:29:02PM +0200, Joerg Roedel wrote:
> > > I am back in office next tuesday and will look into this problem too.
> > > 
> > Thank you.
> 
> Just took a look and I think the problem is that the devices are
> attached to domains before the IOMMU hardware is enabled. This happens
> in the function prealloc_protection_domains(). The attach code issues
> the dte-invalidate commands but they are not executed because the
> hardware is off. I will verify this when I have access to hardware
> again.
> The possible fix will be to enable the hardware earlier in the
> initialization path.
> 
That sounds like a reasonable theory, I'll try hack something together shortly.

> > > Right. The default for all devices is to forbid DMA.
> > > 
> > Thanks, glad to know I read that right, took me a bit to understand it :)
> 
> I should probably add a comment :-)
> 
> > > Thats indeed true. I have seen that with ixgbe cards for example. They
> > > seem to be really confused after an target abort.
> > > 
> > Yeah, this part worries me, target aborts lead to various brain dead hardware
> > pieces.  What are you thoughts on leaving the iommu on through a reboot to avoid
> > this issue (possibly resetting any pci device that encounters a target abort, as
> > noted in the error log on the iommu?
> 
> This would only prevent possible data corruption. When the IOMMU is off
> the devices will not get a target abort but will only write to different
> physical memory locations. The window where a target abort can happen
> starts when the kdump kernel re-enables the IOMMU and ends when the new
> driver for that device attaches. This is a small window but there is not
> a lot we can do to avoid this small time window.
> 
Can you explain this a bit further please?  From what I read, when the iommu is
disabled, AIUI it does no translations.  That means that any dma addresses which
the driver mapped via the iommu prior to a crash that are stored in devices will
just get strobed on the bus without any translation.  If those dma address do
not lay on top of any physical ram, won't that lead to bus errors, and
transaction aborts?  Worse, if those dma addresses do lie on top of real
physical addresses, won't we get corruption in various places?  Or am I missing
part of how that works?

Regards
Neil

> 	Joerg
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 17:11                   ` Neil Horman
@ 2010-04-01 20:14                     ` Joerg Roedel
  2010-04-02  0:00                       ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-04-01 20:14 UTC (permalink / raw)
  To: Neil Horman
  Cc: Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman,
	Vivek Goyal

On Thu, Apr 01, 2010 at 01:11:49PM -0400, Neil Horman wrote:
> On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote:

> > The possible fix will be to enable the hardware earlier in the
> > initialization path.
> > 
> That sounds like a reasonable theory, I'll try hack something together
> shortly.

Great. So the problem might be already fixed when I am back in the
office ;-)

> > This would only prevent possible data corruption. When the IOMMU is off
> > the devices will not get a target abort but will only write to different
> > physical memory locations. The window where a target abort can happen
> > starts when the kdump kernel re-enables the IOMMU and ends when the new
> > driver for that device attaches. This is a small window but there is not
> > a lot we can do to avoid this small time window.
> > 
> Can you explain this a bit further please?  From what I read, when the iommu is
> disabled, AIUI it does no translations.  That means that any dma addresses which
> the driver mapped via the iommu prior to a crash that are stored in devices will
> just get strobed on the bus without any translation.  If those dma address do
> not lay on top of any physical ram, won't that lead to bus errors, and
> transaction aborts?  Worse, if those dma addresses do lie on top of real
> physical addresses, won't we get corruption in various places?  Or am I missing
> part of how that works?

Hm, the device address may not be a valid host physical address, thats
true. But the problem with the small time-window when the IOMMU hardware
is re-programmed from the kdump kernel still exists.
I need to think about other possible side-effects of leaving the IOMMU
enabled on shutdown^Wboot into a kdump kernel.

	Joerg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 20:14                     ` Joerg Roedel
@ 2010-04-02  0:00                       ` Neil Horman
  2010-04-02  0:30                         ` Chris Wright
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-04-02  0:00 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Neil Horman, kexec, linux-kernel, hbabu, iommu, Eric W. Biederman,
	Vivek Goyal

On Thu, Apr 01, 2010 at 10:14:34PM +0200, Joerg Roedel wrote:
> On Thu, Apr 01, 2010 at 01:11:49PM -0400, Neil Horman wrote:
> > On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote:
> 
> > > The possible fix will be to enable the hardware earlier in the
> > > initialization path.
> > > 
> > That sounds like a reasonable theory, I'll try hack something together
> > shortly.
> 
> Great. So the problem might be already fixed when I am back in the
> office ;-)
> 
Don't hold your breath, but I'll try my best :)

> > > This would only prevent possible data corruption. When the IOMMU is off
> > > the devices will not get a target abort but will only write to different
> > > physical memory locations. The window where a target abort can happen
> > > starts when the kdump kernel re-enables the IOMMU and ends when the new
> > > driver for that device attaches. This is a small window but there is not
> > > a lot we can do to avoid this small time window.
> > > 
> > Can you explain this a bit further please?  From what I read, when the iommu is
> > disabled, AIUI it does no translations.  That means that any dma addresses which
> > the driver mapped via the iommu prior to a crash that are stored in devices will
> > just get strobed on the bus without any translation.  If those dma address do
> > not lay on top of any physical ram, won't that lead to bus errors, and
> > transaction aborts?  Worse, if those dma addresses do lie on top of real
> > physical addresses, won't we get corruption in various places?  Or am I missing
> > part of how that works?
> 
> Hm, the device address may not be a valid host physical address, thats
> true. But the problem with the small time-window when the IOMMU hardware
> is re-programmed from the kdump kernel still exists.
> I need to think about other possible side-effects of leaving the IOMMU
> enabled on shutdown^Wboot into a kdump kernel.
> 
I think its an interesting angle to consider.  Thats why I was talking about
cloning the old tables in the new kdump kernel and using the error log to filter
out entries that we could safely assume were complete until enough of the iommu
page tables were free, so that we could continue to hobble along in the kdump
kernel until we got to a proper reboot.  All just thought experiment of course.
I'll try tinkering with your idea above first.
Neil

> 	Joerg
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-02  0:00                       ` Neil Horman
@ 2010-04-02  0:30                         ` Chris Wright
  2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
  0 siblings, 1 reply; 41+ messages in thread
From: Chris Wright @ 2010-04-02  0:30 UTC (permalink / raw)
  To: Neil Horman
  Cc: Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu,
	Eric W. Biederman, Vivek Goyal

* Neil Horman (nhorman@tuxdriver.com) wrote:
> On Thu, Apr 01, 2010 at 10:14:34PM +0200, Joerg Roedel wrote:
> > On Thu, Apr 01, 2010 at 01:11:49PM -0400, Neil Horman wrote:
> > > On Thu, Apr 01, 2010 at 05:56:43PM +0200, Joerg Roedel wrote:
> > 
> > > > The possible fix will be to enable the hardware earlier in the
> > > > initialization path.
> > > > 
> > > That sounds like a reasonable theory, I'll try hack something together
> > > shortly.
> > 
> > Great. So the problem might be already fixed when I am back in the
> > office ;-)
> > 
> Don't hold your breath, but I'll try my best :)

The problem we had before was a combo of not clearing PDE and not having
the command buffer set up properly.  Now we have basically the same
problem.  The attach is running too early, so the invalidate commands
are falling on deaf ears.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  0:30                         ` Chris Wright
@ 2010-04-02  1:23                           ` Chris Wright
  2010-04-02  1:31                             ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright
                                               ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Chris Wright @ 2010-04-02  1:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman, Vivek Goyal

Hit another kdump problem as reported by Neil Horman.  When initializaing
the IOMMU, we attach devices to their domains before the IOMMU is
fully (re)initialized.  Attaching a device will issue some important
invalidations.  In the context of the newly kexec'd kdump kernel, the
IOMMU may have stale cached data from the original kernel.  Because we
do the attach too early, the invalidation commands are placed in the new
command buffer before the IOMMU is updated w/ that buffer.  This leaves
the stale entries in the kdump context and can renders device unusable.
Simply enable the IOMMU before we do the attach.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/x86/kernel/amd_iommu_init.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void)
 	if (ret)
 		goto free;

+	enable_iommus();
+
 	if (iommu_pass_through)
 		ret = amd_iommu_init_passthrough();
 	else
@@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void)

 	amd_iommu_init_notifier();

-	enable_iommus();
-
 	if (iommu_pass_through)
 		goto out;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer
  2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
@ 2010-04-02  1:31                             ` Chris Wright
  2010-04-02  1:35                             ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman
                                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 41+ messages in thread
From: Chris Wright @ 2010-04-02  1:31 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman, Vivek Goyal

To catch future potential issues we can add a warning whenever we issue
a command before the command buffer is fully initialized.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
Or not...this is just if you think it's useful ;-)

 arch/x86/include/asm/amd_iommu_types.h |    1 +
 arch/x86/kernel/amd_iommu.c            |    1 +
 arch/x86/kernel/amd_iommu_init.c       |    5 +++--
 3 files changed, 5 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/amd_iommu_types.h
+++ b/arch/x86/include/asm/amd_iommu_types.h
@@ -140,6 +140,7 @@
 
 /* constants to configure the command buffer */
 #define CMD_BUFFER_SIZE    8192
+#define CMD_BUFFER_UNINITIALIZED 1
 #define CMD_BUFFER_ENTRIES 512
 #define MMIO_CMD_SIZE_SHIFT 56
 #define MMIO_CMD_SIZE_512 (0x9ULL << MMIO_CMD_SIZE_SHIFT)
--- a/arch/x86/kernel/amd_iommu.c
+++ b/arch/x86/kernel/amd_iommu.c
@@ -392,6 +392,7 @@ static int __iommu_queue_command(struct 
 	u32 tail, head;
 	u8 *target;
 
+	WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);;
 	tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
 	target = iommu->cmd_buf + tail;
 	memcpy_toio(target, cmd, sizeof(*cmd));
--- a/arch/x86/kernel/amd_iommu_init.c
+++ b/arch/x86/kernel/amd_iommu_init.c
@@ -436,7 +436,7 @@ static u8 * __init alloc_command_buffer(
 	if (cmd_buf == NULL)
 		return NULL;
 
-	iommu->cmd_buf_size = CMD_BUFFER_SIZE;
+	iommu->cmd_buf_size = CMD_BUFFER_SIZE | CMD_BUFFER_UNINITIALIZED;
 
 	return cmd_buf;
 }
@@ -453,6 +453,7 @@ void amd_iommu_reset_cmd_buffer(struct a
 	writel(0x00, iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
 
 	iommu_feature_enable(iommu, CONTROL_CMDBUF_EN);
+	iommu->cmd_buf_size &= ~(CMD_BUFFER_UNINITIALIZED);
 }
 
 /*
@@ -477,7 +478,7 @@ static void iommu_enable_command_buffer(
 static void __init free_command_buffer(struct amd_iommu *iommu)
 {
 	free_pages((unsigned long)iommu->cmd_buf,
-		   get_order(iommu->cmd_buf_size));
+		   get_order(iommu->cmd_buf_size & ~(CMD_BUFFER_UNINITIALIZED)));
 }
 
 /* allocates the memory where the IOMMU will log its events to */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
  2010-04-02  1:31                             ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright
@ 2010-04-02  1:35                             ` Neil Horman
  2010-04-02  1:38                               ` Chris Wright
  2010-04-02  9:11                             ` Joerg Roedel
                                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-04-02  1:35 UTC (permalink / raw)
  To: Chris Wright
  Cc: Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu,
	Eric W. Biederman, Vivek Goyal

On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote:
> Hit another kdump problem as reported by Neil Horman.  When initializaing
> the IOMMU, we attach devices to their domains before the IOMMU is
> fully (re)initialized.  Attaching a device will issue some important
> invalidations.  In the context of the newly kexec'd kdump kernel, the
> IOMMU may have stale cached data from the original kernel.  Because we
> do the attach too early, the invalidation commands are placed in the new
> command buffer before the IOMMU is updated w/ that buffer.  This leaves
> the stale entries in the kdump context and can renders device unusable.
> Simply enable the IOMMU before we do the attach.
> 
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>

I'll test this out this weekend, thanks Chris!
Neil

> ---
>  arch/x86/kernel/amd_iommu_init.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/kernel/amd_iommu_init.c
> +++ b/arch/x86/kernel/amd_iommu_init.c
> @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void)
>  	if (ret)
>  		goto free;
>  
> +	enable_iommus();
> +
>  	if (iommu_pass_through)
>  		ret = amd_iommu_init_passthrough();
>  	else
> @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void)
>  
>  	amd_iommu_init_notifier();
>  
> -	enable_iommus();
> -
>  	if (iommu_pass_through)
>  		goto out;
>  

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  1:35                             ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman
@ 2010-04-02  1:38                               ` Chris Wright
  0 siblings, 0 replies; 41+ messages in thread
From: Chris Wright @ 2010-04-02  1:38 UTC (permalink / raw)
  To: Neil Horman
  Cc: Chris Wright, Joerg Roedel, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman, Vivek Goyal

* Neil Horman (nhorman@redhat.com) wrote:
> On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote:
> > Hit another kdump problem as reported by Neil Horman.  When initializaing
> > the IOMMU, we attach devices to their domains before the IOMMU is
> > fully (re)initialized.  Attaching a device will issue some important
> > invalidations.  In the context of the newly kexec'd kdump kernel, the
> > IOMMU may have stale cached data from the original kernel.  Because we
> > do the attach too early, the invalidation commands are placed in the new
> > command buffer before the IOMMU is updated w/ that buffer.  This leaves
> > the stale entries in the kdump context and can renders device unusable.
> > Simply enable the IOMMU before we do the attach.
> > 
> > Cc: Neil Horman <nhorman@tuxdriver.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> 
> I'll test this out this weekend, thanks Chris!

Great, thanks!  I tested w/ both default and iommu=pt.  Both worked,
didn't spot any regressions.  But additional testing is very welcome.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
  2010-04-02  1:31                             ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright
  2010-04-02  1:35                             ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman
@ 2010-04-02  9:11                             ` Joerg Roedel
  2010-04-02 23:59                               ` Chris Wright
  2010-04-02 15:59                             ` Vivek Goyal
  2010-04-05 15:34                             ` Neil Horman
  4 siblings, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-04-02  9:11 UTC (permalink / raw)
  To: Chris Wright
  Cc: Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman, Vivek Goyal

On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote:
> Hit another kdump problem as reported by Neil Horman.  When initializaing
> the IOMMU, we attach devices to their domains before the IOMMU is
> fully (re)initialized.  Attaching a device will issue some important
> invalidations.  In the context of the newly kexec'd kdump kernel, the
> IOMMU may have stale cached data from the original kernel.  Because we
> do the attach too early, the invalidation commands are placed in the new
> command buffer before the IOMMU is updated w/ that buffer.  This leaves
> the stale entries in the kdump context and can renders device unusable.
> Simply enable the IOMMU before we do the attach.
> 
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
>  arch/x86/kernel/amd_iommu_init.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/kernel/amd_iommu_init.c
> +++ b/arch/x86/kernel/amd_iommu_init.c
> @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void)
>  	if (ret)
>  		goto free;
>  
> +	enable_iommus();
> +
>  	if (iommu_pass_through)
>  		ret = amd_iommu_init_passthrough();
>  	else
> @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void)
>  
>  	amd_iommu_init_notifier();
>  
> -	enable_iommus();
> -
>  	if (iommu_pass_through)
>  		goto out;

Ok, good to know this fixes the problem. One issue: If the
initialization of the domains fails the iommu hardware needs to be
disabled again in the free path.

Thanks,

	Joerg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  9:11                             ` Joerg Roedel
@ 2010-04-02 23:59                               ` Chris Wright
  0 siblings, 0 replies; 41+ messages in thread
From: Chris Wright @ 2010-04-02 23:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Chris Wright, Joerg Roedel, Neil Horman, Neil Horman, kexec,
	linux-kernel, hbabu, iommu, Eric W. Biederman, Vivek Goyal

* Joerg Roedel (joro@8bytes.org) wrote:
> Ok, good to know this fixes the problem. One issue: If the
> initialization of the domains fails the iommu hardware needs to be
> disabled again in the free path.

Sure, thanks for catching.  Will resend shortly

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
                                               ` (2 preceding siblings ...)
  2010-04-02  9:11                             ` Joerg Roedel
@ 2010-04-02 15:59                             ` Vivek Goyal
  2010-04-02 22:38                               ` Chris Wright
  2010-04-03 17:38                               ` Joerg Roedel
  2010-04-05 15:34                             ` Neil Horman
  4 siblings, 2 replies; 41+ messages in thread
From: Vivek Goyal @ 2010-04-02 15:59 UTC (permalink / raw)
  To: Chris Wright
  Cc: Joerg Roedel, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman

On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote:
> Hit another kdump problem as reported by Neil Horman.  When initializaing
> the IOMMU, we attach devices to their domains before the IOMMU is
> fully (re)initialized.  Attaching a device will issue some important
> invalidations.  In the context of the newly kexec'd kdump kernel, the
> IOMMU may have stale cached data from the original kernel.  Because we
> do the attach too early, the invalidation commands are placed in the new
> command buffer before the IOMMU is updated w/ that buffer.  This leaves
> the stale entries in the kdump context and can renders device unusable.
> Simply enable the IOMMU before we do the attach.
> 
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
>  arch/x86/kernel/amd_iommu_init.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/kernel/amd_iommu_init.c
> +++ b/arch/x86/kernel/amd_iommu_init.c
> @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void)
>  	if (ret)
>  		goto free;
>  
> +	enable_iommus();
> +
>  	if (iommu_pass_through)
>  		ret = amd_iommu_init_passthrough();
>  	else
> @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void)
>  
>  	amd_iommu_init_notifier();
>  
> -	enable_iommus();
> -

Ok, so now we do enable_iommu() before we attach devices and flush DTE and
domain PDE, IO TLB. That makes sense.

Following is just for my education purposes. Trying to understand better
the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be
sequence of events.

1. kernel crashes, we leave IOMMU enabled.

2. boot into capture kernel and we call enable_iommus(). This function
   first disables iommu, sets up new device table and enables iommus
   again.
	a. So during this small window when iommu is disabled and we enable
	   it back, any inflight DMA will passthrough possibly to an
	   unintended physical address as translation is disabled and it
	   can corrupt the kdump kenrel.

	b. Even after enabling the iommu, I guess we will continue to
	   use cached DTE, and translation information to handle any
	   in-flight DMA. The difference is that now iommus are enabled
	   so any in-flight DMA should go to the address as intended in
	   first kenrel and should not corrupt anything.

3. Once iommus are enabled again, we allocated and initilize protection
   domains. We attach devices to domains. In the process we flush the
   DTE, PDE and IO TLBs.

	c. Looks like do_attach->set_dte_entry(), by default gives write
	   permission (IW) to all the devices. I am assuming that at
	   this point of time translation is enabled and possibly unity
	   mapped. If that's the case, any in-flight DMA will not be
	   allowed to happen at unity mapped address and this can possibly
	   corrupt the kernel? 

I understand this patch should fix the case when in second kernel a 
device is not doing DMA because of possibly cached DTE, and translation
information. But looks like in-flight DMA issues will still need a closer
look. But that is a separate issue and needs to be addressed in separate
set of patches.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02 15:59                             ` Vivek Goyal
@ 2010-04-02 22:38                               ` Chris Wright
  2010-04-02 22:55                                 ` Eric W. Biederman
  2010-04-03 17:38                               ` Joerg Roedel
  1 sibling, 1 reply; 41+ messages in thread
From: Chris Wright @ 2010-04-02 22:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Chris Wright, Joerg Roedel, Neil Horman, Neil Horman, kexec,
	linux-kernel, hbabu, iommu, Eric W. Biederman

* Vivek Goyal (vgoyal@redhat.com) wrote:
> Following is just for my education purposes. Trying to understand better
> the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be
> sequence of events.
> 
> 1. kernel crashes, we leave IOMMU enabled.

No, we actually disable the IOMMU

panic()
  crash_exec()
    machine_crash_shutdown()
      native_machine_crash_shutdown()
        x86_platform.iommu_shutdown() == amd_iommu_init.c::disable_iommus()

> 2. boot into capture kernel and we call enable_iommus(). This function
>    first disables iommu, sets up new device table and enables iommus
>    again.
> 	a. So during this small window when iommu is disabled and we enable
> 	   it back, any inflight DMA will passthrough possibly to an
> 	   unintended physical address as translation is disabled and it
> 	   can corrupt the kdump kenrel.

Yes, should be possible.

> 	b. Even after enabling the iommu, I guess we will continue to
> 	   use cached DTE, and translation information to handle any
> 	   in-flight DMA. The difference is that now iommus are enabled
> 	   so any in-flight DMA should go to the address as intended in
> 	   first kenrel and should not corrupt anything.

Cached DTE only gets you domainID and pt root, so results depend upon
other caches too.

> 3. Once iommus are enabled again, we allocated and initilize protection
>    domains. We attach devices to domains. In the process we flush the
>    DTE, PDE and IO TLBs.

Yes, but this is immediately after 2b above.  We can't send the invalidate
commands until the iommu is enabled.

> 	c. Looks like do_attach->set_dte_entry(), by default gives write
> 	   permission (IW) to all the devices. I am assuming that at
> 	   this point of time translation is enabled and possibly unity
> 	   mapped.

Unlikely to have full unity map, it's typically just for a range,
and this is also typically for devices that have interaction w/ BIOS
(typical examples are usb/SMM for keyboard and integrated graphics).
And the mapping is to reserved memory areas (BIOS regions).  On my test
box, for example, there are none of the BIOS specified ranges.

>	   If that's the case, any in-flight DMA will not be
> 	   allowed to happen at unity mapped address and this can possibly
> 	   corrupt the kernel? 

I don't understand what you mean there.

> I understand this patch should fix the case when in second kernel a 
> device is not doing DMA because of possibly cached DTE, and translation
> information. But looks like in-flight DMA issues will still need a closer
> look. But that is a separate issue and needs to be addressed in separate
> set of patches.

All of the in-flight DMA corrupts kdump kernel memory issues can exist
w/out an IOMMU.  On the one hand, the IOMMU actually reduces the window
of opportunity, but on the other it adds possible translation issue.

About the only thing we can do here is modify the crashing kernel to
update the DTE to disable dma, invalidate, and leave IOMMU enabled.
This is all in the context of crashing, so any further errors here mean
we probably won't get to kdump kernel.

The kexec'd kernel will still need to defensively prepare things as it
always does since it can't trust anything that came before it.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02 22:38                               ` Chris Wright
@ 2010-04-02 22:55                                 ` Eric W. Biederman
  2010-04-02 23:57                                   ` Chris Wright
  0 siblings, 1 reply; 41+ messages in thread
From: Eric W. Biederman @ 2010-04-02 22:55 UTC (permalink / raw)
  To: Chris Wright
  Cc: Vivek Goyal, Joerg Roedel, Neil Horman, Neil Horman, kexec,
	linux-kernel, hbabu, iommu

Chris Wright <chrisw@sous-sol.org> writes:

> * Vivek Goyal (vgoyal@redhat.com) wrote:
>> Following is just for my education purposes. Trying to understand better
>> the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be
>> sequence of events.
>> 
>> 1. kernel crashes, we leave IOMMU enabled.
>
> No, we actually disable the IOMMU
>
> panic()
>   crash_exec()
>     machine_crash_shutdown()
>       native_machine_crash_shutdown()
>         x86_platform.iommu_shutdown() == amd_iommu_init.c::disable_iommus()

That call exists but is there any good reason for it?
Nothing should be on that can be reasonably done anywhere else.
This round of bug fixing and testing would be a good time to remove
that extraneous call.

>> 2. boot into capture kernel and we call enable_iommus(). This function
>>    first disables iommu, sets up new device table and enables iommus
>>    again.
>> 	a. So during this small window when iommu is disabled and we enable
>> 	   it back, any inflight DMA will passthrough possibly to an
>> 	   unintended physical address as translation is disabled and it
>> 	   can corrupt the kdump kenrel.
>
> Yes, should be possible.
>
>> 	b. Even after enabling the iommu, I guess we will continue to
>> 	   use cached DTE, and translation information to handle any
>> 	   in-flight DMA. The difference is that now iommus are enabled
>> 	   so any in-flight DMA should go to the address as intended in
>> 	   first kenrel and should not corrupt anything.
>
> Cached DTE only gets you domainID and pt root, so results depend upon
> other caches too.
>
>> 3. Once iommus are enabled again, we allocated and initilize protection
>>    domains. We attach devices to domains. In the process we flush the
>>    DTE, PDE and IO TLBs.
>
> Yes, but this is immediately after 2b above.  We can't send the invalidate
> commands until the iommu is enabled.
>
>> 	c. Looks like do_attach->set_dte_entry(), by default gives write
>> 	   permission (IW) to all the devices. I am assuming that at
>> 	   this point of time translation is enabled and possibly unity
>> 	   mapped.
>
> Unlikely to have full unity map, it's typically just for a range,
> and this is also typically for devices that have interaction w/ BIOS
> (typical examples are usb/SMM for keyboard and integrated graphics).
> And the mapping is to reserved memory areas (BIOS regions).  On my test
> box, for example, there are none of the BIOS specified ranges.
>
>>	   If that's the case, any in-flight DMA will not be
>> 	   allowed to happen at unity mapped address and this can possibly
>> 	   corrupt the kernel? 
>
> I don't understand what you mean there.
>
>> I understand this patch should fix the case when in second kernel a 
>> device is not doing DMA because of possibly cached DTE, and translation
>> information. But looks like in-flight DMA issues will still need a closer
>> look. But that is a separate issue and needs to be addressed in separate
>> set of patches.
>
> All of the in-flight DMA corrupts kdump kernel memory issues can exist
> w/out an IOMMU.  On the one hand, the IOMMU actually reduces the window
> of opportunity, but on the other it adds possible translation issue.

?
If disabling the IOMMU changes the destination in RAM of in-flight DMA
the issues are not at all the same.

> About the only thing we can do here is modify the crashing kernel to
> update the DTE to disable dma, invalidate, and leave IOMMU enabled.
> This is all in the context of crashing, so any further errors here mean
> we probably won't get to kdump kernel.

Why even touch the IOMMU?

> The kexec'd kernel will still need to defensively prepare things as it
> always does since it can't trust anything that came before it.


Yes.

Eric


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02 22:55                                 ` Eric W. Biederman
@ 2010-04-02 23:57                                   ` Chris Wright
  0 siblings, 0 replies; 41+ messages in thread
From: Chris Wright @ 2010-04-02 23:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Chris Wright, Vivek Goyal, Joerg Roedel, Neil Horman, Neil Horman,
	kexec, linux-kernel, hbabu, iommu

* Eric W. Biederman (ebiederm@xmission.com) wrote:
> Chris Wright <chrisw@sous-sol.org> writes:
> 
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> >> Following is just for my education purposes. Trying to understand better
> >> the impact of in-flight DMAs. So IIUC, in kudmp context following seems to be
> >> sequence of events.
> >> 
> >> 1. kernel crashes, we leave IOMMU enabled.
> >
> > No, we actually disable the IOMMU
> >
> > panic()
> >   crash_exec()
> >     machine_crash_shutdown()
> >       native_machine_crash_shutdown()
> >         x86_platform.iommu_shutdown() == amd_iommu_init.c::disable_iommus()
> 
> That call exists but is there any good reason for it?

It was originally added it for this very case, although I don't think it
should be there.  I too am a proponent of removing it, because I think
all work should be done in the newly kexec'd kernel.

> Nothing should be on that can be reasonably done anywhere else.
> This round of bug fixing and testing would be a good time to remove
> that extraneous call.
> 
> >> 2. boot into capture kernel and we call enable_iommus(). This function
> >>    first disables iommu, sets up new device table and enables iommus
> >>    again.
> >> 	a. So during this small window when iommu is disabled and we enable
> >> 	   it back, any inflight DMA will passthrough possibly to an
> >> 	   unintended physical address as translation is disabled and it
> >> 	   can corrupt the kdump kenrel.
> >
> > Yes, should be possible.
> >
> >> 	b. Even after enabling the iommu, I guess we will continue to
> >> 	   use cached DTE, and translation information to handle any
> >> 	   in-flight DMA. The difference is that now iommus are enabled
> >> 	   so any in-flight DMA should go to the address as intended in
> >> 	   first kenrel and should not corrupt anything.
> >
> > Cached DTE only gets you domainID and pt root, so results depend upon
> > other caches too.
> >
> >> 3. Once iommus are enabled again, we allocated and initilize protection
> >>    domains. We attach devices to domains. In the process we flush the
> >>    DTE, PDE and IO TLBs.
> >
> > Yes, but this is immediately after 2b above.  We can't send the invalidate
> > commands until the iommu is enabled.
> >
> >> 	c. Looks like do_attach->set_dte_entry(), by default gives write
> >> 	   permission (IW) to all the devices. I am assuming that at
> >> 	   this point of time translation is enabled and possibly unity
> >> 	   mapped.
> >
> > Unlikely to have full unity map, it's typically just for a range,
> > and this is also typically for devices that have interaction w/ BIOS
> > (typical examples are usb/SMM for keyboard and integrated graphics).
> > And the mapping is to reserved memory areas (BIOS regions).  On my test
> > box, for example, there are none of the BIOS specified ranges.
> >
> >>	   If that's the case, any in-flight DMA will not be
> >> 	   allowed to happen at unity mapped address and this can possibly
> >> 	   corrupt the kernel? 
> >
> > I don't understand what you mean there.
> >
> >> I understand this patch should fix the case when in second kernel a 
> >> device is not doing DMA because of possibly cached DTE, and translation
> >> information. But looks like in-flight DMA issues will still need a closer
> >> look. But that is a separate issue and needs to be addressed in separate
> >> set of patches.
> >
> > All of the in-flight DMA corrupts kdump kernel memory issues can exist
> > w/out an IOMMU.  On the one hand, the IOMMU actually reduces the window
> > of opportunity, but on the other it adds possible translation issue.
> 
> ?
> If disabling the IOMMU changes the destination in RAM of in-flight DMA
> the issues are not at all the same.
> 
> > About the only thing we can do here is modify the crashing kernel to
> > update the DTE to disable dma, invalidate, and leave IOMMU enabled.
> > This is all in the context of crashing, so any further errors here mean
> > we probably won't get to kdump kernel.
> 
> Why even touch the IOMMU?

I agree, and was trying to show that it's risky to do that while crashing.

Removing the IOMMU disable may still allow some DMA to succeed.
To actually close that off, you have to force all DMA transactions
to fail.  Personally, I'm skeptical that the tradeoff is worth it.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02 15:59                             ` Vivek Goyal
  2010-04-02 22:38                               ` Chris Wright
@ 2010-04-03 17:38                               ` Joerg Roedel
  2010-04-05 14:17                                 ` Vivek Goyal
  1 sibling, 1 reply; 41+ messages in thread
From: Joerg Roedel @ 2010-04-03 17:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman

On Fri, Apr 02, 2010 at 11:59:32AM -0400, Vivek Goyal wrote:
> 1. kernel crashes, we leave IOMMU enabled.

True for everything except gart and amd iommu.

> 	a. So during this small window when iommu is disabled and we enable
> 	   it back, any inflight DMA will passthrough possibly to an
> 	   unintended physical address as translation is disabled and it
> 	   can corrupt the kdump kenrel.

Right.

> 	b. Even after enabling the iommu, I guess we will continue to
> 	   use cached DTE, and translation information to handle any
> 	   in-flight DMA. The difference is that now iommus are enabled
> 	   so any in-flight DMA should go to the address as intended in
> 	   first kenrel and should not corrupt anything.

Right.

> 
> 3. Once iommus are enabled again, we allocated and initilize protection
>    domains. We attach devices to domains. In the process we flush the
>    DTE, PDE and IO TLBs.
> 
> 	c. Looks like do_attach->set_dte_entry(), by default gives write
> 	   permission (IW) to all the devices. I am assuming that at
> 	   this point of time translation is enabled and possibly unity
> 	   mapped.

No, The IW bit in the DTE must be set because all write permission bits
(DTE and page tabled) are ANDed to determine if a device can write to a
particular address. So as long as the paging mode is unequal to zero the
hardware will walk the page-table first to find out if the device has
write permission. With paging mode == 0 your statement about read-write
unity-mapping is true. This is used for a pass-through domain (iommu=pt)
btw.

	Joerg


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-03 17:38                               ` Joerg Roedel
@ 2010-04-05 14:17                                 ` Vivek Goyal
  2010-04-05 14:32                                   ` Joerg Roedel
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2010-04-05 14:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman

On Sat, Apr 03, 2010 at 07:38:36PM +0200, Joerg Roedel wrote:
> On Fri, Apr 02, 2010 at 11:59:32AM -0400, Vivek Goyal wrote:
> > 1. kernel crashes, we leave IOMMU enabled.
> 
> True for everything except gart and amd iommu.
> 
> > 	a. So during this small window when iommu is disabled and we enable
> > 	   it back, any inflight DMA will passthrough possibly to an
> > 	   unintended physical address as translation is disabled and it
> > 	   can corrupt the kdump kenrel.
> 
> Right.
> 
> > 	b. Even after enabling the iommu, I guess we will continue to
> > 	   use cached DTE, and translation information to handle any
> > 	   in-flight DMA. The difference is that now iommus are enabled
> > 	   so any in-flight DMA should go to the address as intended in
> > 	   first kenrel and should not corrupt anything.
> 
> Right.
> 
> > 
> > 3. Once iommus are enabled again, we allocated and initilize protection
> >    domains. We attach devices to domains. In the process we flush the
> >    DTE, PDE and IO TLBs.
> > 
> > 	c. Looks like do_attach->set_dte_entry(), by default gives write
> > 	   permission (IW) to all the devices. I am assuming that at
> > 	   this point of time translation is enabled and possibly unity
> > 	   mapped.
> 
> No, The IW bit in the DTE must be set because all write permission bits
> (DTE and page tabled) are ANDed to determine if a device can write to a
> particular address. So as long as the paging mode is unequal to zero the
> hardware will walk the page-table first to find out if the device has
> write permission.

And by default valid PTEs are not present (except for some unity mappings
as specified by ACPI tables), so we will end the transaction with
IO_PAGE_FAULT? I am assuming that we will not set unity mappings for
kernel reserved area and so either an in-flight DMA will not be allowed
and IO_PAGE_FAULT will be logged or it will be allowed to some unity
mapping which is not mapped to kdump kernel area hence no corruption of
capture kernel?

> With paging mode == 0 your statement about read-write
> unity-mapping is true. This is used for a pass-through domain (iommu=pt)
> btw.

Ok, so in case of pass through, I think one just needs to make sure that
don't use iommu=pt in second kernel if one did not use iommu=pt in first kernel.
Otherwise you can redirect the the in-flight DMAs in second kernel to an
entirely unintended physical memory.


So following seems to be the summary.

- Don't disable AMD IOMMU after crash in machine_crash_shutdown(), because
  disabling it can direct in-flight DMAs to unintended physical meory
  areas and can corrupt other data structures.

- Once the iommu is enabled in second kernel, most likely in-flight DMAs
  will end with IO_PAGE_FAULT (iommu!=pt). Only selective unity mapping
  areas will be setup based on ACPI tables and these should be BIOS region
  and should not overlap with kdump reserved memory. iommu=pt should also
  be safe if iommu=pt was used in first kernel also.  

- Only small window where in-flight DMA can corrupt things is when we
  are initializing iommu in second kernel. (We first disable iommu and then
  enable it back). During this small period translation will be disabled and
  some IO can go to unintended address. And there does not seem to be any easy
  way to plug this hole.

Have I got it right?
 
Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-05 14:17                                 ` Vivek Goyal
@ 2010-04-05 14:32                                   ` Joerg Roedel
  0 siblings, 0 replies; 41+ messages in thread
From: Joerg Roedel @ 2010-04-05 14:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Chris Wright, Neil Horman, Neil Horman, kexec, linux-kernel,
	hbabu, iommu, Eric W. Biederman

On Mon, Apr 05, 2010 at 10:17:50AM -0400, Vivek Goyal wrote:

> And by default valid PTEs are not present (except for some unity mappings
> as specified by ACPI tables), so we will end the transaction with
> IO_PAGE_FAULT? I am assuming that we will not set unity mappings for
> kernel reserved area and so either an in-flight DMA will not be allowed
> and IO_PAGE_FAULT will be logged or it will be allowed to some unity
> mapping which is not mapped to kdump kernel area hence no corruption of
> capture kernel?

Right. The unity-mappings are typically used for devices that are
controled by the BIOS and define memory regions owned by the BIOS. So
Linux will not use the unity mapped regions anyway, not in the first
kernel and not in the kdump kernel.

> > With paging mode == 0 your statement about read-write
> > unity-mapping is true. This is used for a pass-through domain (iommu=pt)
> > btw.
> 
> Ok, so in case of pass through, I think one just needs to make sure that
> don't use iommu=pt in second kernel if one did not use iommu=pt in first kernel.
> Otherwise you can redirect the the in-flight DMAs in second kernel to an
> entirely unintended physical memory.

The kdump kernel should use the same setting as the plain kernel.

> So following seems to be the summary.
> 
> - Don't disable AMD IOMMU after crash in machine_crash_shutdown(), because
>   disabling it can direct in-flight DMAs to unintended physical meory
>   areas and can corrupt other data structures.

Right, that really seems to be the best solution.

> - Once the iommu is enabled in second kernel, most likely in-flight DMAs
>   will end with IO_PAGE_FAULT (iommu!=pt). Only selective unity mapping
>   areas will be setup based on ACPI tables and these should be BIOS region
>   and should not overlap with kdump reserved memory. iommu=pt should also
>   be safe if iommu=pt was used in first kernel also.

Right. With Chris' patches the DTE entries of newly attached domains are
flushed at IOMMU initialization in the kdump kernel. So the new data
structures are in place and used by the hardware.

> - Only small window where in-flight DMA can corrupt things is when we
>   are initializing iommu in second kernel. (We first disable iommu and then
>   enable it back). During this small period translation will be disabled and
>   some IO can go to unintended address. And there does not seem to be any easy
>   way to plug this hole.

Right.

> Have I got it right?

Yes :-)


	Joerg

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices
  2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
                                               ` (3 preceding siblings ...)
  2010-04-02 15:59                             ` Vivek Goyal
@ 2010-04-05 15:34                             ` Neil Horman
  4 siblings, 0 replies; 41+ messages in thread
From: Neil Horman @ 2010-04-05 15:34 UTC (permalink / raw)
  To: Chris Wright
  Cc: Joerg Roedel, Neil Horman, kexec, linux-kernel, hbabu, iommu,
	Eric W. Biederman, Vivek Goyal

On Thu, Apr 01, 2010 at 06:23:53PM -0700, Chris Wright wrote:
> Hit another kdump problem as reported by Neil Horman.  When initializaing
> the IOMMU, we attach devices to their domains before the IOMMU is
> fully (re)initialized.  Attaching a device will issue some important
> invalidations.  In the context of the newly kexec'd kdump kernel, the
> IOMMU may have stale cached data from the original kernel.  Because we
> do the attach too early, the invalidation commands are placed in the new
> command buffer before the IOMMU is updated w/ that buffer.  This leaves
> the stale entries in the kdump context and can renders device unusable.
> Simply enable the IOMMU before we do the attach.
> 
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
>  arch/x86/kernel/amd_iommu_init.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/kernel/amd_iommu_init.c
> +++ b/arch/x86/kernel/amd_iommu_init.c
> @@ -1288,6 +1288,8 @@ static int __init amd_iommu_init(void)
>  	if (ret)
>  		goto free;
>  
> +	enable_iommus();
> +
>  	if (iommu_pass_through)
>  		ret = amd_iommu_init_passthrough();
>  	else
> @@ -1300,8 +1302,6 @@ static int __init amd_iommu_init(void)
>  
>  	amd_iommu_init_notifier();
>  
> -	enable_iommus();
> -
>  	if (iommu_pass_through)
>  		goto out;
>  
> 

FWIW, this patch series fixed the kdump issues I was having on my AMD system
here.  Thanks Chris

Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 15:54 ` Vivek Goyal
  2010-03-31 18:28   ` Neil Horman
@ 2010-03-31 18:43   ` Eric W. Biederman
  1 sibling, 0 replies; 41+ messages in thread
From: Eric W. Biederman @ 2010-03-31 18:43 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Neil Horman, iommu, joerg.roedel, hbabu, kexec, linux-kernel

Vivek Goyal <vgoyal@redhat.com> writes:

> On Wed, Mar 31, 2010 at 11:24:17AM -0400, Neil Horman wrote:
>> Flush iommu during shutdown
>> 
>> When using an iommu, its possible, if a kdump kernel boot follows a primary
>> kernel crash, that dma operations might still be in flight from the previous
>> kernel during the kdump kernel boot.  This can lead to memory corruption,
>> crashes, and other erroneous behavior, specifically I've seen it manifest during
>> a kdump boot as endless iommu error log entries of the form:
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
>> address=0x000000000245a0c0 flags=0x0070]
>> 
>> Followed by an inability to access hard drives, and various other resources.
>> 
>> I've written this fix for it.  In short it just forces a flush of the in flight
>> dma operations on shutdown, so that the new kernel is certain not to have any
>> in-flight dmas trying to complete after we've reset all the iommu page tables,
>> causing the above errors.  I've tested it and it fixes the problem for me quite
>> well.
>
> CCing Eric also.

Thanks.

> Neil, this is interesting. In the past we noticed similar issues,
> especially on PPC. But I was told that we could not clear the iommu
> mapping entries as we had no control on in flight DMA and if a DMA comes
> later after clearing an entry and entry is not present, it is an error.

Which is exactly what the reported error looks like.

> Hence one of the suggestions was not to clear iommu mapping entries but
> reserve some for kdump operation and use those in kdump kernel.
>
> So this call amd_iommu_flush_all_devices() will be able to tell devices
> that don't do any more DMAs and hence it is safe to reprogram iommu
> mapping entries.

I took a quick look at our crash shutdown path and I am very disappointed
with the way it has gone lately.

Regardless of the merits flushing an iommu versus doing things with an
iommu I don't see how we are in any better position in the crash kernel
then we are in the kdump kernel.  So what are we doing touching it
in the kdump path?

Likewise for the hpet.

We also seem to be at a point where if we have a tsc we don't need to
enable interrupts until we are ready to enable them in native mode.  And
except for a few weird SMP 486's tsc and apics came in at the same time.

So my grumpy code review says we should gut crash.c (like below) and
fix the initialization paths so they do the right thing.

---
 arch/x86/kernel/crash.c |   18 ------------------
 1 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index a4849c1..8e33c50 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -22,12 +22,10 @@
 #include <asm/nmi.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
-#include <asm/hpet.h>
 #include <linux/kdebug.h>
 #include <asm/cpu.h>
 #include <asm/reboot.h>
 #include <asm/virtext.h>
-#include <asm/x86_init.h>
 
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
@@ -56,15 +54,11 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
 	 */
 	cpu_emergency_vmxoff();
 	cpu_emergency_svm_disable();
-
-	disable_local_APIC();
 }
 
 static void kdump_nmi_shootdown_cpus(void)
 {
 	nmi_shootdown_cpus(kdump_nmi_callback);
-
-	disable_local_APIC();
 }
 
 #else
@@ -96,17 +90,5 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	cpu_emergency_vmxoff();
 	cpu_emergency_svm_disable();
 
-	lapic_shutdown();
-#if defined(CONFIG_X86_IO_APIC)
-	disable_IO_APIC();
-#endif
-#ifdef CONFIG_HPET_TIMER
-	hpet_disable();
-#endif
-
-#ifdef CONFIG_X86_64
-	x86_platform.iommu_shutdown();
-#endif
-
 	crash_save_cpu(regs, safe_smp_processor_id());
 }
-- 
1.6.5.2.143.g8cc62






^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 15:24 [PATCH] amd iommu: force flush of iommu prior during shutdown Neil Horman
  2010-03-31 15:54 ` Vivek Goyal
@ 2010-03-31 21:25 ` Chris Wright
  2010-04-01  1:13   ` Neil Horman
  2010-04-01  2:44   ` Vivek Goyal
  1 sibling, 2 replies; 41+ messages in thread
From: Chris Wright @ 2010-03-31 21:25 UTC (permalink / raw)
  To: Neil Horman; +Cc: iommu, joerg.roedel, vgoyal, hbabu, kexec, linux-kernel

* Neil Horman (nhorman@tuxdriver.com) wrote:
> Flush iommu during shutdown
> 
> When using an iommu, its possible, if a kdump kernel boot follows a primary
> kernel crash, that dma operations might still be in flight from the previous
> kernel during the kdump kernel boot.  This can lead to memory corruption,
> crashes, and other erroneous behavior, specifically I've seen it manifest during
> a kdump boot as endless iommu error log entries of the form:
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> address=0x000000000245a0c0 flags=0x0070]

We've already fixed this problem once before, so some code shift must
have brought it back.  Personally, I prefer to do this on the bringup
path than the teardown path.  Besides keeping the teardown path as
simple as possible (goal is to get to kdump kernel asap), there's also
reason to competely flush on startup in genernal in case BIOS has done
anything unsavory.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 21:25 ` Chris Wright
@ 2010-04-01  1:13   ` Neil Horman
  2010-04-01  1:39     ` Chris Wright
  2010-04-01  2:24     ` Vivek Goyal
  2010-04-01  2:44   ` Vivek Goyal
  1 sibling, 2 replies; 41+ messages in thread
From: Neil Horman @ 2010-04-01  1:13 UTC (permalink / raw)
  To: Chris Wright; +Cc: iommu, joerg.roedel, vgoyal, hbabu, kexec, linux-kernel

On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> * Neil Horman (nhorman@tuxdriver.com) wrote:
> > Flush iommu during shutdown
> > 
> > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > kernel crash, that dma operations might still be in flight from the previous
> > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > a kdump boot as endless iommu error log entries of the form:
> > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > address=0x000000000245a0c0 flags=0x0070]
> 
> We've already fixed this problem once before, so some code shift must
> have brought it back.  Personally, I prefer to do this on the bringup
> path than the teardown path.  Besides keeping the teardown path as
> simple as possible (goal is to get to kdump kernel asap), there's also
> reason to competely flush on startup in genernal in case BIOS has done
> anything unsavory.
> 
Chris,
	Can you elaborate on what you did with the iommu to make this safe?  It
will save me time digging through the history on this code, and help me
understand better whats going on here.

I was starting to think that we should just leave the iommu on through a kdump,
and re-construct a new page table based on the old table (filtered by the error
log) on kdump boot, but it sounds like a better solution might be in place.

Thanks
Neil

> thanks,
> -chris
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01  1:13   ` Neil Horman
@ 2010-04-01  1:39     ` Chris Wright
  2010-04-01  2:24     ` Vivek Goyal
  1 sibling, 0 replies; 41+ messages in thread
From: Chris Wright @ 2010-04-01  1:39 UTC (permalink / raw)
  To: Neil Horman
  Cc: Chris Wright, iommu, joerg.roedel, vgoyal, hbabu, kexec,
	linux-kernel

* Neil Horman (nhorman@tuxdriver.com) wrote:
> On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > Flush iommu during shutdown
> > > 
> > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > kernel crash, that dma operations might still be in flight from the previous
> > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > a kdump boot as endless iommu error log entries of the form:
> > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > address=0x000000000245a0c0 flags=0x0070]
> > 
> > We've already fixed this problem once before, so some code shift must
> > have brought it back.  Personally, I prefer to do this on the bringup
> > path than the teardown path.  Besides keeping the teardown path as
> > simple as possible (goal is to get to kdump kernel asap), there's also
> > reason to competely flush on startup in genernal in case BIOS has done
> > anything unsavory.
> > 
> Chris,
> 	Can you elaborate on what you did with the iommu to make this safe?  It
> will save me time digging through the history on this code, and help me
> understand better whats going on here.
> 
> I was starting to think that we should just leave the iommu on through a kdump,
> and re-construct a new page table based on the old table (filtered by the error
> log) on kdump boot, but it sounds like a better solution might be in place.

The code used to simply insure a clean slate on startup by flushing the
relevant domain table entry and the cached translations as devices were
attached (happens during init of the kernel either base kernel or kdump
one).

See here:

42a49f965a8d24ed92af04f5b564d63f17fd9c56
a8c485bb6857811807d42f9fd1fde2f5f89cc5c9

What's changed is the initialization doesn't appear to do the proper
flushes anymore.  Your patch has the effect of puting the back, but
during shtudown rather than initialization.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01  1:13   ` Neil Horman
  2010-04-01  1:39     ` Chris Wright
@ 2010-04-01  2:24     ` Vivek Goyal
  2010-04-01 12:53       ` Neil Horman
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2010-04-01  2:24 UTC (permalink / raw)
  To: Neil Horman; +Cc: Chris Wright, iommu, joerg.roedel, hbabu, kexec, linux-kernel

On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote:
> On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > Flush iommu during shutdown
> > > 
> > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > kernel crash, that dma operations might still be in flight from the previous
> > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > a kdump boot as endless iommu error log entries of the form:
> > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > address=0x000000000245a0c0 flags=0x0070]
> > 
> > We've already fixed this problem once before, so some code shift must
> > have brought it back.  Personally, I prefer to do this on the bringup
> > path than the teardown path.  Besides keeping the teardown path as
> > simple as possible (goal is to get to kdump kernel asap), there's also
> > reason to competely flush on startup in genernal in case BIOS has done
> > anything unsavory.
> > 
> Chris,
> 	Can you elaborate on what you did with the iommu to make this safe?  It
> will save me time digging through the history on this code, and help me
> understand better whats going on here.
> 
> I was starting to think that we should just leave the iommu on through a kdump,
> and re-construct a new page table based on the old table (filtered by the error
> log) on kdump boot, but it sounds like a better solution might be in place.
> 

Hi Neil,

Is following sequence possible.

- In crashed kernel, take away the write permission from all the devices.
  Mark bit 62 zero for all devices in device table.

- Leave the iommu on and let the device entries be valid in kdump kernel
  so that any in-flight dma does not become pass through (which can cause
  more damage and corrupt kdump kernel).

- During kdump kernel initialization, load a new device table where again
  all the devices don't have write permission. looks like by default
  we create a device table with all bits zero except DEV_ENTRY_VALID
  and DEV_ENTRY_TRANSLATION bit.

- Reset the device where we want to setup any dma or operate on.

- Allow device to do DMA/write.

So by default all the devices will not be able to do write to memory 
and selective devices are given access only after a reset.

I am not sure what are the dependencies for loading a new device table
in second kernel. If it requires disabling the IOMMU, then we leave a
window where in-flight dma will become passthrough and has the potential
to corrupt kdump kernel.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01  2:24     ` Vivek Goyal
@ 2010-04-01 12:53       ` Neil Horman
  2010-04-01 15:02         ` Vivek Goyal
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2010-04-01 12:53 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Neil Horman, joerg.roedel, kexec, linux-kernel, Chris Wright,
	hbabu, iommu

On Wed, Mar 31, 2010 at 10:24:18PM -0400, Vivek Goyal wrote:
> On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote:
> > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > > Flush iommu during shutdown
> > > > 
> > > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > > kernel crash, that dma operations might still be in flight from the previous
> > > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > > a kdump boot as endless iommu error log entries of the form:
> > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > > address=0x000000000245a0c0 flags=0x0070]
> > > 
> > > We've already fixed this problem once before, so some code shift must
> > > have brought it back.  Personally, I prefer to do this on the bringup
> > > path than the teardown path.  Besides keeping the teardown path as
> > > simple as possible (goal is to get to kdump kernel asap), there's also
> > > reason to competely flush on startup in genernal in case BIOS has done
> > > anything unsavory.
> > > 
> > Chris,
> > 	Can you elaborate on what you did with the iommu to make this safe?  It
> > will save me time digging through the history on this code, and help me
> > understand better whats going on here.
> > 
> > I was starting to think that we should just leave the iommu on through a kdump,
> > and re-construct a new page table based on the old table (filtered by the error
> > log) on kdump boot, but it sounds like a better solution might be in place.
> > 
> 
> Hi Neil,
> 
> Is following sequence possible.
> 
> - In crashed kernel, take away the write permission from all the devices.
>   Mark bit 62 zero for all devices in device table.
> 
> - Leave the iommu on and let the device entries be valid in kdump kernel
>   so that any in-flight dma does not become pass through (which can cause
>   more damage and corrupt kdump kernel).
> 
> - During kdump kernel initialization, load a new device table where again
>   all the devices don't have write permission. looks like by default
>   we create a device table with all bits zero except DEV_ENTRY_VALID
>   and DEV_ENTRY_TRANSLATION bit.
> 
> - Reset the device where we want to setup any dma or operate on.
> 
> - Allow device to do DMA/write.
> 
> So by default all the devices will not be able to do write to memory 
> and selective devices are given access only after a reset.
> 
> I am not sure what are the dependencies for loading a new device table
> in second kernel. If it requires disabling the IOMMU, then we leave a
> window where in-flight dma will become passthrough and has the potential
> to corrupt kdump kernel.
> 
I think this is possible, but I'm a bit concerned with how some devices will
handle a reset.  For instance, what will happen to an HBA or a disk, if we reset
it as the module is loading?  Is that safe?

Neil

> Thanks
> Vivek 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 12:53       ` Neil Horman
@ 2010-04-01 15:02         ` Vivek Goyal
  2010-04-01 15:13           ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2010-04-01 15:02 UTC (permalink / raw)
  To: Neil Horman
  Cc: Neil Horman, joerg.roedel, kexec, linux-kernel, Chris Wright,
	hbabu, iommu

On Thu, Apr 01, 2010 at 08:53:04AM -0400, Neil Horman wrote:
> On Wed, Mar 31, 2010 at 10:24:18PM -0400, Vivek Goyal wrote:
> > On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote:
> > > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > > > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > > > Flush iommu during shutdown
> > > > > 
> > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > > > kernel crash, that dma operations might still be in flight from the previous
> > > > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > > > a kdump boot as endless iommu error log entries of the form:
> > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > > > address=0x000000000245a0c0 flags=0x0070]
> > > > 
> > > > We've already fixed this problem once before, so some code shift must
> > > > have brought it back.  Personally, I prefer to do this on the bringup
> > > > path than the teardown path.  Besides keeping the teardown path as
> > > > simple as possible (goal is to get to kdump kernel asap), there's also
> > > > reason to competely flush on startup in genernal in case BIOS has done
> > > > anything unsavory.
> > > > 
> > > Chris,
> > > 	Can you elaborate on what you did with the iommu to make this safe?  It
> > > will save me time digging through the history on this code, and help me
> > > understand better whats going on here.
> > > 
> > > I was starting to think that we should just leave the iommu on through a kdump,
> > > and re-construct a new page table based on the old table (filtered by the error
> > > log) on kdump boot, but it sounds like a better solution might be in place.
> > > 
> > 
> > Hi Neil,
> > 
> > Is following sequence possible.
> > 
> > - In crashed kernel, take away the write permission from all the devices.
> >   Mark bit 62 zero for all devices in device table.
> > 
> > - Leave the iommu on and let the device entries be valid in kdump kernel
> >   so that any in-flight dma does not become pass through (which can cause
> >   more damage and corrupt kdump kernel).
> > 
> > - During kdump kernel initialization, load a new device table where again
> >   all the devices don't have write permission. looks like by default
> >   we create a device table with all bits zero except DEV_ENTRY_VALID
> >   and DEV_ENTRY_TRANSLATION bit.
> > 
> > - Reset the device where we want to setup any dma or operate on.
> > 
> > - Allow device to do DMA/write.
> > 
> > So by default all the devices will not be able to do write to memory 
> > and selective devices are given access only after a reset.
> > 
> > I am not sure what are the dependencies for loading a new device table
> > in second kernel. If it requires disabling the IOMMU, then we leave a
> > window where in-flight dma will become passthrough and has the potential
> > to corrupt kdump kernel.
> > 
> I think this is possible, but I'm a bit concerned with how some devices will
> handle a reset.  For instance, what will happen to an HBA or a disk, if we reset
> it as the module is loading?  Is that safe?

I think we need to reset devices in driver if "reset_devices" is set. So
we will not reset these during normal boot.

Regarding being safe, I don't know. I am assuming that driver knows (or
need to know), how to reset device safely while driver is initializing.
That's the whole assumption kdump is built on, that once driver is
initializing, it will first reset the device (if reset_devices is set), so
that chances of device working properly in second kernel increase.

Vivek


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01 15:02         ` Vivek Goyal
@ 2010-04-01 15:13           ` Neil Horman
  0 siblings, 0 replies; 41+ messages in thread
From: Neil Horman @ 2010-04-01 15:13 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Neil Horman, joerg.roedel, kexec, linux-kernel, Chris Wright,
	hbabu, iommu

On Thu, Apr 01, 2010 at 11:02:03AM -0400, Vivek Goyal wrote:
> On Thu, Apr 01, 2010 at 08:53:04AM -0400, Neil Horman wrote:
> > On Wed, Mar 31, 2010 at 10:24:18PM -0400, Vivek Goyal wrote:
> > > On Wed, Mar 31, 2010 at 09:13:11PM -0400, Neil Horman wrote:
> > > > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > > > > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > > > > Flush iommu during shutdown
> > > > > > 
> > > > > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > > > > kernel crash, that dma operations might still be in flight from the previous
> > > > > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > > > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > > > > a kdump boot as endless iommu error log entries of the form:
> > > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > > > > address=0x000000000245a0c0 flags=0x0070]
> > > > > 
> > > > > We've already fixed this problem once before, so some code shift must
> > > > > have brought it back.  Personally, I prefer to do this on the bringup
> > > > > path than the teardown path.  Besides keeping the teardown path as
> > > > > simple as possible (goal is to get to kdump kernel asap), there's also
> > > > > reason to competely flush on startup in genernal in case BIOS has done
> > > > > anything unsavory.
> > > > > 
> > > > Chris,
> > > > 	Can you elaborate on what you did with the iommu to make this safe?  It
> > > > will save me time digging through the history on this code, and help me
> > > > understand better whats going on here.
> > > > 
> > > > I was starting to think that we should just leave the iommu on through a kdump,
> > > > and re-construct a new page table based on the old table (filtered by the error
> > > > log) on kdump boot, but it sounds like a better solution might be in place.
> > > > 
> > > 
> > > Hi Neil,
> > > 
> > > Is following sequence possible.
> > > 
> > > - In crashed kernel, take away the write permission from all the devices.
> > >   Mark bit 62 zero for all devices in device table.
> > > 
> > > - Leave the iommu on and let the device entries be valid in kdump kernel
> > >   so that any in-flight dma does not become pass through (which can cause
> > >   more damage and corrupt kdump kernel).
> > > 
> > > - During kdump kernel initialization, load a new device table where again
> > >   all the devices don't have write permission. looks like by default
> > >   we create a device table with all bits zero except DEV_ENTRY_VALID
> > >   and DEV_ENTRY_TRANSLATION bit.
> > > 
> > > - Reset the device where we want to setup any dma or operate on.
> > > 
> > > - Allow device to do DMA/write.
> > > 
> > > So by default all the devices will not be able to do write to memory 
> > > and selective devices are given access only after a reset.
> > > 
> > > I am not sure what are the dependencies for loading a new device table
> > > in second kernel. If it requires disabling the IOMMU, then we leave a
> > > window where in-flight dma will become passthrough and has the potential
> > > to corrupt kdump kernel.
> > > 
> > I think this is possible, but I'm a bit concerned with how some devices will
> > handle a reset.  For instance, what will happen to an HBA or a disk, if we reset
> > it as the module is loading?  Is that safe?
> 
> I think we need to reset devices in driver if "reset_devices" is set. So
> we will not reset these during normal boot.
> 
> Regarding being safe, I don't know. I am assuming that driver knows (or
> need to know), how to reset device safely while driver is initializing.
> That's the whole assumption kdump is built on, that once driver is
> initializing, it will first reset the device (if reset_devices is set), so
> that chances of device working properly in second kernel increase.
> 
Yes, I agree, I was more just asking is it safe to unilaterally reset devices
during boot?  I suppose it is, but I'm not entirely sure
Neil

> Vivek
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-03-31 21:25 ` Chris Wright
  2010-04-01  1:13   ` Neil Horman
@ 2010-04-01  2:44   ` Vivek Goyal
  2010-04-01  7:10     ` Chris Wright
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2010-04-01  2:44 UTC (permalink / raw)
  To: Chris Wright; +Cc: Neil Horman, iommu, joerg.roedel, hbabu, kexec, linux-kernel

On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> * Neil Horman (nhorman@tuxdriver.com) wrote:
> > Flush iommu during shutdown
> > 
> > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > kernel crash, that dma operations might still be in flight from the previous
> > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > a kdump boot as endless iommu error log entries of the form:
> > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > address=0x000000000245a0c0 flags=0x0070]
> 
> We've already fixed this problem once before, so some code shift must
> have brought it back.  Personally, I prefer to do this on the bringup
> path than the teardown path.  Besides keeping the teardown path as
> simple as possible (goal is to get to kdump kernel asap), there's also
> reason to competely flush on startup in genernal in case BIOS has done
> anything unsavory.

Can we flush domains (all the I/O TLBs assciated with each domain), during
initialization? I think all the domain data built by previous kernel will
be lost and new kernel will have no idea about. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01  2:44   ` Vivek Goyal
@ 2010-04-01  7:10     ` Chris Wright
  2010-04-01 12:56       ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Chris Wright @ 2010-04-01  7:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Chris Wright, Neil Horman, iommu, joerg.roedel, hbabu, kexec,
	linux-kernel

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > Flush iommu during shutdown
> > > 
> > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > kernel crash, that dma operations might still be in flight from the previous
> > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > a kdump boot as endless iommu error log entries of the form:
> > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > address=0x000000000245a0c0 flags=0x0070]
> > 
> > We've already fixed this problem once before, so some code shift must
> > have brought it back.  Personally, I prefer to do this on the bringup
> > path than the teardown path.  Besides keeping the teardown path as
> > simple as possible (goal is to get to kdump kernel asap), there's also
> > reason to competely flush on startup in genernal in case BIOS has done
> > anything unsavory.
> 
> Can we flush domains (all the I/O TLBs assciated with each domain), during
> initialization? I think all the domain data built by previous kernel will
> be lost and new kernel will have no idea about. 

We first invalidate the device table entry, so new translation requests
will see the new domainid for a given BDF.  Then we invalidate the
whole set of page tables associated w/ the new domainid.  Now all dma
transactions will need page table walk (page tables will be empty excpet
for any 1:1 mappings).  Any old domainid's from previous kernel that
aren't found in new device table entries are effectively moot.  Just so
happens that in kexec/kdump case, they'll be the same domainid's, but
that doesn't matter.

thanks,
-chris

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] amd iommu: force flush of iommu prior during shutdown
  2010-04-01  7:10     ` Chris Wright
@ 2010-04-01 12:56       ` Neil Horman
  0 siblings, 0 replies; 41+ messages in thread
From: Neil Horman @ 2010-04-01 12:56 UTC (permalink / raw)
  To: Chris Wright
  Cc: Vivek Goyal, Neil Horman, joerg.roedel, kexec, linux-kernel,
	hbabu, iommu

On Thu, Apr 01, 2010 at 12:10:40AM -0700, Chris Wright wrote:
> * Vivek Goyal (vgoyal@redhat.com) wrote:
> > On Wed, Mar 31, 2010 at 02:25:35PM -0700, Chris Wright wrote:
> > > * Neil Horman (nhorman@tuxdriver.com) wrote:
> > > > Flush iommu during shutdown
> > > > 
> > > > When using an iommu, its possible, if a kdump kernel boot follows a primary
> > > > kernel crash, that dma operations might still be in flight from the previous
> > > > kernel during the kdump kernel boot.  This can lead to memory corruption,
> > > > crashes, and other erroneous behavior, specifically I've seen it manifest during
> > > > a kdump boot as endless iommu error log entries of the form:
> > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x000d
> > > > address=0x000000000245a0c0 flags=0x0070]
> > > 
> > > We've already fixed this problem once before, so some code shift must
> > > have brought it back.  Personally, I prefer to do this on the bringup
> > > path than the teardown path.  Besides keeping the teardown path as
> > > simple as possible (goal is to get to kdump kernel asap), there's also
> > > reason to competely flush on startup in genernal in case BIOS has done
> > > anything unsavory.
> > 
> > Can we flush domains (all the I/O TLBs assciated with each domain), during
> > initialization? I think all the domain data built by previous kernel will
> > be lost and new kernel will have no idea about. 
> 
> We first invalidate the device table entry, so new translation requests
> will see the new domainid for a given BDF.  Then we invalidate the
> whole set of page tables associated w/ the new domainid.  Now all dma
> transactions will need page table walk (page tables will be empty excpet
> for any 1:1 mappings).  Any old domainid's from previous kernel that
> aren't found in new device table entries are effectively moot.  Just so
> happens that in kexec/kdump case, they'll be the same domainid's, but
> that doesn't matter.
> 
> thanks,
> -chris
> 
Additionally chris (this is just for my own education here), what happens when
we disable the iommu while dma's are in flight?  I ask because from what I read,
my assumption is that the iommu effectively enters a passive mode where bus
accesses from devices holding dma addresses that were previously provided by an
iommu translation will just get strobed onto the bus without being translated
back to physical addresses.  Won't that result in bus errors causing master
aborts?  If so, it would seem that it would be further cause to leave the iommu
on during a crash/kdump boot.

Neil

> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-04-05 15:35 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-31 15:24 [PATCH] amd iommu: force flush of iommu prior during shutdown Neil Horman
2010-03-31 15:54 ` Vivek Goyal
2010-03-31 18:28   ` Neil Horman
2010-03-31 18:57     ` Eric W. Biederman
2010-03-31 19:18       ` Neil Horman
2010-03-31 19:51         ` Eric W. Biederman
2010-03-31 20:27           ` Neil Horman
2010-04-01  4:04             ` Eric W. Biederman
2010-04-01 12:49               ` Neil Horman
2010-04-01 14:29             ` Joerg Roedel
2010-04-01 14:47               ` Neil Horman
2010-04-01 15:56                 ` Joerg Roedel
2010-04-01 17:11                   ` Neil Horman
2010-04-01 20:14                     ` Joerg Roedel
2010-04-02  0:00                       ` Neil Horman
2010-04-02  0:30                         ` Chris Wright
2010-04-02  1:23                           ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Chris Wright
2010-04-02  1:31                             ` [PATCH 2/2] x86/amd-iommu: warn when issuing command to uninitiailed cmd buffer Chris Wright
2010-04-02  1:35                             ` [PATCH 1/2] x86/amd-iommu: enable iommu before attaching devices Neil Horman
2010-04-02  1:38                               ` Chris Wright
2010-04-02  9:11                             ` Joerg Roedel
2010-04-02 23:59                               ` Chris Wright
2010-04-02 15:59                             ` Vivek Goyal
2010-04-02 22:38                               ` Chris Wright
2010-04-02 22:55                                 ` Eric W. Biederman
2010-04-02 23:57                                   ` Chris Wright
2010-04-03 17:38                               ` Joerg Roedel
2010-04-05 14:17                                 ` Vivek Goyal
2010-04-05 14:32                                   ` Joerg Roedel
2010-04-05 15:34                             ` Neil Horman
2010-03-31 18:43   ` [PATCH] amd iommu: force flush of iommu prior during shutdown Eric W. Biederman
2010-03-31 21:25 ` Chris Wright
2010-04-01  1:13   ` Neil Horman
2010-04-01  1:39     ` Chris Wright
2010-04-01  2:24     ` Vivek Goyal
2010-04-01 12:53       ` Neil Horman
2010-04-01 15:02         ` Vivek Goyal
2010-04-01 15:13           ` Neil Horman
2010-04-01  2:44   ` Vivek Goyal
2010-04-01  7:10     ` Chris Wright
2010-04-01 12:56       ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox