iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/6] Crashdump Accepting Active IOMMU
@ 2014-01-10 22:07 Bill Sumner
  2014-01-10 22:07 ` [PATCHv3 5/6] Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU Bill Sumner
       [not found] ` <1389391652-52422-1-git-send-email-bill.sumner-VXdhtT5mjnY@public.gmane.org>
  0 siblings, 2 replies; 18+ messages in thread
From: Bill Sumner @ 2014-01-10 22:07 UTC (permalink / raw)
  To: dwmw2, indou.takao, bhe, joro
  Cc: iommu, kexec, alex.williamson, linux-pci, linux-kernel, ddutile,
	ishii.hironobu, bhelgaas, bill.sumner, doug.hatch, zhenhua

v2->v3:
1. Commented-out "#define DEBUG 1" to eliminate debug messages
2. Updated the comments about changes in each version in all patches in the set.
3. Fixed: one-line added to Copy-Translations" patch to initialize the iovad
          struct as recommended by Baoquan He [bhe@redhat.com]
          init_iova_domain(&domain->iovad, DMA_32BIT_PFN);

v1->v2:
The following series implements a fix for:
A kdump problem about DMA that has been discussed for a long time. That is,
when a kernel panics and boots into the kdump kernel, DMA started by the 
panicked kernel is not stopped before the kdump kernel is booted and the 
kdump kernel disables the IOMMU while this DMA continues.  This causes the
IOMMU to stop translating the DMA addresses as IOVAs and begin to treat them 
as physical memory addresses -- which causes the DMA to either:
(1) generate DMAR errors or (2) generate PCI SERR errors or (3) transfer  
data to or from incorrect areas of memory. Often this causes the dump to fail.

This patch set modifies the behavior of the iommu in the (new) crashdump kernel: 
1. to accept the iommu hardware in an active state, 
2. to leave the current translations in-place so that legacy DMA will continue
   using its current buffers until the device drivers in the crashdump kernel
   initialize and initialize their devices,
3. to use different portions of the iova address ranges for the device drivers
   in the crashdump kernel than the iova ranges that were in-use at the time
   of the panic.  

Advantages of this approach:
1. All manipulation of the IO-device is done by the Linux device-driver
   for that device.
2. This approach behaves in a manner very similar to operation without an
   active iommu.
3. Any activity between the IO-device and its RMRR areas is handled by the
   device-driver in the same manner as during a non-kdump boot.
4. If an IO-device has no driver in the kdump kernel, it is simply left alone.
   This supports the practice of creating a special kdump kernel without
   drivers for any devices that are not required for taking a crashdump. 

Changes since the RFC version of this patch:
1. Consolidated all of the operational code into the "copy..." functions.
   The "process..." functions were primarily used for diagnostics and
   exploration; however, there was a small amount of operational code that
   used the "process..." functions.
   This operational code has been moved into the "copy..." functions.

2. Removed the "Process ..." functions and the diagnostic code that ran
   on that function set.  This removed about 1/4 of the code -- which this
   operational patch set no longer needs.  These portions of the RFC patch
   could be formatted as a separate patch and submitted independently
   at a later date. 

3. Re-formatted the code to the Linux Coding Standards.
   The checkpatch script still finds some lines to complain about;
   however most of these lines are either (1) lines that I did not change,
   or (2) lines that only changed by adding a level of indent which pushed
   them over 80-characters, or (3) new lines whose intent is far clearer when
   longer than 80-characters.

4. Updated the remaining debug print to be significantly more flexible.
   This allows control over the amount of debug print to the console --
   which can vary widely.

5. Fixed a couple of minor bugs found by testing on a machine with a
   very large IO configuration.

At a high level, this code operates primarily during iommu initialization
and device-driver initialization

During intel-iommu hardware initialization:
In intel_iommu_init(void)
* If (This is the crash kernel)
  .  Set flag: crashdump_accepting_active_iommu (all changes below check this)
  .  Skip disabling the iommu hardware translations

In init_dmars()
* Duplicate the intel iommu translation tables from the old kernel
  in the new kernel
  . The root-entry table, all context-entry tables,
    and all page-translation-entry tables
  . The duplicate tables contain updated physical addresses to link them together.
  . The duplicate tables are mapped into kernel virtual addresses
    in the new kernel which allows most of the existing iommu code
    to operate without change.
  . Do some minimal sanity-checks during the copy
  . Place the address of the new root-entry structure into "struct intel_iommu"

* Skip setting-up new domains for 'si', 'rmrr', 'isa' 
  . Translations for 'rmrr' and 'isa' ranges have been copied from the old kernel
  . This patch has not yet been tested with iommu pass-through enabled

* Existing (unchanged) code near the end of dmar_init:
  . Loads the address of the (now new) root-entry structure from
    "struct intel_iommu" into the iommu hardware and does the hardware flushes.
    This changes the active translation tables from the ones in the old kernel
    to the copies in the new kernel.
  . This is legal because the translations in the two sets of tables are
    currently identical:
      Virtualization Technology for Directed I/O. Architecture Specification,
      February 2011, Rev. 1.3  (section 11.2, paragraph 2) 

In iommu_init_domains()
* Mark as in-use all domain-id's from the old kernel
  . In case the new kernel contains a device that was not in the old kernel
    and a new, unused domain-id is actually needed, the bitmap will give us one.

When a new domain is created for a device:
* If (this device has a context in the old kernel)
  . Get domain-id, address-width, and IOVA ranges from the old kernel context;
  . Get address(page-entry-tables) from the copy in the new kernel;
  . And apply all of the above values to the new domain structure.
* Else
  . Create a new domain as normal


Bill Sumner (6):
  Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype
  Crashdump-Accepting-Active-IOMMU-Utility-functions
  Crashdump-Accepting-Active-IOMMU-Domain-Interfaces
  Crashdump-Accepting-Active-IOMMU-Copy-Translations
  Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU
  Crashdump-Accepting-Active-IOMMU-Call-From-Mainline

 drivers/iommu/intel-iommu.c | 1293 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 1225 insertions(+), 68 deletions(-)

-- 
Bill Sumner <bill.sumner@hp.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCHv3 0/6] Crashdump Accepting Active IOMMU
@ 2014-03-12 16:15 Sumner, William
  0 siblings, 0 replies; 18+ messages in thread
From: Sumner, William @ 2014-03-12 16:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Li, Zhen-Hua, Hatch, Douglas B (HPS Linux PM),
	ishii.hironobu-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org


[-- Attachment #1.1: Type: text/plain, Size: 5229 bytes --]

On Mon, Mar 10, 2014 at 04:43PM, Vivek Goyal wrote:
>On Fri, Jan 10, 2014 at 03:07:26PM -0700, Bill Sumner wrote:
>
>[..]
>> This patch set modifies the behavior of the iommu in the (new) crashdump kernel:
>> 1. to accept the iommu hardware in an active state, 2. to leave the
>> current translations in-place so that legacy DMA will continue
>>    using its current buffers until the device drivers in the crashdump kernel>>    initialize and initialize their devices, 3. to use different
>> portions of the iova address ranges for the device drivers
>>    in the crashdump kernel than the iova ranges that were in-use at the time
>>    of the panic.
>
>Conceptually, above makes sense to me. I have few queries.
>
>- Do we need to pass any kind of data from first kernel to second kernel,
>  like table size etc? Calgary IOMMU was using first kernel's tables
>  also and it was determining previous kernel's table size using saved_max_pfn.

Yes. We need to pass the intel-iommu translation tables from the first kernel to the second kernel - well, to allow the second kernel to read them.  Only the tree of tables that the intel-iommu hardware references are needed: the root-entry table, the context-entry tables, and the page-translation tables.  This patch involves only the intel-iommu -- and none of the other iommu hardware types.

During the early initialization of the new kdump kernel this patch copies (duplicates) the tree of iommu translation tables from the old kernel into the new kernel.  These tables are all linked together as a tree by physical memory addresses. The physical address of the top of the tree -- the root-entry table -- is obtained from a register in the iommu hardware. The copy operation traverses the tree in depth-first order, sanity-checking each table and then duplicating it into the kernel address space of the new kernel, linking the new tables appropriately.  If the copy succeeds then the iommu register is updated to point to the copy in the new kernel and the iommu continues translating DMA requests from IO devices.  If a sanity-check fails, the patch currently errors-out of the dump (might want to revisit this error case and see if there is something better to do.)

By copying the tables into the new kernel, all of the existing code in intel-iommu.c continues to work nearly unchanged, with only a few added initializations of fields when kdump is active and when intel-iommu.c creates its "bookkeeping" structures for the device -- to make sure they correspond to the contents of the translate tables. I made a determined effort to avoid changing the existing execution path for the non-kdump case, and to minimize the changes even when kdump is active.

Note also that this leaves the copy of the translate tables in the old panicked kernel unused and unchanged during the operation of 'makedumpfile', so they appear correctly in the dump as they were at the time of the panic.

>
>- I don't know how IOMMU translation tables look like, but are new DMA
>  zones setup by drivers in second kernel part of same table?

Yes. The copied translation tables in the kdump kernel contain both the translations that were active at the time of the dump (which will never go away until the dump is finished and the platform is rebooted), plus any translations needed by the kdump kernel (which come and go as necessary).  As the "bookkeeping" structures are created in the new kernel (typically the first time that a device requests a translation for a buffer) they are initialized such that all IOVAs that already exist in the translate tables for tht device are marked as "not available for allocation" so all requests by device drivers in the new kernel receive IOVAs that were not in-use at the time of the panic.

> How do we
>  make sure that sufficient space is available.

So far I have not seen any problem with running out of space because of this copy;  however, I believe that this may be a valid concern with very large IO configurations -- and it deserves some attention as the community reviews and tests this patch.

> I am not sure if possible
>  table corruption from crashed kernel is an issue here or not.

Any corrupted translation tables from the crashed kernel would be a problem that would prevent using copies of them. I hope and expect that this happens rarely -- and that we would catch it during the copy when it does.

Fortunately, these tables are only manipulated by the code in intel-iommu.c - which limits the amount of code that has to be "right".  Of course, there is always the possibility that other code in the kernel could "hose" one of these tables -- so we need to sanity-check the tables before the new kernel uses them.

>
>In general, I think changelogs of these patches need to be little better.
>There seem to be lot of text and still I can't quickly wrap my head around what a particular patch is supposed to be doing.

I will work on the changelogs as I re-arrange the code and the structure of the patch-set for the next submitted version.

>
>But we definitely need to fix this issue. IOMMU issues with kdump have been troubling us for a very long time.
Strongly agree.

-- Bill

[-- Attachment #1.2: Type: text/html, Size: 9293 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-04-25 18:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-10 22:07 [PATCHv3 0/6] Crashdump Accepting Active IOMMU Bill Sumner
2014-01-10 22:07 ` [PATCHv3 5/6] Crashdump-Accepting-Active-IOMMU-Debug-Print-IOMMU Bill Sumner
     [not found] ` <1389391652-52422-1-git-send-email-bill.sumner-VXdhtT5mjnY@public.gmane.org>
2014-01-10 22:07   ` [PATCHv3 1/6] Crashdump-Accepting-Active-IOMMU-Flags-and-Prototype Bill Sumner
2014-01-10 22:07   ` [PATCHv3 2/6] Crashdump-Accepting-Active-IOMMU-Utility-functions Bill Sumner
2014-01-10 22:07   ` [PATCHv3 3/6] Crashdump-Accepting-Active-IOMMU-Domain-Interfaces Bill Sumner
     [not found]     ` <1389391652-52422-4-git-send-email-bill.sumner-VXdhtT5mjnY@public.gmane.org>
2014-03-04 14:59       ` Joerg Roedel
2014-03-10 19:23         ` Sumner, William
2014-01-10 22:07   ` [PATCHv3 4/6] Crashdump-Accepting-Active-IOMMU-Copy-Translations Bill Sumner
2014-01-10 22:07   ` [PATCHv3 6/6] Crashdump-Accepting-Active-IOMMU-Call-From-Mainline Bill Sumner
2014-01-25  2:44   ` [PATCHv3 0/6] Crashdump Accepting Active IOMMU Baoquan He
2014-03-04 15:06   ` Joerg Roedel
     [not found]     ` <20140304150611.GE2799-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-03-10 19:10       ` Sumner, William
2014-03-10 21:43   ` Vivek Goyal
2014-04-07 20:43   ` Don Dutile
2014-04-08 16:14     ` David Woodhouse
     [not found]       ` <1396973686.25235.36.camel-W2I5cNIroUsVm/YvaOjsyQ@public.gmane.org>
2014-04-08 18:42         ` Don Dutile
     [not found]     ` <53430DE9.9080303-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-04-25 18:11       ` Sumner, William
  -- strict thread matches above, loose matches on Subject: below --
2014-03-12 16:15 Sumner, William

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).