All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suravee Suthikulanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>
To: "Jörg Rödel" <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>,
	"iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org"
	<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Cc: "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: RFC: IOMMU/AMD: Error Handling
Date: Mon, 29 Apr 2013 14:45:30 -0500	[thread overview]
Message-ID: <517ECDDA.3000606@amd.com> (raw)

Joerg,

We are in the process of implementing AMD IOMMU error handling, and I 
would like some comments from you and the community.

Currently, the AMD IOMMU driver only reports events from the event log 
in the dmesg, and does not try to handle them in case of errors.  AMD 
IOMMU errors can be categorized as device-specific errors and IOMMU errors.

1. For IOMMU errors such as:
     - DEV_TAB_HADWARE_ERROR
     - PAGE_TAB_ERROR
     - COMMAND_HARDWARE_ERROR
If the error is detected during IOMMU initialization, we could disable 
IOMMU and proceed.  If the error occurs after IOMMU is initialized, we 
won't be able to recover from this, and might need to result in panic.

2. For device-specific errors such as:
     - ILLEGAL_DEV_TABLE_ENTRY
     - IO_PAGE_FAULT
     - INVALDE_DEVICE_REQUEST
We think the AMD IOMMU driver should try to isolate the device. This 
involves blocking device transactions at IOMMU DTE and tries to disable 
the device (e.g. calling the remove(struct pci_dev *pdev) interface 
generally provides by device drivers).  This could prevents the device 
from continuing to fail and to risk of system instability.

3. In case of posted memory write transaction, device driver might not 
be aware that the transaction has failed and blocked at IOMMU. If there 
is no HW IOMMU, I believe this is handled by PCI error handling code.  
If the IOMMU hardware reporth such case, could this potentially leverage 
the Linux IOMMU fault handling interface, iommu_set_fault_handler() and 
report_iommu_fault(),  to communicate to device driver or PCI driver?

Any feedback or comments are appreciated.

Thank you,
Suravee

WARNING: multiple messages have this Message-ID (diff)
From: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
To: "Jörg Rödel" <joro@8bytes.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RFC: IOMMU/AMD: Error Handling
Date: Mon, 29 Apr 2013 14:45:30 -0500	[thread overview]
Message-ID: <517ECDDA.3000606@amd.com> (raw)

Joerg,

We are in the process of implementing AMD IOMMU error handling, and I 
would like some comments from you and the community.

Currently, the AMD IOMMU driver only reports events from the event log 
in the dmesg, and does not try to handle them in case of errors.  AMD 
IOMMU errors can be categorized as device-specific errors and IOMMU errors.

1. For IOMMU errors such as:
     - DEV_TAB_HADWARE_ERROR
     - PAGE_TAB_ERROR
     - COMMAND_HARDWARE_ERROR
If the error is detected during IOMMU initialization, we could disable 
IOMMU and proceed.  If the error occurs after IOMMU is initialized, we 
won't be able to recover from this, and might need to result in panic.

2. For device-specific errors such as:
     - ILLEGAL_DEV_TABLE_ENTRY
     - IO_PAGE_FAULT
     - INVALDE_DEVICE_REQUEST
We think the AMD IOMMU driver should try to isolate the device. This 
involves blocking device transactions at IOMMU DTE and tries to disable 
the device (e.g. calling the remove(struct pci_dev *pdev) interface 
generally provides by device drivers).  This could prevents the device 
from continuing to fail and to risk of system instability.

3. In case of posted memory write transaction, device driver might not 
be aware that the transaction has failed and blocked at IOMMU. If there 
is no HW IOMMU, I believe this is handled by PCI error handling code.  
If the IOMMU hardware reporth such case, could this potentially leverage 
the Linux IOMMU fault handling interface, iommu_set_fault_handler() and 
report_iommu_fault(),  to communicate to device driver or PCI driver?

Any feedback or comments are appreciated.

Thank you,
Suravee





             reply	other threads:[~2013-04-29 19:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-29 19:45 Suravee Suthikulanit [this message]
2013-04-29 19:45 ` RFC: IOMMU/AMD: Error Handling Suravee Suthikulanit
     [not found] ` <517ECDDA.3000606-5C7GfCeVMHo@public.gmane.org>
2013-04-29 20:10   ` Don Dutile
2013-04-29 20:10     ` Don Dutile
2013-04-29 20:34     ` Duran, Leo
     [not found]       ` <BA42942F2D0DED45AFB0A6216D1E951D44CBE1F9-Vo+W8YXarrgxlywnonMhLEEOCMrvLtNR@public.gmane.org>
2013-04-29 21:42         ` Don Dutile
2013-04-29 21:42           ` Don Dutile
2013-04-29 22:31           ` Duran, Leo
     [not found]           ` <517EE940.8010005-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-04-30 14:56             ` Suravee Suthikulanit
2013-04-30 14:56               ` Suravee Suthikulanit
     [not found]               ` <517FDB96.7060602-5C7GfCeVMHo@public.gmane.org>
2013-04-30 15:09                 ` Don Dutile
2013-04-30 15:09                   ` Don Dutile
2013-04-30 15:21                 ` Joerg Roedel
2013-04-30 15:21                   ` Joerg Roedel
     [not found]     ` <517ED3A9.2050508-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-04-30 14:49       ` Suravee Suthikulanit
2013-04-30 14:49         ` Suravee Suthikulanit
     [not found]         ` <517FD9E8.8070802-5C7GfCeVMHo@public.gmane.org>
2013-04-30 15:06           ` Don Dutile
2013-04-30 15:06             ` Don Dutile
     [not found]             ` <517FDDF6.8090707-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-04-30 16:02               ` Alex Williamson
2013-04-30 16:02                 ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517ECDDA.3000606@amd.com \
    --to=suravee.suthikulpanit-5c7gfcevmho@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.