From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Hounschell Subject: Re: Kernel Oops: iommu related? Date: Thu, 12 Feb 2015 14:59:03 -0500 Message-ID: <54DD0607.7070308@compro.net> References: <54DCE8A6.4000608@compro.net> <20150212180846.GD29106@8bytes.org> <54DCF024.30309@compro.net> <54DD00FE.2050009@compro.net> Reply-To: markh-n2QNKt385d+sTnJN9+BGXg@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <54DD00FE.2050009-n2QNKt385d+sTnJN9+BGXg@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Joerg Roedel Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org List-Id: iommu@lists.linux-foundation.org On 02/12/2015 02:37 PM, Mark Hounschell wrote: > On 02/12/2015 01:25 PM, Mark Hounschell wrote: >> On 02/12/2015 01:08 PM, Joerg Roedel wrote: >>> On Thu, Feb 12, 2015 at 12:53:42PM -0500, Mark Hounschell wrote: >>>> This happens immediately after unloading one of our out of kernel GPL >>>> drivers. >>>> The driver has done NOTHING other than load at bootup. I'm running a >>>> 3.18.7 >>>> kernel (x86_64) on an AMD platform. I can't see anything obviously >>>> wrong in our >>>> driver. It works fine when the iommu is disabled. This particular >>>> machine has 7 of >>>> our cards in it. Four in one expansion rack and 3 in another. The 2 >>>> PCI expansion >>>> racks use pci-e interface cards installed in the MB. >>>> >>>> Feb 12 10:47:15 harley kernel: AMD-Vi: Event logged [IO_PAGE_FAULT >>>> device=0f:00.0 domain=0x0000 address=0x00000000000ae640 flags=0x0070] >>>> Feb 12 10:47:15 harley kernel: AMD-Vi: Event logged [IO_PAGE_FAULT >>>> device=0f:00.0 domain=0x0000 address=0x00000000000ae660 flags=0x0070] >>>> Feb 12 10:47:15 harley kernel: AMD-Vi: Event logged [IO_PAGE_FAULT >>>> device=0f:00.0 domain=0x0000 address=0x00000000000ae670 flags=0x0070] >>>> Feb 12 10:47:27 harley kernel: ------------[ cut here ]------------ >>>> Feb 12 10:47:27 harley kernel: WARNING: CPU: 3 PID: 0 at >>>> drivers/iommu/amd_iommu.c:2637 dma_ops_domain_unmap.part.13+0x65/0x70() >>> >>> This warning indicates that some driver is unmapping a dma range that >>> was not mapped previously (meaning that a pte in the io-page-tables is >>> zeroed out). >>> The reason for this (and the IO_PAGE_FAULTs) you see are almost >>> certainly because some driver does not use the DMA-API correctly. >>> >>> >> >> I wonder what driver that could be. It certainly isn't the one that I >> just unloaded >> as it for sure has not done anything dma realated. I'm pretty sure I >> uninstalled all >> our other drivers but will go back and verify. >> > > I've cleaned the machine of all our drivers and also the nvidia driver. > If the problem is as you say, it is an in-kernel driver. I've attached a > dmesg taken after it started. I used the dgap driver from the staging > directory to unload and trigger this as I have one. That driver does NO > dma. I know because I'm one of the maintainers and have done lots of > work on it. > There was a dmesg attached to my previous email BTW Mark