From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Taillefumier Subject: Re: [Intel-gfx] Failure with swiotlb Date: Wed, 06 Jan 2010 16:25:51 +0100 Message-ID: <4B44AB7F.6090304@free.fr> References: <20091230030200.GA2249@zhen-devel.sh.intel.com> <1262168787.3181.3219.camel@macbook.infradead.org> <20091231043306.GB4801@zhen-devel.sh.intel.com> <20100104092745.GB9054@zhen-devel.sh.intel.com> <87fx6la82r.fsf@pollan.anholt.net> <20100105152003.GA14108@localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20100105152003.GA14108@localdomain> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.sourceforge.net To: Zhenyu Wang Cc: intel-gfx , dri-devel , David Woodhouse List-Id: dri-devel@lists.freedesktop.org On 01/05/2010 04:20 PM, Zhenyu Wang wrote: > On 2010.01.04 13:11:56 -0800, Eric Anholt wrote: >> On Mon, 4 Jan 2010 17:27:45 +0800, Zhenyu Wang wrote: >>> On 2009.12.31 12:33:06 +0800, Zhenyu Wang wrote: >>>> On 2009.12.30 10:26:27 +0000, David Woodhouse wrote: >>>>> On Wed, 2009-12-30 at 11:02 +0800, Zhenyu Wang wrote: >>>>>> We have .31->.32 regression as reported in >>>>>> http://bugs.freedesktop.org/show_bug.cgi?id=25690 >>>>>> http://bugzilla.kernel.org/show_bug.cgi?id=14627 >>>>>> >>>>>> It's triggered on non VT-d machine (or machine that should have VT-d, >>>>>> but no way to turn it on in BIOS.) and with large memory, and swiotlb >>>>>> is used for PCI dma ops. swiotlb uses a bounce buffer to copy between >>>>>> CPU pages and real pages made for DMA, but we can't make it real coherent >>>>>> as we don't call pci_dma_sync_single_for_cpu() alike APIs. And in GEM >>>>>> domain change, we also can't flush pages for bounce buffer. It looks like >>>>>> our usual non-cache-coherent graphics device can't love swiotlb. >>>>>> >>>>>> This patch trys to only handle pci dma mapping in case of real iommu >>>>>> hardware detected, the only case for that is VT-d. And fallback to origin >>>>>> method to insert physical page directly in other case. This fixes the >>>>>> GPU hang on our Q965 with 8G memory in 64-bit OS. Comments? >>>>> >>>>> I don't understand. Why is swiotlb doing anything here anyway, when the >>>>> device has a dma_mask of 36 bits? >>>>> >>>>> Shouldn't dma_capable() return 1, causing swiotlb_map_page() to return >>>>> the original address unmangled? >>>> >>>> Good point, I didn't look into swiotlb code, coz my debug showed it returned >>>> mangled dma address. So looks the real problem is 36 bit dma mask got corrupted >>>> somehow, which matches first report in fd.o bug 25690. >>>> >>>> Looks we should setup dma mask in drm/i915 driver too, as they both operate on >>>> graphics device. But I can't test that on our 8G mem machine until after new year. >>>> >>> >>> Finally caught it! It's within drm_pci_alloc() which will try to setup dma mask >>> for pci_dev again! That is used for physical address based hardware status page >>> for 965G (i915_init_phys_hws()), as alloc with pci coherent interface. But trying >>> to set mask again in an alloc function looks wrong to me, and driver should setup >>> their own consistent dma mask according to hw. >>> >>> So following patch trys to remove mask setting in drm_pci_alloc(), which fixed >>> the origin problem as dma mask now has the right 36bit setting on intel hw. I >>> can't test if ati bits looks correct, Dave? >>> >>> As intel hws page does support 36bit physical address, that will be another patch >>> for setup pci consistent 36bit mask for it. Any comment? >> >> Looks like this patch doesn't set the dma mask that used to get set for >> the drivers that were relying on it. Once all the drivers are fixed to >> set it up at load time, this seems like a good interface fix. > > In my patch all removed ones were 32bit mask, which is pci dma default mask. > So if driver doesn't set dma mask before, it should also be fine with this > change. This failure also seems to be responsible for the bug 25510 since applying the patch to the last git kernel fix it. I will add a comment on the bug 25510 file. I was not able to apply the patch on the v2.6.32.x series because of multiple declarations though. Mathieu ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev --