From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: dom0 / hypervisor hang on dom0 boot Date: Fri, 17 May 2013 18:28:16 -0400 Message-ID: <20130517222814.GA3255@localhost.localdomain> References: <3374329.VOz1gdFjBv@amur.mch.fsc.net> <1801825.vc7PjyMSRn@amur.mch.fsc.net> <5193749902000078000D6569@nat28.tlf.novell.com> <1630888.LbRauWP15S@amur.mch.fsc.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="liOOAslEiF7prFVr" Return-path: Content-Disposition: inline In-Reply-To: <1630888.LbRauWP15S@amur.mch.fsc.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dietmar Hahn Cc: Andrew Cooper , Konrad Rzeszutek Wilk , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, May 16, 2013 at 01:07:05PM +0200, Dietmar Hahn wrote: > Am Mittwoch 15 Mai 2013, 10:42:17 schrieb Jan Beulich: > > >>> On 15.05.13 at 11:12, Dietmar Hahn wrote: > > > Am Mittwoch 15 Mai 2013, 09:35:46 schrieb Jan Beulich: > > >> >>> On 15.05.13 at 08:53, Dietmar Hahn wrote: > > >> > I tried iommu=debug and I can't see any faulting messages but Iam not > > >> > familiar with this code. > > >> > I attached the logging, maybe anyone can have a look on this. > > > > Perhaps only (if at all) by instrumenting the hypervisor. The > > question of course is how easily/quickly you can narrow down the > > code region that it might be dying in. And whether it's a hypervisor > > action at all that causes the hang (as opposed to something the > > DRM code in Dom0 does). > > I added some debug code to the linux kernel and could track down the > point of the hang. I used openSuSE kernel 3.7.10-1.4 but I looked at newer > kernels and found that the code is similar. > > i915_gem_init_global_gtt(...) > ... > intel_gtt_clear_range(start / PAGE_SIZE, (end-start) / PAGE_SIZE); > ... > > void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries) > { > unsigned int i; > > ---> A printk(...) here is seen on serial line! > > for (i = first_entry; i < (first_entry + num_entries); i++) { > intel_private.driver->write_entry(intel_private.base.scratch_page_dma, > i, 0); > } > > ---> A printk(...) here is never seen! > > readl(intel_private.gtt+i-1); > } > > The function behind the pointer intel_private.driver->write_entry is > i965_write_entry(). And the interesting instruction seems to be: > writel(addr | pte_flags, intel_private.gtt + entry); > > I added another printk() on start of the function i965_write_entry(). > And surprisingly after printing a lot of messages the kernel came up!!! > But now I had other problems like losing the audio device (maybe timeouts). > So maybe the hang is a timing problem? > > What I wanted to check is, what the hypervisor is doing while the system hangs. > Has anybody an idea maybe a timer and after 30s printing a dump of the stack of > all cpus? Yes. Can you try the two attached patches please. > Thanks. > > Dietmar. > > > -- > Company details: http://ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0001-drm-i915-Don-t-leak-a-page-in-case-of-DMA-error-mapp.patch" >>From 4201962b743a44325ff848ba6387d3710343c123 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 17 May 2013 18:13:35 -0400 Subject: [PATCH 1/2] drm/i915: Don't leak a page in case of DMA error mapping. We don't free the allocated page if we fail to setup the DMA mapping. This fixes it. Signed-off-by: Konrad Rzeszutek Wilk --- drivers/char/agp/intel-gtt.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index dbd901e..701b328 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -294,9 +294,10 @@ static int intel_gtt_setup_scratch_page(void) if (intel_private.base.needs_dmar) { dma_addr = pci_map_page(intel_private.pcidev, page, 0, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); - if (pci_dma_mapping_error(intel_private.pcidev, dma_addr)) + if (pci_dma_mapping_error(intel_private.pcidev, dma_addr)) { + __intel_gtt_teardown_scratch_page(); return -EINVAL; - + } intel_private.base.scratch_page_dma = dma_addr; } else intel_private.base.scratch_page_dma = page_to_phys(page); @@ -542,15 +543,18 @@ static unsigned int intel_gtt_mappable_entries(void) return aperture_size >> PAGE_SHIFT; } - -static void intel_gtt_teardown_scratch_page(void) +static void __intel_gtt_teardown_scratch_page(void) { set_pages_wb(intel_private.scratch_page, 1); - pci_unmap_page(intel_private.pcidev, intel_private.base.scratch_page_dma, - PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); put_page(intel_private.scratch_page); __free_page(intel_private.scratch_page); } +static void intel_gtt_teardown_scratch_page(void) +{ + pci_unmap_page(intel_private.pcidev, intel_private.base.scratch_page_dma, + PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); + __intel_gtt_teardown_scratch_page(); +} static void intel_gtt_cleanup(void) { -- 1.8.1.2 --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0002-drm-i915-Sync-the-scratch-page-after-writting-values.patch" >>From 51908f611fb00195d98f1a552106c6d1709720c0 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 17 May 2013 18:20:46 -0400 Subject: [PATCH 2/2] drm/i915: Sync the scratch page after writting values to it. We don't sync the page after we have written to it - this is what you are suppose to when doing: pci_map_page .. write some values [ was missing a call to pci_dma_sync_single_for_device] .. read some values pci_unmap_page Signed-off-by: Konrad Rzeszutek Wilk --- drivers/char/agp/intel-gtt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c index 701b328..89dd698 100644 --- a/drivers/char/agp/intel-gtt.c +++ b/drivers/char/agp/intel-gtt.c @@ -902,6 +902,9 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries) intel_private.driver->write_entry(intel_private.base.scratch_page_dma, i, 0); } + pci_dma_sync_single_for_device(intel_private.pcidev, + intel_private.base.scratch_page_dma, + PAGE_SIZE, PCI_DMA_BIDIRECTIONAL); readl(intel_private.gtt+i-1); } EXPORT_SYMBOL(intel_gtt_clear_range); -- 1.8.1.2 --liOOAslEiF7prFVr Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --liOOAslEiF7prFVr--