xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad@kernel.org>
To: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xen.org
Subject: Re: dom0 / hypervisor hang  on dom0 boot
Date: Fri, 17 May 2013 18:28:16 -0400	[thread overview]
Message-ID: <20130517222814.GA3255@localhost.localdomain> (raw)
In-Reply-To: <1630888.LbRauWP15S@amur.mch.fsc.net>

[-- Attachment #1: Type: text/plain, Size: 2581 bytes --]

On Thu, May 16, 2013 at 01:07:05PM +0200, Dietmar Hahn wrote:
> Am Mittwoch 15 Mai 2013, 10:42:17 schrieb Jan Beulich:
> > >>> On 15.05.13 at 11:12, Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> wrote:
> > > Am Mittwoch 15 Mai 2013, 09:35:46 schrieb Jan Beulich:
> > >> >>> On 15.05.13 at 08:53, Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> wrote:
> > >> > I tried iommu=debug and I can't see any faulting messages but Iam not
> > >> > familiar with this code.
> > >> > I attached the logging, maybe anyone can have a look on this.
> > 
> > Perhaps only (if at all) by instrumenting the hypervisor. The
> > question of course is how easily/quickly you can narrow down the
> > code region that it might be dying in. And whether it's a hypervisor
> > action at all that causes the hang (as opposed to something the
> > DRM code in Dom0 does).
> 
> I added some debug code to the linux kernel and could track down the
> point of the hang. I used openSuSE kernel 3.7.10-1.4 but I looked at newer
> kernels and found that the code is similar.
> 
> i915_gem_init_global_gtt(...)
>  ...
>  intel_gtt_clear_range(start / PAGE_SIZE, (end-start) / PAGE_SIZE);
>  ...
> 
> void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
> {
>         unsigned int i;
> 
>     ---> A printk(...) here is seen on serial line!
> 
>         for (i = first_entry; i < (first_entry + num_entries); i++) {
>                 intel_private.driver->write_entry(intel_private.base.scratch_page_dma,
>                                                   i, 0);
>         }
> 
>     ---> A printk(...) here is never seen!
> 
>         readl(intel_private.gtt+i-1);
> }
> 
> The function behind the pointer intel_private.driver->write_entry is
> i965_write_entry(). And the interesting instruction seems to be:
>   writel(addr | pte_flags, intel_private.gtt + entry);
> 
> I added another printk() on start of the function i965_write_entry().
> And surprisingly  after printing a lot of messages the kernel came up!!!
> But now I had other problems like losing the audio device (maybe timeouts).
> So maybe the hang is a timing problem?
> 
> What I wanted to check is, what the hypervisor is doing while the system hangs.
> Has anybody an idea maybe a timer and after 30s printing a dump of the stack of
> all cpus?

Yes. Can you try the two attached patches please.

> Thanks.
> 
> Dietmar.
> 
> 
> -- 
> Company details: http://ts.fujitsu.com/imprint.html
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

[-- Attachment #2: 0001-drm-i915-Don-t-leak-a-page-in-case-of-DMA-error-mapp.patch --]
[-- Type: text/plain, Size: 1947 bytes --]

>From 4201962b743a44325ff848ba6387d3710343c123 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 17 May 2013 18:13:35 -0400
Subject: [PATCH 1/2] drm/i915: Don't leak a page in case of DMA error mapping.

We don't free the allocated page if we fail to setup the DMA
mapping. This fixes it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/char/agp/intel-gtt.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index dbd901e..701b328 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -294,9 +294,10 @@ static int intel_gtt_setup_scratch_page(void)
 	if (intel_private.base.needs_dmar) {
 		dma_addr = pci_map_page(intel_private.pcidev, page, 0,
 				    PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-		if (pci_dma_mapping_error(intel_private.pcidev, dma_addr))
+		if (pci_dma_mapping_error(intel_private.pcidev, dma_addr)) {
+			__intel_gtt_teardown_scratch_page();
 			return -EINVAL;
-
+		}
 		intel_private.base.scratch_page_dma = dma_addr;
 	} else
 		intel_private.base.scratch_page_dma = page_to_phys(page);
@@ -542,15 +543,18 @@ static unsigned int intel_gtt_mappable_entries(void)
 
 	return aperture_size >> PAGE_SHIFT;
 }
-
-static void intel_gtt_teardown_scratch_page(void)
+static void __intel_gtt_teardown_scratch_page(void)
 {
 	set_pages_wb(intel_private.scratch_page, 1);
-	pci_unmap_page(intel_private.pcidev, intel_private.base.scratch_page_dma,
-		       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	put_page(intel_private.scratch_page);
 	__free_page(intel_private.scratch_page);
 }
+static void intel_gtt_teardown_scratch_page(void)
+{
+	pci_unmap_page(intel_private.pcidev, intel_private.base.scratch_page_dma,
+		       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
+	__intel_gtt_teardown_scratch_page();
+}
 
 static void intel_gtt_cleanup(void)
 {
-- 
1.8.1.2


[-- Attachment #3: 0002-drm-i915-Sync-the-scratch-page-after-writting-values.patch --]
[-- Type: text/plain, Size: 1221 bytes --]

>From 51908f611fb00195d98f1a552106c6d1709720c0 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 17 May 2013 18:20:46 -0400
Subject: [PATCH 2/2] drm/i915: Sync the scratch page after writting values to
 it.

We don't sync the page after we have written to it - this is what
you are suppose to when doing:

  pci_map_page
	.. write some values
  [ was missing a call to pci_dma_sync_single_for_device]
	.. read some values
  pci_unmap_page

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/char/agp/intel-gtt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 701b328..89dd698 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -902,6 +902,9 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
 		intel_private.driver->write_entry(intel_private.base.scratch_page_dma,
 						  i, 0);
 	}
+	pci_dma_sync_single_for_device(intel_private.pcidev,
+				       intel_private.base.scratch_page_dma,
+				       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	readl(intel_private.gtt+i-1);
 }
 EXPORT_SYMBOL(intel_gtt_clear_range);
-- 
1.8.1.2


[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2013-05-17 22:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-14 12:35 dom0 / hypervisor hang on dom0 boot Dietmar Hahn
2013-05-14 12:42 ` Andrew Cooper
2013-05-14 12:50   ` Dietmar Hahn
2013-05-14 12:51     ` Andrew Cooper
2013-05-14 13:25       ` Dietmar Hahn
2013-05-14 13:27 ` Jan Beulich
2013-05-15  6:53   ` Dietmar Hahn
2013-05-15  8:35     ` Jan Beulich
2013-05-15  9:12       ` Dietmar Hahn
2013-05-15  9:42         ` Jan Beulich
2013-05-16 11:07           ` Dietmar Hahn
2013-05-16 12:10             ` Jan Beulich
2013-05-16 13:16               ` Dietmar Hahn
2013-05-16 13:45                 ` Jan Beulich
2013-05-17  7:10                   ` Dietmar Hahn
2013-05-16 14:50               ` Dugger, Donald D
2013-05-20 14:30               ` Dugger, Donald D
2013-05-21  8:03                 ` Jan Beulich
2013-05-21  8:28                   ` Tian, Kevin
2013-05-21  8:47                     ` Jan Beulich
2013-05-17 22:28             ` Konrad Rzeszutek Wilk [this message]
2013-05-21  7:39               ` Dietmar Hahn
2013-05-21 14:10                 ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130517222814.GA3255@localhost.localdomain \
    --to=konrad@kernel.org \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dietmar.hahn@ts.fujitsu.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).