xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Daniel De Graaf <dgdegra@tycho.nsa.gov>
To: David Vrabel <david.vrabel@citrix.com>
Cc: xen-devel@lists.xen.org, Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Vincent Bernardoff <vb@luminar.eu.org>
Subject: Re: Crashing kernel with dom0/libxc gnttab/gntshr
Date: Tue, 30 Jul 2013 17:03:31 -0400	[thread overview]
Message-ID: <51F82A23.2030209@tycho.nsa.gov> (raw)
In-Reply-To: <51F7F0BE.1040503@citrix.com>

On 07/30/2013 12:58 PM, David Vrabel wrote:
[...]
>
> [  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
> pmd:b9b5c067
> [  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
>     (null) index:0xffffffffffffffff
>
> I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
>   This has looked up the page using the PTE it is trying to clear.  Has
> it found the correct page?  Since the MFN is currently mapped into the
> same domain, has the m2p_override stuff confused the look up and it is
> checking the grantee page not the granter?
>
> David

I think something like this is happening, since while reproducing this
on my test system, some linked list corruption was found that I believe
to be the cause of this problem. The gnttab_map_refs function on PV uses
m2p_add_override on the page, which threads page->lru to an
m2p_overrides list. However, something else is using page->lru during
the use of gntdev, as shown by the following debug patch:

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 3c8803f..198e57e 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
  	if (err)
  		return err;
  
+	printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
+		&map->pages[0]->lru,
+		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
+
  	for (i = 0; i < map->count; i++) {
  		if (map->map_ops[i].status)
  			err = -EINVAL;
@@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
  		}
  	}
  
+	printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
+		&map->pages[0]->lru,
+		map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+		map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
  	err = gnttab_unmap_refs(map->unmap_ops + offset,
  			use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
			pages);

Output:
[   88.610644] map page0 lru: ffffea0001dee160 prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
[   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167 pmd:2541a067
[   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:          (null) index:0xffffffffffffffff
[   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
[   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:          (null) mapping:ffff8800692974a0 index:0
[   88.611547] vma->vm_ops->fault:           (null)
[   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
[...backtrace cropped...]
[   88.614301] unmap page0 lru: ffffea0001dee160 prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938

The initial map is a linked list with only that element, so the address
0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
found by zap_pte_range is not a valid struct page.

The struct page* being used by the gntalloc device was 0xffffea0000952740,
for reference; it's not a direct collision between the page used by the
gntdev and gntalloc devices.

Not sure what the best fix is for this at the moment.

-- 
Daniel De Graaf
National Security Agency

  reply	other threads:[~2013-07-30 21:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-30 10:50 Crashing kernel with dom0/libxc gnttab/gntshr Vincent Bernardoff
2013-07-30 10:59 ` Ian Campbell
2013-07-30 13:41   ` Vincent Bernardoff
2013-07-30 15:50     ` Vincent Bernardoff
2013-07-30 15:55       ` Ian Campbell
2013-07-30 16:58       ` David Vrabel
2013-07-30 21:03         ` Daniel De Graaf [this message]
2013-08-02 13:50           ` Stefano Stabellini
2013-08-02 14:10             ` Ian Campbell
2013-08-02 16:49             ` Jeremy Fitzhardinge
2013-08-02 17:02               ` Stefano Stabellini
2013-08-03 10:06                 ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51F82A23.2030209@tycho.nsa.gov \
    --to=dgdegra@tycho.nsa.gov \
    --cc=Ian.Campbell@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=vb@luminar.eu.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).