xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: konrad wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Xen 4.3 + tmem =  Xen BUG at domain_page.c:143
Date: Tue, 11 Jun 2013 13:30:22 -0400	[thread overview]
Message-ID: <51B75EAE.7010802@oracle.com> (raw)
In-Reply-To: <51B76EA602000078000DD36D@nat28.tlf.novell.com>

[-- Attachment #1: Type: text/plain, Size: 9316 bytes --]


On 6/11/2013 12:38 PM, Jan Beulich wrote:
>>>> On 11.06.13 at 17:30, konrad wilk <konrad.wilk@oracle.com> wrote:
>> I think this is a more subtle bug.
>> I applied a debug patch (see attached) and with the help of it and the logs:
>>
>> (XEN) domain_page.c:160:d1 mfn (1ebe96) -> 6 idx: 32(i:1,j:0), branch:1
>> (XEN) domain_page.c:166:d1 [0] idx=26, mfn=0x1ebcd8, refcnt: 0
>> (XEN) domain_page.c:166:d1 [1] idx=12, mfn=0x1ebcd9, refcnt: 0
>> (XEN) domain_page.c:166:d1 [2] idx=2, mfn=0x210e9a, refcnt: 0
>> (XEN) domain_page.c:166:d1 [3] idx=14, mfn=0x210e9b, refcnt: 0
>> (XEN) domain_page.c:166:d1 [4] idx=7, mfn=0x210e9c, refcnt: 0
>> (XEN) domain_page.c:166:d1 [5] idx=10, mfn=0x210e9d, refcnt: 0
>> (XEN) domain_page.c:166:d1 [6] idx=5, mfn=0x210e9e, refcnt: 0
>> (XEN) domain_page.c:166:d1 [7] idx=13, mfn=0x1ebe97, refcnt: 0
>> (XEN) Xen BUG at domain_page.c:169
>> (XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
>> (XEN) CPU:    3
>> (XEN) RIP:    e008:[<ffff82c4c01606a7>] map_domain_page+0x61d/0x6e1
>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000   rbx: ffff8300c68f9000   rcx: 0000000000000000
>> (XEN) rdx: ffff8302125b2020   rsi: 000000000000000a   rdi: ffff82c4c027a6e8
>> (XEN) rbp: ffff8302125afcc8   rsp: ffff8302125afc48   r8: 0000000000000004
>> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
>> (XEN) r12: ffff83022e2ef000   r13: 00000000001ebe96   r14: 0000000000000020
>> (XEN) r15: ffff8300c68f9080   cr0: 0000000080050033   cr4: 00000000000426f0
>> (XEN) cr3: 0000000209541000   cr2: ffffffffff600400
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen stack trace from rsp=ffff8302125afc48:
>> (XEN)    00000000001ebe97 0000000000000000 0000000000000000 ffff830200000001
>> (XEN)    ffff8302125afcc8 ffff82c400000000 00000000001ebe97 000000080000000d
>> (XEN)    ffff83022e2ef2d8 0000000000000286 ffff82c4c0127b6b ffff83022e2ef000
>> (XEN)    ffff82e003d7d2c0 ffff8302125afd60 00000000001ebe96 0000000000000000
>> (XEN)    ffff8302125afd38 ffff82c4c01373de 0000000000000000 ffffffffffffffff
>> (XEN)    0000000000000001 ffff8302125afd58 ffff83022e2ef2d8 0000000000000286
>> (XEN)    0000000000000027 0000000000000000 0000000000001000 0000000000000000
>> (XEN)    0000000000000000 00000000001ebe96 ffff8302125afd98 ffff82c4c01377c4
>> (XEN)    0000000000000000 ffff820040017000 ffff82e003d7d2c0 00000000001ebe96
>> (XEN)    ffff8302125afd98 ffff830210ecf390 00000000fffffff4 ffff820040009010
>> (XEN)    ffff820040000f50 ffff83022e2f0c90 ffff8302125afe18 ffff82c4c0135929
>> (XEN)    000000160000001e ffff820040000f50 0000000000000000 00000000001ebe96
>> (XEN)    0000000000000000 0000000000000000 0000a2f6125afe28 ffff8302125afe00
>> (XEN)    0000001675f02b51 ffff83022e2f0c90 ffff830210ecf390 0000000000000000
>> (XEN)    0000000000000001 0000000000000065 ffff8302125afef8 ffff82c4c0136510
>> (XEN)    ffff830200001000 0000000000000000 ffff8302125afe90 255ece02125b2040
>> (XEN)    00000003125afe68 00000016742667d1 ffff8302125b2100 0000003d52299000
>> (XEN)    ffff8300c68f9000 0000000001c9c380 ffff8302125b2100 ffff8302125b1808
>> (XEN)    0000000000000004 0000000000000004 0000000000000000 0000000000000000
>> (XEN)    000000000000a2f6 0000000000000000 00000000001ebe96 ffff82c4c0126e77
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c4c01606a7>] map_domain_page+0x61d/0x6e1
>> (XEN)    [<ffff82c4c01373de>] cli_get_page+0x15e/0x17b
>> (XEN)    [<ffff82c4c01377c4>] tmh_copy_from_client+0x150/0x284
>> (XEN)    [<ffff82c4c0135929>] do_tmem_put+0x323/0x5c4
>> (XEN)    [<ffff82c4c0136510>] do_tmem_op+0x5a0/0xbd0
>> (XEN)    [<ffff82c4c022391b>] syscall_enter+0xeb/0x145
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) Xen BUG at domain_page.c:169
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Manual reset required ('noreboot' specified)
>>
>> It looks as if the path that is taken is:
>>
>> 110     idx = find_next_zero_bit(dcache->inuse, dcache->entries,
>> dcache->cursor);
>> 111     if ( unlikely(idx >= dcache->entries) )
>> 112     {
>>
>> 115         /* /First/, clean the garbage map and update the inuse list. */
>> 116         for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ )
>> 117         {
>> 118             dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0);
>> 119             accum |= ~dcache->inuse[i];
>>
>> Here computes the accum
>> 120         }
>> 121
>> 122         if ( accum )
>> 123             idx = find_first_zero_bit(dcache->inuse, dcache->entries)
>>
>> Ok, finds the idx (32),
>> 124         else
>> 125         {
>> .. does not go here.
>> 142         }
>> 143         BUG_ON(idx >= dcache->entries);
>>
>> And hits the BUG_ON().
>>
>> But I am not sure if that is appropriate. Perhaps the BUG_ON was meant
>> as a check
>> for the loop (lines 128 ->  141) - in case it looped around and never
>> found an empty place.
>> But if that is the condition then that would also look suspect as it
>> might have found an
>> empty hash entry and the idx would still end up being 32.
> The BUG_ON() here is definitely valid - a few lines down, after the
> enclosing if(), we use it in ways that requires this to not have
> triggered. It basically tells you whether an in range idx was found,
> which apparently isn't the case here.
>
> As I think George already pointed out - printing accum here would
> be quite useful: It should have at least one of the low 32 bits set,
> given that dcache->entries must be at most 32 according to the
> data you already got logged.

Of course, here is the new log (and the debug attachment)

(XEN) domain_page.c:122:d1 [0]: ffffffff, idx: 32
(XEN) domain_page.c:167:d1 mfn (1eba98) -> 0 idx: 32(i:1,j:0), branch:9 
0xffffffff00000000
(XEN) domain_page.c:173:d1 [0] idx=0, mfn=0x182790, refcnt: 0
(XEN) domain_page.c:173:d1 [1] idx=29, mfn=0x1946f9, refcnt: 0
(XEN) domain_page.c:173:d1 [2] idx=15, mfn=0x1946fa, refcnt: 0
(XEN) domain_page.c:173:d1 [3] idx=11, mfn=0x1946fb, refcnt: 0
(XEN) domain_page.c:173:d1 [4] idx=17, mfn=0x1946fc, refcnt: 0
(XEN) domain_page.c:173:d1 [5] idx=21, mfn=0x1946fd, refcnt: 0
(XEN) domain_page.c:173:d1 [6] idx=10, mfn=0x180296, refcnt: 0
(XEN) domain_page.c:173:d1 [7] idx=4, mfn=0x180297, refcnt: 0
(XEN) Xen BUG at domain_page.c:176
(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c4c0160742>] map_domain_page+0x6b8/0x77c
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff8300c68f9000   rcx: 0000000000000000
(XEN) rdx: ffff83020e84c020   rsi: 000000000000000a   rdi: ffff82c4c027a6e8
(XEN) rbp: ffff83020e847cc8   rsp: ffff83020e847c28   r8: 0000000000000004
(XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000001
(XEN) r12: ffff83022d815000   r13: 00000000001eba98   r14: 0000000000000020
(XEN) r15: ffff8300c68f9080   cr0: 0000000080050033   cr4: 00000000000426f0
(XEN) cr3: 000000019c644000   cr2: ffff88000ef124b0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83020e847c28:
(XEN)    0000000000180297 0000000000000000 ffff830200000000 0000000000000009
(XEN)    ffffffff00000000 ffff82c4c0116542 0000000000000297 ffffffff00000000
(XEN)    ffff830200000020 00000000c01714f6 0000000000180297 0000000800000004
(XEN)    ffff83022d8152d8 0000000000000286 ffff82c4c012760a ffff83022d815000
(XEN)    ffff82e003d75300 ffff83020e847d60 00000000001eba98 0000000000000000
(XEN)    ffff83020e847d38 ffff82c4c01373de 0000000000000000 ffffffffffffffff
(XEN)    0000000000000001 ffff83020e847d58 ffff83022d8152d8 0000000000000286
(XEN)    0000000000000027 0000000000000000 0000000000001000 0000000000000000
(XEN)    0000000000000000 00000000001eba98 ffff83020e847d98 ffff82c4c01377c4
(XEN)    0000000000000000 ffff82004001a000 ffff82e003d75300 00000000001eba98
(XEN)    ffff83020e847d98 ffff83020354f390 00000000fffffff4 ffff820040002010
(XEN)    ffff820040001580 ffff83022d816c90 ffff83020e847e18 ffff82c4c0135929
(XEN)    ffff83020e847db8 ffff820040001580 0000000000000000 00000000001eba98
(XEN)    0000000000000000 0000000000000000 000001f200000000 ffff83020e847e00
(XEN)    ffff83020e847e18 ffff83022d816c90 ffff83020354f390 0000000000000000
(XEN)    0000000000000001 0000000000000091 ffff83020e847ef8 ffff82c4c0136510
(XEN)    ffff830200001000 0000000000000000 ffff83020e847e90 bbbc0ca3c027bba0
(XEN)    ffff82c4c027bba0 ffff82c4c02e0000 0000000000000002 ffff83020e847e78
(XEN)    ffff82c4c0127b09 ffff82c4c027bba0 ffff83020e847e98 ffff82c4c01299af
(XEN)    0000000000000004 0000000000000005 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4c0160742>] map_domain_page+0x6b8/0x77c
(XEN)    [<ffff82c4c01373de>] cli_get_page+0x15e/0x17b
(XEN)    [<ffff82c4c01377c4>] tmh_copy_from_client+0x150/0x284
(XEN)    [<ffff82c4c0135929>] do_tmem_put+0x323/0x5c4
(XEN)    [<ffff82c4c0136510>] do_tmem_op+0x5a0/0xbd0
(XEN)    [<ffff82c4c02239bb>] syscall_enter+0xeb/0x145
(XEN)
\a(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Xen BUG at domain_page.c:176
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)


[-- Attachment #2: xen-domain_page-v2.patch --]
[-- Type: text/x-patch, Size: 4427 bytes --]

diff --git a/xen/Rules.mk b/xen/Rules.mk
index 3f0b262..bc6b437 100644
--- a/xen/Rules.mk
+++ b/xen/Rules.mk
@@ -3,6 +3,7 @@
 # If you change any of these configuration options then you must
 # 'make clean' before rebuilding.
 #
+debug := y
 verbose       ?= n
 perfc         ?= n
 perfc_arrays  ?= n
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index efda6af..9ad6193 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -59,12 +59,12 @@ void __init mapcache_override_current(struct vcpu *v)
 void *map_domain_page(unsigned long mfn)
 {
     unsigned long flags;
-    unsigned int idx, i;
+    unsigned int idx, i, j = 0;
     struct vcpu *v;
     struct mapcache_domain *dcache;
     struct mapcache_vcpu *vcache;
     struct vcpu_maphash_entry *hashent;
-
+    int branch = 0;
 #ifdef NDEBUG
     if ( mfn <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
         return mfn_to_virt(mfn);
@@ -116,30 +116,63 @@ void *map_domain_page(unsigned long mfn)
         for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ )
         {
             dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0);
+            if (v->domain->domain_id) {
+                if (~dcache->inuse[i]) {
+                    gdprintk(XENLOG_INFO, "[%d]: %lx, idx: %d\n", i, dcache->inuse[i],
+                             find_first_zero_bit(dcache->inuse, dcache->entries));
+                    branch |= 8;
+                }
+            }
             accum |= ~dcache->inuse[i];
         }
 
-        if ( accum )
+        if ( accum ) {
             idx = find_first_zero_bit(dcache->inuse, dcache->entries);
+            branch |= 1;
+        }
         else
         {
+            branch |= 2;
             /* Replace a hash entry instead. */
             i = MAPHASH_HASHFN(mfn);
             do {
                 hashent = &vcache->hash[i];
                 if ( hashent->idx != MAPHASHENT_NOTINUSE && !hashent->refcnt )
                 {
+                    branch |= 4;
                     idx = hashent->idx;
                     ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(idx)) == hashent->mfn);
                     l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty());
                     hashent->idx = MAPHASHENT_NOTINUSE;
                     hashent->mfn = ~0UL;
+                    if (idx >= dcache->entries) {
+                        branch |= 8;
+                        gdprintk(XENLOG_INFO, "mfn (%lx) -> %ld idx (iter:%d)\n", mfn,  MAPHASH_HASHFN(mfn), j);
+
+                        for (i = 0; i < MAPHASH_ENTRIES;i++) {
+                            hashent = &vcache->hash[i];
+
+                            gdprintk(XENLOG_INFO, "[%d] idx=%d, mfn=0x%lx, refcnt: %d\n",
+                                    i, hashent->idx, hashent->mfn, hashent->refcnt);
+                        }
+                    }
                     break;
                 }
                 if ( ++i == MAPHASH_ENTRIES )
                     i = 0;
+                j++;
             } while ( i != MAPHASH_HASHFN(mfn) );
         }
+        if (idx >= dcache->entries) {
+           gdprintk(XENLOG_INFO, "mfn (%lx) -> %ld idx: %d(i:%d,j:%d), branch:%x 0x%lx\n", mfn,  MAPHASH_HASHFN(mfn), idx,  i, j, branch, accum);
+
+           for (i = 0; i < MAPHASH_ENTRIES;i++) {
+                    hashent = &vcache->hash[i];
+
+                    gdprintk(XENLOG_INFO, "[%d] idx=%d, mfn=0x%lx, refcnt: %d\n",
+                                    i, hashent->idx, hashent->mfn, hashent->refcnt);
+           }
+        }
         BUG_ON(idx >= dcache->entries);
 
         /* /Second/, flush TLBs. */
@@ -254,6 +287,7 @@ int mapcache_domain_init(struct domain *d)
                  2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) >
                  MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20));
     bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long));
+    gdprintk(XENLOG_INFO, "domain bitmap pages: %d\n", bitmap_pages);
     dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE;
     dcache->garbage = dcache->inuse +
                       (bitmap_pages + 1) * PAGE_SIZE / sizeof(long);
@@ -276,6 +310,7 @@ int mapcache_vcpu_init(struct vcpu *v)
     if ( is_hvm_vcpu(v) || !dcache->inuse )
         return 0;
 
+    gdprintk(XENLOG_INFO, "ents: %d, entries: %d\n", ents, dcache->entries);
     if ( ents > dcache->entries )
     {
         /* Populate page tables. */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-06-11 17:30 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 13:45 Xen 4.3 + tmem = Xen BUG at domain_page.c:143 konrad wilk
2013-06-11 14:46 ` Jan Beulich
2013-06-11 15:30   ` konrad wilk
2013-06-11 15:56     ` George Dunlap
2013-06-11 16:38     ` Jan Beulich
2013-06-11 17:30       ` konrad wilk [this message]
2013-06-11 18:52       ` konrad wilk
2013-06-11 21:06         ` konrad wilk
2013-06-12  6:38           ` Jan Beulich
2013-06-12 11:00         ` George Dunlap
2013-06-12 11:15           ` Processed: " xen
2013-06-12 11:37           ` George Dunlap
2013-06-12 12:46             ` Jan Beulich
2013-06-12 14:13             ` Konrad Rzeszutek Wilk
2013-06-12 12:12           ` Jan Beulich
2013-06-12 13:16             ` George Dunlap
2013-06-12 13:27               ` Jan Beulich
2013-06-12 15:11             ` Keir Fraser
2013-06-12 15:27               ` Keir Fraser
2013-06-12 15:54                 ` Jan Beulich
2013-06-12 15:48               ` Jan Beulich
2013-06-12 17:26                 ` Keir Fraser
2013-07-05 16:56                   ` George Dunlap
2013-07-08  8:58                     ` Jan Beulich
2013-07-08  9:07                       ` George Dunlap
2013-07-08  9:15                         ` Processed: " xen
2013-07-08  9:25                         ` George Dunlap
2013-07-08  9:30                           ` Processed: " xen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B75EAE.7010802@oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).