xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: konrad wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Xen 4.3 + tmem =  Xen BUG at domain_page.c:143
Date: Tue, 11 Jun 2013 17:06:13 -0400	[thread overview]
Message-ID: <51B79145.2010401@oracle.com> (raw)
In-Reply-To: <51B7720B.10607@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 4830 bytes --]


On 6/11/2013 2:52 PM, konrad wilk wrote:
>
>> The BUG_ON() here is definitely valid - a few lines down, after the
>> enclosing if(), we use it in ways that requires this to not have
>> triggered. It basically tells you whether an in range idx was found,
>> which apparently isn't the case here.
>>
>> As I think George already pointed out - printing accum here would
>> be quite useful: It should have at least one of the low 32 bits set,
>> given that dcache->entries must be at most 32 according to the
>> data you already got logged.
>
> With extra debugging (see attached patch)
>
> (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, 
> ~ffffffff40000e00, idx: 9 garbage: 40000e00, inuse: ffffffff
> (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, 
> ~ffffffff02400000, idx: 22 garbage: 2400000, inuse: ffffffff
> (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, 
> ~ffffffff00000800, idx: 11 garbage: 800, inuse: ffffffff
> (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, 
> ~ffffffff00000000, idx: 32 garbage: 0, inuse: ffffffff
> (XEN) domain_page.c:170:d1 mfn (183642) -> 2 idx: 32(i:1,j:0), 
> branch:9 0xffffffff00000000
> (XEN) domain_page.c:176:d1 [0] idx=13, mfn=0x203b00, refcnt: 0
> (XEN) domain_page.c:176:d1 [1] idx=25, mfn=0x1839e1, refcnt: 0
> (XEN) domain_page.c:176:d1 [2] idx=3, mfn=0x1824d2, refcnt: 0
> (XEN) domain_page.c:176:d1 [3] idx=5, mfn=0x1eb48b, refcnt: 0
> (XEN) domain_page.c:176:d1 [4] idx=28, mfn=0x203b04, refcnt: 0
> (XEN) domain_page.c:176:d1 [5] idx=0, mfn=0x1eb485, refcnt: 0
> (XEN) domain_page.c:176:d1 [6] idx=30, mfn=0x203afe, refcnt: 0
> (XEN) domain_page.c:176:d1 [7] idx=20, mfn=0x203aff, refcnt: 0
>
> And that does point the picture that we have exhausted the full 32 
> entries of mapcache.
>
> Now off to find out who is holding them and why. Aren't these 
> operations (map/unmap domain_page) suppose to be shortlived?

And found the culprit. With some EIP logging:

(XEN) domain_page.c:214:d1 [0] mfn=0x1ff67a idx=0, mfn=0x1ff67a, refcnt: 
0 [EIP=0]
(XEN) domain_page.c:216:d1 [1] mfn=18fef2, [EIP=0]
(XEN) domain_page.c:216:d1 [2] mfn=1eb518, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [3] mfn=170a08, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [4] mfn=18feef, [EIP=0]
(XEN) domain_page.c:216:d1 [5] mfn=1eb4c8, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [6] mfn=202699, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [7] mfn=18fef0, [EIP=0]
(XEN) domain_page.c:216:d1 [8] mfn=0, [EIP=0]
(XEN) domain_page.c:214:d1 [9] mfn=0x18e7ed idx=9, mfn=0x18e7ed, refcnt: 
0 [EIP=0]
(XEN) domain_page.c:214:d1 [10] mfn=0x18f629 idx=10, mfn=0x18f629, 
refcnt: 0 [EIP=0]
(XEN) domain_page.c:216:d1 [11] mfn=1eb47e, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:214:d1 [12] mfn=0x18209c idx=12, mfn=0x18209c, 
refcnt: 0 [EIP=0]
(XEN) domain_page.c:216:d1 [13] mfn=18fef5, [EIP=0]
(XEN) domain_page.c:214:d1 [14] mfn=0x18f62b idx=14, mfn=0x18f62b, 
refcnt: 0 [EIP=0]
(XEN) domain_page.c:216:d1 [15] mfn=1eb459, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [16] mfn=1eb512, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [17] mfn=170d2b, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [18] mfn=20272b, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [19] mfn=16c22c, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [20] mfn=18fef4, [EIP=0]
(XEN) domain_page.c:216:d1 [21] mfn=18e7e9, [EIP=0]
(XEN) domain_page.c:216:d1 [22] mfn=18feee, [EIP=0]
(XEN) domain_page.c:216:d1 [23] mfn=1eb4a3, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]
(XEN) domain_page.c:216:d1 [24] mfn=18fef3, [EIP=0]
(XEN) domain_page.c:214:d1 [25] mfn=0x18f62f idx=25, mfn=0x18f62f, 
refcnt: 0 [EIP=0]
(XEN) domain_page.c:216:d1 [26] mfn=18ff02, [__get_page_type+0x1001/0x146a]
(XEN) domain_page.c:214:d1 [27] mfn=0x18fefe idx=27, mfn=0x18fefe, 
refcnt: 0 [EIP=0]
(XEN) domain_page.c:216:d1 [28] mfn=18ff00, [__get_page_type+0xcc3/0x146a]
(XEN) domain_page.c:216:d1 [29] mfn=0, [EIP=0]
(XEN) domain_page.c:214:d1 [30] mfn=0x18f628 idx=30, mfn=0x18f628, 
refcnt: 0 [EIP=0]
(XEN) domain_page.c:216:d1 [31] mfn=1eb4ed, 
[tmh_persistent_pool_page_get+0x26d/0x2d8]

And a brief look at the code it looks as any calls to the xmalloc_pool 
code ends up
calling map_domain_page. Since most of the tmem code is using the pool 
to store
guest pages (looking briefly at tmem_malloc), this would explain why we 
ran out of
32 slots. Especially as we don't free them until the guest puts the 
persistent pages back.

The fix.. well, not yet here but I think it would be mostly 
concentrating around
tmem code.

Thanks for suggestion on looking at the accum value.

[-- Attachment #2: xen-domain_page-v4.patch --]
[-- Type: text/x-patch, Size: 6936 bytes --]

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index efda6af..d297037 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -14,6 +14,7 @@
 #include <asm/current.h>
 #include <asm/flushtlb.h>
 #include <asm/hardirq.h>
+#include <xen/symbols.h>
 
 static struct vcpu *__read_mostly override;
 
@@ -59,12 +60,12 @@ void __init mapcache_override_current(struct vcpu *v)
 void *map_domain_page(unsigned long mfn)
 {
     unsigned long flags;
-    unsigned int idx, i;
+    unsigned int idx, i, j = 0;
     struct vcpu *v;
     struct mapcache_domain *dcache;
     struct mapcache_vcpu *vcache;
     struct vcpu_maphash_entry *hashent;
-
+    int branch = 0;
 #ifdef NDEBUG
     if ( mfn <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
         return mfn_to_virt(mfn);
@@ -115,31 +116,102 @@ void *map_domain_page(unsigned long mfn)
         /* /First/, clean the garbage map and update the inuse list. */
         for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ )
         {
+            unsigned long garbage = dcache->garbage[i];
+            unsigned long _inuse = dcache->inuse[i];
+            barrier();
             dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0);
+            if (v->domain->domain_id) {
+                if (~dcache->inuse[i]) {
+                    gdprintk(XENLOG_INFO, "mfn: %lx, [%d]: %lx->%lx, garbage: %lx -> ~%lx, idx: %d,  ~garbage: %lx \n", mfn, i, _inuse, dcache->inuse[i], garbage, ~dcache->inuse[i],
+                             find_first_zero_bit(dcache->inuse, dcache->entries),~garbage);
+                    branch |= 8;
+                }
+            }
             accum |= ~dcache->inuse[i];
         }
 
-        if ( accum )
+        if ( accum ) {
             idx = find_first_zero_bit(dcache->inuse, dcache->entries);
+            branch |= 1;
+        }
         else
         {
+            branch |= 2;
             /* Replace a hash entry instead. */
             i = MAPHASH_HASHFN(mfn);
             do {
                 hashent = &vcache->hash[i];
                 if ( hashent->idx != MAPHASHENT_NOTINUSE && !hashent->refcnt )
                 {
+                    branch |= 4;
                     idx = hashent->idx;
                     ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(idx)) == hashent->mfn);
                     l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty());
                     hashent->idx = MAPHASHENT_NOTINUSE;
                     hashent->mfn = ~0UL;
+                    if (idx >= dcache->entries) {
+                        branch |= 8;
+                        gdprintk(XENLOG_INFO, "mfn (%lx) -> %ld idx (iter:%d)\n", mfn,  MAPHASH_HASHFN(mfn), j);
+
+                        for (i = 0; i < MAPHASH_ENTRIES;i++) {
+                            hashent = &vcache->hash[i];
+
+                            gdprintk(XENLOG_INFO, "[%d] idx=%d, mfn=0x%lx, refcnt: %d\n",
+                                    i, hashent->idx, hashent->mfn, hashent->refcnt);
+                        }
+                    }
                     break;
                 }
                 if ( ++i == MAPHASH_ENTRIES )
                     i = 0;
+                j++;
             } while ( i != MAPHASH_HASHFN(mfn) );
         }
+        if (idx >= dcache->entries) {
+            unsigned long _mfn;
+            const char *name;
+            unsigned long offset, size;
+
+            static char namebuf[KSYM_NAME_LEN+1];
+            #define BUFFER_SIZE sizeof("%s+%#lx/%#lx [%s]") + KSYM_NAME_LEN + \
+			2*(BITS_PER_LONG*3/10) + 1
+            static char buffer[BUFFER_SIZE];
+
+
+
+           gdprintk(XENLOG_INFO, "mfn (%lx) -> %ld idx: %d(i:%d,j:%d), branch:%x 0x%lx\n", mfn,  MAPHASH_HASHFN(mfn), idx,  i, j, branch, accum);
+
+           for (i = 0; i < dcache->entries; i++) {
+             int in_mapcache = 0;
+            _mfn = l1e_get_pfn(MAPCACHE_L1ENT(i));
+             hashent = NULL;
+
+             for (j = 0; j < MAPHASH_ENTRIES;j++) {
+                    hashent = &vcache->hash[j];
+
+                    if (hashent->mfn == _mfn) {
+                            in_mapcache = 1;
+                            break;
+                    }
+             }
+            if (i <= 32 && dcache->ips[i]) {
+                    name = symbols_lookup(dcache->ips[i], &size, &offset, namebuf);
+
+                    if (!name)
+                        snprintf(buffer, BUFFER_SIZE, "0x%lx", dcache->ips[i]);
+                    else
+                        snprintf(buffer, BUFFER_SIZE, "%s+%#lx/%#lx", name, offset, size);
+            } else
+                  snprintf(buffer, BUFFER_SIZE, "EIP=0");
+
+             if (in_mapcache)
+                    gdprintk(XENLOG_INFO, "[%d] mfn=0x%lx idx=%d, mfn=0x%lx, refcnt: %d [%s]\n",
+                                    i, _mfn, hashent->idx, hashent->mfn, hashent->refcnt, buffer);
+             else
+                    gdprintk(XENLOG_INFO, "[%d] mfn=%lx, [%s]\n", i, _mfn, buffer);
+           }
+
+        }
         BUG_ON(idx >= dcache->entries);
 
         /* /Second/, flush TLBs. */
@@ -152,6 +224,9 @@ void *map_domain_page(unsigned long mfn)
     set_bit(idx, dcache->inuse);
     dcache->cursor = idx + 1;
 
+    if (v->domain->domain_id && idx <= 32)
+        dcache->ips[idx] = (unsigned long)__builtin_return_address(0);
+
     spin_unlock(&dcache->lock);
 
     l1e_write(&MAPCACHE_L1ENT(idx), l1e_from_pfn(mfn, __PAGE_HYPERVISOR));
@@ -215,7 +290,8 @@ void unmap_domain_page(const void *ptr)
         /* /Second/, mark as garbage. */
         set_bit(idx, dcache->garbage);
     }
-
+    if (v->domain->domain_id && idx <= 32)
+        dcache->ips[idx] = 0;
     local_irq_restore(flags);
 }
 
@@ -254,6 +330,7 @@ int mapcache_domain_init(struct domain *d)
                  2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) >
                  MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20));
     bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long));
+    gdprintk(XENLOG_INFO, "domain bitmap pages: %d\n", bitmap_pages);
     dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE;
     dcache->garbage = dcache->inuse +
                       (bitmap_pages + 1) * PAGE_SIZE / sizeof(long);
@@ -276,6 +353,7 @@ int mapcache_vcpu_init(struct vcpu *v)
     if ( is_hvm_vcpu(v) || !dcache->inuse )
         return 0;
 
+    gdprintk(XENLOG_INFO, "ents: %d, entries: %d\n", ents, dcache->entries);
     if ( ents > dcache->entries )
     {
         /* Populate page tables. */
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d79464d..5389ea9 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -67,6 +67,7 @@ struct mapcache_domain {
     /* Which mappings are in use, and which are garbage to reap next epoch? */
     unsigned long *inuse;
     unsigned long *garbage;
+    unsigned long ips[33];
 };
 
 int mapcache_domain_init(struct domain *);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-06-11 21:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 13:45 Xen 4.3 + tmem = Xen BUG at domain_page.c:143 konrad wilk
2013-06-11 14:46 ` Jan Beulich
2013-06-11 15:30   ` konrad wilk
2013-06-11 15:56     ` George Dunlap
2013-06-11 16:38     ` Jan Beulich
2013-06-11 17:30       ` konrad wilk
2013-06-11 18:52       ` konrad wilk
2013-06-11 21:06         ` konrad wilk [this message]
2013-06-12  6:38           ` Jan Beulich
2013-06-12 11:00         ` George Dunlap
2013-06-12 11:15           ` Processed: " xen
2013-06-12 11:37           ` George Dunlap
2013-06-12 12:46             ` Jan Beulich
2013-06-12 14:13             ` Konrad Rzeszutek Wilk
2013-06-12 12:12           ` Jan Beulich
2013-06-12 13:16             ` George Dunlap
2013-06-12 13:27               ` Jan Beulich
2013-06-12 15:11             ` Keir Fraser
2013-06-12 15:27               ` Keir Fraser
2013-06-12 15:54                 ` Jan Beulich
2013-06-12 15:48               ` Jan Beulich
2013-06-12 17:26                 ` Keir Fraser
2013-07-05 16:56                   ` George Dunlap
2013-07-08  8:58                     ` Jan Beulich
2013-07-08  9:07                       ` George Dunlap
2013-07-08  9:15                         ` Processed: " xen
2013-07-08  9:25                         ` George Dunlap
2013-07-08  9:30                           ` Processed: " xen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B79145.2010401@oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).