From mboxrd@z Thu Jan 1 00:00:00 1970 From: konrad wilk Subject: Re: Xen 4.3 + tmem = Xen BUG at domain_page.c:143 Date: Tue, 11 Jun 2013 17:06:13 -0400 Message-ID: <51B79145.2010401@oracle.com> References: <51B72A09.8080709@oracle.com> <51B7547702000078000DD225@nat28.tlf.novell.com> <51B742B0.3070500@oracle.com> <51B76EA602000078000DD36D@nat28.tlf.novell.com> <51B7720B.10607@oracle.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------080500020508090302040509" Return-path: In-Reply-To: <51B7720B.10607@oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------080500020508090302040509 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 6/11/2013 2:52 PM, konrad wilk wrote: > >> The BUG_ON() here is definitely valid - a few lines down, after the >> enclosing if(), we use it in ways that requires this to not have >> triggered. It basically tells you whether an in range idx was found, >> which apparently isn't the case here. >> >> As I think George already pointed out - printing accum here would >> be quite useful: It should have at least one of the low 32 bits set, >> given that dcache->entries must be at most 32 according to the >> data you already got logged. > > With extra debugging (see attached patch) > > (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, > ~ffffffff40000e00, idx: 9 garbage: 40000e00, inuse: ffffffff > (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, > ~ffffffff02400000, idx: 22 garbage: 2400000, inuse: ffffffff > (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, > ~ffffffff00000800, idx: 11 garbage: 800, inuse: ffffffff > (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, > ~ffffffff00000000, idx: 32 garbage: 0, inuse: ffffffff > (XEN) domain_page.c:170:d1 mfn (183642) -> 2 idx: 32(i:1,j:0), > branch:9 0xffffffff00000000 > (XEN) domain_page.c:176:d1 [0] idx=13, mfn=0x203b00, refcnt: 0 > (XEN) domain_page.c:176:d1 [1] idx=25, mfn=0x1839e1, refcnt: 0 > (XEN) domain_page.c:176:d1 [2] idx=3, mfn=0x1824d2, refcnt: 0 > (XEN) domain_page.c:176:d1 [3] idx=5, mfn=0x1eb48b, refcnt: 0 > (XEN) domain_page.c:176:d1 [4] idx=28, mfn=0x203b04, refcnt: 0 > (XEN) domain_page.c:176:d1 [5] idx=0, mfn=0x1eb485, refcnt: 0 > (XEN) domain_page.c:176:d1 [6] idx=30, mfn=0x203afe, refcnt: 0 > (XEN) domain_page.c:176:d1 [7] idx=20, mfn=0x203aff, refcnt: 0 > > And that does point the picture that we have exhausted the full 32 > entries of mapcache. > > Now off to find out who is holding them and why. Aren't these > operations (map/unmap domain_page) suppose to be shortlived? And found the culprit. With some EIP logging: (XEN) domain_page.c:214:d1 [0] mfn=0x1ff67a idx=0, mfn=0x1ff67a, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [1] mfn=18fef2, [EIP=0] (XEN) domain_page.c:216:d1 [2] mfn=1eb518, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [3] mfn=170a08, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [4] mfn=18feef, [EIP=0] (XEN) domain_page.c:216:d1 [5] mfn=1eb4c8, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [6] mfn=202699, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [7] mfn=18fef0, [EIP=0] (XEN) domain_page.c:216:d1 [8] mfn=0, [EIP=0] (XEN) domain_page.c:214:d1 [9] mfn=0x18e7ed idx=9, mfn=0x18e7ed, refcnt: 0 [EIP=0] (XEN) domain_page.c:214:d1 [10] mfn=0x18f629 idx=10, mfn=0x18f629, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [11] mfn=1eb47e, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:214:d1 [12] mfn=0x18209c idx=12, mfn=0x18209c, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [13] mfn=18fef5, [EIP=0] (XEN) domain_page.c:214:d1 [14] mfn=0x18f62b idx=14, mfn=0x18f62b, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [15] mfn=1eb459, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [16] mfn=1eb512, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [17] mfn=170d2b, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [18] mfn=20272b, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [19] mfn=16c22c, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [20] mfn=18fef4, [EIP=0] (XEN) domain_page.c:216:d1 [21] mfn=18e7e9, [EIP=0] (XEN) domain_page.c:216:d1 [22] mfn=18feee, [EIP=0] (XEN) domain_page.c:216:d1 [23] mfn=1eb4a3, [tmh_persistent_pool_page_get+0x26d/0x2d8] (XEN) domain_page.c:216:d1 [24] mfn=18fef3, [EIP=0] (XEN) domain_page.c:214:d1 [25] mfn=0x18f62f idx=25, mfn=0x18f62f, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [26] mfn=18ff02, [__get_page_type+0x1001/0x146a] (XEN) domain_page.c:214:d1 [27] mfn=0x18fefe idx=27, mfn=0x18fefe, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [28] mfn=18ff00, [__get_page_type+0xcc3/0x146a] (XEN) domain_page.c:216:d1 [29] mfn=0, [EIP=0] (XEN) domain_page.c:214:d1 [30] mfn=0x18f628 idx=30, mfn=0x18f628, refcnt: 0 [EIP=0] (XEN) domain_page.c:216:d1 [31] mfn=1eb4ed, [tmh_persistent_pool_page_get+0x26d/0x2d8] And a brief look at the code it looks as any calls to the xmalloc_pool code ends up calling map_domain_page. Since most of the tmem code is using the pool to store guest pages (looking briefly at tmem_malloc), this would explain why we ran out of 32 slots. Especially as we don't free them until the guest puts the persistent pages back. The fix.. well, not yet here but I think it would be mostly concentrating around tmem code. Thanks for suggestion on looking at the accum value. --------------080500020508090302040509 Content-Type: text/x-patch; name="xen-domain_page-v4.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="xen-domain_page-v4.patch" diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index efda6af..d297037 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -14,6 +14,7 @@ #include #include #include +#include static struct vcpu *__read_mostly override; @@ -59,12 +60,12 @@ void __init mapcache_override_current(struct vcpu *v) void *map_domain_page(unsigned long mfn) { unsigned long flags; - unsigned int idx, i; + unsigned int idx, i, j = 0; struct vcpu *v; struct mapcache_domain *dcache; struct mapcache_vcpu *vcache; struct vcpu_maphash_entry *hashent; - + int branch = 0; #ifdef NDEBUG if ( mfn <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) ) return mfn_to_virt(mfn); @@ -115,31 +116,102 @@ void *map_domain_page(unsigned long mfn) /* /First/, clean the garbage map and update the inuse list. */ for ( i = 0; i < BITS_TO_LONGS(dcache->entries); i++ ) { + unsigned long garbage = dcache->garbage[i]; + unsigned long _inuse = dcache->inuse[i]; + barrier(); dcache->inuse[i] &= ~xchg(&dcache->garbage[i], 0); + if (v->domain->domain_id) { + if (~dcache->inuse[i]) { + gdprintk(XENLOG_INFO, "mfn: %lx, [%d]: %lx->%lx, garbage: %lx -> ~%lx, idx: %d, ~garbage: %lx \n", mfn, i, _inuse, dcache->inuse[i], garbage, ~dcache->inuse[i], + find_first_zero_bit(dcache->inuse, dcache->entries),~garbage); + branch |= 8; + } + } accum |= ~dcache->inuse[i]; } - if ( accum ) + if ( accum ) { idx = find_first_zero_bit(dcache->inuse, dcache->entries); + branch |= 1; + } else { + branch |= 2; /* Replace a hash entry instead. */ i = MAPHASH_HASHFN(mfn); do { hashent = &vcache->hash[i]; if ( hashent->idx != MAPHASHENT_NOTINUSE && !hashent->refcnt ) { + branch |= 4; idx = hashent->idx; ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(idx)) == hashent->mfn); l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty()); hashent->idx = MAPHASHENT_NOTINUSE; hashent->mfn = ~0UL; + if (idx >= dcache->entries) { + branch |= 8; + gdprintk(XENLOG_INFO, "mfn (%lx) -> %ld idx (iter:%d)\n", mfn, MAPHASH_HASHFN(mfn), j); + + for (i = 0; i < MAPHASH_ENTRIES;i++) { + hashent = &vcache->hash[i]; + + gdprintk(XENLOG_INFO, "[%d] idx=%d, mfn=0x%lx, refcnt: %d\n", + i, hashent->idx, hashent->mfn, hashent->refcnt); + } + } break; } if ( ++i == MAPHASH_ENTRIES ) i = 0; + j++; } while ( i != MAPHASH_HASHFN(mfn) ); } + if (idx >= dcache->entries) { + unsigned long _mfn; + const char *name; + unsigned long offset, size; + + static char namebuf[KSYM_NAME_LEN+1]; + #define BUFFER_SIZE sizeof("%s+%#lx/%#lx [%s]") + KSYM_NAME_LEN + \ + 2*(BITS_PER_LONG*3/10) + 1 + static char buffer[BUFFER_SIZE]; + + + + gdprintk(XENLOG_INFO, "mfn (%lx) -> %ld idx: %d(i:%d,j:%d), branch:%x 0x%lx\n", mfn, MAPHASH_HASHFN(mfn), idx, i, j, branch, accum); + + for (i = 0; i < dcache->entries; i++) { + int in_mapcache = 0; + _mfn = l1e_get_pfn(MAPCACHE_L1ENT(i)); + hashent = NULL; + + for (j = 0; j < MAPHASH_ENTRIES;j++) { + hashent = &vcache->hash[j]; + + if (hashent->mfn == _mfn) { + in_mapcache = 1; + break; + } + } + if (i <= 32 && dcache->ips[i]) { + name = symbols_lookup(dcache->ips[i], &size, &offset, namebuf); + + if (!name) + snprintf(buffer, BUFFER_SIZE, "0x%lx", dcache->ips[i]); + else + snprintf(buffer, BUFFER_SIZE, "%s+%#lx/%#lx", name, offset, size); + } else + snprintf(buffer, BUFFER_SIZE, "EIP=0"); + + if (in_mapcache) + gdprintk(XENLOG_INFO, "[%d] mfn=0x%lx idx=%d, mfn=0x%lx, refcnt: %d [%s]\n", + i, _mfn, hashent->idx, hashent->mfn, hashent->refcnt, buffer); + else + gdprintk(XENLOG_INFO, "[%d] mfn=%lx, [%s]\n", i, _mfn, buffer); + } + + } BUG_ON(idx >= dcache->entries); /* /Second/, flush TLBs. */ @@ -152,6 +224,9 @@ void *map_domain_page(unsigned long mfn) set_bit(idx, dcache->inuse); dcache->cursor = idx + 1; + if (v->domain->domain_id && idx <= 32) + dcache->ips[idx] = (unsigned long)__builtin_return_address(0); + spin_unlock(&dcache->lock); l1e_write(&MAPCACHE_L1ENT(idx), l1e_from_pfn(mfn, __PAGE_HYPERVISOR)); @@ -215,7 +290,8 @@ void unmap_domain_page(const void *ptr) /* /Second/, mark as garbage. */ set_bit(idx, dcache->garbage); } - + if (v->domain->domain_id && idx <= 32) + dcache->ips[idx] = 0; local_irq_restore(flags); } @@ -254,6 +330,7 @@ int mapcache_domain_init(struct domain *d) 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long))) > MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES << 20)); bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) * sizeof(long)); + gdprintk(XENLOG_INFO, "domain bitmap pages: %d\n", bitmap_pages); dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE; dcache->garbage = dcache->inuse + (bitmap_pages + 1) * PAGE_SIZE / sizeof(long); @@ -276,6 +353,7 @@ int mapcache_vcpu_init(struct vcpu *v) if ( is_hvm_vcpu(v) || !dcache->inuse ) return 0; + gdprintk(XENLOG_INFO, "ents: %d, entries: %d\n", ents, dcache->entries); if ( ents > dcache->entries ) { /* Populate page tables. */ diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index d79464d..5389ea9 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -67,6 +67,7 @@ struct mapcache_domain { /* Which mappings are in use, and which are garbage to reap next epoch? */ unsigned long *inuse; unsigned long *garbage; + unsigned long ips[33]; }; int mapcache_domain_init(struct domain *); --------------080500020508090302040509 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --------------080500020508090302040509--