From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Race condition with scheduler runqueues Date: Mon, 18 Feb 2013 18:11:07 +0000 Message-ID: <51226EBB.3030503@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Xen-devel List , George Dunlap , Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org Hello, Our testing has discovered a crash (pagefault at 0x0000000000000008) which I have tracked down to bad __runq_remove() in csched_vcpu_sleep() in sched_credit.c (because a static function of the same name also exists in sched_credit2.c, which confused matters to start with) The test case was a loop of localhost migrate of a 1vcpu HVM win8 domain. The test case itself has passed many times in the past on the same Xen codebase (Xen-4.1.3), indicating that it is very rare. There does not appear to be any relevant changes between the version of Xen in the test and xen-4.1-testing. The failure itself is because of a XEN_DOMCTL_scheduler_op (trace below) from dom0, targeting the VCPU of the migrating domain. (XEN) Xen call trace: (XEN) [] csched_vcpu_sleep+0x44/0x70 (XEN) 0[] vcpu_sleep_nosync+0xe7/0x3b0 (XEN) 12[] vcpu_sleep_sync+0x9/0x50 (XEN) 14[] sched_adjust+0xac/0x230 (XEN) 24[] do_domctl+0x731/0x1130 (XEN) 64[] compat_hypercall+0x74/0x80 The relevant part of csched_vcpu_sleep() is else if ( __vcpu_on_runq(svc) ) __runq_remove(svc); which disassembles to ffff82c480116a01: 49 8b 10 mov (%r8),%rdx ffff82c480116a04: 4c 39 c2 cmp %r8,%rdx ffff82c480116a07: 75 07 jne ffff82c480116a10 ffff82c480116a09: f3 c3 repz retq ffff82c480116a0b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ffff82c480116a10: 49 8b 40 08 mov 0x8(%r8),%rax ffff82c480116a14: 48 89 42 08 mov %rax,0x8(%rdx) # <- Pagefault here ffff82c480116a18: 48 89 10 mov %rdx,(%rax) ffff82c480116a1b: 4d 89 40 08 mov %r8,0x8(%r8) ffff82c480116a1f: 4d 89 00 mov %r8,(%r8) The relevant crash registers from the pagefault are: rax: 0000000000000000 rdx: 0000000000000000 r8: ffff83080c89ed90 If I am reading the code correctly, this means that runq->next is NULL, so we fail list_empty() and erroneously pass __vcpu_on_runq(). We then fail with a fault when trying to update runq->prev, which is also NULL. The only place I can spot in the code where the runq->{next,prev} could conceivably be NULL is in csched_alloc_vdata() between the memset() and INIT_LIST_HEAD(). This is logically sensible in combination with the localhost migrate loop, and I cant immediately see anything to prevent this race happening. Can anyone with more knowledge than me in this area of code comment on the plausibility of my analysis? Unfortunately, I cant see an easy solution. ~Andrew