All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel List <xen-devel@lists.xen.org>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>
Subject: Race condition with scheduler runqueues
Date: Mon, 18 Feb 2013 18:11:07 +0000	[thread overview]
Message-ID: <51226EBB.3030503@citrix.com> (raw)

Hello,

Our testing has discovered a crash (pagefault at 0x0000000000000008)
which I have tracked down to bad __runq_remove() in csched_vcpu_sleep()
in sched_credit.c (because a static function of the same name also
exists in sched_credit2.c, which confused matters to start with)

The test case was a loop of localhost migrate of a 1vcpu HVM win8
domain.  The test case itself has passed many times in the past on the
same Xen codebase (Xen-4.1.3), indicating that it is very rare.  There
does not appear to be any relevant changes between the version of Xen in
the test and xen-4.1-testing.

The failure itself is because of a XEN_DOMCTL_scheduler_op (trace below)
from dom0, targeting the VCPU of the migrating domain.

(XEN) Xen call trace:
(XEN)       [<ffff82c480116a14>] csched_vcpu_sleep+0x44/0x70
(XEN)      0[<ffff82c480120117>] vcpu_sleep_nosync+0xe7/0x3b0
(XEN)     12[<ffff82c4801203e9>] vcpu_sleep_sync+0x9/0x50
(XEN)     14[<ffff82c48011fd4c>] sched_adjust+0xac/0x230
(XEN)     24[<ffff82c480102bc1>] do_domctl+0x731/0x1130
(XEN)     64[<ffff82c4802013c4>] compat_hypercall+0x74/0x80

The relevant part of csched_vcpu_sleep() is

    else if ( __vcpu_on_runq(svc) )
        __runq_remove(svc);

which disassembles to

ffff82c480116a01:       49 8b 10                mov    (%r8),%rdx
ffff82c480116a04:       4c 39 c2                cmp    %r8,%rdx
ffff82c480116a07:       75 07                   jne    ffff82c480116a10
<csched_vcpu_sleep+0x40>
ffff82c480116a09:       f3 c3                   repz retq
ffff82c480116a0b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
ffff82c480116a10:       49 8b 40 08             mov    0x8(%r8),%rax
ffff82c480116a14:       48 89 42 08             mov    %rax,0x8(%rdx) #
<- Pagefault here
ffff82c480116a18:       48 89 10                mov    %rdx,(%rax)
ffff82c480116a1b:       4d 89 40 08             mov    %r8,0x8(%r8)
ffff82c480116a1f:       4d 89 00                mov    %r8,(%r8)

The relevant crash registers from the pagefault are:
rax: 0000000000000000
rdx: 0000000000000000
 r8: ffff83080c89ed90

If I am reading the code correctly, this means that runq->next is NULL,
so we fail list_empty() and erroneously pass __vcpu_on_runq().  We then
fail with a fault when trying to update runq->prev, which is also NULL.

The only place I can spot in the code where the runq->{next,prev} could
conceivably be NULL is in csched_alloc_vdata() between the memset() and
INIT_LIST_HEAD().  This is logically sensible in combination with the
localhost migrate loop, and I cant immediately see anything to prevent
this race happening.

Can anyone with more knowledge than me in this area of code comment on
the plausibility of my analysis?  Unfortunately, I cant see an easy
solution.

~Andrew

             reply	other threads:[~2013-02-18 18:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-18 18:11 Andrew Cooper [this message]
2013-02-19  9:28 ` Race condition with scheduler runqueues Jan Beulich
2013-02-19 11:47   ` Andrew Cooper
2013-02-26 12:08     ` George Dunlap
2013-02-26 13:44       ` Andrew Cooper
2013-02-27  6:19   ` Juergen Gross
2013-02-27  6:28     ` Juergen Gross
2013-02-27  9:12       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51226EBB.3030503@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.