All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: Jan Beulich <JBeulich@novell.com>
Cc: xen-devel@lists.xensource.com, Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: Re: [xen-unstable test] 6374: regressions - FAIL
Date: Mon, 14 Mar 2011 15:40:27 +0100	[thread overview]
Message-ID: <4D7E28DB.6080005@ts.fujitsu.com> (raw)
In-Reply-To: <4D7DFD130200007800036344@vpn.id2.novell.com>

On 03/14/11 11:33, Jan Beulich wrote:
>>>> On 11.03.11 at 18:51, Ian Jackson<Ian.Jackson@eu.citrix.com>  wrote:
>> xen.org writes ("[Xen-devel] [xen-unstable test] 6374: regressions - FAIL"):
>>> flight 6374 xen-unstable real [real]
>>> Tests which did not succeed and are blocking:
>>>   test-amd64-i386-pv            5 xen-boot               fail REGR. vs. 6369
>>
>> Xen crash in scheduler (non-credit2).
>>
>> Mar 11 13:46:53.646796 (XEN) Watchdog timer detects that CPU1 is stuck!
>> Mar 11 13:46:57.922794 (XEN) ----[ Xen-4.1.0-rc7-pre  x86_64  debug=y  Not tainted ]----
>> Mar 11 13:46:57.931763 (XEN) CPU:    1
>> Mar 11 13:46:57.931784 (XEN) RIP:    e008:[<ffff82c480100140>] __bitmap_empty+0x0/0x7f
>> Mar 11 13:46:57.931817 (XEN) RFLAGS: 0000000000000047   CONTEXT: hypervisor
>> Mar 11 13:46:57.946773 (XEN) rax: ffff82c4802d1ac0   rbx: ffff8301a7fafc78   rcx: 0000000000000002
>> Mar 11 13:46:57.946813 (XEN) rdx: ffff82c4802d0cc0   rsi: 0000000000000080   rdi: ffff8301a7fafc78
>> Mar 11 13:46:57.954780 (XEN) rbp: ffff8301a7fafcb8   rsp: ffff8301a7fafc00   r8:  0000000000000002
>> Mar 11 13:46:57.966770 (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
>> Mar 11 13:46:57.966805 (XEN) r12: ffff8301a7fafc68   r13: 0000000000000001   r14: 0000000000000001
>> Mar 11 13:46:57.975780 (XEN) r15: ffff82c4802d1ac0   cr0: 000000008005003b   cr4: 00000000000006f0
>> Mar 11 13:46:57.987771 (XEN) cr3: 00000000d7c9c000   cr2: 00000000c45e5770
>> Mar 11 13:46:57.987800 (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0033   ss: 0000   cs: e008
>> Mar 11 13:46:57.998773 (XEN) Xen stack trace from rsp=ffff8301a7fafc00:
>> ...
>> Mar 11 13:46:58.154777 (XEN) Xen call trace:
>> Mar 11 13:46:58.154798 (XEN)    [<ffff82c480100140>] __bitmap_empty+0x0/0x7f
>> Mar 11 13:46:58.163767 (XEN)    [<ffff82c480119582>] csched_cpu_pick+0xe/0x10
>> Mar 11 13:46:58.163802 (XEN)    [<ffff82c480122c8d>] vcpu_migrate+0xfb/0x230
>> Mar 11 13:46:58.178768 (XEN)    [<ffff82c480122e24>] context_saved+0x62/0x7b
>> Mar 11 13:46:58.178799 (XEN)    [<ffff82c480157f17>] context_switch+0xd98/0xdca
>> Mar 11 13:46:58.183766 (XEN)    [<ffff82c4801226b4>] schedule+0x5fc/0x624
>> Mar 11 13:46:58.183795 (XEN)    [<ffff82c480123837>] __do_softirq+0x88/0x99
>> Mar 11 13:46:58.198784 (XEN)    [<ffff82c4801238b2>] do_softirq+0x6a/0x7a
>
> I suppose that's a result of 22957:c5c4688d5654 - as I understand it
> exiting the loop is only possible if two consecutive invocations of
> pick_cpu return the same result. This, however, is precisely what the
> pCPU's idle_bias is supposed to prevent on hyper-threaded/multi-core
> systems (so that it's not always the same entity that gets selected).
>
> But even beyond that particular aspect, relying on any form of
> "stability" of the returned value isn't correct.
>
> Plus running pick_cpu repeatedly without actually using its result
> is wrong wrt to idle_bias updating too - that's why
> cached_vcpu_acct() calls _csched_cpu_pick() with the commit
> argument set to false (which will result in a subsequent call -
> through pick_cpu - with the argument set to true to be likely
> to return the same value, but there's no correctness dependency
> on that). So 22948:2d35823a86e7 already wasn't really correct
> in putting a loop around pick_cpu.
>
> It's also not clear to me what the surrounding
> if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
> is supposed to filter, as the lock pointer gets set only when a
> CPU gets brought up.

Yeah, but the vcpu can change cpus while we don't hold the lock.
This means old_cpu can change between selecting the lock and actually
taking it...

> As I don't really understand what is being tried to achieve here,
> I also can't really suggest a possible fix other than reverting both
> offending changesets.

I'll send a patch as a suggestion :-)


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

      reply	other threads:[~2011-03-14 14:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11 16:20 [xen-unstable test] 6374: regressions - FAIL xen.org
2011-03-11 17:51 ` Ian Jackson
2011-03-14 10:02   ` Tim Deegan
2011-03-14 10:39     ` Jan Beulich
2011-03-14 10:52       ` Tim Deegan
2011-03-14 16:08         ` Jan Beulich
2011-03-14 16:17           ` Tim Deegan
2011-03-14 10:33   ` Jan Beulich
2011-03-14 14:40     ` Juergen Gross [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D7E28DB.6080005@ts.fujitsu.com \
    --to=juergen.gross@ts.fujitsu.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@novell.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.