All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	"Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Thu, 17 Feb 2011 10:11:25 +0100	[thread overview]
Message-ID: <4D5CE63D.5040704@ts.fujitsu.com> (raw)
In-Reply-To: <4D5CC89C.7020306@ts.fujitsu.com>

On 02/17/11 08:05, Juergen Gross wrote:
> On 02/16/11 14:54, George Dunlap wrote:
>> Andre (and Juergen), can you try again with the attached patch?
>>
>> What the patch basically does is try to make "cpu_disable_scheduler()"
>> do what it seems to say it does. :-) Namely, the various
>> scheduler-related interrutps (both per-cpu ticks and the master tick)
>> is a part of the scheduler, so disable them before doing anything, and
>> don't enable them until the cpu is really ready to go again.
>>
>> To be precise:
>> * cpu_disable_scheduler() disables ticks
>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
>> and does it after inserting the idle vcpu
>> * Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
>> stop tickers
>> + Call tick_{resume,suspend} in cpu_{up,down}, respectively
>> * Modify credit1's tick_{suspend,resume} to handle the master ticker
>> as well.
>>
>> With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
>> on one pcpu), I can perform thousands of operations successfully.
>>
>> (NB this is not ready for application yet, I just wanted to check to
>> see if it fixes Andre's problem)

Tried again, this time with the following patch:

diff -r 72470de157ce xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Wed Feb 16 09:49:33 2011 +0000
+++ b/xen/common/sched_credit.c Wed Feb 16 15:09:54 2011 +0100
@@ -1268,7 +1268,8 @@ csched_load_balance(struct csched_privat
          /*
           * Any work over there to steal?
           */
-        speer = csched_runq_steal(peer_cpu, cpu, snext->pri);
+        speer = cpu_isset(peer_cpu, *online) ?
+            csched_runq_steal(peer_cpu, cpu, snext->pri) : NULL;
          pcpu_schedule_unlock(peer_cpu);
          if ( speer != NULL )
          {


Worked without any flaw for 30000 iterations.


Juergen

>
> After some thousand iterations the machine hang and after dumping Dom0
> registers to console it continued running and crashed about a second later:
>
> (XEN) cpupool_unassign_cpu(pool=0,cpu=9)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0
> (XEN) cpupool_unassign_cpu ret=0
> (XEN) cpupool_unassign_cpu(pool=0,cpu=4)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0
> (XEN) cpupool_unassign_cpu ret=0
> (XEN) cpupool_assign_cpu(pool=1,cpu=9)
> (XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40
> (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
> timer.c:279
> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]----
> (XEN) CPU: 9
> (XEN) RIP: e008:[<ffff82c480126100>] active_timer+0xc/0x37
> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
> (XEN) rdx: ffff830839d8ff18 rsi: 0000010dbb628a80 rdi: ffff83083ffbcf98
> (XEN) rbp: ffff830839d8fd50 rsp: ffff830839d8fd50 r8: ffff83083ffbcf90
> (XEN) r9: ffff82c480213680 r10: 00000000ffffffff r11: 0000000000000010
> (XEN) r12: ffff82c4802d3f80 r13: ffff82c4802d3f80 r14: ffff83083ffbcf98
> (XEN) r15: ffff83083ffbcfc0 cr0: 000000008005003b cr4: 00000000000026f0
> (XEN) cr3: 000000007809c000 cr2: 0000000000620048
> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff830839d8fd50:
> (XEN) ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80
> (XEN) 0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50
> (XEN) 0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906
> (XEN) ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa
> (XEN) ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000
> (XEN) ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009
> (XEN) 00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198
> (XEN) ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009
> (XEN) ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9
> (XEN) ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21
> (XEN) 0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c
> (XEN) ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18
> (XEN) ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a
> (XEN) 0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff
> (XEN) ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246
> (XEN) 0000000000000000 000000010003347d 0000000000000000 0000000000000000
> (XEN) ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef
> (XEN) 0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246
> (XEN) ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c480126100>] active_timer+0xc/0x37
> (XEN) [<ffff82c480126ef9>] set_timer+0x102/0x218
> (XEN) [<ffff82c480117906>] csched_tick_resume+0x53/0x75
> (XEN) [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c
> (XEN) [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6
> (XEN) [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd
> (XEN) [<ffff82c480104f21>] continue_hypercall_tasklet_handler+0x51/0xc3
> (XEN) [<ffff82c480125b6c>] do_tasklet+0xe1/0x155
> (XEN) [<ffff82c48015645a>] idle_loop+0x5f/0x67
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 9:
> (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
> timer.c:279
> (XEN) ****************************************
>
>
> Juergen
>


-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

  reply	other threads:[~2011-02-17  9:11 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28  6:47 ` Juergen Gross
2011-01-28 11:07   ` Andre Przywara
2011-01-28 11:44     ` Juergen Gross
2011-01-28 13:14       ` Andre Przywara
2011-01-31  7:04         ` Juergen Gross
2011-01-31 14:59           ` Andre Przywara
2011-01-31 15:28             ` George Dunlap
2011-02-01 16:32               ` Andre Przywara
2011-02-02  6:27                 ` Juergen Gross
2011-02-02  8:49                   ` Juergen Gross
2011-02-02 10:05                     ` Juergen Gross
2011-02-02 10:59                       ` Andre Przywara
2011-02-02 14:39                 ` Stephan Diestelhorst
2011-02-02 15:14                   ` Juergen Gross
2011-02-02 16:01                     ` Stephan Diestelhorst
2011-02-03  5:57                       ` Juergen Gross
2011-02-03  9:18                         ` Juergen Gross
2011-02-04 14:09                           ` Andre Przywara
2011-02-07 12:38                             ` Andre Przywara
2011-02-07 13:32                               ` Juergen Gross
2011-02-07 15:55                                 ` George Dunlap
2011-02-08  5:43                                   ` Juergen Gross
2011-02-08 12:08                                     ` George Dunlap
2011-02-08 12:14                                       ` George Dunlap
2011-02-08 16:33                                         ` Andre Przywara
2011-02-09 12:27                                           ` George Dunlap
2011-02-09 12:27                                             ` George Dunlap
2011-02-09 13:04                                               ` Juergen Gross
2011-02-09 13:39                                                 ` Andre Przywara
2011-02-09 13:51                                               ` Andre Przywara
2011-02-09 14:21                                                 ` Juergen Gross
2011-02-10  6:42                                                   ` Juergen Gross
2011-02-10  9:25                                                     ` Andre Przywara
2011-02-10 14:18                                                       ` Andre Przywara
2011-02-11  6:17                                                         ` Juergen Gross
2011-02-11  7:39                                                           ` Andre Przywara
2011-02-14 17:57                                                             ` George Dunlap
2011-02-15  7:22                                                               ` Juergen Gross
2011-02-16  9:47                                                                 ` Juergen Gross
2011-02-16 13:54                                                                   ` George Dunlap
     [not found]                                                                     ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11                                                                     ` Juergen Gross
2011-02-16 14:28                                                                       ` Juergen Gross
2011-02-17  0:05                                                                       ` André Przywara
2011-02-17  7:05                                                                     ` Juergen Gross
2011-02-17  9:11                                                                       ` Juergen Gross [this message]
2011-02-21 10:00                                                                     ` Andre Przywara
2011-02-21 13:19                                                                       ` Juergen Gross
2011-02-21 14:45                                                                         ` Andre Przywara
2011-02-21 14:50                                                                           ` Juergen Gross
2011-02-08 12:23                                       ` Juergen Gross
2011-01-28 11:13   ` George Dunlap
2011-01-28 13:05     ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D5CE63D.5040704@ts.fujitsu.com \
    --to=juergen.gross@ts.fujitsu.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Stephan.Diestelhorst@amd.com \
    --cc=andre.przywara@amd.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.