Re: Hypervisor crash(!) on xl cpupool-numa-split

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	"Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Wed, 16 Feb 2011 10:47:23 +0100	[thread overview]
Message-ID: <4D5B9D2B.107@ts.fujitsu.com> (raw)
In-Reply-To: <4D5A29C0.4050702@ts.fujitsu.com>

Okay, I have some more data.

I activated cpupool_dprintk() and included checks in sched_credit.c to
test for weight inconsistencies. To reduce race possibilities I've added
my patch to execute cpu assigning/unassigning always in a tasklet on the
cpu to be moved.

Here is the result:

(XEN) cpupool_unassign_cpu(pool=0,cpu=6)
(XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
(XEN) cpupool_unassign_cpu(pool=0,cpu=6)
(XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
(XEN) cpupool_assign_cpu(pool=0,cpu=1)
(XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
(XEN) cpupool_assign_cpu(cpu=1) ret 0
(XEN) cpupool_assign_cpu(pool=1,cpu=4)
(XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
(XEN) cpupool_assign_cpu(cpu=4) ret 0
(XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
(XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
(XEN) Xen BUG at sched_credit.c:570
(XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    4
(XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx: 0000000000000000
(XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi: ffff82c4802542e8
(XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8:  0000000000000004
(XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11: 0000000000000001
(XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14: ffff831002ad5e40
(XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830839dcfde8:
(XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000
(XEN)    0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651
(XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204
(XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e
(XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100
(XEN)    ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880
(XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647
(XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20
(XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2
(XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002
(XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50
(XEN)    0000000000000246 0000000000000032 0000000000000000 00000000ffffffff
(XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848
(XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033
(XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000004
(XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
(XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
(XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
(XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
(XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 4:
(XEN) Xen BUG at sched_credit.c:570
(XEN) ****************************************

As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON
triggered in csched_acct() is a logical result of this.

How this can happen I don't know yet.
Anyone any idea? I'll keep searching...


Juergen

On 02/15/11 08:22, Juergen Gross wrote:
> On 02/14/11 18:57, George Dunlap wrote:
>> The good news is, I've managed to reproduce this on my local test
>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
>> attached script. It's time to go home now, but I should be able to
>> dig something up tomorrow.
>>
>> To use the script:
>> * Rename cpupool0 to "p0", and create an empty second pool, "p1"
>> * You can modify elements by adding "arg=val" as arguments.
>> * Arguments are:
>> + dryrun={true,false} Do the work, but don't actually execute any xl
>> arguments. Default false.
>> + left: Number commands to execute. Default 10.
>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is
>> 8 cpus).
>> + verbose={true,false} Print what you're doing. Default is true.
>>
>> The script sometimes attempts to remove the last cpu from cpupool0; in
>> this case, libxl will print an error. If the script gets an error
>> under that condition, it will ignore it; under any other condition, it
>> will print diagnostic information.
>>
>> What finally crashed it for me was this command:
>> # ./cpupool-test.sh verbose=false left=1000
>
> Nice!
> With your script I finally managed to get the error, too. On my box (2
> sockets
> a 6 cores) I had to use
>
> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>
> to trigger it.
> Looking for more data now...
>
>
> Juergen
>
>>
>> -George
>>
>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>> Przywara<andre.przywara@amd.com> wrote:
>>> Juergen Gross wrote:
>>>>
>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>
>>>>> Andre Przywara wrote:
>>>>>>
>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>
>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>
>>>>>>>> Andre, George,
>>>>>>>>
>>>>>>>>
>>>>>>>> What seems to be interesting: I think the problem did always occur
>>>>>>>> when
>>>>>>>> a new cpupool was created and the first cpu was moved to it.
>>>>>>>>
>>>>>>>> I think my previous assumption regarding the master_ticker was not
>>>>>>>> too bad.
>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming
>>>>>>>> active
>>>>>>>> before the scheduler is really initialized properly. This could
>>>>>>>> happen, if
>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved
>>>>>>>> and
>>>>>>>> the
>>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>>
>>>>>>>> The solution should be to activate the timers only if the
>>>>>>>> scheduler is
>>>>>>>> ready for them.
>>>>>>>>
>>>>>>>> George, do you think the master_ticker should be stopped in
>>>>>>>> suspend_ticker
>>>>>>>> as well? I still see potential problems for entering deep C-States.
>>>>>>>> I think
>>>>>>>> I'll prepare a patch which will keep the master_ticker active
>>>>>>>> for the
>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case.
>>>>>>>
>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine
>>>>>>> without any
>>>>>>> problems.
>>>>>>> Andre, could you give it a try?
>>>>>>
>>>>>> Did, but unfortunately it crashed as always. Tried twice and made
>>>>>> sure
>>>>>> I booted the right kernel. Sorry.
>>>>>> The idea with the race between the timer and the state changing
>>>>>> sounded very appealing, actually that was suspicious to me from the
>>>>>> beginning.
>>>>>>
>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON
>>>>>> to see in which situation we are when the bug triggers.
>>>>>
>>>>> OK, here is a first try of this, the patch iterates over all CPU pools
>>>>> and outputs some data if the BUG_ON
>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition
>>>>> triggers:
>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
>>>>> fffffffc003f
>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>> ....
>>>>> The masks look proper (6 cores per node), the bug triggers when the
>>>>> first CPU is about to be(?) inserted.
>>>>
>>>> Sure? I'm missing the cpu with mask 2000.
>>>> I'll try to reproduce the problem on a larger machine here (24 cores, 4
>>>> numa
>>>> nodes).
>>>> Andre, can you give me your xen boot parameters? Which xen changeset
>>>> are
>>>> you
>>>> running, and do you have any additional patches in use?
>>>
>>> The grub lines:
>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>
>>> All of my experiments are use c/s 22858 as a base.
>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or
>>> G34),
>>> you should add the following patch (removing the line)
>>> --- a/xen/arch/x86/traps.c
>>> +++ b/xen/arch/x86/traps.c
>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>> break;
>>> case 5: /* MONITOR/MWAIT */
>>>
>>> This is not necessary (in fact that reverts my patch c/s 22815), but
>>> raises
>>> the probability to trigger the bug, probably because it increases the
>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0,
>>> try to
>>> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>>>
>>> Good luck ;-)
>>> Andre.
>>>
>>> --
>>> Andre Przywara
>>> AMD-OSRC (Dresden)
>>> Tel: x29712
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>
>


-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

next prev parent reply	other threads:[~2011-02-16  9:47 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28  6:47 ` Juergen Gross
2011-01-28 11:07   ` Andre Przywara
2011-01-28 11:44     ` Juergen Gross
2011-01-28 13:14       ` Andre Przywara
2011-01-31  7:04         ` Juergen Gross
2011-01-31 14:59           ` Andre Przywara
2011-01-31 15:28             ` George Dunlap
2011-02-01 16:32               ` Andre Przywara
2011-02-02  6:27                 ` Juergen Gross
2011-02-02  8:49                   ` Juergen Gross
2011-02-02 10:05                     ` Juergen Gross
2011-02-02 10:59                       ` Andre Przywara
2011-02-02 14:39                 ` Stephan Diestelhorst
2011-02-02 15:14                   ` Juergen Gross
2011-02-02 16:01                     ` Stephan Diestelhorst
2011-02-03  5:57                       ` Juergen Gross
2011-02-03  9:18                         ` Juergen Gross
2011-02-04 14:09                           ` Andre Przywara
2011-02-07 12:38                             ` Andre Przywara
2011-02-07 13:32                               ` Juergen Gross
2011-02-07 15:55                                 ` George Dunlap
2011-02-08  5:43                                   ` Juergen Gross
2011-02-08 12:08                                     ` George Dunlap
2011-02-08 12:14                                       ` George Dunlap
2011-02-08 16:33                                         ` Andre Przywara
2011-02-09 12:27                                           ` George Dunlap
2011-02-09 12:27                                             ` George Dunlap
2011-02-09 13:04                                               ` Juergen Gross
2011-02-09 13:39                                                 ` Andre Przywara
2011-02-09 13:51                                               ` Andre Przywara
2011-02-09 14:21                                                 ` Juergen Gross
2011-02-10  6:42                                                   ` Juergen Gross
2011-02-10  9:25                                                     ` Andre Przywara
2011-02-10 14:18                                                       ` Andre Przywara
2011-02-11  6:17                                                         ` Juergen Gross
2011-02-11  7:39                                                           ` Andre Przywara
2011-02-14 17:57                                                             ` George Dunlap
2011-02-15  7:22                                                               ` Juergen Gross
2011-02-16  9:47                                                                 ` Juergen Gross [this message]
2011-02-16 13:54                                                                   ` George Dunlap
     [not found]                                                                     ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11                                                                     ` Juergen Gross
2011-02-16 14:28                                                                       ` Juergen Gross
2011-02-17  0:05                                                                       ` André Przywara
2011-02-17  7:05                                                                     ` Juergen Gross
2011-02-17  9:11                                                                       ` Juergen Gross
2011-02-21 10:00                                                                     ` Andre Przywara
2011-02-21 13:19                                                                       ` Juergen Gross
2011-02-21 14:45                                                                         ` Andre Przywara
2011-02-21 14:50                                                                           ` Juergen Gross
2011-02-08 12:23                                       ` Juergen Gross
2011-01-28 11:13   ` George Dunlap
2011-01-28 13:05     ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D5B9D2B.107@ts.fujitsu.com \
    --to=juergen.gross@ts.fujitsu.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Stephan.Diestelhorst@amd.com \
    --cc=andre.przywara@amd.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).