From: Andre Przywara <andre.przywara@amd.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Mon, 21 Feb 2011 11:00:38 +0100 [thread overview]
Message-ID: <4D6237C6.1050206@amd.com> (raw)
In-Reply-To: <AANLkTin+rE1=+vpmTg9xeQdYn7_hucSFkrz1qCtiKfkY@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12198 bytes --]
George Dunlap wrote:
> Andre (and Juergen), can you try again with the attached patch?
I applied this patch on top of 22931 and it did _not_ work.
The crash occurred almost immediately after I started my script, so the
same behaviour as without the patch.
(attached my script for reference, though it will most likely only make
sense on bigger NUMA machines)
Regards,
Andre.
> What the patch basically does is try to make "cpu_disable_scheduler()"
> do what it seems to say it does. :-) Namely, the various
> scheduler-related interrutps (both per-cpu ticks and the master tick)
> is a part of the scheduler, so disable them before doing anything, and
> don't enable them until the cpu is really ready to go again.
>
> To be precise:
> * cpu_disable_scheduler() disables ticks
> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
> and does it after inserting the idle vcpu
> * Modify semantics, s.t., {alloc,free}_pdata() don't actually start or
> stop tickers
> + Call tick_{resume,suspend} in cpu_{up,down}, respectively
> * Modify credit1's tick_{suspend,resume} to handle the master ticker as well.
>
> With this patch (if dom0 doesn't get wedged due to all 8 vcpus being
> on one pcpu), I can perform thousands of operations successfully.
>
> (NB this is not ready for application yet, I just wanted to check to
> see if it fixes Andre's problem)
>
> -George
>
> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> Okay, I have some more data.
>>
>> I activated cpupool_dprintk() and included checks in sched_credit.c to
>> test for weight inconsistencies. To reduce race possibilities I've added
>> my patch to execute cpu assigning/unassigning always in a tasklet on the
>> cpu to be moved.
>>
>> Here is the result:
>>
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
>> (XEN) cpupool_assign_cpu(cpu=1) ret 0
>> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
>> (XEN) cpupool_assign_cpu(cpu=4) ret 0
>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
>> (XEN) Xen BUG at sched_credit.c:570
>> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]----
>> (XEN) CPU: 4
>> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000
>> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8
>> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004
>> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001
>> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40
>> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0
>> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98
>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
>> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000
>> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651
>> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204
>> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e
>> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100
>> (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880
>> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647
>> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20
>> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2
>> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002
>> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50
>> (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff
>> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848
>> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033
>> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004
>> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f
>> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c
>> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
>> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99
>> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 4:
>> (XEN) Xen BUG at sched_credit.c:570
>> (XEN) ****************************************
>>
>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON
>> triggered in csched_acct() is a logical result of this.
>>
>> How this can happen I don't know yet.
>> Anyone any idea? I'll keep searching...
>>
>>
>> Juergen
>>
>> On 02/15/11 08:22, Juergen Gross wrote:
>>> On 02/14/11 18:57, George Dunlap wrote:
>>>> The good news is, I've managed to reproduce this on my local test
>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
>>>> attached script. It's time to go home now, but I should be able to
>>>> dig something up tomorrow.
>>>>
>>>> To use the script:
>>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1"
>>>> * You can modify elements by adding "arg=val" as arguments.
>>>> * Arguments are:
>>>> + dryrun={true,false} Do the work, but don't actually execute any xl
>>>> arguments. Default false.
>>>> + left: Number commands to execute. Default 10.
>>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is
>>>> 8 cpus).
>>>> + verbose={true,false} Print what you're doing. Default is true.
>>>>
>>>> The script sometimes attempts to remove the last cpu from cpupool0; in
>>>> this case, libxl will print an error. If the script gets an error
>>>> under that condition, it will ignore it; under any other condition, it
>>>> will print diagnostic information.
>>>>
>>>> What finally crashed it for me was this command:
>>>> # ./cpupool-test.sh verbose=false left=1000
>>> Nice!
>>> With your script I finally managed to get the error, too. On my box (2
>>> sockets
>>> a 6 cores) I had to use
>>>
>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>>
>>> to trigger it.
>>> Looking for more data now...
>>>
>>>
>>> Juergen
>>>
>>>> -George
>>>>
>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>>> Przywara<andre.przywara@amd.com> wrote:
>>>>> Juergen Gross wrote:
>>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>> Andre Przywara wrote:
>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>> Andre, George,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What seems to be interesting: I think the problem did always occur
>>>>>>>>>> when
>>>>>>>>>> a new cpupool was created and the first cpu was moved to it.
>>>>>>>>>>
>>>>>>>>>> I think my previous assumption regarding the master_ticker was not
>>>>>>>>>> too bad.
>>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming
>>>>>>>>>> active
>>>>>>>>>> before the scheduler is really initialized properly. This could
>>>>>>>>>> happen, if
>>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved
>>>>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>>>>
>>>>>>>>>> The solution should be to activate the timers only if the
>>>>>>>>>> scheduler is
>>>>>>>>>> ready for them.
>>>>>>>>>>
>>>>>>>>>> George, do you think the master_ticker should be stopped in
>>>>>>>>>> suspend_ticker
>>>>>>>>>> as well? I still see potential problems for entering deep C-States.
>>>>>>>>>> I think
>>>>>>>>>> I'll prepare a patch which will keep the master_ticker active
>>>>>>>>>> for the
>>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case.
>>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine
>>>>>>>>> without any
>>>>>>>>> problems.
>>>>>>>>> Andre, could you give it a try?
>>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made
>>>>>>>> sure
>>>>>>>> I booted the right kernel. Sorry.
>>>>>>>> The idea with the race between the timer and the state changing
>>>>>>>> sounded very appealing, actually that was suspicious to me from the
>>>>>>>> beginning.
>>>>>>>>
>>>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON
>>>>>>>> to see in which situation we are when the bug triggers.
>>>>>>> OK, here is a first try of this, the patch iterates over all CPU pools
>>>>>>> and outputs some data if the BUG_ON
>>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition
>>>>>>> triggers:
>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
>>>>>>> fffffffc003f
>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>>> ....
>>>>>>> The masks look proper (6 cores per node), the bug triggers when the
>>>>>>> first CPU is about to be(?) inserted.
>>>>>> Sure? I'm missing the cpu with mask 2000.
>>>>>> I'll try to reproduce the problem on a larger machine here (24 cores, 4
>>>>>> numa
>>>>>> nodes).
>>>>>> Andre, can you give me your xen boot parameters? Which xen changeset
>>>>>> are
>>>>>> you
>>>>>> running, and do you have any additional patches in use?
>>>>> The grub lines:
>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>>
>>>>> All of my experiments are use c/s 22858 as a base.
>>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or
>>>>> G34),
>>>>> you should add the following patch (removing the line)
>>>>> --- a/xen/arch/x86/traps.c
>>>>> +++ b/xen/arch/x86/traps.c
>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>>> break;
>>>>> case 5: /* MONITOR/MWAIT */
>>>>>
>>>>> This is not necessary (in fact that reverts my patch c/s 22815), but
>>>>> raises
>>>>> the probability to trigger the bug, probably because it increases the
>>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0,
>>>>> try to
>>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>>>>>
>>>>> Good luck ;-)
>>>>> Andre.
>>>>>
>>>>> --
>>>>> Andre Przywara
>>>>> AMD-OSRC (Dresden)
>>>>> Tel: x29712
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>
>>
>> --
>> Juergen Gross Principal Developer Operating Systems
>> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
>> Fujitsu Technology Solutions e-mail:
>> juergen.gross@ts.fujitsu.com
>> Domagkstr. 28 Internet: ts.fujitsu.com
>> D-80807 Muenchen Company details:
>> ts.fujitsu.com/imprint.html
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
[-- Attachment #2: numasplit.sh --]
[-- Type: text/plain, Size: 1778 bytes --]
#!/bin/sh
XL=./ldxl
ROOTPOOL=Pool-0
NUMAPREFIX=Pool-node
numnodes=`xl info | sed -e 's/^nr_nodes *: \([0-9]*\)/\1/;t;d'`
numcores=`xl info | sed -e 's/^nr_cpus *: \([0-9]*\)/\1/;t;d'`
if [ ! -x ${XL} ]
then
XL=xl
fi
if [ $# -gt 0 ];
then
action=$1
else
action=create
fi
if [ "$action" = "create" ]
then
$XL cpupool-rename $ROOTPOOL ${NUMAPREFIX}0
for i in `seq 1 $((numnodes-1))`
do
echo "Removing CPUs from Pool 0"
$XL cpupool-cpu-remove ${NUMAPREFIX}0 node:$i
echo "Rewriting config file"
sed -i -e "s/${NUMAPREFIX}./${NUMAPREFIX}${i}/" cpupool.test
echo "Creating new pool"
$XL cpupool-create cpupool.test
echo "Populating new pool"
$XL cpupool-cpu-add ${NUMAPREFIX}${i} node:$i
done
elif [ "$action" = "create2" ]
then
$XL cpupool-rename $ROOTPOOL ${NUMAPREFIX}0
echo "Removing CPUs from Pool 0"
for i in `seq 1 $((numnodes-1))`
do
$XL cpupool-cpu-remove ${NUMAPREFIX}0 node:$i
done
for i in `seq 1 $((numnodes-1))`
do
echo "Rewriting config file"
sed -i -e "s/${NUMAPREFIX}./${NUMAPREFIX}${i}/" cpupool.test
echo "Creating new pool"
$XL cpupool-create cpupool.test
echo "Populating new pool"
$XL cpupool-cpu-add ${NUMAPREFIX}${i} node:$i
done
elif [ "$action" = "revert" ]
then
for i in `seq 1 $((numnodes-1))`
do
echo "Destroying Pool $i"
$XL cpupool-destroy ${NUMAPREFIX}${i}
echo "adding freed CPUs to pool 0"
$XL cpupool-cpu-add ${NUMAPREFIX}0 node:$i
done
$XL cpupool-rename ${NUMAPREFIX}0 $ROOTPOOL
elif [ "$action" = "remove" ]
then
for i in `seq 1 $((numcores-1))`
do
echo "Removing CPU $i from Pool-0"
$XL cpupool-cpu-remove $ROOTPOOL $i
done
elif [ "$action" = "add" ]
then
for i in `seq 1 $((numcores-1))`
do
echo "Removing CPU $i from Pool-0"
$XL cpupool-cpu-add $ROOTPOOL $i
done
fi
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
next prev parent reply other threads:[~2011-02-21 10:00 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28 6:47 ` Juergen Gross
2011-01-28 11:07 ` Andre Przywara
2011-01-28 11:44 ` Juergen Gross
2011-01-28 13:14 ` Andre Przywara
2011-01-31 7:04 ` Juergen Gross
2011-01-31 14:59 ` Andre Przywara
2011-01-31 15:28 ` George Dunlap
2011-02-01 16:32 ` Andre Przywara
2011-02-02 6:27 ` Juergen Gross
2011-02-02 8:49 ` Juergen Gross
2011-02-02 10:05 ` Juergen Gross
2011-02-02 10:59 ` Andre Przywara
2011-02-02 14:39 ` Stephan Diestelhorst
2011-02-02 15:14 ` Juergen Gross
2011-02-02 16:01 ` Stephan Diestelhorst
2011-02-03 5:57 ` Juergen Gross
2011-02-03 9:18 ` Juergen Gross
2011-02-04 14:09 ` Andre Przywara
2011-02-07 12:38 ` Andre Przywara
2011-02-07 13:32 ` Juergen Gross
2011-02-07 15:55 ` George Dunlap
2011-02-08 5:43 ` Juergen Gross
2011-02-08 12:08 ` George Dunlap
2011-02-08 12:14 ` George Dunlap
2011-02-08 16:33 ` Andre Przywara
2011-02-09 12:27 ` George Dunlap
2011-02-09 12:27 ` George Dunlap
2011-02-09 13:04 ` Juergen Gross
2011-02-09 13:39 ` Andre Przywara
2011-02-09 13:51 ` Andre Przywara
2011-02-09 14:21 ` Juergen Gross
2011-02-10 6:42 ` Juergen Gross
2011-02-10 9:25 ` Andre Przywara
2011-02-10 14:18 ` Andre Przywara
2011-02-11 6:17 ` Juergen Gross
2011-02-11 7:39 ` Andre Przywara
2011-02-14 17:57 ` George Dunlap
2011-02-15 7:22 ` Juergen Gross
2011-02-16 9:47 ` Juergen Gross
2011-02-16 13:54 ` George Dunlap
[not found] ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11 ` Juergen Gross
2011-02-16 14:28 ` Juergen Gross
2011-02-17 0:05 ` André Przywara
2011-02-17 7:05 ` Juergen Gross
2011-02-17 9:11 ` Juergen Gross
2011-02-21 10:00 ` Andre Przywara [this message]
2011-02-21 13:19 ` Juergen Gross
2011-02-21 14:45 ` Andre Przywara
2011-02-21 14:50 ` Juergen Gross
2011-02-08 12:23 ` Juergen Gross
2011-01-28 11:13 ` George Dunlap
2011-01-28 13:05 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D6237C6.1050206@amd.com \
--to=andre.przywara@amd.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=Stephan.Diestelhorst@amd.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).