windows 2008 guest causing rcu

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* windows 2008 guest causing rcu_shed to emit NMI
@ 2013-01-22 18:00 Andrey Korolyov
  2013-01-24  0:52 ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-22 18:00 UTC (permalink / raw)
  To: kvm

[-- Attachment #1: Type: text/plain, Size: 595 bytes --]

Hi,

problem described in the title happens on heavy I/O pressure on the
host, without idle=poll trace almost always is the same, involving
mwait, with poll and nohz=off RIP varies from time to time, at the
previous hang it was tg_throttle_down, rather than test_ti_thread_flag
in attached one. Both possible clocksource drivers, hpet and tsc, able
to reproduce that with equal probability. VMs are pinned over one of
two numa sets on two-head machine, mean emulator thread and each of
vcpu threads has its own cpuset cg with '0-5,12-17' or '6-11,18-23'.
I`ll appreciate any suggestions to try.

[-- Attachment #2: dmesg2.txt.gz --]
[-- Type: application/x-gzip, Size: 16610 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-22 18:00 windows 2008 guest causing rcu_shed to emit NMI Andrey Korolyov
@ 2013-01-24  0:52 ` Marcelo Tosatti
  2013-01-24 10:54   ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2013-01-24  0:52 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: kvm

On Tue, Jan 22, 2013 at 09:00:25PM +0300, Andrey Korolyov wrote:
> Hi,
> 
> problem described in the title happens on heavy I/O pressure on the
> host, without idle=poll trace almost always is the same, involving
> mwait, with poll and nohz=off RIP varies from time to time, at the
> previous hang it was tg_throttle_down, rather than test_ti_thread_flag
> in attached one. Both possible clocksource drivers, hpet and tsc, able
> to reproduce that with equal probability. VMs are pinned over one of
> two numa sets on two-head machine, mean emulator thread and each of
> vcpu threads has its own cpuset cg with '0-5,12-17' or '6-11,18-23'.
> I`ll appreciate any suggestions to try.

Andrey,

Can you reproduce with an upstream kernel? Commit
5cfc2aabcb282f fixes a livelock.

 d2 75 c3 eb 03 41 89 c6 48 83 c4 18 44 89 f0 5b 5d 41 5c 41 5d 41 5e 41
5f c3 <31> c0 c3 48 63 ff 48 c7 c2 80 37 01 00 48 8b 0c fd e0 d6 68 81
[12738.508644] Call Trace:
[12738.508648]  [<ffffffff81035a66>] ? walk_tg_tree_from+0x70/0x99
[12738.508652]  [<ffffffff81014c03>] ? __switch_to_xtra+0x14c/0x160
[12738.508656]  [<ffffffff8103bcce>] ? throttle_cfs_rq+0x4d/0x109
[12738.508660]  [<ffffffff8103be70>] ? put_prev_task_fair+0x3f/0x65
[12738.508663]  [<ffffffff8134c8ae>] ? __schedule+0x32e/0x5c3
[12738.508666]  [<ffffffff8134ceee>] ? yield_to+0xfa/0x10c
[12738.508669]  [<ffffffff8105d5af>] ? atomic_inc+0x3/0x4
[12738.508678]  [<ffffffffa03a8fc4>] ? kvm_vcpu_on_spin+0x8c/0xf7 [kvm]
[12738.508684]  [<ffffffffa030602f>] ? handle_pause+0x11/0x18

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-24  0:52 ` Marcelo Tosatti
@ 2013-01-24 10:54   ` Andrey Korolyov
  2013-01-24 12:20     ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-24 10:54 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

Thank you Marcelo,

Host node locking up sometimes later than yesterday, bur problem still
here, please see attached dmesg. Stuck process looks like
root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
/usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
virtio-blk-pci,? -device

on fourth vm by count.

Should I try upstream kernel instead of applying patch to the latest
3.4 or it is useless?

On Thu, Jan 24, 2013 at 4:52 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Jan 22, 2013 at 09:00:25PM +0300, Andrey Korolyov wrote:
>> Hi,
>>
>> problem described in the title happens on heavy I/O pressure on the
>> host, without idle=poll trace almost always is the same, involving
>> mwait, with poll and nohz=off RIP varies from time to time, at the
>> previous hang it was tg_throttle_down, rather than test_ti_thread_flag
>> in attached one. Both possible clocksource drivers, hpet and tsc, able
>> to reproduce that with equal probability. VMs are pinned over one of
>> two numa sets on two-head machine, mean emulator thread and each of
>> vcpu threads has its own cpuset cg with '0-5,12-17' or '6-11,18-23'.
>> I`ll appreciate any suggestions to try.
>
> Andrey,
>
> Can you reproduce with an upstream kernel? Commit
> 5cfc2aabcb282f fixes a livelock.
>
>  d2 75 c3 eb 03 41 89 c6 48 83 c4 18 44 89 f0 5b 5d 41 5c 41 5d 41 5e 41
> 5f c3 <31> c0 c3 48 63 ff 48 c7 c2 80 37 01 00 48 8b 0c fd e0 d6 68 81
> [12738.508644] Call Trace:
> [12738.508648]  [<ffffffff81035a66>] ? walk_tg_tree_from+0x70/0x99
> [12738.508652]  [<ffffffff81014c03>] ? __switch_to_xtra+0x14c/0x160
> [12738.508656]  [<ffffffff8103bcce>] ? throttle_cfs_rq+0x4d/0x109
> [12738.508660]  [<ffffffff8103be70>] ? put_prev_task_fair+0x3f/0x65
> [12738.508663]  [<ffffffff8134c8ae>] ? __schedule+0x32e/0x5c3
> [12738.508666]  [<ffffffff8134ceee>] ? yield_to+0xfa/0x10c
> [12738.508669]  [<ffffffff8105d5af>] ? atomic_inc+0x3/0x4
> [12738.508678]  [<ffffffffa03a8fc4>] ? kvm_vcpu_on_spin+0x8c/0xf7 [kvm]
> [12738.508684]  [<ffffffffa030602f>] ? handle_pause+0x11/0x18

[-- Attachment #2: dmesg.txt.gz --]
[-- Type: application/x-gzip, Size: 3165 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-24 10:54   ` Andrey Korolyov
@ 2013-01-24 12:20     ` Marcelo Tosatti
  2013-01-25  7:45       ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2013-01-24 12:20 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: kvm

On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
> Thank you Marcelo,
> 
> Host node locking up sometimes later than yesterday, bur problem still
> here, please see attached dmesg. Stuck process looks like
> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
> virtio-blk-pci,? -device
> 
> on fourth vm by count.
> 
> Should I try upstream kernel instead of applying patch to the latest
> 3.4 or it is useless?

If you can upgrade to an upstream kernel, please do that.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-24 12:20     ` Marcelo Tosatti
@ 2013-01-25  7:45       ` Andrey Korolyov
  2013-01-25 20:49         ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-25  7:45 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>> Thank you Marcelo,
>>
>> Host node locking up sometimes later than yesterday, bur problem still
>> here, please see attached dmesg. Stuck process looks like
>> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
>> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>> virtio-blk-pci,? -device
>>
>> on fourth vm by count.
>>
>> Should I try upstream kernel instead of applying patch to the latest
>> 3.4 or it is useless?
>
> If you can upgrade to an upstream kernel, please do that.
>

With vanilla 3.7.4 there is almost no changes, and NMI started firing
again. External symptoms looks like following: starting from some
count, may be third or sixth vm, qemu-kvm process allocating its
memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
kill stuck kvm processes and node returned back to the normal, when on
3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
output (problem and workaround when no scheduler involved described
here http://www.spinics.net/lists/kvm/msg84799.html).

[-- Attachment #2: dmesg-3.7.4.txt.gz --]
[-- Type: application/x-gzip, Size: 16656 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-25  7:45       ` Andrey Korolyov
@ 2013-01-25 20:49         ` Marcelo Tosatti
  2013-01-27 21:04           ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2013-01-25 20:49 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: kvm

On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
> >> Thank you Marcelo,
> >>
> >> Host node locking up sometimes later than yesterday, bur problem still
> >> here, please see attached dmesg. Stuck process looks like
> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
> >> virtio-blk-pci,? -device
> >>
> >> on fourth vm by count.
> >>
> >> Should I try upstream kernel instead of applying patch to the latest
> >> 3.4 or it is useless?
> >
> > If you can upgrade to an upstream kernel, please do that.
> >
> 
> With vanilla 3.7.4 there is almost no changes, and NMI started firing
> again. External symptoms looks like following: starting from some
> count, may be third or sixth vm, qemu-kvm process allocating its
> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
> kill stuck kvm processes and node returned back to the normal, when on
> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
> output (problem and workaround when no scheduler involved described
> here http://www.spinics.net/lists/kvm/msg84799.html).

Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-25 20:49         ` Marcelo Tosatti
@ 2013-01-27 21:04           ` Andrey Korolyov
       [not found]             ` <20130127231447.GB14721@amt.cnet>
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-27 21:04 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>> >> Thank you Marcelo,
>> >>
>> >> Host node locking up sometimes later than yesterday, bur problem still
>> >> here, please see attached dmesg. Stuck process looks like
>> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
>> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>> >> virtio-blk-pci,? -device
>> >>
>> >> on fourth vm by count.
>> >>
>> >> Should I try upstream kernel instead of applying patch to the latest
>> >> 3.4 or it is useless?
>> >
>> > If you can upgrade to an upstream kernel, please do that.
>> >
>>
>> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>> again. External symptoms looks like following: starting from some
>> count, may be third or sixth vm, qemu-kvm process allocating its
>> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>> kill stuck kvm processes and node returned back to the normal, when on
>> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>> output (problem and workaround when no scheduler involved described
>> here http://www.spinics.net/lists/kvm/msg84799.html).
>
> Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.
>

Hi Marcelo,

thanks, this parameter helped to increase number of working VMs in a
half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
to 15 percents, persists on such numbers for a long time, where linux
guests in same configuration do not jump over one percent even under
stress bench. After I disabled HT, crash happens only in long runs and
now it is kernel panic :)
Stair-like memory allocation behaviour disappeared, but other symptom
leading to the crash which I have not counted previously, persists: if
VM count is ``enough'' for crash, some qemu processes starting to eat
one core, and they`ll panic system after run in tens of minutes in
such state or if I try to attach debugger to one of them. If needed, I
can log entire crash output via netconsole, now I have some tail,
almost the same every time:
http://xdel.ru/downloads/btwin.png

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
       [not found]             ` <20130127231447.GB14721@amt.cnet>
@ 2013-01-28 13:56               ` Andrey Korolyov
  2013-01-28 23:35                 ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-28 13:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>> >> >> Thank you Marcelo,
>> >> >>
>> >> >> Host node locking up sometimes later than yesterday, bur problem still
>> >> >> here, please see attached dmesg. Stuck process looks like
>> >> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>> >> >> virtio-blk-pci,? -device
>> >> >>
>> >> >> on fourth vm by count.
>> >> >>
>> >> >> Should I try upstream kernel instead of applying patch to the latest
>> >> >> 3.4 or it is useless?
>> >> >
>> >> > If you can upgrade to an upstream kernel, please do that.
>> >> >
>> >>
>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>> >> again. External symptoms looks like following: starting from some
>> >> count, may be third or sixth vm, qemu-kvm process allocating its
>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>> >> kill stuck kvm processes and node returned back to the normal, when on
>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>> >> output (problem and workaround when no scheduler involved described
>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
>> >
>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.
>> >
>>
>> Hi Marcelo,
>>
>> thanks, this parameter helped to increase number of working VMs in a
>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
>> to 15 percents, persists on such numbers for a long time, where linux
>> guests in same configuration do not jump over one percent even under
>> stress bench. After I disabled HT, crash happens only in long runs and
>> now it is kernel panic :)
>> Stair-like memory allocation behaviour disappeared, but other symptom
>> leading to the crash which I have not counted previously, persists: if
>> VM count is ``enough'' for crash, some qemu processes starting to eat
>> one core, and they`ll panic system after run in tens of minutes in
>> such state or if I try to attach debugger to one of them. If needed, I
>> can log entire crash output via netconsole, now I have some tail,
>> almost the same every time:
>> http://xdel.ru/downloads/btwin.png
>
> Yes, please log entire crash output, thanks.
>

Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:

http://xdel.ru/downloads/oops-default-kvmintel.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-28 13:56               ` Andrey Korolyov
@ 2013-01-28 23:35                 ` Andrey Korolyov
       [not found]                   ` <20130129231546.GA29904@amt.cnet>
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-28 23:35 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
>>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>>> >> >> Thank you Marcelo,
>>> >> >>
>>> >> >> Host node locking up sometimes later than yesterday, bur problem still
>>> >> >> here, please see attached dmesg. Stuck process looks like
>>> >> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
>>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>>> >> >> virtio-blk-pci,? -device
>>> >> >>
>>> >> >> on fourth vm by count.
>>> >> >>
>>> >> >> Should I try upstream kernel instead of applying patch to the latest
>>> >> >> 3.4 or it is useless?
>>> >> >
>>> >> > If you can upgrade to an upstream kernel, please do that.
>>> >> >
>>> >>
>>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>>> >> again. External symptoms looks like following: starting from some
>>> >> count, may be third or sixth vm, qemu-kvm process allocating its
>>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>>> >> kill stuck kvm processes and node returned back to the normal, when on
>>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>>> >> output (problem and workaround when no scheduler involved described
>>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
>>> >
>>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.
>>> >
>>>
>>> Hi Marcelo,
>>>
>>> thanks, this parameter helped to increase number of working VMs in a
>>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
>>> to 15 percents, persists on such numbers for a long time, where linux
>>> guests in same configuration do not jump over one percent even under
>>> stress bench. After I disabled HT, crash happens only in long runs and
>>> now it is kernel panic :)
>>> Stair-like memory allocation behaviour disappeared, but other symptom
>>> leading to the crash which I have not counted previously, persists: if
>>> VM count is ``enough'' for crash, some qemu processes starting to eat
>>> one core, and they`ll panic system after run in tens of minutes in
>>> such state or if I try to attach debugger to one of them. If needed, I
>>> can log entire crash output via netconsole, now I have some tail,
>>> almost the same every time:
>>> http://xdel.ru/downloads/btwin.png
>>
>> Yes, please log entire crash output, thanks.
>>
>
> Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:
>
> http://xdel.ru/downloads/oops-default-kvmintel.txt

Just an update: I was able to reproduce that on pure linux VMs using
qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at
start of vm(with count ten working machines at the moment). Qemu-1.1.2
generally is not able to reproduce that, but host node with older
version crashing on less amount of Windows VMs(three to six instead
ten to fifteen) than with 1.3, please see trace below:

http://xdel.ru/downloads/oops-old-qemu.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
       [not found]                   ` <20130129231546.GA29904@amt.cnet>
@ 2013-01-30  8:21                     ` Andrey Korolyov
  2013-01-30 20:11                       ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-30  8:21 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

On Wed, Jan 30, 2013 at 3:15 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Tue, Jan 29, 2013 at 02:35:02AM +0300, Andrey Korolyov wrote:
>> On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> > On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
>> >>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>> >>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>> >>> >> >> Thank you Marcelo,
>> >>> >> >>
>> >>> >> >> Host node locking up sometimes later than yesterday, bur problem still
>> >>> >> >> here, please see attached dmesg. Stuck process looks like
>> >>> >> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
>> >>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>> >>> >> >> virtio-blk-pci,? -device
>> >>> >> >>
>> >>> >> >> on fourth vm by count.
>> >>> >> >>
>> >>> >> >> Should I try upstream kernel instead of applying patch to the latest
>> >>> >> >> 3.4 or it is useless?
>> >>> >> >
>> >>> >> > If you can upgrade to an upstream kernel, please do that.
>> >>> >> >
>> >>> >>
>> >>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>> >>> >> again. External symptoms looks like following: starting from some
>> >>> >> count, may be third or sixth vm, qemu-kvm process allocating its
>> >>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>> >>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>> >>> >> kill stuck kvm processes and node returned back to the normal, when on
>> >>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>> >>> >> output (problem and workaround when no scheduler involved described
>> >>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
>> >>> >
>> >>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.
>> >>> >
>> >>>
>> >>> Hi Marcelo,
>> >>>
>> >>> thanks, this parameter helped to increase number of working VMs in a
>> >>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
>> >>> to 15 percents, persists on such numbers for a long time, where linux
>> >>> guests in same configuration do not jump over one percent even under
>> >>> stress bench. After I disabled HT, crash happens only in long runs and
>> >>> now it is kernel panic :)
>> >>> Stair-like memory allocation behaviour disappeared, but other symptom
>> >>> leading to the crash which I have not counted previously, persists: if
>> >>> VM count is ``enough'' for crash, some qemu processes starting to eat
>> >>> one core, and they`ll panic system after run in tens of minutes in
>> >>> such state or if I try to attach debugger to one of them. If needed, I
>> >>> can log entire crash output via netconsole, now I have some tail,
>> >>> almost the same every time:
>> >>> http://xdel.ru/downloads/btwin.png
>> >>
>> >> Yes, please log entire crash output, thanks.
>> >>
>> >
>> > Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:
>> >
>> > http://xdel.ru/downloads/oops-default-kvmintel.txt
>>
>> Just an update: I was able to reproduce that on pure linux VMs using
>> qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at
>> start of vm(with count ten working machines at the moment). Qemu-1.1.2
>> generally is not able to reproduce that, but host node with older
>> version crashing on less amount of Windows VMs(three to six instead
>> ten to fifteen) than with 1.3, please see trace below:
>>
>> http://xdel.ru/downloads/oops-old-qemu.txt
>
> Single bit memory error, apparently. Try:
>
> 1. memtest86.
> 2. Boot with slub_debug=ZFPU kernel parameter.
> 3. Reproduce on different machine
>
>

Hi Marcelo,

I always follow the rule - if some weird bug exists, check it on
ECC-enabled machine and check IPMI logs too before start complaining
:) I have finally managed to ``fix'' the problem, but my solution
seems a bit strange:
- I have noticed that if virtual machines started without any cgroup
setting they will not cause this bug under any conditions,
- I have thought, very wrong in my mind, that the
CONFIG_SCHED_AUTOGROUP should regroup the tasks without any cgroup and
should not touch tasks already inside any existing cpu cgroup. First
sight on the 200-line patch shows that the autogrouping always applies
to all tasks, so I tried to disable it,
- wild magic appears - VMs didn`t crashed host any more, even in count
30+ they work fine.
I still don`t know what exactly triggered that and will I face it
again under different conditions, so my solution more likely to be a
patch of mud in wall of the dam, instead of proper fixing.

There seems to be two possible origins of such error - a very very
hideous race condition involving cgroups and processes like qemu-kvm
causing frequent context switches and simple incompatibility between
NUMA, logic of CONFIG_SCHED_AUTOGROUP and qemu VMs already doing work
in the cgroup, since I have not observed this errors on single numa
node(mean, desktop) on relatively heavier condition.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-30  8:21                     ` Andrey Korolyov
@ 2013-01-30 20:11                       ` Marcelo Tosatti
  2013-01-31 17:40                         ` Andrey Korolyov
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2013-01-30 20:11 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: kvm

On Wed, Jan 30, 2013 at 11:21:08AM +0300, Andrey Korolyov wrote:
> On Wed, Jan 30, 2013 at 3:15 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Tue, Jan 29, 2013 at 02:35:02AM +0300, Andrey Korolyov wrote:
> >> On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
> >> > On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >> >> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
> >> >>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >> >>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
> >> >>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >> >>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
> >> >>> >> >> Thank you Marcelo,
> >> >>> >> >>
> >> >>> >> >> Host node locking up sometimes later than yesterday, bur problem still
> >> >>> >> >> here, please see attached dmesg. Stuck process looks like
> >> >>> >> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
> >> >>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
> >> >>> >> >> virtio-blk-pci,? -device
> >> >>> >> >>
> >> >>> >> >> on fourth vm by count.
> >> >>> >> >>
> >> >>> >> >> Should I try upstream kernel instead of applying patch to the latest
> >> >>> >> >> 3.4 or it is useless?
> >> >>> >> >
> >> >>> >> > If you can upgrade to an upstream kernel, please do that.
> >> >>> >> >
> >> >>> >>
> >> >>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
> >> >>> >> again. External symptoms looks like following: starting from some
> >> >>> >> count, may be third or sixth vm, qemu-kvm process allocating its
> >> >>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
> >> >>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
> >> >>> >> kill stuck kvm processes and node returned back to the normal, when on
> >> >>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
> >> >>> >> output (problem and workaround when no scheduler involved described
> >> >>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
> >> >>> >
> >> >>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.
> >> >>> >
> >> >>>
> >> >>> Hi Marcelo,
> >> >>>
> >> >>> thanks, this parameter helped to increase number of working VMs in a
> >> >>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
> >> >>> to 15 percents, persists on such numbers for a long time, where linux
> >> >>> guests in same configuration do not jump over one percent even under
> >> >>> stress bench. After I disabled HT, crash happens only in long runs and
> >> >>> now it is kernel panic :)
> >> >>> Stair-like memory allocation behaviour disappeared, but other symptom
> >> >>> leading to the crash which I have not counted previously, persists: if
> >> >>> VM count is ``enough'' for crash, some qemu processes starting to eat
> >> >>> one core, and they`ll panic system after run in tens of minutes in
> >> >>> such state or if I try to attach debugger to one of them. If needed, I
> >> >>> can log entire crash output via netconsole, now I have some tail,
> >> >>> almost the same every time:
> >> >>> http://xdel.ru/downloads/btwin.png
> >> >>
> >> >> Yes, please log entire crash output, thanks.
> >> >>
> >> >
> >> > Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:
> >> >
> >> > http://xdel.ru/downloads/oops-default-kvmintel.txt
> >>
> >> Just an update: I was able to reproduce that on pure linux VMs using
> >> qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at
> >> start of vm(with count ten working machines at the moment). Qemu-1.1.2
> >> generally is not able to reproduce that, but host node with older
> >> version crashing on less amount of Windows VMs(three to six instead
> >> ten to fifteen) than with 1.3, please see trace below:
> >>
> >> http://xdel.ru/downloads/oops-old-qemu.txt
> >
> > Single bit memory error, apparently. Try:
> >
> > 1. memtest86.
> > 2. Boot with slub_debug=ZFPU kernel parameter.
> > 3. Reproduce on different machine
> >
> >
> 
> Hi Marcelo,
> 
> I always follow the rule - if some weird bug exists, check it on
> ECC-enabled machine and check IPMI logs too before start complaining
> :) I have finally managed to ``fix'' the problem, but my solution
> seems a bit strange:
> - I have noticed that if virtual machines started without any cgroup
> setting they will not cause this bug under any conditions,
> - I have thought, very wrong in my mind, that the
> CONFIG_SCHED_AUTOGROUP should regroup the tasks without any cgroup and
> should not touch tasks already inside any existing cpu cgroup. First
> sight on the 200-line patch shows that the autogrouping always applies
> to all tasks, so I tried to disable it,
> - wild magic appears - VMs didn`t crashed host any more, even in count
> 30+ they work fine.
> I still don`t know what exactly triggered that and will I face it
> again under different conditions, so my solution more likely to be a
> patch of mud in wall of the dam, instead of proper fixing.
> 
> There seems to be two possible origins of such error - a very very
> hideous race condition involving cgroups and processes like qemu-kvm
> causing frequent context switches and simple incompatibility between
> NUMA, logic of CONFIG_SCHED_AUTOGROUP and qemu VMs already doing work
> in the cgroup, since I have not observed this errors on single numa
> node(mean, desktop) on relatively heavier condition.

Yes, it would be important to track it down though. Enabling 
slub_debug=ZFPU kernel parameter should help.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: windows 2008 guest causing rcu_shed to emit NMI
  2013-01-30 20:11                       ` Marcelo Tosatti
@ 2013-01-31 17:40                         ` Andrey Korolyov
  0 siblings, 0 replies; 12+ messages in thread
From: Andrey Korolyov @ 2013-01-31 17:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, libvirt-users

On Thu, Jan 31, 2013 at 12:11 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Wed, Jan 30, 2013 at 11:21:08AM +0300, Andrey Korolyov wrote:
>> On Wed, Jan 30, 2013 at 3:15 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Tue, Jan 29, 2013 at 02:35:02AM +0300, Andrey Korolyov wrote:
>> >> On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov <andrey@xdel.ru> wrote:
>> >> > On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> >> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote:
>> >> >>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> >>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
>> >> >>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> >> >>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
>> >> >>> >> >> Thank you Marcelo,
>> >> >>> >> >>
>> >> >>> >> >> Host node locking up sometimes later than yesterday, bur problem still
>> >> >>> >> >> here, please see attached dmesg. Stuck process looks like
>> >> >>> >> >> root     19251  0.0  0.0 228476 12488 ?        D    14:42   0:00
>> >> >>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
>> >> >>> >> >> virtio-blk-pci,? -device
>> >> >>> >> >>
>> >> >>> >> >> on fourth vm by count.
>> >> >>> >> >>
>> >> >>> >> >> Should I try upstream kernel instead of applying patch to the latest
>> >> >>> >> >> 3.4 or it is useless?
>> >> >>> >> >
>> >> >>> >> > If you can upgrade to an upstream kernel, please do that.
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing
>> >> >>> >> again. External symptoms looks like following: starting from some
>> >> >>> >> count, may be third or sixth vm, qemu-kvm process allocating its
>> >> >>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
>> >> >>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
>> >> >>> >> kill stuck kvm processes and node returned back to the normal, when on
>> >> >>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
>> >> >>> >> output (problem and workaround when no scheduler involved described
>> >> >>> >> here http://www.spinics.net/lists/kvm/msg84799.html).
>> >> >>> >
>> >> >>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.
>> >> >>> >
>> >> >>>
>> >> >>> Hi Marcelo,
>> >> >>>
>> >> >>> thanks, this parameter helped to increase number of working VMs in a
>> >> >>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10
>> >> >>> to 15 percents, persists on such numbers for a long time, where linux
>> >> >>> guests in same configuration do not jump over one percent even under
>> >> >>> stress bench. After I disabled HT, crash happens only in long runs and
>> >> >>> now it is kernel panic :)
>> >> >>> Stair-like memory allocation behaviour disappeared, but other symptom
>> >> >>> leading to the crash which I have not counted previously, persists: if
>> >> >>> VM count is ``enough'' for crash, some qemu processes starting to eat
>> >> >>> one core, and they`ll panic system after run in tens of minutes in
>> >> >>> such state or if I try to attach debugger to one of them. If needed, I
>> >> >>> can log entire crash output via netconsole, now I have some tail,
>> >> >>> almost the same every time:
>> >> >>> http://xdel.ru/downloads/btwin.png
>> >> >>
>> >> >> Yes, please log entire crash output, thanks.
>> >> >>
>> >> >
>> >> > Here please, 3.7.4-vanilla, 16 vms, ple_gap=0:
>> >> >
>> >> > http://xdel.ru/downloads/oops-default-kvmintel.txt
>> >>
>> >> Just an update: I was able to reproduce that on pure linux VMs using
>> >> qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at
>> >> start of vm(with count ten working machines at the moment). Qemu-1.1.2
>> >> generally is not able to reproduce that, but host node with older
>> >> version crashing on less amount of Windows VMs(three to six instead
>> >> ten to fifteen) than with 1.3, please see trace below:
>> >>
>> >> http://xdel.ru/downloads/oops-old-qemu.txt
>> >
>> > Single bit memory error, apparently. Try:
>> >
>> > 1. memtest86.
>> > 2. Boot with slub_debug=ZFPU kernel parameter.
>> > 3. Reproduce on different machine
>> >
>> >
>>
>> Hi Marcelo,
>>
>> I always follow the rule - if some weird bug exists, check it on
>> ECC-enabled machine and check IPMI logs too before start complaining
>> :) I have finally managed to ``fix'' the problem, but my solution
>> seems a bit strange:
>> - I have noticed that if virtual machines started without any cgroup
>> setting they will not cause this bug under any conditions,
>> - I have thought, very wrong in my mind, that the
>> CONFIG_SCHED_AUTOGROUP should regroup the tasks without any cgroup and
>> should not touch tasks already inside any existing cpu cgroup. First
>> sight on the 200-line patch shows that the autogrouping always applies
>> to all tasks, so I tried to disable it,
>> - wild magic appears - VMs didn`t crashed host any more, even in count
>> 30+ they work fine.
>> I still don`t know what exactly triggered that and will I face it
>> again under different conditions, so my solution more likely to be a
>> patch of mud in wall of the dam, instead of proper fixing.
>>
>> There seems to be two possible origins of such error - a very very
>> hideous race condition involving cgroups and processes like qemu-kvm
>> causing frequent context switches and simple incompatibility between
>> NUMA, logic of CONFIG_SCHED_AUTOGROUP and qemu VMs already doing work
>> in the cgroup, since I have not observed this errors on single numa
>> node(mean, desktop) on relatively heavier condition.
>
> Yes, it would be important to track it down though. Enabling
> slub_debug=ZFPU kernel parameter should help.
>
>

Hi Marcelo,

I have finally beat that one. As I have mentioned before in the
off-list message, nested cgroups for vcpu/emulator threads created by
libvirt was a root cause of this problem. Today we`ve disabled
creation of cgroup deeper than qemu/vm/ level and trace didn`t showed
up under different workloads. So for libvirt itself, it may be a
feature request to create thread-based cgroups iff any element of the
VM` config requires that. As for cgroups, seems it is fatal to have
very large amount of nested elements inside cpu on qemu-kvm, or on
very large amount of threads - since I have limited core amount on
each node, I can`t prove what exactly, complicated cgroup hierarchy or
some side effects putting threads on the dedicated cgroup, caused all
this pain. And, of course, without Windows(tm) bug is very hard to
observe in the wild, since almost no synthetic test I have put on the
linux VMs is able to show it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-01-31 17:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-22 18:00 windows 2008 guest causing rcu_shed to emit NMI Andrey Korolyov
2013-01-24  0:52 ` Marcelo Tosatti
2013-01-24 10:54   ` Andrey Korolyov
2013-01-24 12:20     ` Marcelo Tosatti
2013-01-25  7:45       ` Andrey Korolyov
2013-01-25 20:49         ` Marcelo Tosatti
2013-01-27 21:04           ` Andrey Korolyov
     [not found]             ` <20130127231447.GB14721@amt.cnet>
2013-01-28 13:56               ` Andrey Korolyov
2013-01-28 23:35                 ` Andrey Korolyov
     [not found]                   ` <20130129231546.GA29904@amt.cnet>
2013-01-30  8:21                     ` Andrey Korolyov
2013-01-30 20:11                       ` Marcelo Tosatti
2013-01-31 17:40                         ` Andrey Korolyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox