* Xen paravirt frontend block hang
@ 2007-12-26 19:33 Christopher S. Aker
2007-12-28 23:14 ` Jeremy Fitzhardinge
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Christopher S. Aker @ 2007-12-26 19:33 UTC (permalink / raw)
To: virtualization
Sorry for the noise if this isn't the appropriate venue for this. I
posted this last month to xen-devel:
http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
I can reliably cause a paravirt_ops Xen guest to hang during intensive
IO. My current recipe is an untar/tar loop, without compression, of a
kernel tree. For example:
wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
bzip2 -d linux-2.6.23.tar.bz2
while true;
echo `date`
tar xf linux-2.6.23.tar
tar cf linux-2.6.23.tar linux-2.6.23
done
After a few loops, anything that touches the xvd device that hung will
get stuck in D state.
This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
from 3.1.2. In all cases, the host continues to run fine, nothing out
of the ordinary is logged on the dom0 side, xenstore reports the status
of the devices is fine.
Can anyone reproduce this problem, or let me know what else I can
provide to help track this down?
Thanks,
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2007-12-26 19:33 Xen paravirt frontend block hang Christopher S. Aker
@ 2007-12-28 23:14 ` Jeremy Fitzhardinge
2007-12-29 1:12 ` Christopher S. Aker
2008-01-29 0:22 ` Christopher S. Aker
2008-02-28 20:00 ` Jeremy Fitzhardinge
[not found] ` <47C712EF.1060703@goop.org>
2 siblings, 2 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2007-12-28 23:14 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: virtualization
Christopher S. Aker wrote:
> Sorry for the noise if this isn't the appropriate venue for this. I
> posted this last month to xen-devel:
>
> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
>
> I can reliably cause a paravirt_ops Xen guest to hang during intensive
> IO. My current recipe is an untar/tar loop, without compression, of a
> kernel tree. For example:
>
> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
> bzip2 -d linux-2.6.23.tar.bz2
>
> while true;
> echo `date`
> tar xf linux-2.6.23.tar
> tar cf linux-2.6.23.tar linux-2.6.23
> done
>
> After a few loops, anything that touches the xvd device that hung will
> get stuck in D state.
>
> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
> 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
> from 3.1.2. In all cases, the host continues to run fine, nothing out
> of the ordinary is logged on the dom0 side, xenstore reports the
> status of the devices is fine.
>
> Can anyone reproduce this problem, or let me know what else I can
> provide to help track this down?
Hi,
I'll try to track this down asap. Have you tried any other kernel
versions? In other words, did it just start happening, or its always
done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't
already been fixed (which is possible if its something that happened in
a higher layer or something).
Thanks,
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2007-12-28 23:14 ` Jeremy Fitzhardinge
@ 2007-12-29 1:12 ` Christopher S. Aker
2007-12-29 6:00 ` Jeremy Fitzhardinge
2008-01-29 0:22 ` Christopher S. Aker
1 sibling, 1 reply; 18+ messages in thread
From: Christopher S. Aker @ 2007-12-29 1:12 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: virtualization
Jeremy Fitzhardinge wrote:
> I'll try to track this down asap. Have you tried any other kernel
> versions? In other words, did it just start happening, or its always
> done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't
> already been fixed (which is possible if its something that happened in
> a higher layer or something).
I've only just recently started using the paravirt_ops kernels, but all
the ones I've tried have exhibited the problem, so I'm not sure when the
hang was introduced. It doesn't happen with a xensource 2.6.18 tree domU.
2.6.23.8, 2.6.23.12, and 2.6.24-rc6 <-- pv_ops kernels I've tried that
have the hang.
Thanks!
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2007-12-29 1:12 ` Christopher S. Aker
@ 2007-12-29 6:00 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2007-12-29 6:00 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: virtualization
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> I'll try to track this down asap. Have you tried any other kernel
>> versions? In other words, did it just start happening, or its always
>> done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't
>> already been fixed (which is possible if its something that happened in
>> a higher layer or something).
>
> I've only just recently started using the paravirt_ops kernels, but
> all the ones I've tried have exhibited the problem, so I'm not sure
> when the hang was introduced. It doesn't happen with a xensource
> 2.6.18 tree domU.
>
> 2.6.23.8, 2.6.23.12, and 2.6.24-rc6 <-- pv_ops kernels I've tried that
> have the hang.
Yeah, its all fairly new code. But the pvops blockfront shouldn't have
any functional differences from the 2.6.18 version of the driver, so
something seems to have got broken along the way.
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2007-12-28 23:14 ` Jeremy Fitzhardinge
2007-12-29 1:12 ` Christopher S. Aker
@ 2008-01-29 0:22 ` Christopher S. Aker
2008-01-29 0:40 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 18+ messages in thread
From: Christopher S. Aker @ 2008-01-29 0:22 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: virtualization
Jeremy Fitzhardinge wrote:
> Christopher S. Aker wrote on 12/26/07 2:33 PM,
>> Sorry for the noise if this isn't the appropriate venue for this. I
>> posted this last month to xen-devel:
>>
>> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
>>
>> I can reliably cause a paravirt_ops Xen guest to hang during intensive
>> IO. My current recipe is an untar/tar loop, without compression, of a
>> kernel tree. For example:
>>
>> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
>> bzip2 -d linux-2.6.23.tar.bz2
>>
>> while true;
>> date
>> tar xf linux-2.6.23.tar
>> tar cf linux-2.6.23.tar linux-2.6.23
>> done
>>
>> After a few loops, anything that touches the xvd device that hung will
>> get stuck in D state.
>>
>> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
>> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
>> 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
>> from 3.1.2. In all cases, the host continues to run fine, nothing out
>> of the ordinary is logged on the dom0 side, xenstore reports the
>> status of the devices is fine.
>>
>> Can anyone reproduce this problem, or let me know what else I can
>> provide to help track this down?
>
> Hi,
>
> I'll try to track this down asap. Have you tried any other kernel
> versions? In other words, did it just start happening, or its always
> done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't
> already been fixed (which is possible if its something that happened in
> a higher layer or something).
Were you able to give this a try? Still doing it on pv_ops 2.6.24.
Thanks,
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-01-29 0:22 ` Christopher S. Aker
@ 2008-01-29 0:40 ` Jeremy Fitzhardinge
2008-01-29 1:04 ` Christopher S. Aker
0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-29 0:40 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: xming, virtualization
Christopher S. Aker wrote:
>>
>> I'll try to track this down asap. Have you tried any other kernel
>> versions? In other words, did it just start happening, or its always
>> done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't
>> already been fixed (which is possible if its something that happened in
>> a higher layer or something).
>
> Were you able to give this a try? Still doing it on pv_ops 2.6.24.
Hm. xming reported similar symtoms to your report, which turned out to
be a result of problems with events getting lost. This patch - which is
in 2.6.24 - resolved the issue. Stock 2.6.24 still has this problem for
you? Can you also reproduce it with lots of console output?
Thanks,
J
changeset: 76060:66bba82b6e9b
parent: 76048:e10ad8f96525
user: Jeremy Fitzhardinge <jeremy@goop.org>
date: Wed Jan 23 18:04:54 2008 -0800
files: arch/x86/xen/enlighten.c
description:
xen: disable vcpu_info placement for now
There have been several reports of Xen guest domains locking up when
using vcpu_info structure placement. Disable it for now.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
committer: Linus Torvalds <torvalds@woody.linux-foundation.org>
diff -r e10ad8f96525 -r 66bba82b6e9b arch/x86/xen/enlighten.c
--- a/arch/x86/xen/enlighten.c Wed Jan 23 09:58:55 2008 -0800
+++ b/arch/x86/xen/enlighten.c Wed Jan 23 18:04:54 2008 -0800
@@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in
*
* 0: not available, 1: available
*/
-static int have_vcpu_info_placement = 1;
+static int have_vcpu_info_placement = 0;
static void __init xen_vcpu_setup(int cpu)
{
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-01-29 0:40 ` Jeremy Fitzhardinge
@ 2008-01-29 1:04 ` Christopher S. Aker
2008-02-06 12:37 ` xming
0 siblings, 1 reply; 18+ messages in thread
From: Christopher S. Aker @ 2008-01-29 1:04 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xming, virtualization
Jeremy Fitzhardinge wrote:
> Hm. xming reported similar symtoms to your report, which turned out to
> be a result of problems with events getting lost. This patch - which is
> in 2.6.24 - resolved the issue. Stock 2.6.24 still has this problem for
> you? Can you also reproduce it with lots of console output?
Looks like my tree has this changeset already (stock 2.6.24), so that's
not it :/ ... I'll try flooding the console with output.
For what it's worth, it does it on both non-smp and smp compiled domU
kernels. smp kernels continue to function after triggering the hang,
however one CPU gets stuck in iowait -- but I am able to continue to
read from the xvd device.
Are you able to reproduce it with the shell script I provided? It takes
maybe 3 or 4 times through the loop to trigger.
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-01-29 1:04 ` Christopher S. Aker
@ 2008-02-06 12:37 ` xming
2008-02-07 4:09 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 18+ messages in thread
From: xming @ 2008-02-06 12:37 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: virtualization
On Jan 29, 2008 2:04 AM, Christopher S. Aker <caker@theshore.net> wrote:
> Jeremy Fitzhardinge wrote:
> > Hm. xming reported similar symtoms to your report, which turned out to
> > be a result of problems with events getting lost. This patch - which is
> > in 2.6.24 - resolved the issue. Stock 2.6.24 still has this problem for
> > you? Can you also reproduce it with lots of console output?
>
> Looks like my tree has this changeset already (stock 2.6.24), so that's
> not it :/ ... I'll try flooding the console with output.
>
> For what it's worth, it does it on both non-smp and smp compiled domU
> kernels. smp kernels continue to function after triggering the hang,
> however one CPU gets stuck in iowait -- but I am able to continue to
> read from the xvd device.
>
> Are you able to reproduce it with the shell script I provided? It takes
> maybe 3 or 4 times through the loop to trigger.
>
> -Chris
I cannot trigger this neither with my tests
(http://marc.info/?l=linux-kernel&m=120066005505315&w=2)
nor with your test after 12 loops:
Wed Feb 6 13:20:41 CET 2008
Wed Feb 6 13:21:31 CET 2008
Wed Feb 6 13:22:39 CET 2008
Wed Feb 6 13:23:46 CET 2008
Wed Feb 6 13:24:53 CET 2008
Wed Feb 6 13:26:04 CET 2008
Wed Feb 6 13:27:13 CET 2008
Wed Feb 6 13:28:19 CET 2008
Wed Feb 6 13:29:31 CET 2008
Wed Feb 6 13:30:40 CET 2008
Wed Feb 6 13:31:52 CET 2008
Wed Feb 6 13:33:02 CET 2008
But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
not boot any more and 2.6.24 does boot but will hang after cpufreq changes
the frequency.
x.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-02-06 12:37 ` xming
@ 2008-02-07 4:09 ` Jeremy Fitzhardinge
2008-02-07 14:12 ` xming
0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-07 4:09 UTC (permalink / raw)
To: xming; +Cc: Keir Fraser, virtualization
xming wrote:
> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
> not boot any more and 2.6.24 does boot but will hang after cpufreq changes
> the frequency.
>
Interesting. Do you mean dom0 cpufreq frequency changes will cause the
domU to hang?
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-02-07 4:09 ` Jeremy Fitzhardinge
@ 2008-02-07 14:12 ` xming
2008-02-28 20:03 ` Jeremy Fitzhardinge
2008-03-18 16:02 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 18+ messages in thread
From: xming @ 2008-02-07 14:12 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Keir Fraser, virtualization
On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> xming wrote:
> > But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
> > not boot any more and 2.6.24 does boot but will hang after cpufreq changes
> > the frequency.
> >
>
> Interesting. Do you mean dom0 cpufreq frequency changes will cause the
> domU to hang?
>
> J
>
Yes, when Dom0 changes freq while domU is doing something will trigger this.
When using "on demand" will trigger this very eassily.
This is from xm top when a domU hangs:
test32 ------ 4018 98.8 131072 6.4 131072
6.4 1 1 4516 50087 1 0 433908 300403
3084907223
So it appers to be running (eating CPU) sometimes the state is "r"
sometimes "-",
but both console and network are dead.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2007-12-26 19:33 Xen paravirt frontend block hang Christopher S. Aker
2007-12-28 23:14 ` Jeremy Fitzhardinge
@ 2008-02-28 20:00 ` Jeremy Fitzhardinge
[not found] ` <47C712EF.1060703@goop.org>
2 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-28 20:00 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization
[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]
Christopher S. Aker wrote:
> Sorry for the noise if this isn't the appropriate venue for this. I
> posted this last month to xen-devel:
>
> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
>
> I can reliably cause a paravirt_ops Xen guest to hang during intensive
> IO. My current recipe is an untar/tar loop, without compression, of a
> kernel tree. For example:
>
> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
> bzip2 -d linux-2.6.23.tar.bz2
>
> while true;
> echo `date`
> tar xf linux-2.6.23.tar
> tar cf linux-2.6.23.tar linux-2.6.23
> done
>
> After a few loops, anything that touches the xvd device that hung will
> get stuck in D state.
I've been running this all night without seeing any problem. I'm using
current x86.git#testing with a few local patches, but nothing especially
relevent-looking.
Could you try the attached patch to see if it makes any difference?
J
>
> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
> 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
> from 3.1.2. In all cases, the host continues to run fine, nothing out
> of the ordinary is logged on the dom0 side, xenstore reports the
> status of the devices is fine.
>
> Can anyone reproduce this problem, or let me know what else I can
> provide to help track this down?
>
> Thanks,
> -Chris
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
[-- Attachment #2: xen-indirect-iret.patch --]
[-- Type: text/x-patch, Size: 2429 bytes --]
Subject: xen: use iret instruction all the time
Change iret implementation to not be dependent on direct-access vcpu
structure.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
---
arch/x86/xen/enlighten.c | 3 +--
arch/x86/xen/xen-asm.S | 11 +++--------
arch/x86/xen/xen-ops.h | 2 +-
3 files changed, 5 insertions(+), 11 deletions(-)
===================================================================
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -860,7 +860,6 @@ void __init xen_setup_vcpu_info_placemen
pv_irq_ops.irq_disable = xen_irq_disable_direct;
pv_irq_ops.irq_enable = xen_irq_enable_direct;
pv_mmu_ops.read_cr2 = xen_read_cr2_direct;
- pv_cpu_ops.iret = xen_iret_direct;
}
}
@@ -964,7 +963,7 @@ static const struct pv_cpu_ops xen_cpu_o
.read_tsc = native_read_tsc,
.read_pmc = native_read_pmc,
- .iret = (void *)&hypercall_page[__HYPERVISOR_iret],
+ .iret = xen_iret,
.irq_enable_syscall_ret = NULL, /* never called */
.load_tr_desc = paravirt_nop,
===================================================================
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -130,13 +130,8 @@ ENDPATCH(xen_restore_fl_direct)
current stack state in whatever form its in, we keep things
simple by only using a single register which is pushed/popped
on the stack.
-
- Non-direct iret could be done in the same way, but it would
- require an annoying amount of code duplication. We'll assume
- that direct mode will be the common case once the hypervisor
- support becomes commonplace.
*/
-ENTRY(xen_iret_direct)
+ENTRY(xen_iret)
/* test eflags for special cases */
testl $(X86_EFLAGS_VM | XEN_EFLAGS_NMI), 8(%esp)
jnz hyper_iret
@@ -150,9 +145,9 @@ ENTRY(xen_iret_direct)
GET_THREAD_INFO(%eax)
movl TI_cpu(%eax),%eax
movl __per_cpu_offset(,%eax,4),%eax
- lea per_cpu__xen_vcpu_info(%eax),%eax
+ mov per_cpu__xen_vcpu(%eax),%eax
#else
- movl $per_cpu__xen_vcpu_info, %eax
+ movl per_cpu__xen_vcpu, %eax
#endif
/* check IF state we're restoring */
===================================================================
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -63,5 +63,5 @@ DECL_ASM(unsigned long, xen_save_fl_dire
DECL_ASM(unsigned long, xen_save_fl_direct, void);
DECL_ASM(void, xen_restore_fl_direct, unsigned long);
-void xen_iret_direct(void);
+void xen_iret(void);
#endif /* XEN_OPS_H */
[-- Attachment #3: Type: text/plain, Size: 184 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-02-07 14:12 ` xming
@ 2008-02-28 20:03 ` Jeremy Fitzhardinge
2008-03-18 16:02 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-02-28 20:03 UTC (permalink / raw)
To: xming; +Cc: Linux Kernel Mailing List, Keir Fraser, virtualization, Xen-devel
xming wrote:
> On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> xming wrote:
>>
>>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
>>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes
>>> the frequency.
>>>
>>>
>> Interesting. Do you mean dom0 cpufreq frequency changes will cause the
>> domU to hang?
>>
>> J
>>
>>
>
> Yes, when Dom0 changes freq while domU is doing something will trigger this.
> When using "on demand" will trigger this very eassily.
>
> This is from xm top when a domU hangs:
>
> test32 ------ 4018 98.8 131072 6.4 131072
> 6.4 1 1 4516 50087 1 0 433908 300403
> 3084907223
>
> So it appers to be running (eating CPU) sometimes the state is "r"
> sometimes "-",
> but both console and network are dead.
>
I haven't tried to repro this yet, but I suspect I won't be able to
because all my test machines have constant_tsc. Does CPU change TSC
rate on processor speed changes?
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
[not found] ` <47C712EF.1060703@goop.org>
@ 2008-03-02 0:43 ` Christopher S. Aker
[not found] ` <47C9F818.4020200@theshore.net>
1 sibling, 0 replies; 18+ messages in thread
From: Christopher S. Aker @ 2008-03-02 0:43 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization
Jeremy Fitzhardinge wrote:
> I've been running this all night without seeing any problem. I'm using
> current x86.git#testing with a few local patches, but nothing especially
> relevent-looking.
Meh .. what backend are you using? We're using LVM volumes exported
directly into the domUs like so:
disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
> Could you try the attached patch to see if it makes any difference?
Unfortunately we're still in the same place... pv_ops kernels are still
hanging after heavy disk IO:
works - 2.6.18.x (from xen-unstable)
hangs - 2.6.25-rc3-git3
hangs - 2.6.25-rc3-git3 + your patch
Any other suggestions or debugging I can provide that would be useful to
squash this?
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
[not found] ` <47C9F818.4020200@theshore.net>
@ 2008-03-02 15:35 ` Jeremy Fitzhardinge
[not found] ` <47CAC931.1000107@goop.org>
1 sibling, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-02 15:35 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> I've been running this all night without seeing any problem. I'm
>> using current x86.git#testing with a few local patches, but nothing
>> especially relevent-looking.
>
> Meh .. what backend are you using? We're using LVM volumes exported
> directly into the domUs like so:
>
> disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ]
>
>> Could you try the attached patch to see if it makes any difference?
>
> Unfortunately we're still in the same place... pv_ops kernels are
> still hanging after heavy disk IO:
>
> works - 2.6.18.x (from xen-unstable)
> hangs - 2.6.25-rc3-git3
> hangs - 2.6.25-rc3-git3 + your patch
>
> Any other suggestions or debugging I can provide that would be useful
> to squash this?
Are you running an SMP or UP domain? I found I could get hangs very
easily with UP (but I need confirm it isn't a result of some other very
experimental patches).
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
[not found] ` <47CAC931.1000107@goop.org>
@ 2008-03-02 16:03 ` Christopher S. Aker
[not found] ` <47CACFBE.5010007@theshore.net>
1 sibling, 0 replies; 18+ messages in thread
From: Christopher S. Aker @ 2008-03-02 16:03 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization
Jeremy Fitzhardinge wrote:
> Are you running an SMP or UP domain? I found I could get hangs very
> easily with UP (but I need confirm it isn't a result of some other very
> experimental patches).
The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
kernels are still slightly responsive after the hang occurs, which makes
me think only one proc gets stuck at a time, not the entire kernel.
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] Re: Xen paravirt frontend block hang
[not found] ` <47CACFBE.5010007@theshore.net>
@ 2008-03-18 16:01 ` Jeremy Fitzhardinge
[not found] ` <47DFE75B.7080404@goop.org>
1 sibling, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-18 16:01 UTC (permalink / raw)
To: Christopher S. Aker
Cc: xming, Xen-devel, Linux Kernel Mailing List, virtualization
Christopher S. Aker wrote:
> Jeremy Fitzhardinge wrote:
>> Are you running an SMP or UP domain? I found I could get hangs very
>> easily with UP (but I need confirm it isn't a result of some other
>> very experimental patches).
>
> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
> kernels are still slightly responsive after the hang occurs, which
> makes me think only one proc gets stuck at a time, not the entire kernel.
The patch I posted yesterday - "xen: fix RMW when unmasking events" -
should definitively fix the hanging-under-load bugs (I hope). It
problem came from returning to userspace with pending events, which
would leave them hanging around on the vcpu unprocessed, and eventually
everything would deadlock. This was caused by using an unlocked
read-modify-write operation on the event pending flag - which can be set
by another (real) cpu - meaning that the pending event wasn't noticed
until too late. It would only be a problem on an SMP host.
The patch should back-apply to 2.6.24.
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang
2008-02-07 14:12 ` xming
2008-02-28 20:03 ` Jeremy Fitzhardinge
@ 2008-03-18 16:02 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2008-03-18 16:02 UTC (permalink / raw)
To: xming; +Cc: Keir Fraser, virtualization
xming wrote:
> On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> xming wrote:
>>
>>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do
>>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes
>>> the frequency.
>>>
>>>
>> Interesting. Do you mean dom0 cpufreq frequency changes will cause the
>> domU to hang?
>>
>> J
>>
>>
>
> Yes, when Dom0 changes freq while domU is doing something will trigger this.
> When using "on demand" will trigger this very eassily.
>
> This is from xm top when a domU hangs:
>
> test32 ------ 4018 98.8 131072 6.4 131072
> 6.4 1 1 4516 50087 1 0 433908 300403
> 3084907223
>
> So it appers to be running (eating CPU) sometimes the state is "r"
> sometimes "-",
> but both console and network are dead.
>
Which version of Xen did you try this on? Some versions of xen-unstable
had horribly broken cpufreq support, in which it was failing to keep
track of cpu speed changes. Current versions should be OK.
J
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] Re: Xen paravirt frontend block hang
[not found] ` <47DFE75B.7080404@goop.org>
@ 2008-03-25 1:37 ` Christopher S. Aker
0 siblings, 0 replies; 18+ messages in thread
From: Christopher S. Aker @ 2008-03-25 1:37 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: xming, Xen-devel, Linux Kernel Mailing List, virtualization
Jeremy Fitzhardinge wrote:
> Christopher S. Aker wrote:
>> Jeremy Fitzhardinge wrote:
>>> Are you running an SMP or UP domain? I found I could get hangs very
>>> easily with UP (but I need confirm it isn't a result of some other
>>> very experimental patches).
>>
>> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP
>> kernels are still slightly responsive after the hang occurs, which
>> makes me think only one proc gets stuck at a time, not the entire kernel.
>
> The patch I posted yesterday - "xen: fix RMW when unmasking events" -
> should definitively fix the hanging-under-load bugs (I hope).
Confirmed-by: caker@theshore.net
Nice work!
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-03-25 1:37 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-26 19:33 Xen paravirt frontend block hang Christopher S. Aker
2007-12-28 23:14 ` Jeremy Fitzhardinge
2007-12-29 1:12 ` Christopher S. Aker
2007-12-29 6:00 ` Jeremy Fitzhardinge
2008-01-29 0:22 ` Christopher S. Aker
2008-01-29 0:40 ` Jeremy Fitzhardinge
2008-01-29 1:04 ` Christopher S. Aker
2008-02-06 12:37 ` xming
2008-02-07 4:09 ` Jeremy Fitzhardinge
2008-02-07 14:12 ` xming
2008-02-28 20:03 ` Jeremy Fitzhardinge
2008-03-18 16:02 ` Jeremy Fitzhardinge
2008-02-28 20:00 ` Jeremy Fitzhardinge
[not found] ` <47C712EF.1060703@goop.org>
2008-03-02 0:43 ` Christopher S. Aker
[not found] ` <47C9F818.4020200@theshore.net>
2008-03-02 15:35 ` Jeremy Fitzhardinge
[not found] ` <47CAC931.1000107@goop.org>
2008-03-02 16:03 ` Christopher S. Aker
[not found] ` <47CACFBE.5010007@theshore.net>
2008-03-18 16:01 ` [Xen-devel] " Jeremy Fitzhardinge
[not found] ` <47DFE75B.7080404@goop.org>
2008-03-25 1:37 ` Christopher S. Aker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).