* Xen paravirt frontend block hang
@ 2007-12-26 19:33 Christopher S. Aker
2007-12-28 23:14 ` Jeremy Fitzhardinge
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Christopher S. Aker @ 2007-12-26 19:33 UTC (permalink / raw)
To: virtualization
Sorry for the noise if this isn't the appropriate venue for this. I
posted this last month to xen-devel:
http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html
I can reliably cause a paravirt_ops Xen guest to hang during intensive
IO. My current recipe is an untar/tar loop, without compression, of a
kernel tree. For example:
wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
bzip2 -d linux-2.6.23.tar.bz2
while true;
echo `date`
tar xf linux-2.6.23.tar
tar cf linux-2.6.23.tar linux-2.6.23
done
After a few loops, anything that touches the xvd device that hung will
get stuck in D state.
This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt
guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and
2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree
from 3.1.2. In all cases, the host continues to run fine, nothing out
of the ordinary is logged on the dom0 side, xenstore reports the status
of the devices is fine.
Can anyone reproduce this problem, or let me know what else I can
provide to help track this down?
Thanks,
-Chris
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Xen paravirt frontend block hang 2007-12-26 19:33 Xen paravirt frontend block hang Christopher S. Aker @ 2007-12-28 23:14 ` Jeremy Fitzhardinge 2007-12-29 1:12 ` Christopher S. Aker 2008-01-29 0:22 ` Christopher S. Aker 2008-02-28 20:00 ` Jeremy Fitzhardinge [not found] ` <47C712EF.1060703@goop.org> 2 siblings, 2 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2007-12-28 23:14 UTC (permalink / raw) To: Christopher S. Aker; +Cc: virtualization Christopher S. Aker wrote: > Sorry for the noise if this isn't the appropriate venue for this. I > posted this last month to xen-devel: > > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html > > I can reliably cause a paravirt_ops Xen guest to hang during intensive > IO. My current recipe is an untar/tar loop, without compression, of a > kernel tree. For example: > > wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 > bzip2 -d linux-2.6.23.tar.bz2 > > while true; > echo `date` > tar xf linux-2.6.23.tar > tar cf linux-2.6.23.tar linux-2.6.23 > done > > After a few loops, anything that touches the xvd device that hung will > get stuck in D state. > > This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt > guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and > 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree > from 3.1.2. In all cases, the host continues to run fine, nothing out > of the ordinary is logged on the dom0 side, xenstore reports the > status of the devices is fine. > > Can anyone reproduce this problem, or let me know what else I can > provide to help track this down? Hi, I'll try to track this down asap. Have you tried any other kernel versions? In other words, did it just start happening, or its always done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't already been fixed (which is possible if its something that happened in a higher layer or something). Thanks, J ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2007-12-28 23:14 ` Jeremy Fitzhardinge @ 2007-12-29 1:12 ` Christopher S. Aker 2007-12-29 6:00 ` Jeremy Fitzhardinge 2008-01-29 0:22 ` Christopher S. Aker 1 sibling, 1 reply; 18+ messages in thread From: Christopher S. Aker @ 2007-12-29 1:12 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization Jeremy Fitzhardinge wrote: > I'll try to track this down asap. Have you tried any other kernel > versions? In other words, did it just start happening, or its always > done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't > already been fixed (which is possible if its something that happened in > a higher layer or something). I've only just recently started using the paravirt_ops kernels, but all the ones I've tried have exhibited the problem, so I'm not sure when the hang was introduced. It doesn't happen with a xensource 2.6.18 tree domU. 2.6.23.8, 2.6.23.12, and 2.6.24-rc6 <-- pv_ops kernels I've tried that have the hang. Thanks! -Chris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2007-12-29 1:12 ` Christopher S. Aker @ 2007-12-29 6:00 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2007-12-29 6:00 UTC (permalink / raw) To: Christopher S. Aker; +Cc: virtualization Christopher S. Aker wrote: > Jeremy Fitzhardinge wrote: >> I'll try to track this down asap. Have you tried any other kernel >> versions? In other words, did it just start happening, or its always >> done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't >> already been fixed (which is possible if its something that happened in >> a higher layer or something). > > I've only just recently started using the paravirt_ops kernels, but > all the ones I've tried have exhibited the problem, so I'm not sure > when the hang was introduced. It doesn't happen with a xensource > 2.6.18 tree domU. > > 2.6.23.8, 2.6.23.12, and 2.6.24-rc6 <-- pv_ops kernels I've tried that > have the hang. Yeah, its all fairly new code. But the pvops blockfront shouldn't have any functional differences from the 2.6.18 version of the driver, so something seems to have got broken along the way. J ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2007-12-28 23:14 ` Jeremy Fitzhardinge 2007-12-29 1:12 ` Christopher S. Aker @ 2008-01-29 0:22 ` Christopher S. Aker 2008-01-29 0:40 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 18+ messages in thread From: Christopher S. Aker @ 2008-01-29 0:22 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization Jeremy Fitzhardinge wrote: > Christopher S. Aker wrote on 12/26/07 2:33 PM, >> Sorry for the noise if this isn't the appropriate venue for this. I >> posted this last month to xen-devel: >> >> http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html >> >> I can reliably cause a paravirt_ops Xen guest to hang during intensive >> IO. My current recipe is an untar/tar loop, without compression, of a >> kernel tree. For example: >> >> wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 >> bzip2 -d linux-2.6.23.tar.bz2 >> >> while true; >> date >> tar xf linux-2.6.23.tar >> tar cf linux-2.6.23.tar linux-2.6.23 >> done >> >> After a few loops, anything that touches the xvd device that hung will >> get stuck in D state. >> >> This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt >> guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and >> 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree >> from 3.1.2. In all cases, the host continues to run fine, nothing out >> of the ordinary is logged on the dom0 side, xenstore reports the >> status of the devices is fine. >> >> Can anyone reproduce this problem, or let me know what else I can >> provide to help track this down? > > Hi, > > I'll try to track this down asap. Have you tried any other kernel > versions? In other words, did it just start happening, or its always > done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't > already been fixed (which is possible if its something that happened in > a higher layer or something). Were you able to give this a try? Still doing it on pv_ops 2.6.24. Thanks, -Chris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-01-29 0:22 ` Christopher S. Aker @ 2008-01-29 0:40 ` Jeremy Fitzhardinge 2008-01-29 1:04 ` Christopher S. Aker 0 siblings, 1 reply; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-01-29 0:40 UTC (permalink / raw) To: Christopher S. Aker; +Cc: xming, virtualization Christopher S. Aker wrote: >> >> I'll try to track this down asap. Have you tried any other kernel >> versions? In other words, did it just start happening, or its always >> done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't >> already been fixed (which is possible if its something that happened in >> a higher layer or something). > > Were you able to give this a try? Still doing it on pv_ops 2.6.24. Hm. xming reported similar symtoms to your report, which turned out to be a result of problems with events getting lost. This patch - which is in 2.6.24 - resolved the issue. Stock 2.6.24 still has this problem for you? Can you also reproduce it with lots of console output? Thanks, J changeset: 76060:66bba82b6e9b parent: 76048:e10ad8f96525 user: Jeremy Fitzhardinge <jeremy@goop.org> date: Wed Jan 23 18:04:54 2008 -0800 files: arch/x86/xen/enlighten.c description: xen: disable vcpu_info placement for now There have been several reports of Xen guest domains locking up when using vcpu_info structure placement. Disable it for now. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> committer: Linus Torvalds <torvalds@woody.linux-foundation.org> diff -r e10ad8f96525 -r 66bba82b6e9b arch/x86/xen/enlighten.c --- a/arch/x86/xen/enlighten.c Wed Jan 23 09:58:55 2008 -0800 +++ b/arch/x86/xen/enlighten.c Wed Jan 23 18:04:54 2008 -0800 @@ -95,7 +95,7 @@ struct shared_info *HYPERVISOR_shared_in * * 0: not available, 1: available */ -static int have_vcpu_info_placement = 1; +static int have_vcpu_info_placement = 0; static void __init xen_vcpu_setup(int cpu) { ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-01-29 0:40 ` Jeremy Fitzhardinge @ 2008-01-29 1:04 ` Christopher S. Aker 2008-02-06 12:37 ` xming 0 siblings, 1 reply; 18+ messages in thread From: Christopher S. Aker @ 2008-01-29 1:04 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: xming, virtualization Jeremy Fitzhardinge wrote: > Hm. xming reported similar symtoms to your report, which turned out to > be a result of problems with events getting lost. This patch - which is > in 2.6.24 - resolved the issue. Stock 2.6.24 still has this problem for > you? Can you also reproduce it with lots of console output? Looks like my tree has this changeset already (stock 2.6.24), so that's not it :/ ... I'll try flooding the console with output. For what it's worth, it does it on both non-smp and smp compiled domU kernels. smp kernels continue to function after triggering the hang, however one CPU gets stuck in iowait -- but I am able to continue to read from the xvd device. Are you able to reproduce it with the shell script I provided? It takes maybe 3 or 4 times through the loop to trigger. -Chris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-01-29 1:04 ` Christopher S. Aker @ 2008-02-06 12:37 ` xming 2008-02-07 4:09 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 18+ messages in thread From: xming @ 2008-02-06 12:37 UTC (permalink / raw) To: Christopher S. Aker; +Cc: virtualization On Jan 29, 2008 2:04 AM, Christopher S. Aker <caker@theshore.net> wrote: > Jeremy Fitzhardinge wrote: > > Hm. xming reported similar symtoms to your report, which turned out to > > be a result of problems with events getting lost. This patch - which is > > in 2.6.24 - resolved the issue. Stock 2.6.24 still has this problem for > > you? Can you also reproduce it with lots of console output? > > Looks like my tree has this changeset already (stock 2.6.24), so that's > not it :/ ... I'll try flooding the console with output. > > For what it's worth, it does it on both non-smp and smp compiled domU > kernels. smp kernels continue to function after triggering the hang, > however one CPU gets stuck in iowait -- but I am able to continue to > read from the xvd device. > > Are you able to reproduce it with the shell script I provided? It takes > maybe 3 or 4 times through the loop to trigger. > > -Chris I cannot trigger this neither with my tests (http://marc.info/?l=linux-kernel&m=120066005505315&w=2) nor with your test after 12 loops: Wed Feb 6 13:20:41 CET 2008 Wed Feb 6 13:21:31 CET 2008 Wed Feb 6 13:22:39 CET 2008 Wed Feb 6 13:23:46 CET 2008 Wed Feb 6 13:24:53 CET 2008 Wed Feb 6 13:26:04 CET 2008 Wed Feb 6 13:27:13 CET 2008 Wed Feb 6 13:28:19 CET 2008 Wed Feb 6 13:29:31 CET 2008 Wed Feb 6 13:30:40 CET 2008 Wed Feb 6 13:31:52 CET 2008 Wed Feb 6 13:33:02 CET 2008 But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do not boot any more and 2.6.24 does boot but will hang after cpufreq changes the frequency. x. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-02-06 12:37 ` xming @ 2008-02-07 4:09 ` Jeremy Fitzhardinge 2008-02-07 14:12 ` xming 0 siblings, 1 reply; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-02-07 4:09 UTC (permalink / raw) To: xming; +Cc: Keir Fraser, virtualization xming wrote: > But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do > not boot any more and 2.6.24 does boot but will hang after cpufreq changes > the frequency. > Interesting. Do you mean dom0 cpufreq frequency changes will cause the domU to hang? J ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-02-07 4:09 ` Jeremy Fitzhardinge @ 2008-02-07 14:12 ` xming 2008-02-28 20:03 ` Jeremy Fitzhardinge 2008-03-18 16:02 ` Jeremy Fitzhardinge 0 siblings, 2 replies; 18+ messages in thread From: xming @ 2008-02-07 14:12 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: Keir Fraser, virtualization On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > xming wrote: > > But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do > > not boot any more and 2.6.24 does boot but will hang after cpufreq changes > > the frequency. > > > > Interesting. Do you mean dom0 cpufreq frequency changes will cause the > domU to hang? > > J > Yes, when Dom0 changes freq while domU is doing something will trigger this. When using "on demand" will trigger this very eassily. This is from xm top when a domU hangs: test32 ------ 4018 98.8 131072 6.4 131072 6.4 1 1 4516 50087 1 0 433908 300403 3084907223 So it appers to be running (eating CPU) sometimes the state is "r" sometimes "-", but both console and network are dead. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-02-07 14:12 ` xming @ 2008-02-28 20:03 ` Jeremy Fitzhardinge 2008-03-18 16:02 ` Jeremy Fitzhardinge 1 sibling, 0 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-02-28 20:03 UTC (permalink / raw) To: xming; +Cc: Linux Kernel Mailing List, Keir Fraser, virtualization, Xen-devel xming wrote: > On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > >> xming wrote: >> >>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do >>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes >>> the frequency. >>> >>> >> Interesting. Do you mean dom0 cpufreq frequency changes will cause the >> domU to hang? >> >> J >> >> > > Yes, when Dom0 changes freq while domU is doing something will trigger this. > When using "on demand" will trigger this very eassily. > > This is from xm top when a domU hangs: > > test32 ------ 4018 98.8 131072 6.4 131072 > 6.4 1 1 4516 50087 1 0 433908 300403 > 3084907223 > > So it appers to be running (eating CPU) sometimes the state is "r" > sometimes "-", > but both console and network are dead. > I haven't tried to repro this yet, but I suspect I won't be able to because all my test machines have constant_tsc. Does CPU change TSC rate on processor speed changes? J ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2008-02-07 14:12 ` xming 2008-02-28 20:03 ` Jeremy Fitzhardinge @ 2008-03-18 16:02 ` Jeremy Fitzhardinge 1 sibling, 0 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-18 16:02 UTC (permalink / raw) To: xming; +Cc: Keir Fraser, virtualization xming wrote: > On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > >> xming wrote: >> >>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do >>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes >>> the frequency. >>> >>> >> Interesting. Do you mean dom0 cpufreq frequency changes will cause the >> domU to hang? >> >> J >> >> > > Yes, when Dom0 changes freq while domU is doing something will trigger this. > When using "on demand" will trigger this very eassily. > > This is from xm top when a domU hangs: > > test32 ------ 4018 98.8 131072 6.4 131072 > 6.4 1 1 4516 50087 1 0 433908 300403 > 3084907223 > > So it appers to be running (eating CPU) sometimes the state is "r" > sometimes "-", > but both console and network are dead. > Which version of Xen did you try this on? Some versions of xen-unstable had horribly broken cpufreq support, in which it was failing to keep track of cpu speed changes. Current versions should be OK. J ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Xen paravirt frontend block hang 2007-12-26 19:33 Xen paravirt frontend block hang Christopher S. Aker 2007-12-28 23:14 ` Jeremy Fitzhardinge @ 2008-02-28 20:00 ` Jeremy Fitzhardinge [not found] ` <47C712EF.1060703@goop.org> 2 siblings, 0 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-02-28 20:00 UTC (permalink / raw) To: Christopher S. Aker; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization [-- Attachment #1: Type: text/plain, Size: 1664 bytes --] Christopher S. Aker wrote: > Sorry for the noise if this isn't the appropriate venue for this. I > posted this last month to xen-devel: > > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html > > I can reliably cause a paravirt_ops Xen guest to hang during intensive > IO. My current recipe is an untar/tar loop, without compression, of a > kernel tree. For example: > > wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 > bzip2 -d linux-2.6.23.tar.bz2 > > while true; > echo `date` > tar xf linux-2.6.23.tar > tar cf linux-2.6.23.tar linux-2.6.23 > done > > After a few loops, anything that touches the xvd device that hung will > get stuck in D state. I've been running this all night without seeing any problem. I'm using current x86.git#testing with a few local patches, but nothing especially relevent-looking. Could you try the attached patch to see if it makes any difference? J > > This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt > guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and > 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree > from 3.1.2. In all cases, the host continues to run fine, nothing out > of the ordinary is logged on the dom0 side, xenstore reports the > status of the devices is fine. > > Can anyone reproduce this problem, or let me know what else I can > provide to help track this down? > > Thanks, > -Chris > _______________________________________________ > Virtualization mailing list > Virtualization@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/virtualization [-- Attachment #2: xen-indirect-iret.patch --] [-- Type: text/x-patch, Size: 2429 bytes --] Subject: xen: use iret instruction all the time Change iret implementation to not be dependent on direct-access vcpu structure. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> --- arch/x86/xen/enlighten.c | 3 +-- arch/x86/xen/xen-asm.S | 11 +++-------- arch/x86/xen/xen-ops.h | 2 +- 3 files changed, 5 insertions(+), 11 deletions(-) =================================================================== --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -860,7 +860,6 @@ void __init xen_setup_vcpu_info_placemen pv_irq_ops.irq_disable = xen_irq_disable_direct; pv_irq_ops.irq_enable = xen_irq_enable_direct; pv_mmu_ops.read_cr2 = xen_read_cr2_direct; - pv_cpu_ops.iret = xen_iret_direct; } } @@ -964,7 +963,7 @@ static const struct pv_cpu_ops xen_cpu_o .read_tsc = native_read_tsc, .read_pmc = native_read_pmc, - .iret = (void *)&hypercall_page[__HYPERVISOR_iret], + .iret = xen_iret, .irq_enable_syscall_ret = NULL, /* never called */ .load_tr_desc = paravirt_nop, =================================================================== --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -130,13 +130,8 @@ ENDPATCH(xen_restore_fl_direct) current stack state in whatever form its in, we keep things simple by only using a single register which is pushed/popped on the stack. - - Non-direct iret could be done in the same way, but it would - require an annoying amount of code duplication. We'll assume - that direct mode will be the common case once the hypervisor - support becomes commonplace. */ -ENTRY(xen_iret_direct) +ENTRY(xen_iret) /* test eflags for special cases */ testl $(X86_EFLAGS_VM | XEN_EFLAGS_NMI), 8(%esp) jnz hyper_iret @@ -150,9 +145,9 @@ ENTRY(xen_iret_direct) GET_THREAD_INFO(%eax) movl TI_cpu(%eax),%eax movl __per_cpu_offset(,%eax,4),%eax - lea per_cpu__xen_vcpu_info(%eax),%eax + mov per_cpu__xen_vcpu(%eax),%eax #else - movl $per_cpu__xen_vcpu_info, %eax + movl per_cpu__xen_vcpu, %eax #endif /* check IF state we're restoring */ =================================================================== --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -63,5 +63,5 @@ DECL_ASM(unsigned long, xen_save_fl_dire DECL_ASM(unsigned long, xen_save_fl_direct, void); DECL_ASM(void, xen_restore_fl_direct, unsigned long); -void xen_iret_direct(void); +void xen_iret(void); #endif /* XEN_OPS_H */ [-- Attachment #3: Type: text/plain, Size: 184 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <47C712EF.1060703@goop.org>]
* Re: Xen paravirt frontend block hang [not found] ` <47C712EF.1060703@goop.org> @ 2008-03-02 0:43 ` Christopher S. Aker [not found] ` <47C9F818.4020200@theshore.net> 1 sibling, 0 replies; 18+ messages in thread From: Christopher S. Aker @ 2008-03-02 0:43 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization Jeremy Fitzhardinge wrote: > I've been running this all night without seeing any problem. I'm using > current x86.git#testing with a few local patches, but nothing especially > relevent-looking. Meh .. what backend are you using? We're using LVM volumes exported directly into the domUs like so: disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ] > Could you try the attached patch to see if it makes any difference? Unfortunately we're still in the same place... pv_ops kernels are still hanging after heavy disk IO: works - 2.6.18.x (from xen-unstable) hangs - 2.6.25-rc3-git3 hangs - 2.6.25-rc3-git3 + your patch Any other suggestions or debugging I can provide that would be useful to squash this? -Chris ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <47C9F818.4020200@theshore.net>]
* Re: Xen paravirt frontend block hang [not found] ` <47C9F818.4020200@theshore.net> @ 2008-03-02 15:35 ` Jeremy Fitzhardinge [not found] ` <47CAC931.1000107@goop.org> 1 sibling, 0 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-02 15:35 UTC (permalink / raw) To: Christopher S. Aker; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization Christopher S. Aker wrote: > Jeremy Fitzhardinge wrote: >> I've been running this all night without seeing any problem. I'm >> using current x86.git#testing with a few local patches, but nothing >> especially relevent-looking. > > Meh .. what backend are you using? We're using LVM volumes exported > directly into the domUs like so: > > disk =[ 'phy:vg1/xencaker-56392,xvda,w', ... ] > >> Could you try the attached patch to see if it makes any difference? > > Unfortunately we're still in the same place... pv_ops kernels are > still hanging after heavy disk IO: > > works - 2.6.18.x (from xen-unstable) > hangs - 2.6.25-rc3-git3 > hangs - 2.6.25-rc3-git3 + your patch > > Any other suggestions or debugging I can provide that would be useful > to squash this? Are you running an SMP or UP domain? I found I could get hangs very easily with UP (but I need confirm it isn't a result of some other very experimental patches). J ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <47CAC931.1000107@goop.org>]
* Re: Xen paravirt frontend block hang [not found] ` <47CAC931.1000107@goop.org> @ 2008-03-02 16:03 ` Christopher S. Aker [not found] ` <47CACFBE.5010007@theshore.net> 1 sibling, 0 replies; 18+ messages in thread From: Christopher S. Aker @ 2008-03-02 16:03 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: Xen-devel, Linux Kernel Mailing List, virtualization Jeremy Fitzhardinge wrote: > Are you running an SMP or UP domain? I found I could get hangs very > easily with UP (but I need confirm it isn't a result of some other very > experimental patches). The hang occurs with both SMP and UP compiled pv_ops kernels. SMP kernels are still slightly responsive after the hang occurs, which makes me think only one proc gets stuck at a time, not the entire kernel. -Chris ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <47CACFBE.5010007@theshore.net>]
* Re: [Xen-devel] Re: Xen paravirt frontend block hang [not found] ` <47CACFBE.5010007@theshore.net> @ 2008-03-18 16:01 ` Jeremy Fitzhardinge [not found] ` <47DFE75B.7080404@goop.org> 1 sibling, 0 replies; 18+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-18 16:01 UTC (permalink / raw) To: Christopher S. Aker Cc: xming, Xen-devel, Linux Kernel Mailing List, virtualization Christopher S. Aker wrote: > Jeremy Fitzhardinge wrote: >> Are you running an SMP or UP domain? I found I could get hangs very >> easily with UP (but I need confirm it isn't a result of some other >> very experimental patches). > > The hang occurs with both SMP and UP compiled pv_ops kernels. SMP > kernels are still slightly responsive after the hang occurs, which > makes me think only one proc gets stuck at a time, not the entire kernel. The patch I posted yesterday - "xen: fix RMW when unmasking events" - should definitively fix the hanging-under-load bugs (I hope). It problem came from returning to userspace with pending events, which would leave them hanging around on the vcpu unprocessed, and eventually everything would deadlock. This was caused by using an unlocked read-modify-write operation on the event pending flag - which can be set by another (real) cpu - meaning that the pending event wasn't noticed until too late. It would only be a problem on an SMP host. The patch should back-apply to 2.6.24. J ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <47DFE75B.7080404@goop.org>]
* Re: [Xen-devel] Re: Xen paravirt frontend block hang [not found] ` <47DFE75B.7080404@goop.org> @ 2008-03-25 1:37 ` Christopher S. Aker 0 siblings, 0 replies; 18+ messages in thread From: Christopher S. Aker @ 2008-03-25 1:37 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: xming, Xen-devel, Linux Kernel Mailing List, virtualization Jeremy Fitzhardinge wrote: > Christopher S. Aker wrote: >> Jeremy Fitzhardinge wrote: >>> Are you running an SMP or UP domain? I found I could get hangs very >>> easily with UP (but I need confirm it isn't a result of some other >>> very experimental patches). >> >> The hang occurs with both SMP and UP compiled pv_ops kernels. SMP >> kernels are still slightly responsive after the hang occurs, which >> makes me think only one proc gets stuck at a time, not the entire kernel. > > The patch I posted yesterday - "xen: fix RMW when unmasking events" - > should definitively fix the hanging-under-load bugs (I hope). Confirmed-by: caker@theshore.net Nice work! -Chris ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-03-25 1:37 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-26 19:33 Xen paravirt frontend block hang Christopher S. Aker
2007-12-28 23:14 ` Jeremy Fitzhardinge
2007-12-29 1:12 ` Christopher S. Aker
2007-12-29 6:00 ` Jeremy Fitzhardinge
2008-01-29 0:22 ` Christopher S. Aker
2008-01-29 0:40 ` Jeremy Fitzhardinge
2008-01-29 1:04 ` Christopher S. Aker
2008-02-06 12:37 ` xming
2008-02-07 4:09 ` Jeremy Fitzhardinge
2008-02-07 14:12 ` xming
2008-02-28 20:03 ` Jeremy Fitzhardinge
2008-03-18 16:02 ` Jeremy Fitzhardinge
2008-02-28 20:00 ` Jeremy Fitzhardinge
[not found] ` <47C712EF.1060703@goop.org>
2008-03-02 0:43 ` Christopher S. Aker
[not found] ` <47C9F818.4020200@theshore.net>
2008-03-02 15:35 ` Jeremy Fitzhardinge
[not found] ` <47CAC931.1000107@goop.org>
2008-03-02 16:03 ` Christopher S. Aker
[not found] ` <47CACFBE.5010007@theshore.net>
2008-03-18 16:01 ` [Xen-devel] " Jeremy Fitzhardinge
[not found] ` <47DFE75B.7080404@goop.org>
2008-03-25 1:37 ` Christopher S. Aker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).