* Performance issue @ 2012-11-22 19:17 George-Cristian Bîrzan 2012-11-23 7:26 ` Stefan Hajnoczi 2012-11-25 15:19 ` Gleb Natapov 0 siblings, 2 replies; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-22 19:17 UTC (permalink / raw) To: kvm I'm trying to understand a performance problem (50% degradation in the VM) that I'm experiencing some systems with qemu-kvm. Running Fedora with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1 on AMD Opteron 6176 and 6174, and all of them behave identically. A Windows guest is receiving a UDP MPEG stream that is being processed by TSReader. The stream comes in at about 73Mbps, but the VM cannot process more than 43Mbps. It's not a networking issue, the packets reach the guest and with iperf we can easily do 80Mbps. Also, with iperf, it can receive the packets from the streamer (even though it doesn't detect things properly, but it was just a way to see ). However, on an identical host (a 6174 CPU, even), a Windows install has absolutely no problem processing the same stream. This is the command we're using to start qemu-kvm: /usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M pc-1.2 -cpu host -enable-kvm -m 16384 -smp 16,sockets=1,cores=16,threads=1 -uuid b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-81067503a6ad.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c-7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr=0x3 -netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr=0x4 -vnc 127.0.0.1:4,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 As a sidenote, the TSReader application only uses one thread for decoding the stream, one for network IO. While using more threads would solve the problem. I've tried smaller guest, with 5 cores, pinned all of them to CPUs 6 to 11 (all in a NUMA node), each to an individual CPU, I've tried enabling huge pages/TLB thingy... and that's about it. I'm completely stuck. Is this 50% hit something that's considered 'okay', or am I doing something wrong? And if the latter, what/how can I debug it? -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-22 19:17 Performance issue George-Cristian Bîrzan @ 2012-11-23 7:26 ` Stefan Hajnoczi [not found] ` <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com> 2012-11-25 15:19 ` Gleb Natapov 1 sibling, 1 reply; 23+ messages in thread From: Stefan Hajnoczi @ 2012-11-23 7:26 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: kvm On Thu, Nov 22, 2012 at 09:17:34PM +0200, George-Cristian Bîrzan wrote: > I'm trying to understand a performance problem (50% degradation in the > VM) that I'm experiencing some systems with qemu-kvm. Running Fedora > with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1 > on AMD Opteron 6176 and 6174, and all of them behave identically. > > A Windows guest is receiving a UDP MPEG stream that is being processed > by TSReader. The stream comes in at about 73Mbps, but the VM cannot > process more than 43Mbps. It's not a networking issue, the packets > reach the guest and with iperf we can easily do 80Mbps. Also, with > iperf, it can receive the packets from the streamer (even though it > doesn't detect things properly, but it was just a way to see ). Hi George-Cristian, On IRC you mentioned you found a solution. Any updates? Are you still seeing the performance problem? Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com>]
* Fwd: Performance issue [not found] ` <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com> @ 2012-11-23 14:02 ` George-Cristian Bîrzan 0 siblings, 0 replies; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-23 14:02 UTC (permalink / raw) To: kvm, Stefan Hajnoczi On Fri, Nov 23, 2012 at 9:26 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > Hi George-Cristian, > On IRC you mentioned you found a solution. Any updates? Are you still > seeing the performance problem? It wasn't a solution, I just thought I knew why. I was thinking the 73Mbps were coming in at 188 bytes per packet, which would've been too many packets for the machine to handle, probably. Turns out, the stream is coming in at 1358 bytes, which means I'm back to square one. Also, I just got in to work, and will try to write my own program to read the stream. The actual workload that these VMs will have to do is actually not as simple as just decoding the stream, they have to transcode them, but I don't have access to the source to see exactly what it's doing (same withe tsreader, but at least that's not something in house for our customer.) -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-22 19:17 Performance issue George-Cristian Bîrzan 2012-11-23 7:26 ` Stefan Hajnoczi @ 2012-11-25 15:19 ` Gleb Natapov 2012-11-25 16:17 ` George-Cristian Bîrzan 1 sibling, 1 reply; 23+ messages in thread From: Gleb Natapov @ 2012-11-25 15:19 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: kvm On Thu, Nov 22, 2012 at 09:17:34PM +0200, George-Cristian Bîrzan wrote: > I'm trying to understand a performance problem (50% degradation in the > VM) that I'm experiencing some systems with qemu-kvm. Running Fedora > with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1 > on AMD Opteron 6176 and 6174, and all of them behave identically. > > A Windows guest is receiving a UDP MPEG stream that is being processed > by TSReader. The stream comes in at about 73Mbps, but the VM cannot > process more than 43Mbps. It's not a networking issue, the packets > reach the guest and with iperf we can easily do 80Mbps. Also, with > iperf, it can receive the packets from the streamer (even though it > doesn't detect things properly, but it was just a way to see ). > > However, on an identical host (a 6174 CPU, even), a Windows install > has absolutely no problem processing the same stream. > What Windows is this? Can you try changing "-cpu host" to "-cpu host,+hv_relaxed"? -- Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-25 15:19 ` Gleb Natapov @ 2012-11-25 16:17 ` George-Cristian Bîrzan 2012-11-26 19:31 ` George-Cristian Bîrzan 0 siblings, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-25 16:17 UTC (permalink / raw) To: Gleb Natapov; +Cc: kvm On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote: > What Windows is this? Can you try changing "-cpu host" to "-cpu > host,+hv_relaxed"? This is on Windows Server 2008 R2 (sorry, forgot to mention that I guess), and I can try it tomorrow (US time), as getting a stream my way depends on complicated stuff. I will though, and let you know how it goes. -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-25 16:17 ` George-Cristian Bîrzan @ 2012-11-26 19:31 ` George-Cristian Bîrzan 2012-11-27 12:20 ` Gleb Natapov 0 siblings, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-26 19:31 UTC (permalink / raw) To: Gleb Natapov; +Cc: kvm On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote: > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote: >> What Windows is this? Can you try changing "-cpu host" to "-cpu >> host,+hv_relaxed"? > > This is on Windows Server 2008 R2 (sorry, forgot to mention that I > guess), and I can try it tomorrow (US time), as getting a stream my > way depends on complicated stuff. I will though, and let you know how > it goes. I changed that, no difference. -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-26 19:31 ` George-Cristian Bîrzan @ 2012-11-27 12:20 ` Gleb Natapov 2012-11-27 12:29 ` George-Cristian Bîrzan 0 siblings, 1 reply; 23+ messages in thread From: Gleb Natapov @ 2012-11-27 12:20 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: kvm On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote: > On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote: > > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote: > >> What Windows is this? Can you try changing "-cpu host" to "-cpu > >> host,+hv_relaxed"? > > > > This is on Windows Server 2008 R2 (sorry, forgot to mention that I > > guess), and I can try it tomorrow (US time), as getting a stream my > > way depends on complicated stuff. I will though, and let you know how > > it goes. > > I changed that, no difference. > > Heh, I forgot that the part that should make difference is not yet upstream :( -- Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-27 12:20 ` Gleb Natapov @ 2012-11-27 12:29 ` George-Cristian Bîrzan 2012-11-27 14:54 ` Gleb Natapov 0 siblings, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-27 12:29 UTC (permalink / raw) To: Gleb Natapov; +Cc: kvm On Tue, Nov 27, 2012 at 2:20 PM, Gleb Natapov <gleb@redhat.com> wrote: > On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote: >> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote: >> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote: >> >> What Windows is this? Can you try changing "-cpu host" to "-cpu >> >> host,+hv_relaxed"? >> > >> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I >> > guess), and I can try it tomorrow (US time), as getting a stream my >> > way depends on complicated stuff. I will though, and let you know how >> > it goes. >> >> I changed that, no difference. >> >> > Heh, I forgot that the part that should make difference is not yet > upstream :( We can try recompiling kvm/qemu with some patches, if that'd help. At this point, anything is on the table except changing Windows and the hardware :-) Also, it might be that the software doing the actual work is not well written, but even so... -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-27 12:29 ` George-Cristian Bîrzan @ 2012-11-27 14:54 ` Gleb Natapov 2012-11-27 20:38 ` Vadim Rozenfeld 0 siblings, 1 reply; 23+ messages in thread From: Gleb Natapov @ 2012-11-27 14:54 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: kvm, vrozenfe On Tue, Nov 27, 2012 at 02:29:20PM +0200, George-Cristian Bîrzan wrote: > On Tue, Nov 27, 2012 at 2:20 PM, Gleb Natapov <gleb@redhat.com> wrote: > > On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote: > >> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote: > >> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote: > >> >> What Windows is this? Can you try changing "-cpu host" to "-cpu > >> >> host,+hv_relaxed"? > >> > > >> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I > >> > guess), and I can try it tomorrow (US time), as getting a stream my > >> > way depends on complicated stuff. I will though, and let you know how > >> > it goes. > >> > >> I changed that, no difference. > >> > >> > > Heh, I forgot that the part that should make difference is not yet > > upstream :( > > We can try recompiling kvm/qemu with some patches, if that'd help. At > this point, anything is on the table except changing Windows and the > hardware :-) Vadim do you have Hyper-v reference timer patches for KVM to try? > > Also, it might be that the software doing the actual work is not well > written, but even so... > > -- > George-Cristian Bîrzan -- Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-27 14:54 ` Gleb Natapov @ 2012-11-27 20:38 ` Vadim Rozenfeld 2012-11-27 21:13 ` George-Cristian Bîrzan 0 siblings, 1 reply; 23+ messages in thread From: Vadim Rozenfeld @ 2012-11-27 20:38 UTC (permalink / raw) To: Gleb Natapov; +Cc: George-Cristian Bîrzan, kvm On Tuesday, November 27, 2012 04:54:47 PM Gleb Natapov wrote: > On Tue, Nov 27, 2012 at 02:29:20PM +0200, George-Cristian Bîrzan wrote: > > On Tue, Nov 27, 2012 at 2:20 PM, Gleb Natapov <gleb@redhat.com> wrote: > > > On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote: > > >> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote: > > >> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote: > > >> >> What Windows is this? Can you try changing "-cpu host" to "-cpu > > >> >> host,+hv_relaxed"? > > >> > > > >> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I > > >> > guess), and I can try it tomorrow (US time), as getting a stream my > > >> > way depends on complicated stuff. I will though, and let you know > > >> > how it goes. > > >> > > >> I changed that, no difference. > > > > > > Heh, I forgot that the part that should make difference is not yet > > > upstream :( > > > > We can try recompiling kvm/qemu with some patches, if that'd help. At > > this point, anything is on the table except changing Windows and the > > hardware :-) > > Vadim do you have Hyper-v reference timer patches for KVM to try? I have some code which do both reference time and invariant TSC but it will not work after migration. I will send it later today. Vadim. > > > Also, it might be that the software doing the actual work is not well > > written, but even so... > > > > -- > > George-Cristian Bîrzan > > -- > Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-27 20:38 ` Vadim Rozenfeld @ 2012-11-27 21:13 ` George-Cristian Bîrzan 2012-11-28 11:39 ` Vadim Rozenfeld 0 siblings, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-27 21:13 UTC (permalink / raw) To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > I have some code which do both reference time and invariant TSC but it > will not work after migration. I will send it later today. Do you mean migrating guests? This is not an issue for us. Also, it would be much appreciated! -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-27 21:13 ` George-Cristian Bîrzan @ 2012-11-28 11:39 ` Vadim Rozenfeld 2012-11-28 19:09 ` George-Cristian Bîrzan 2012-11-28 19:18 ` George-Cristian Bîrzan 0 siblings, 2 replies; 23+ messages in thread From: Vadim Rozenfeld @ 2012-11-28 11:39 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: Gleb Natapov, kvm [-- Attachment #1: Type: Text/Plain, Size: 751 bytes --] On Tuesday, November 27, 2012 11:13:12 PM George-Cristian Bîrzan wrote: > On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > > I have some code which do both reference time and invariant TSC but it > > will not work after migration. I will send it later today. > > Do you mean migrating guests? This is not an issue for us. OK, but don't say I didn't warn you :) There are two patches, one for kvm and another one for qemu. you will probably need to rebase them. Add "hv_tsc" cpu parameter to activate this feature. you will probably need to deactivate hpet by adding "-no-hpet" parameter as well. best regards, Vadim. > > Also, it would be much appreciated! > > -- > George-Cristian Bîrzan [-- Attachment #2: hv_time_kvm.diff --] [-- Type: text/x-patch, Size: 4028 bytes --] diff --git a/arch/x86/include/asm/hyperv.h b/arch/x86/include/asm/hyperv.h index b80420b..9c5ffef 100644 --- a/arch/x86/include/asm/hyperv.h +++ b/arch/x86/include/asm/hyperv.h @@ -136,6 +136,9 @@ /* MSR used to read the per-partition time reference counter */ #define HV_X64_MSR_TIME_REF_COUNT 0x40000020 +/* A partition's reference time stamp counter (TSC) page */ +#define HV_X64_MSR_REFERENCE_TSC 0x40000021 + /* Define the virtual APIC registers */ #define HV_X64_MSR_EOI 0x40000070 #define HV_X64_MSR_ICR 0x40000071 @@ -179,6 +182,10 @@ #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK \ (~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) +#define HV_X64_MSR_TSC_REFERENCE_ENABLE 0x00000001 +#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT 12 + + #define HV_PROCESSOR_POWER_STATE_C0 0 #define HV_PROCESSOR_POWER_STATE_C1 1 #define HV_PROCESSOR_POWER_STATE_C2 2 @@ -191,4 +198,11 @@ #define HV_STATUS_INVALID_ALIGNMENT 4 #define HV_STATUS_INSUFFICIENT_BUFFERS 19 +typedef struct _HV_REFERENCE_TSC_PAGE { + uint32_t TscSequence; + uint32_t Rserved1; + uint64_t TscScale; + int64_t TscOffset; +} HV_REFERENCE_TSC_PAGE, * PHV_REFERENCE_TSC_PAGE; + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b2e11f4..63ee09e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -565,6 +565,8 @@ struct kvm_arch { /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; + u64 hv_ref_count; + u64 hv_tsc_page; #ifdef CONFIG_KVM_MMU_AUDIT int audit_point; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4f76417..4538295 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -813,7 +813,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_REFERENCE_TSC, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, MSR_KVM_PV_EOI_EN, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, @@ -1428,6 +1428,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr) switch (msr) { case HV_X64_MSR_GUEST_OS_ID: case HV_X64_MSR_HYPERCALL: + case HV_X64_MSR_TIME_REF_COUNT: + case HV_X64_MSR_REFERENCE_TSC: r = true; break; } @@ -1438,6 +1440,7 @@ static bool kvm_hv_msr_partition_wide(u32 msr) static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) { struct kvm *kvm = vcpu->kvm; + unsigned long addr; switch (msr) { case HV_X64_MSR_GUEST_OS_ID: @@ -1467,6 +1470,27 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (__copy_to_user((void __user *)addr, instructions, 4)) return 1; kvm->arch.hv_hypercall = data; + kvm->arch.hv_ref_count = get_kernel_ns(); + break; + } + case HV_X64_MSR_REFERENCE_TSC: { + HV_REFERENCE_TSC_PAGE tsc_ref; + tsc_ref.TscSequence = + boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ? 1 : 0; + tsc_ref.TscScale = + ((10000LL << 32) /vcpu->arch.virtual_tsc_khz) << 32; + tsc_ref.TscOffset = 0; + if (!(data & HV_X64_MSR_TSC_REFERENCE_ENABLE)) { + kvm->arch.hv_tsc_page = data; + break; + } + addr = gfn_to_hva(vcpu->kvm, data >> + HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT); + if (kvm_is_error_hva(addr)) + return 1; + if(__copy_to_user((void __user *)addr, &tsc_ref, sizeof(tsc_ref))) + return 1; + kvm->arch.hv_tsc_page = data; break; } default: @@ -1881,6 +1905,13 @@ static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case HV_X64_MSR_HYPERCALL: data = kvm->arch.hv_hypercall; break; + case HV_X64_MSR_TIME_REF_COUNT: + data = get_kernel_ns() - kvm->arch.hv_ref_count; + do_div(data, 100); + break; + case HV_X64_MSR_REFERENCE_TSC: + data = kvm->arch.hv_tsc_page; + break; default: vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr); return 1; [-- Attachment #3: hv_time_qemu.diff --] [-- Type: text/x-patch, Size: 4666 bytes --] diff --git a/target-i386/cpu.c b/target-i386/cpu.c index f3708e6..ad77b72 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1250,6 +1250,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) hyperv_enable_relaxed_timing(true); } else if (!strcmp(featurestr, "hv_vapic")) { hyperv_enable_vapic_recommended(true); + } else if (!strcmp(featurestr, "hv_tsc")) { + hyperv_enable_tsc_recommended(true); } else { fprintf(stderr, "feature string `%s' not in format (+feature|-feature|feature=xyz)\n", featurestr); goto error; diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c index f284e99..bd581a1 100644 --- a/target-i386/hyperv.c +++ b/target-i386/hyperv.c @@ -15,6 +15,12 @@ static bool hyperv_vapic; static bool hyperv_relaxed_timing; static int hyperv_spinlock_attempts = HYPERV_SPINLOCK_NEVER_RETRY; +static bool hyperv_tsc; + +void hyperv_enable_tsc_recommended(bool val) +{ + hyperv_tsc = val; +} void hyperv_enable_vapic_recommended(bool val) { @@ -42,12 +48,18 @@ bool hyperv_enabled(void) bool hyperv_hypercall_available(void) { if (hyperv_vapic || + hyperv_tsc || (hyperv_spinlock_attempts != HYPERV_SPINLOCK_NEVER_RETRY)) { return true; } return false; } +bool hyperv_tsc_recommended(void) +{ + return hyperv_tsc; +} + bool hyperv_vapic_recommended(void) { return hyperv_vapic; diff --git a/target-i386/hyperv.h b/target-i386/hyperv.h index bacb1d4..94c2d6e 100644 --- a/target-i386/hyperv.h +++ b/target-i386/hyperv.h @@ -27,10 +27,12 @@ #endif #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_KVM) +void hyperv_enable_tsc_recommended(bool val); void hyperv_enable_vapic_recommended(bool val); void hyperv_enable_relaxed_timing(bool val); void hyperv_set_spinlock_retries(int val); #else +static inline void hyperv_enable_tsc_recommended(bool val) { } static inline void hyperv_enable_vapic_recommended(bool val) { } static inline void hyperv_enable_relaxed_timing(bool val) { } static inline void hyperv_set_spinlock_retries(int val) { } @@ -38,6 +40,7 @@ static inline void hyperv_set_spinlock_retries(int val) { } bool hyperv_enabled(void); bool hyperv_hypercall_available(void); +bool hyperv_tsc_recommended(void); bool hyperv_vapic_recommended(void); bool hyperv_relaxed_timing_enabled(void); int hyperv_get_spinlock_retries(void); diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5b18383..dc7f259 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -390,13 +390,17 @@ int kvm_arch_init_vcpu(CPUX86State *env) c = &cpuid_data.entries[cpuid_i++]; memset(c, 0, sizeof(*c)); c->function = KVM_CPUID_SIGNATURE; - if (!hyperv_enabled()) { - memcpy(signature, "KVMKVMKVM\0\0\0", 12); - c->eax = 0; - } else { - memcpy(signature, "Microsoft Hv", 12); + memcpy(signature, "KVMKVMKVM\0\0\0", 12); + if (hyperv_enabled()) { c->eax = HYPERV_CPUID_MIN; } +// if (!hyperv_enabled()) { +// memcpy(signature, "KVMKVMKVM\0\0\0", 12); +// c->eax = 0; +// } else { +// memcpy(signature, "Microsoft Hv", 12); +// c->eax = HYPERV_CPUID_MIN; +// } c->ebx = signature[0]; c->ecx = signature[1]; c->edx = signature[2]; @@ -427,7 +431,11 @@ int kvm_arch_init_vcpu(CPUX86State *env) c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE; c->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE; } - + if (hyperv_tsc_recommended()) { + c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE; + c->eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE; + c->eax |= 0x200; + } c = &cpuid_data.entries[cpuid_i++]; memset(c, 0, sizeof(*c)); c->function = HYPERV_CPUID_ENLIGHTMENT_INFO; @@ -445,14 +453,14 @@ int kvm_arch_init_vcpu(CPUX86State *env) c->eax = 0x40; c->ebx = 0x40; - c = &cpuid_data.entries[cpuid_i++]; - memset(c, 0, sizeof(*c)); - c->function = KVM_CPUID_SIGNATURE_NEXT; - memcpy(signature, "KVMKVMKVM\0\0\0", 12); - c->eax = 0; - c->ebx = signature[0]; - c->ecx = signature[1]; - c->edx = signature[2]; +// c = &cpuid_data.entries[cpuid_i++]; +// memset(c, 0, sizeof(*c)); +// c->function = KVM_CPUID_SIGNATURE_NEXT; +// memcpy(signature, "KVMKVMKVM\0\0\0", 12); +// c->eax = 0; +// c->ebx = signature[0]; +// c->ecx = signature[1]; +// c->edx = signature[2]; } has_msr_async_pf_en = c->eax & (1 << KVM_FEATURE_ASYNC_PF); ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-28 11:39 ` Vadim Rozenfeld @ 2012-11-28 19:09 ` George-Cristian Bîrzan 2012-11-29 11:56 ` Vadim Rozenfeld 2012-11-28 19:18 ` George-Cristian Bîrzan 1 sibling, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-28 19:09 UTC (permalink / raw) To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > On Tuesday, November 27, 2012 11:13:12 PM George-Cristian Bîrzan wrote: >> On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com> > wrote: >> > I have some code which do both reference time and invariant TSC but it >> > will not work after migration. I will send it later today. >> >> Do you mean migrating guests? This is not an issue for us. > OK, but don't say I didn't warn you :) > > There are two patches, one for kvm and another one for qemu. > you will probably need to rebase them. > Add "hv_tsc" cpu parameter to activate this feature. > you will probably need to deactivate hpet by adding "-no-hpet" > parameter as well. I've also added +hv_relaxed since then, but this is the command I'm using now and there's no change: /usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M pc-1.2 -enable-kvm -m 16384 -smp 9,sockets=1,cores=9,threads=1 -uuid b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-81067503a6ad.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c-7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr=0x3 -netdev tap,fd=40,id=hostnet1,vhost=on,vhostfd=41 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr=0x4 -vnc 127.0.0.1:0,password -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -cpu host,hv_tsc I compiled qemu-1.2.0-24 after applying your patch, used the head for KVM, and I see no difference. I've tried setting windows' useplatformclock on and off, no change either. Other than that, was looking into a profiling trace of the software running and a lot of time (60%?) is spent calling two functions from hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and HalpHPETProgramRolloverTimer which do point at something related to the timers. Any other thing I can try? -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-28 19:09 ` George-Cristian Bîrzan @ 2012-11-29 11:56 ` Vadim Rozenfeld 2012-11-29 13:45 ` George-Cristian Bîrzan 0 siblings, 1 reply; 23+ messages in thread From: Vadim Rozenfeld @ 2012-11-29 11:56 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: Gleb Natapov, kvm On Wednesday, November 28, 2012 09:09:29 PM George-Cristian Bîrzan wrote: > On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > > On Tuesday, November 27, 2012 11:13:12 PM George-Cristian Bîrzan wrote: > >> On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com> > > > > wrote: > >> > I have some code which do both reference time and invariant TSC but it > >> > will not work after migration. I will send it later today. > >> > >> Do you mean migrating guests? This is not an issue for us. > > > > OK, but don't say I didn't warn you :) > > > > There are two patches, one for kvm and another one for qemu. > > you will probably need to rebase them. > > Add "hv_tsc" cpu parameter to activate this feature. > > you will probably need to deactivate hpet by adding "-no-hpet" > > parameter as well. > > I've also added +hv_relaxed since then, but this is the command I'm I would suggest activating relaxed timing for all W2K8R2/Win7 guests. > using now and there's no change: > > /usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M > pc-1.2 -enable-kvm -m 16384 -smp 9,sockets=1,cores=9,threads=1 -uuid > b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults > -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-8 > 1067503a6ad.monitor,server,nowait -mon > chardev=charmonitor,id=monitor,mode=control -rtc base=utc > -no-hpet -no-shutdown -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive > file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c > -7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,ai > o=native -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=vir > tio-disk0,bootindex=1 -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=36 > -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr > =0x3 -netdev tap,fd=40,id=hostnet1,vhost=on,vhostfd=41 -device > virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr > =0x4 -vnc 127.0.0.1:0,password -vga cirrus -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -cpu host,hv_tsc > > I compiled qemu-1.2.0-24 after applying your patch, used the head for > KVM, and I see no difference. I've tried setting windows' > useplatformclock on and off, no change either. > > > Other than that, was looking into a profiling trace of the software > running and a lot of time (60%?) is spent calling two functions from > hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and > HalpHPETProgramRolloverTimer which do point at something related to > the timers. > It means that hyper-v time stamp source was not activated. > Any other thing I can try? > > > -- > George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-29 11:56 ` Vadim Rozenfeld @ 2012-11-29 13:45 ` George-Cristian Bîrzan 2012-11-29 13:56 ` Gleb Natapov 0 siblings, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-29 13:45 UTC (permalink / raw) To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm On Thu, Nov 29, 2012 at 1:56 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: >> I've also added +hv_relaxed since then, but this is the command I'm > > I would suggest activating relaxed timing for all W2K8R2/Win7 guests. Is there any place I can read up on the downsides of this for Linux, or is Just Better? >>>> Other than that, was looking into a profiling trace of the software >> running and a lot of time (60%?) is spent calling two functions from >> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and >> HalpHPETProgramRolloverTimer which do point at something related to >> the timers. >> > It means that hyper-v time stamp source was not activated. I recompiled the whole kernel, with your patch, and while I cannot check at 70Mbps now, a test stream of 20 seems to do better. Also, now I don't see any of those functions, which used to account ~60% of the time spent by the program. I'm waiting for the customer to come back and start the 'real' stream, but from my tests, time spent in hal.dll is now an order of magnitude smaller. -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-29 13:45 ` George-Cristian Bîrzan @ 2012-11-29 13:56 ` Gleb Natapov 2012-11-29 20:34 ` Vadim Rozenfeld 0 siblings, 1 reply; 23+ messages in thread From: Gleb Natapov @ 2012-11-29 13:56 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: Vadim Rozenfeld, kvm On Thu, Nov 29, 2012 at 03:45:52PM +0200, George-Cristian Bîrzan wrote: > On Thu, Nov 29, 2012 at 1:56 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > >> I've also added +hv_relaxed since then, but this is the command I'm > > > > I would suggest activating relaxed timing for all W2K8R2/Win7 guests. > > Is there any place I can read up on the downsides of this for Linux, > or is Just Better? > You shouldn't use hyper-v flags for Linux guests. In theory Linux should just ignore them, in practice there may be bugs that will prevent Linux from detecting that it runs as a guest and disable optimizations. > >>>> Other than that, was looking into a profiling trace of the software > >> running and a lot of time (60%?) is spent calling two functions from > >> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and > >> HalpHPETProgramRolloverTimer which do point at something related to > >> the timers. > >> > > It means that hyper-v time stamp source was not activated. > > I recompiled the whole kernel, with your patch, and while I cannot > check at 70Mbps now, a test stream of 20 seems to do better. Also, now > I don't see any of those functions, which used to account ~60% of the > time spent by the program. I'm waiting for the customer to come back > and start the 'real' stream, but from my tests, time spent in hal.dll > is now an order of magnitude smaller. > > -- > George-Cristian Bîrzan -- Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-29 13:56 ` Gleb Natapov @ 2012-11-29 20:34 ` Vadim Rozenfeld 0 siblings, 0 replies; 23+ messages in thread From: Vadim Rozenfeld @ 2012-11-29 20:34 UTC (permalink / raw) To: Gleb Natapov; +Cc: George-Cristian Bîrzan, kvm On Thursday, November 29, 2012 03:56:10 PM Gleb Natapov wrote: > On Thu, Nov 29, 2012 at 03:45:52PM +0200, George-Cristian Bîrzan wrote: > > On Thu, Nov 29, 2012 at 1:56 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > > >> I've also added +hv_relaxed since then, but this is the command I'm > > > > > > I would suggest activating relaxed timing for all W2K8R2/Win7 guests. > > > > Is there any place I can read up on the downsides of this for Linux, > > or is Just Better? > > You shouldn't use hyper-v flags for Linux guests. In theory Linux should > just ignore them, in practice there may be bugs that will prevent Linux > from detecting that it runs as a guest and disable optimizations. > As Gleb said, hyper-v flag are relevant to the Windows guests only. IIRC spinlocks and vapic should work for Vista and higher. Relaxed timing and partition reference time work for Win7/W2K8R2. > > >>>> Other than that, was looking into a profiling trace of the software > > >> > > >> running and a lot of time (60%?) is spent calling two functions from > > >> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and > > >> HalpHPETProgramRolloverTimer which do point at something related to > > >> the timers. > > > > > > It means that hyper-v time stamp source was not activated. > > > > I recompiled the whole kernel, with your patch, and while I cannot > > check at 70Mbps now, a test stream of 20 seems to do better. Also, now > > I don't see any of those functions, which used to account ~60% of the > > time spent by the program. I'm waiting for the customer to come back > > and start the 'real' stream, but from my tests, time spent in hal.dll > > is now an order of magnitude smaller. > > > > -- > > George-Cristian Bîrzan > > -- > Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-28 11:39 ` Vadim Rozenfeld 2012-11-28 19:09 ` George-Cristian Bîrzan @ 2012-11-28 19:18 ` George-Cristian Bîrzan 2012-11-28 19:56 ` Gleb Natapov 1 sibling, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-28 19:18 UTC (permalink / raw) To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > There are two patches, one for kvm and another one for qemu. I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-28 19:18 ` George-Cristian Bîrzan @ 2012-11-28 19:56 ` Gleb Natapov 2012-11-28 20:01 ` George-Cristian Bîrzan 0 siblings, 1 reply; 23+ messages in thread From: Gleb Natapov @ 2012-11-28 19:56 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: Vadim Rozenfeld, kvm On Wed, Nov 28, 2012 at 09:18:38PM +0200, George-Cristian Bîrzan wrote: > On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > > There are two patches, one for kvm and another one for qemu. > > I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu > Does not matter, but you need to also recompile kernel with the first patch. -- Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-28 19:56 ` Gleb Natapov @ 2012-11-28 20:01 ` George-Cristian Bîrzan 2012-11-28 20:12 ` Gleb Natapov 0 siblings, 1 reply; 23+ messages in thread From: George-Cristian Bîrzan @ 2012-11-28 20:01 UTC (permalink / raw) To: Gleb Natapov; +Cc: Vadim Rozenfeld, kvm On Wed, Nov 28, 2012 at 9:56 PM, Gleb Natapov <gleb@redhat.com> wrote: > On Wed, Nov 28, 2012 at 09:18:38PM +0200, George-Cristian Bîrzan wrote: >> On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: >> > There are two patches, one for kvm and another one for qemu. >> >> I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu >> > Does not matter, but you need to also recompile kernel with the first patch. Do I have to recompile the kernel, or just the module? I followed the instructions at http://www.linux-kvm.org/page/Code#building_an_external_module_with_older_kernels but I guess I can do the whole kernel, if it might help. -- George-Cristian Bîrzan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Performance issue 2012-11-28 20:01 ` George-Cristian Bîrzan @ 2012-11-28 20:12 ` Gleb Natapov 0 siblings, 0 replies; 23+ messages in thread From: Gleb Natapov @ 2012-11-28 20:12 UTC (permalink / raw) To: George-Cristian Bîrzan; +Cc: Vadim Rozenfeld, kvm On Wed, Nov 28, 2012 at 10:01:04PM +0200, George-Cristian Bîrzan wrote: > On Wed, Nov 28, 2012 at 9:56 PM, Gleb Natapov <gleb@redhat.com> wrote: > > On Wed, Nov 28, 2012 at 09:18:38PM +0200, George-Cristian Bîrzan wrote: > >> On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote: > >> > There are two patches, one for kvm and another one for qemu. > >> > >> I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu > >> > > Does not matter, but you need to also recompile kernel with the first patch. > > Do I have to recompile the kernel, or just the module? I followed the > instructions at > http://www.linux-kvm.org/page/Code#building_an_external_module_with_older_kernels > but I guess I can do the whole kernel, if it might help. > Module is enough, but kvm-kmod is not what you want. Just rebuild the whole kernel if you do not know how to rebuild only the module for your distribution's kernel. -- Gleb. ^ permalink raw reply [flat|nested] 23+ messages in thread
* kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
@ 2015-10-15 13:38 Rainer Fügenstein
2015-10-16 1:15 ` Neil Brown
0 siblings, 1 reply; 23+ messages in thread
From: Rainer Fügenstein @ 2015-10-15 13:38 UTC (permalink / raw)
To: Linux-RAID
Hi,
my NAS-like server with 5*3TB SATA drives in RAID5 configuration was
running without problems for what seems an eternity; since about 3
weeks it keeps freezing every other day with the following error:
# grep soft /var/log/messages
Oct 15 11:26:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:26:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:26:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:26:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:26:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
Oct 15 11:27:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:27:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:27:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
Oct 15 11:28:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:28:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:28:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:28:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:28:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
[...]
this is only part of the story, check the end of this message for
a detailed log.
sometimes the server recovers after 60+ seconds, sometimes it requires
a hard reset (causing mdraid to re-sync the whole array).
IIRC, it started when a drive in the array failed with "SATA
connection timeouts" (kind of). this drive has been replaced by a new
one, but yet the CPU lockups keep coming.
I suspect that aging hardware slowly starts to fail, but not sure
which part (drives? SATA controller? cables? NIC? CPU? ...)
here's some info that might be useful:
# uname -a
Linux alfred 2.6.18-406.el5 #1 SMP Tue Jun 2 17:25:57 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[7] sdf1[3] sdc1[5] sde1[0] sdd1[8]
11721061376 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
[=>...................] resync = 5.2% (154579584/2930265344) finish=3347.7min speed=13816K/sec
unused devices: <none>
excerpt:
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata9.00: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133
ata9.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata9.00: configured for UDMA/133
sdb : very big device. try to use READ CAPACITY(16).
SCSI device sdb: 5860533168 512-byte hdwr sectors (3000593 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
sdb : very big device. try to use READ CAPACITY(16).
SCSI device sdb: 5860533168 512-byte hdwr sectors (3000593 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
sdb: sdb1
sd 4:0:0:0: Attached scsi disk sdb
sd 4:0:0:0: Attached scsi generic sg1 type 0
Vendor: ATA Model: WDC WD30EZRX-00D Rev: 80.0
Type: Direct-Access ANSI SCSI revision: 05
# lspci
00:00.0 Host bridge: Intel Corporation Atom Processor D4xx/D5xx/N4xx/N5xx DMI Bridge (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Atom Processor D4xx/D5xx/N4xx/N5xx Integrated Graphics Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 01)
00:1c.2 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 3 (rev 01)
00:1c.3 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 4 (rev 01)
00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 (rev 01)
00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 (rev 01)
00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 (rev 01)
00:1d.3 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #4 (rev 01)
00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation NM10 Family LPC Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation NM10/ICH7 Family SATA Controller [AHCI mode] (rev 01)
00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)
05:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
# cat /proc/cpuinfo
[...]
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU D510 @ 1.66GHz
stepping : 10
cpu MHz : 1666.686
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3333.36
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
= = = detailed log:
Oct 15 11:27:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:27:49 alfred kernel: CPU 1:
Oct 15 11:27:49 alfred kernel: Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat ip_
nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 ipv6 xfrm_nalgo crypto
_api xfs loop dm_multipath scsi_dh raid456 xor video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acp
i acpi_memhotplug ac parport_pc lp parport sg i2c_i801 i2c_core serio_raw tpm_tis pcspkr tpm sata_mv r8169 tpm_bios shpchp mii d
m_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci libata sd_mod scsi_
mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Oct 15 11:27:49 alfred kernel: Pid: 1614, comm: md0_raid5 Not tainted 2.6.18-406.el5 #1
Oct 15 11:27:49 alfred kernel: RIP: 0010:[<ffffffff881d35a2>] [<ffffffff881d35a2>] :r8169:rtl8169_interrupt+0x248/0x26f
Oct 15 11:27:49 alfred kernel: RSP: 0018:ffff81007eec7df8 EFLAGS: 00000206
Oct 15 11:27:49 alfred kernel: RAX: 0000000000000040 RBX: ffff81007de0a000 RCX: 0000000000000042
Oct 15 11:27:49 alfred kernel: RDX: 00000000ffe2001d RSI: ffffffff80047254 RDI: ffff81007de0a180
Oct 15 11:27:49 alfred kernel: RBP: ffff81007eec7d70 R08: 0000000000000003 R09: ffffffff8005e298
Oct 15 11:27:49 alfred kernel: R10: 0000000000000001 R11: 0000000000000060 R12: ffffffff8005dc9e
Oct 15 11:27:49 alfred kernel: R13: 0000000000000040 R14: ffffffff800796ae R15: ffff81007eec7d70
Oct 15 11:27:49 alfred kernel: FS: 0000000000000000(0000) GS:ffff81007ef179c0(0000) knlGS:0000000000000000
Oct 15 11:27:49 alfred kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Oct 15 11:27:49 alfred kernel: CR2: 00002b0a2bbba30c CR3: 00000000547e8000 CR4: 00000000000006a0
Oct 15 11:27:49 alfred kernel:
Oct 15 11:27:49 alfred kernel: Call Trace:
Oct 15 11:27:49 alfred kernel: <IRQ> [<ffffffff881d356b>] :r8169:rtl8169_interrupt+0x211/0x26f
Oct 15 11:27:49 alfred kernel: [<ffffffff80010dc0>] handle_IRQ_event+0x51/0xa6
Oct 15 11:27:49 alfred kernel: [<ffffffff800becc5>] __do_IRQ+0xfb/0x15b
Oct 15 11:27:49 alfred kernel: [<ffffffff8006d4c5>] do_IRQ+0xe9/0xf7
Oct 15 11:27:49 alfred kernel: [<ffffffff8005d625>] ret_from_intr+0x0/0xa
Oct 15 11:27:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:27:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
Oct 15 11:27:49 alfred kernel: [<ffffffff8005dc9e>] apic_timer_interrupt+0x66/0x6c
Oct 15 11:27:49 alfred kernel: <EOI> [<ffffffff80064b30>] _spin_unlock_irqrestore+0x8/0x9
Oct 15 11:27:49 alfred kernel: [<ffffffff88075d16>] :scsi_mod:scsi_dispatch_cmd+0x207/0x2b1
Oct 15 11:27:49 alfred kernel: [<ffffffff8807b926>] :scsi_mod:scsi_request_fn+0x2c3/0x392
Oct 15 11:27:49 alfred kernel: [<ffffffff8014af49>] elv_insert+0xac/0x1c4
Oct 15 11:27:49 alfred kernel: [<ffffffff8000c21c>] __make_request+0x47f/0x4ce
Oct 15 11:27:49 alfred kernel: [<ffffffff8001c84f>] generic_make_request+0x211/0x228
Oct 15 11:27:49 alfred kernel: [<ffffffff8001b125>] bio_alloc_bioset+0x89/0xd9
Oct 15 11:27:49 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:27:49 alfred kernel: [<ffffffff8003368c>] submit_bio+0xe6/0xed
Oct 15 11:27:49 alfred kernel: [<ffffffff80222dfe>] md_update_sb+0x1af/0x23a
Oct 15 11:27:49 alfred kernel: [<ffffffff8022812e>] md_check_recovery+0x15d/0x454
Oct 15 11:27:49 alfred kernel: [<ffffffff8833549f>] :raid456:raid5d+0x15/0x182
Oct 15 11:27:49 alfred kernel: [<ffffffff8003b13b>] prepare_to_wait+0x34/0x61
Oct 15 11:27:49 alfred kernel: [<ffffffff80225acc>] md_thread+0xf8/0x10e
Oct 15 11:27:49 alfred kernel: [<ffffffff800a3fb1>] autoremove_wake_function+0x0/0x2e
Oct 15 11:27:49 alfred kernel: [<ffffffff802259d4>] md_thread+0x0/0x10e
Oct 15 11:27:49 alfred kernel: [<ffffffff80032c1d>] kthread+0xfe/0x132
Oct 15 11:27:49 alfred kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 15 11:27:49 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:27:49 alfred kernel: [<ffffffff80032b1f>] kthread+0x0/0x132
Oct 15 11:27:49 alfred kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 15 11:27:49 alfred kernel:
Oct 15 11:28:14 alfred kernel: INFO: task pdflush:10294 blocked for more than 120 seconds.
Oct 15 11:28:14 alfred kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 11:28:14 alfred kernel: pdflush D ffff810002536420 0 10294 27 10375 1706 (L-TLB)
Oct 15 11:28:14 alfred kernel: ffff81006318baa0 0000000000000046 0000000000000003 0000000082147ce0
Oct 15 11:28:14 alfred kernel: 00900000000000d8 000000000000000a ffff8100614ff040 ffffffff8031db60
Oct 15 11:28:14 alfred kernel: 00004a61ef4e2a4e 0000000000008115 ffff8100614ff228 000000006166ea40
Oct 15 11:28:14 alfred kernel: Call Trace:
Oct 15 11:28:14 alfred kernel: [<ffffffff80224647>] md_write_start+0xf2/0x108
Oct 15 11:28:14 alfred kernel: [<ffffffff800a3fb1>] autoremove_wake_function+0x0/0x2e
Oct 15 11:28:14 alfred kernel: [<ffffffff883cce08>] :xfs:xfs_page_state_convert+0x4f7/0x546
Oct 15 11:28:14 alfred kernel: [<ffffffff88335db1>] :raid456:make_request+0x4e/0x4e3
Oct 15 11:28:14 alfred kernel: [<ffffffff8001c84f>] generic_make_request+0x211/0x228
Oct 15 11:28:14 alfred kernel: [<ffffffff800238ac>] mempool_alloc+0x31/0xe7
Oct 15 11:28:14 alfred kernel: [<ffffffff8003368c>] submit_bio+0xe6/0xed
Oct 15 11:28:14 alfred kernel: [<ffffffff883ce805>] :xfs:_xfs_buf_ioapply+0x1f2/0x254
Oct 15 11:28:14 alfred kernel: [<ffffffff883ce8a0>] :xfs:xfs_buf_iorequest+0x39/0x64
Oct 15 11:28:14 alfred kernel: [<ffffffff883b89e2>] :xfs:xlog_bdstrat_cb+0x16/0x3c
Oct 15 11:28:14 alfred kernel: [<ffffffff883b99e4>] :xfs:xlog_sync+0x218/0x3ad
Oct 15 11:28:14 alfred kernel: [<ffffffff883ba744>] :xfs:xlog_state_sync_all+0xb9/0x1d9
Oct 15 11:28:14 alfred kernel: [<ffffffff883bacc7>] :xfs:_xfs_log_force+0x59/0x68
Oct 15 11:28:14 alfred kernel: [<ffffffff883bace1>] :xfs:xfs_log_force+0xb/0x3f
Oct 15 11:28:14 alfred kernel: [<ffffffff883c6587>] :xfs:xfs_syncsub+0x33/0x226
Oct 15 11:28:14 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel: [<ffffffff883d3cad>] :xfs:xfs_fs_write_super+0x1b/0x21
Oct 15 11:28:14 alfred kernel: [<ffffffff800e8c5a>] sync_supers+0x80/0xe1
Oct 15 11:28:14 alfred kernel: [<ffffffff8005697a>] pdflush+0x0/0x1fb
Oct 15 11:28:14 alfred kernel: [<ffffffff800cdca0>] wb_kupdate+0x3e/0x16a
Oct 15 11:28:14 alfred kernel: [<ffffffff8005697a>] pdflush+0x0/0x1fb
Oct 15 11:28:14 alfred kernel: [<ffffffff80056acb>] pdflush+0x151/0x1fb
Oct 15 11:28:14 alfred kernel: [<ffffffff800cdc62>] wb_kupdate+0x0/0x16a
Oct 15 11:28:14 alfred kernel: [<ffffffff80032c1d>] kthread+0xfe/0x132
Oct 15 11:28:14 alfred kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 15 11:28:14 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel: [<ffffffff80032b1f>] kthread+0x0/0x132
Oct 15 11:28:14 alfred kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 15 11:28:14 alfred kernel:
Oct 15 11:28:14 alfred kernel: INFO: task md0_resync:13543 blocked for more than 120 seconds.
Oct 15 11:28:14 alfred kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 11:28:14 alfred kernel: md0_resync D ffff810037f117f0 0 13543 27 10375 (L-TLB)
Oct 15 11:28:14 alfred kernel: ffff81004dad3c50 0000000000000046 0000000000000001 0000000000000000
Oct 15 11:28:14 alfred kernel: ffff81007eb6f5f0 000000000000000a ffff81005fe63080 ffff810037f117f0
Oct 15 11:28:14 alfred kernel: 00004a613eedffb5 00000000003a50b4 ffff81005fe63268 0000000000000003
Oct 15 11:28:14 alfred kernel: Call Trace:
Oct 15 11:28:14 alfred kernel: [<ffffffff8002e493>] __wake_up+0x38/0x4f
Oct 15 11:28:14 alfred kernel: [<ffffffff880756c0>] :scsi_mod:scsi_done+0x0/0x18
Oct 15 11:28:14 alfred kernel: [<ffffffff88330dc5>] :raid456:get_active_stripe+0x242/0x4bd
Oct 15 11:28:14 alfred kernel: [<ffffffff8008f4f9>] default_wake_function+0x0/0xe
Oct 15 11:28:14 alfred kernel: [<ffffffff88335ccc>] :raid456:sync_request+0x6c0/0x757
Oct 15 11:28:14 alfred kernel: [<ffffffff8807b9a0>] :scsi_mod:scsi_request_fn+0x33d/0x392
Oct 15 11:28:14 alfred kernel: [<ffffffff801583ef>] __next_cpu+0x19/0x28
Oct 15 11:28:14 alfred kernel: [<ffffffff80225f46>] md_do_sync+0x464/0x84b
Oct 15 11:28:14 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel: [<ffffffff80225acc>] md_thread+0xf8/0x10e
Oct 15 11:28:14 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel: [<ffffffff802259d4>] md_thread+0x0/0x10e
Oct 15 11:28:14 alfred kernel: [<ffffffff80032c1d>] kthread+0xfe/0x132
Oct 15 11:28:14 alfred kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 15 11:28:14 alfred kernel: [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel: [<ffffffff80032b1f>] kthread+0x0/0x132
Oct 15 11:28:14 alfred kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 15 11:28:14 alfred kernel:
tnx & cu
--
Best regards,
Rainer mailto:rfu@oudeis.org
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] 2015-10-15 13:38 kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] Rainer Fügenstein @ 2015-10-16 1:15 ` Neil Brown 2015-10-24 16:15 ` performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) Rainer Fügenstein 0 siblings, 1 reply; 23+ messages in thread From: Neil Brown @ 2015-10-16 1:15 UTC (permalink / raw) To: Rainer Fügenstein, Linux-RAID [-- Attachment #1: Type: text/plain, Size: 2641 bytes --] Rainer Fügenstein <rfu@oudeis.org> writes: > Hi, > > my NAS-like server with 5*3TB SATA drives in RAID5 configuration was > running without problems for what seems an eternity; since about 3 > weeks it keeps freezing every other day with the following error: > > # grep soft /var/log/messages > Oct 15 11:26:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] > Oct 15 11:26:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28 > Oct 15 11:26:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133 > Oct 15 11:26:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28 > Oct 15 11:26:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d > Oct 15 11:27:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] > Oct 15 11:27:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28 > Oct 15 11:27:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133 > Oct 15 11:27:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28 > Oct 15 11:27:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d > Oct 15 11:28:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] > Oct 15 11:28:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28 > Oct 15 11:28:49 alfred kernel: [<ffffffff80012583>] __do_softirq+0x51/0x133 > Oct 15 11:28:49 alfred kernel: [<ffffffff8005e298>] call_softirq+0x1c/0x28 > Oct 15 11:28:49 alfred kernel: [<ffffffff8006d63a>] do_softirq+0x2c/0x7d > [...] > this is only part of the story, check the end of this message for > a detailed log. > > sometimes the server recovers after 60+ seconds, sometimes it requires > a hard reset (causing mdraid to re-sync the whole array). I strongly recommend adding a write-intend bitmap mdadm --grow /dev/md0 --bitmap=internal that will speed up the resync enormously. > > IIRC, it started when a drive in the array failed with "SATA > connection timeouts" (kind of). this drive has been replaced by a new > one, but yet the CPU lockups keep coming. > > I suspect that aging hardware slowly starts to fail, but not sure > which part (drives? SATA controller? cables? NIC? CPU? ...) > > here's some info that might be useful: > # uname -a > Linux alfred 2.6.18-406.el5 #1 SMP Tue Jun 2 17:25:57 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux This is a rather ancient kernel. The "el" suffix probably suggests Redhat? If you have a Redhat support contract you should ask them. If you don't, you should probably try a newer kernel (or buy a support contract). NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) 2015-10-16 1:15 ` Neil Brown @ 2015-10-24 16:15 ` Rainer Fügenstein 2015-10-24 16:31 ` Roman Mamedov 0 siblings, 1 reply; 23+ messages in thread From: Rainer Fügenstein @ 2015-10-24 16:15 UTC (permalink / raw) To: Neil Brown, Linux-RAID hi, > I strongly recommend adding a write-intend bitmap > mdadm --grow /dev/md0 --bitmap=internal I did as suggested, but now it feels like performance has dropped to about 1/4th of what it used to be before. since this system is already pretty slow by design, this is quite frustrating. no soft-lockups so far, fortunately. may a new kernel speed things up again? or can --bitmap=internal be undone? (need some time to prepare the upgrade to a new OS release) tnx & cu -- Best regards, Rainer mailto:rfu@oudeis.org ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) 2015-10-24 16:15 ` performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) Rainer Fügenstein @ 2015-10-24 16:31 ` Roman Mamedov 2015-10-25 19:23 ` Rainer Fügenstein 0 siblings, 1 reply; 23+ messages in thread From: Roman Mamedov @ 2015-10-24 16:31 UTC (permalink / raw) To: Rainer Fügenstein; +Cc: Neil Brown, Linux-RAID [-- Attachment #1: Type: text/plain, Size: 486 bytes --] On Sat, 24 Oct 2015 18:15:41 +0200 Rainer Fügenstein <rfu@oudeis.org> wrote: > hi, > > > I strongly recommend adding a write-intend bitmap > > mdadm --grow /dev/md0 --bitmap=internal > > I did as suggested, but now it feels like performance has dropped to > about 1/4th of what it used to be before. since this system is already > pretty slow by design, this is quite frustrating. Use a higher bitmap-chunk size, such as 256M or more. -- With respect, Roman [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) 2015-10-24 16:31 ` Roman Mamedov @ 2015-10-25 19:23 ` Rainer Fügenstein 2015-10-25 20:08 ` Neil Brown 0 siblings, 1 reply; 23+ messages in thread From: Rainer Fügenstein @ 2015-10-25 19:23 UTC (permalink / raw) To: Roman Mamedov; +Cc: Neil Brown, Linux-RAID Hello Roman, Saturday, October 24, 2015, 6:31:39 PM, you wrote: > Use a higher bitmap-chunk size, such as 256M or more. I guess that would be mdadm --grow /dev/md0 --bitmap-chunk=256M ?? is it wise to issue this command during a re-sync? a cron.weekly job started the re-sync (although I'm pretty sure this job has been disabled quite some time ago) $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdb1[7] sdf1[3] sdc1[5] sde1[0] sdd1[8] 11721061376 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] [==>..................] resync = 11.9% (348948608/2930265344) finish=7771.1min speed=5533K/sec bitmap: 8/350 pages [32KB], 4096KB chunk unused devices: <none> tnx & cu -- Best regards, Rainer mailto:rfu@oudeis.org ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) 2015-10-25 19:23 ` Rainer Fügenstein @ 2015-10-25 20:08 ` Neil Brown 2015-11-02 22:55 ` performance issue Rainer Fügenstein 0 siblings, 1 reply; 23+ messages in thread From: Neil Brown @ 2015-10-25 20:08 UTC (permalink / raw) To: Rainer Fügenstein, Roman Mamedov; +Cc: Linux-RAID [-- Attachment #1: Type: text/plain, Size: 1839 bytes --] Rainer Fügenstein <rfu@oudeis.org> writes: > Hello Roman, > > Saturday, October 24, 2015, 6:31:39 PM, you wrote: > >> Use a higher bitmap-chunk size, such as 256M or more. > > I guess that would be > > mdadm --grow /dev/md0 --bitmap-chunk=256M ?? You would need to remove and then re-add the bitmap. So: mdadm --grow /dev/md0 --bitmap=none mdadm --grow /dev/md0 --bitmap=intermnal --bitmap-chunk=256M > > is it wise to issue this command during a re-sync? Depending on kernel version, it will either work or it won't. Either way, it won't cause harm. > > a cron.weekly job started the re-sync (although I'm pretty sure this > job has been disabled quite some time ago) Weekly is a bit more often than I would go for, but why disable it? Regular scanning for latent bad blocks is fairly important for reliability. > $ cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdb1[7] sdf1[3] sdc1[5] sde1[0] sdd1[8] > 11721061376 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] > [==>..................] resync = 11.9% (348948608/2930265344) finish=7771.1min speed=5533K/sec > bitmap: 8/350 pages [32KB], 4096KB chunk That isn't a cronjob started resync. That would say "check" rather than 'resync". This looks a lot like a resync after an unclean restart. But with the bitmap that should go faster... What does "mdadm --examine-bitmap /dev/sdb1" report? NeilBrown > > unused devices: <none> > > tnx & cu > > -- > Best regards, > Rainer mailto:rfu@oudeis.org > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance issue 2015-10-25 20:08 ` Neil Brown @ 2015-11-02 22:55 ` Rainer Fügenstein 2015-11-03 1:34 ` Neil Brown 0 siblings, 1 reply; 23+ messages in thread From: Rainer Fügenstein @ 2015-11-02 22:55 UTC (permalink / raw) To: Neil Brown; +Cc: Linux-RAID On 25.10.2015 21:08, Neil Brown wrote: > mdadm --grow /dev/md0 --bitmap=intermnal --bitmap-chunk=256 not sure how to specify the chunks size: [root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256 mdadm: failed to create internal bitmap - chunksize problem. [root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256M mdadm: invalid bitmap chunksize: 256M ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: performance issue 2015-11-02 22:55 ` performance issue Rainer Fügenstein @ 2015-11-03 1:34 ` Neil Brown 0 siblings, 0 replies; 23+ messages in thread From: Neil Brown @ 2015-11-03 1:34 UTC (permalink / raw) To: Rainer Fügenstein; +Cc: Linux-RAID [-- Attachment #1: Type: text/plain, Size: 615 bytes --] On Tue, Nov 03 2015, Rainer Fügenstein wrote: > On 25.10.2015 21:08, Neil Brown wrote: >> mdadm --grow /dev/md0 --bitmap=intermnal --bitmap-chunk=256 > > not sure how to specify the chunks size: > > [root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256 > mdadm: failed to create internal bitmap - chunksize problem. > [root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256M > mdadm: invalid bitmap chunksize: 256M I guess you have an mdadm version earlier than 3.2 try --bitmap-chunk=262144 which is 256*1024. The number is in K. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 818 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2015-11-03 1:34 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-22 19:17 Performance issue George-Cristian Bîrzan
2012-11-23 7:26 ` Stefan Hajnoczi
[not found] ` <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com>
2012-11-23 14:02 ` Fwd: " George-Cristian Bîrzan
2012-11-25 15:19 ` Gleb Natapov
2012-11-25 16:17 ` George-Cristian Bîrzan
2012-11-26 19:31 ` George-Cristian Bîrzan
2012-11-27 12:20 ` Gleb Natapov
2012-11-27 12:29 ` George-Cristian Bîrzan
2012-11-27 14:54 ` Gleb Natapov
2012-11-27 20:38 ` Vadim Rozenfeld
2012-11-27 21:13 ` George-Cristian Bîrzan
2012-11-28 11:39 ` Vadim Rozenfeld
2012-11-28 19:09 ` George-Cristian Bîrzan
2012-11-29 11:56 ` Vadim Rozenfeld
2012-11-29 13:45 ` George-Cristian Bîrzan
2012-11-29 13:56 ` Gleb Natapov
2012-11-29 20:34 ` Vadim Rozenfeld
2012-11-28 19:18 ` George-Cristian Bîrzan
2012-11-28 19:56 ` Gleb Natapov
2012-11-28 20:01 ` George-Cristian Bîrzan
2012-11-28 20:12 ` Gleb Natapov
-- strict thread matches above, loose matches on Subject: below --
2015-10-15 13:38 kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] Rainer Fügenstein
2015-10-16 1:15 ` Neil Brown
2015-10-24 16:15 ` performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) Rainer Fügenstein
2015-10-24 16:31 ` Roman Mamedov
2015-10-25 19:23 ` Rainer Fügenstein
2015-10-25 20:08 ` Neil Brown
2015-11-02 22:55 ` performance issue Rainer Fügenstein
2015-11-03 1:34 ` Neil Brown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.