Performance issue

All of lore.kernel.org
 help / color / mirror / Atom feed

* Performance issue
@ 2012-11-22 19:17 George-Cristian Bîrzan
  2012-11-23  7:26 ` Stefan Hajnoczi
  2012-11-25 15:19 ` Gleb Natapov
  0 siblings, 2 replies; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-22 19:17 UTC (permalink / raw)
  To: kvm

I'm trying to understand a performance problem (50% degradation in the
VM) that I'm experiencing some systems with qemu-kvm. Running Fedora
with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1
on AMD Opteron 6176 and 6174, and all of them behave identically.

A Windows guest is receiving a UDP MPEG stream that is being processed
by TSReader. The stream comes in at about 73Mbps, but the VM cannot
process more than 43Mbps. It's not a networking issue, the packets
reach the guest and with iperf we can easily do 80Mbps. Also, with
iperf, it can receive the packets from the streamer (even though it
doesn't detect things properly, but it was just a way to see ).

However, on an identical host (a 6174 CPU, even), a Windows install
has absolutely no problem processing the same stream.

This is the command we're using to start qemu-kvm:

/usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M
pc-1.2 -cpu host -enable-kvm -m 16384 -smp
16,sockets=1,cores=16,threads=1 -uuid
b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-81067503a6ad.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-drive file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c-7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,aio=native
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr=0x3
-netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr=0x4
-vnc 127.0.0.1:4,password -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

As a sidenote, the TSReader application only uses one thread for
decoding the stream, one for network IO. While using more threads
would solve the problem.

I've tried smaller guest, with 5 cores, pinned all of them to CPUs 6
to 11 (all in a NUMA node), each to an individual CPU, I've tried
enabling huge pages/TLB thingy... and that's about it. I'm completely
stuck.

Is this 50% hit something that's considered 'okay', or am I doing
something wrong? And if the latter, what/how can I debug it?

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-22 19:17 Performance issue George-Cristian Bîrzan
@ 2012-11-23  7:26 ` Stefan Hajnoczi
       [not found]   ` <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com>
  2012-11-25 15:19 ` Gleb Natapov
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Hajnoczi @ 2012-11-23  7:26 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: kvm

On Thu, Nov 22, 2012 at 09:17:34PM +0200, George-Cristian Bîrzan wrote:
> I'm trying to understand a performance problem (50% degradation in the
> VM) that I'm experiencing some systems with qemu-kvm. Running Fedora
> with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1
> on AMD Opteron 6176 and 6174, and all of them behave identically.
> 
> A Windows guest is receiving a UDP MPEG stream that is being processed
> by TSReader. The stream comes in at about 73Mbps, but the VM cannot
> process more than 43Mbps. It's not a networking issue, the packets
> reach the guest and with iperf we can easily do 80Mbps. Also, with
> iperf, it can receive the packets from the streamer (even though it
> doesn't detect things properly, but it was just a way to see ).

Hi George-Cristian,
On IRC you mentioned you found a solution.  Any updates?  Are you still
seeing the performance problem?

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

[parent not found: <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com>]

* Fwd: Performance issue
       [not found]   ` <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com>
@ 2012-11-23 14:02     ` George-Cristian Bîrzan
  0 siblings, 0 replies; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-23 14:02 UTC (permalink / raw)
  To: kvm, Stefan Hajnoczi

On Fri, Nov 23, 2012 at 9:26 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> Hi George-Cristian,
> On IRC you mentioned you found a solution.  Any updates?  Are you still
> seeing the performance problem?

It wasn't a solution, I just thought I knew why. I was thinking the
73Mbps were coming in at 188 bytes per packet, which would've been too
many packets for the machine to handle, probably. Turns out, the
stream is coming in at 1358 bytes, which means I'm back to square one.

Also, I just got in to work, and will try to write my own program to
read the stream. The actual workload that these VMs will have to do is
actually not as simple as just decoding the stream, they have to
transcode them, but I don't have access to the source to see exactly
what it's doing (same withe tsreader, but at least that's not
something in house for our customer.)

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-22 19:17 Performance issue George-Cristian Bîrzan
  2012-11-23  7:26 ` Stefan Hajnoczi
@ 2012-11-25 15:19 ` Gleb Natapov
  2012-11-25 16:17   ` George-Cristian Bîrzan
  1 sibling, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-11-25 15:19 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: kvm

On Thu, Nov 22, 2012 at 09:17:34PM +0200, George-Cristian Bîrzan wrote:
> I'm trying to understand a performance problem (50% degradation in the
> VM) that I'm experiencing some systems with qemu-kvm. Running Fedora
> with 3.5.3-1.fc17.x86_64 or 3.6.6-1.fc17.x86_64, qemu 1.0.1 or 1.2.1
> on AMD Opteron 6176 and 6174, and all of them behave identically.
> 
> A Windows guest is receiving a UDP MPEG stream that is being processed
> by TSReader. The stream comes in at about 73Mbps, but the VM cannot
> process more than 43Mbps. It's not a networking issue, the packets
> reach the guest and with iperf we can easily do 80Mbps. Also, with
> iperf, it can receive the packets from the streamer (even though it
> doesn't detect things properly, but it was just a way to see ).
> 
> However, on an identical host (a 6174 CPU, even), a Windows install
> has absolutely no problem processing the same stream.
> 
What Windows is this? Can you try changing "-cpu host" to "-cpu
host,+hv_relaxed"?

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-25 15:19 ` Gleb Natapov
@ 2012-11-25 16:17   ` George-Cristian Bîrzan
  2012-11-26 19:31     ` George-Cristian Bîrzan
  0 siblings, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-25 16:17 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote:
> What Windows is this? Can you try changing "-cpu host" to "-cpu
> host,+hv_relaxed"?

This is on Windows Server 2008 R2 (sorry, forgot to mention that I
guess), and I can try it tomorrow (US time), as getting a stream my
way depends on complicated stuff. I will though, and let you know how
it goes.

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-25 16:17   ` George-Cristian Bîrzan
@ 2012-11-26 19:31     ` George-Cristian Bîrzan
  2012-11-27 12:20       ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-26 19:31 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote:
> On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote:
>> What Windows is this? Can you try changing "-cpu host" to "-cpu
>> host,+hv_relaxed"?
>
> This is on Windows Server 2008 R2 (sorry, forgot to mention that I
> guess), and I can try it tomorrow (US time), as getting a stream my
> way depends on complicated stuff. I will though, and let you know how
> it goes.

I changed that, no difference.


--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-26 19:31     ` George-Cristian Bîrzan
@ 2012-11-27 12:20       ` Gleb Natapov
  2012-11-27 12:29         ` George-Cristian Bîrzan
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-11-27 12:20 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: kvm

On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote:
> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote:
> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >> What Windows is this? Can you try changing "-cpu host" to "-cpu
> >> host,+hv_relaxed"?
> >
> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I
> > guess), and I can try it tomorrow (US time), as getting a stream my
> > way depends on complicated stuff. I will though, and let you know how
> > it goes.
> 
> I changed that, no difference.
> 
> 
Heh, I forgot that the part that should make difference is not yet
upstream :(

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-27 12:20       ` Gleb Natapov
@ 2012-11-27 12:29         ` George-Cristian Bîrzan
  2012-11-27 14:54           ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-27 12:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On Tue, Nov 27, 2012 at 2:20 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote:
>> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote:
>> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote:
>> >> What Windows is this? Can you try changing "-cpu host" to "-cpu
>> >> host,+hv_relaxed"?
>> >
>> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I
>> > guess), and I can try it tomorrow (US time), as getting a stream my
>> > way depends on complicated stuff. I will though, and let you know how
>> > it goes.
>>
>> I changed that, no difference.
>>
>>
> Heh, I forgot that the part that should make difference is not yet
> upstream :(

We can try recompiling kvm/qemu with some patches, if that'd help. At
this point, anything is on the table except changing Windows and the
hardware :-)

Also, it might be that the software doing the actual work is not well
written, but even so...

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-27 12:29         ` George-Cristian Bîrzan
@ 2012-11-27 14:54           ` Gleb Natapov
  2012-11-27 20:38             ` Vadim Rozenfeld
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-11-27 14:54 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: kvm, vrozenfe

On Tue, Nov 27, 2012 at 02:29:20PM +0200, George-Cristian Bîrzan wrote:
> On Tue, Nov 27, 2012 at 2:20 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote:
> >> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> wrote:
> >> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> wrote:
> >> >> What Windows is this? Can you try changing "-cpu host" to "-cpu
> >> >> host,+hv_relaxed"?
> >> >
> >> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I
> >> > guess), and I can try it tomorrow (US time), as getting a stream my
> >> > way depends on complicated stuff. I will though, and let you know how
> >> > it goes.
> >>
> >> I changed that, no difference.
> >>
> >>
> > Heh, I forgot that the part that should make difference is not yet
> > upstream :(
> 
> We can try recompiling kvm/qemu with some patches, if that'd help. At
> this point, anything is on the table except changing Windows and the
> hardware :-)

Vadim do you have Hyper-v reference timer patches for KVM to try?

> 
> Also, it might be that the software doing the actual work is not well
> written, but even so...
> 
> --
> George-Cristian Bîrzan

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-27 14:54           ` Gleb Natapov
@ 2012-11-27 20:38             ` Vadim Rozenfeld
  2012-11-27 21:13               ` George-Cristian Bîrzan
  0 siblings, 1 reply; 23+ messages in thread
From: Vadim Rozenfeld @ 2012-11-27 20:38 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: George-Cristian Bîrzan, kvm

On Tuesday, November 27, 2012 04:54:47 PM Gleb Natapov wrote:
> On Tue, Nov 27, 2012 at 02:29:20PM +0200, George-Cristian Bîrzan wrote:
> > On Tue, Nov 27, 2012 at 2:20 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > > On Mon, Nov 26, 2012 at 09:31:19PM +0200, George-Cristian Bîrzan wrote:
> > >> On Sun, Nov 25, 2012 at 6:17 PM, George-Cristian Bîrzan <gc@birzan.org> 
wrote:
> > >> > On Sun, Nov 25, 2012 at 5:19 PM, Gleb Natapov <gleb@redhat.com> 
wrote:
> > >> >> What Windows is this? Can you try changing "-cpu host" to "-cpu
> > >> >> host,+hv_relaxed"?
> > >> > 
> > >> > This is on Windows Server 2008 R2 (sorry, forgot to mention that I
> > >> > guess), and I can try it tomorrow (US time), as getting a stream my
> > >> > way depends on complicated stuff. I will though, and let you know
> > >> > how it goes.
> > >> 
> > >> I changed that, no difference.
> > > 
> > > Heh, I forgot that the part that should make difference is not yet
> > > upstream :(
> > 
> > We can try recompiling kvm/qemu with some patches, if that'd help. At
> > this point, anything is on the table except changing Windows and the
> > hardware :-)
> 
> Vadim do you have Hyper-v reference timer patches for KVM to try?
I have some code which do both reference time and invariant TSC but it
will not work after migration. I will send it later today.
Vadim.
> 
> > Also, it might be that the software doing the actual work is not well
> > written, but even so...
> > 
> > --
> > George-Cristian Bîrzan
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-27 20:38             ` Vadim Rozenfeld
@ 2012-11-27 21:13               ` George-Cristian Bîrzan
  2012-11-28 11:39                 ` Vadim Rozenfeld
  0 siblings, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-27 21:13 UTC (permalink / raw)
  To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm

On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
> I have some code which do both reference time and invariant TSC but it
> will not work after migration. I will send it later today.

Do you mean migrating guests? This is not an issue for us.

Also, it would be much appreciated!

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-27 21:13               ` George-Cristian Bîrzan
@ 2012-11-28 11:39                 ` Vadim Rozenfeld
  2012-11-28 19:09                   ` George-Cristian Bîrzan
  2012-11-28 19:18                   ` George-Cristian Bîrzan
  0 siblings, 2 replies; 23+ messages in thread
From: Vadim Rozenfeld @ 2012-11-28 11:39 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: Gleb Natapov, kvm

[-- Attachment #1: Type: Text/Plain, Size: 751 bytes --]

On Tuesday, November 27, 2012 11:13:12 PM George-Cristian Bîrzan wrote:
> On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com> 
wrote:
> > I have some code which do both reference time and invariant TSC but it
> > will not work after migration. I will send it later today.
> 
> Do you mean migrating guests? This is not an issue for us.
OK, but don't say I didn't warn you :)

There are two patches, one for kvm and another one for qemu.
you will probably need to rebase them.
Add "hv_tsc" cpu parameter to activate this feature.
you will probably need to deactivate hpet by adding "-no-hpet"
parameter as well.

best regards,
Vadim.

> 
> Also, it would be much appreciated!
> 
> --
> George-Cristian Bîrzan

[-- Attachment #2: hv_time_kvm.diff --]
[-- Type: text/x-patch, Size: 4028 bytes --]

diff --git a/arch/x86/include/asm/hyperv.h b/arch/x86/include/asm/hyperv.h
index b80420b..9c5ffef 100644
--- a/arch/x86/include/asm/hyperv.h
+++ b/arch/x86/include/asm/hyperv.h
@@ -136,6 +136,9 @@
 /* MSR used to read the per-partition time reference counter */
 #define HV_X64_MSR_TIME_REF_COUNT		0x40000020
 
+/* A partition's reference time stamp counter (TSC) page */
+#define HV_X64_MSR_REFERENCE_TSC		0x40000021
+
 /* Define the virtual APIC registers */
 #define HV_X64_MSR_EOI				0x40000070
 #define HV_X64_MSR_ICR				0x40000071
@@ -179,6 +182,10 @@
 #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_MASK	\
 		(~((1ull << HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
 
+#define HV_X64_MSR_TSC_REFERENCE_ENABLE			0x00000001
+#define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT		12
+
+
 #define HV_PROCESSOR_POWER_STATE_C0		0
 #define HV_PROCESSOR_POWER_STATE_C1		1
 #define HV_PROCESSOR_POWER_STATE_C2		2
@@ -191,4 +198,11 @@
 #define HV_STATUS_INVALID_ALIGNMENT		4
 #define HV_STATUS_INSUFFICIENT_BUFFERS		19
 
+typedef struct _HV_REFERENCE_TSC_PAGE {
+    uint32_t TscSequence;
+    uint32_t Rserved1;
+    uint64_t TscScale;
+    int64_t  TscOffset;
+} HV_REFERENCE_TSC_PAGE, * PHV_REFERENCE_TSC_PAGE;
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..63ee09e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -565,6 +565,8 @@ struct kvm_arch {
 	/* fields used by HYPER-V emulation */
 	u64 hv_guest_os_id;
 	u64 hv_hypercall;
+	u64 hv_ref_count;
+	u64 hv_tsc_page;
 
 	#ifdef CONFIG_KVM_MMU_AUDIT
 	int audit_point;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f76417..4538295 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -813,7 +813,7 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
-	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
+	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_REFERENCE_TSC,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
 	MSR_KVM_PV_EOI_EN,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1428,6 +1428,8 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
 	switch (msr) {
 	case HV_X64_MSR_GUEST_OS_ID:
 	case HV_X64_MSR_HYPERCALL:
+	case HV_X64_MSR_TIME_REF_COUNT:
+	case HV_X64_MSR_REFERENCE_TSC:
 		r = true;
 		break;
 	}
@@ -1438,6 +1440,7 @@ static bool kvm_hv_msr_partition_wide(u32 msr)
 static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
+	unsigned long addr;
 
 	switch (msr) {
 	case HV_X64_MSR_GUEST_OS_ID:
@@ -1467,6 +1470,27 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		if (__copy_to_user((void __user *)addr, instructions, 4))
 			return 1;
 		kvm->arch.hv_hypercall = data;
+		kvm->arch.hv_ref_count = get_kernel_ns();
+		break;
+	}
+	case HV_X64_MSR_REFERENCE_TSC: {
+		HV_REFERENCE_TSC_PAGE tsc_ref;
+		tsc_ref.TscSequence =
+			boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ? 1 : 0;
+		tsc_ref.TscScale =
+			((10000LL << 32) /vcpu->arch.virtual_tsc_khz) << 32;
+		tsc_ref.TscOffset = 0;
+		if (!(data & HV_X64_MSR_TSC_REFERENCE_ENABLE)) {
+			kvm->arch.hv_tsc_page = data;
+			break;
+		}
+		addr = gfn_to_hva(vcpu->kvm, data >>
+			HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT);
+		if (kvm_is_error_hva(addr))
+			return 1;
+		if(__copy_to_user((void __user *)addr, &tsc_ref, sizeof(tsc_ref)))
+			return 1;
+		kvm->arch.hv_tsc_page = data;
 		break;
 	}
 	default:
@@ -1881,6 +1905,13 @@ static int get_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 	case HV_X64_MSR_HYPERCALL:
 		data = kvm->arch.hv_hypercall;
 		break;
+	case HV_X64_MSR_TIME_REF_COUNT:
+		data = get_kernel_ns() - kvm->arch.hv_ref_count;
+		do_div(data, 100);
+		break;
+	case HV_X64_MSR_REFERENCE_TSC:
+		data = kvm->arch.hv_tsc_page;
+		break;
 	default:
 		vcpu_unimpl(vcpu, "Hyper-V unhandled rdmsr: 0x%x\n", msr);
 		return 1;

[-- Attachment #3: hv_time_qemu.diff --]
[-- Type: text/x-patch, Size: 4666 bytes --]

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index f3708e6..ad77b72 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1250,6 +1250,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model)
             hyperv_enable_relaxed_timing(true);
         } else if (!strcmp(featurestr, "hv_vapic")) {
             hyperv_enable_vapic_recommended(true);
+        } else if (!strcmp(featurestr, "hv_tsc")) {
+            hyperv_enable_tsc_recommended(true);
         } else {
             fprintf(stderr, "feature string `%s' not in format (+feature|-feature|feature=xyz)\n", featurestr);
             goto error;
diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
index f284e99..bd581a1 100644
--- a/target-i386/hyperv.c
+++ b/target-i386/hyperv.c
@@ -15,6 +15,12 @@
 static bool hyperv_vapic;
 static bool hyperv_relaxed_timing;
 static int hyperv_spinlock_attempts = HYPERV_SPINLOCK_NEVER_RETRY;
+static bool hyperv_tsc;
+
+void hyperv_enable_tsc_recommended(bool val)
+{
+    hyperv_tsc = val;
+}
 
 void hyperv_enable_vapic_recommended(bool val)
 {
@@ -42,12 +48,18 @@ bool hyperv_enabled(void)
 bool hyperv_hypercall_available(void)
 {
     if (hyperv_vapic ||
+        hyperv_tsc ||
         (hyperv_spinlock_attempts != HYPERV_SPINLOCK_NEVER_RETRY)) {
       return true;
     }
     return false;
 }
 
+bool hyperv_tsc_recommended(void)
+{
+    return hyperv_tsc;
+}
+
 bool hyperv_vapic_recommended(void)
 {
     return hyperv_vapic;
diff --git a/target-i386/hyperv.h b/target-i386/hyperv.h
index bacb1d4..94c2d6e 100644
--- a/target-i386/hyperv.h
+++ b/target-i386/hyperv.h
@@ -27,10 +27,12 @@
 #endif
 
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_KVM)
+void hyperv_enable_tsc_recommended(bool val);
 void hyperv_enable_vapic_recommended(bool val);
 void hyperv_enable_relaxed_timing(bool val);
 void hyperv_set_spinlock_retries(int val);
 #else
+static inline void hyperv_enable_tsc_recommended(bool val) { }
 static inline void hyperv_enable_vapic_recommended(bool val) { }
 static inline void hyperv_enable_relaxed_timing(bool val) { }
 static inline void hyperv_set_spinlock_retries(int val) { }
@@ -38,6 +40,7 @@ static inline void hyperv_set_spinlock_retries(int val) { }
 
 bool hyperv_enabled(void);
 bool hyperv_hypercall_available(void);
+bool hyperv_tsc_recommended(void);
 bool hyperv_vapic_recommended(void);
 bool hyperv_relaxed_timing_enabled(void);
 int hyperv_get_spinlock_retries(void);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5b18383..dc7f259 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -390,13 +390,17 @@ int kvm_arch_init_vcpu(CPUX86State *env)
     c = &cpuid_data.entries[cpuid_i++];
     memset(c, 0, sizeof(*c));
     c->function = KVM_CPUID_SIGNATURE;
-    if (!hyperv_enabled()) {
-        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-        c->eax = 0;
-    } else {
-        memcpy(signature, "Microsoft Hv", 12);
+    memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+    if (hyperv_enabled()) {
         c->eax = HYPERV_CPUID_MIN;
     }
+//    if (!hyperv_enabled()) {
+//        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+//        c->eax = 0;
+//    } else {
+//        memcpy(signature, "Microsoft Hv", 12);
+//        c->eax = HYPERV_CPUID_MIN;
+//    }
     c->ebx = signature[0];
     c->ecx = signature[1];
     c->edx = signature[2];
@@ -427,7 +431,11 @@ int kvm_arch_init_vcpu(CPUX86State *env)
             c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
             c->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
         }
-
+        if (hyperv_tsc_recommended()) {
+            c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
+            c->eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
+            c->eax |= 0x200;
+        }
         c = &cpuid_data.entries[cpuid_i++];
         memset(c, 0, sizeof(*c));
         c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
@@ -445,14 +453,14 @@ int kvm_arch_init_vcpu(CPUX86State *env)
         c->eax = 0x40;
         c->ebx = 0x40;
 
-        c = &cpuid_data.entries[cpuid_i++];
-        memset(c, 0, sizeof(*c));
-        c->function = KVM_CPUID_SIGNATURE_NEXT;
-        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-        c->eax = 0;
-        c->ebx = signature[0];
-        c->ecx = signature[1];
-        c->edx = signature[2];
+//        c = &cpuid_data.entries[cpuid_i++];
+//        memset(c, 0, sizeof(*c));
+//        c->function = KVM_CPUID_SIGNATURE_NEXT;
+//        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+//        c->eax = 0;
+//        c->ebx = signature[0];
+//        c->ecx = signature[1];
+//        c->edx = signature[2];
     }
 
     has_msr_async_pf_en = c->eax & (1 << KVM_FEATURE_ASYNC_PF);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-28 11:39                 ` Vadim Rozenfeld
@ 2012-11-28 19:09                   ` George-Cristian Bîrzan
  2012-11-29 11:56                     ` Vadim Rozenfeld
  2012-11-28 19:18                   ` George-Cristian Bîrzan
  1 sibling, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-28 19:09 UTC (permalink / raw)
  To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm

On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
> On Tuesday, November 27, 2012 11:13:12 PM George-Cristian Bîrzan wrote:
>> On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com>
> wrote:
>> > I have some code which do both reference time and invariant TSC but it
>> > will not work after migration. I will send it later today.
>>
>> Do you mean migrating guests? This is not an issue for us.
> OK, but don't say I didn't warn you :)
>
> There are two patches, one for kvm and another one for qemu.
> you will probably need to rebase them.
> Add "hv_tsc" cpu parameter to activate this feature.
> you will probably need to deactivate hpet by adding "-no-hpet"
> parameter as well.

I've also added +hv_relaxed since then, but this is the command I'm
using now and there's no change:

/usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M
pc-1.2 -enable-kvm -m 16384 -smp 9,sockets=1,cores=9,threads=1 -uuid
b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-81067503a6ad.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-hpet -no-shutdown -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c-7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,aio=native
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=36 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr=0x3
-netdev tap,fd=40,id=hostnet1,vhost=on,vhostfd=41 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr=0x4
-vnc 127.0.0.1:0,password -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -cpu host,hv_tsc

I compiled qemu-1.2.0-24 after applying your patch, used the head for
KVM, and I see no difference. I've tried setting windows'
useplatformclock on and off, no change either.


Other than that, was looking into a profiling trace of the software
running and a lot of time (60%?) is spent calling two functions from
hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and
HalpHPETProgramRolloverTimer which do point at something related to
the timers.

Any other thing I can try?


--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-28 19:09                   ` George-Cristian Bîrzan
@ 2012-11-29 11:56                     ` Vadim Rozenfeld
  2012-11-29 13:45                       ` George-Cristian Bîrzan
  0 siblings, 1 reply; 23+ messages in thread
From: Vadim Rozenfeld @ 2012-11-29 11:56 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: Gleb Natapov, kvm

On Wednesday, November 28, 2012 09:09:29 PM George-Cristian Bîrzan wrote:
> On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> 
wrote:
> > On Tuesday, November 27, 2012 11:13:12 PM George-Cristian Bîrzan wrote:
> >> On Tue, Nov 27, 2012 at 10:38 PM, Vadim Rozenfeld <vrozenfe@redhat.com>
> > 
> > wrote:
> >> > I have some code which do both reference time and invariant TSC but it
> >> > will not work after migration. I will send it later today.
> >> 
> >> Do you mean migrating guests? This is not an issue for us.
> > 
> > OK, but don't say I didn't warn you :)
> > 
> > There are two patches, one for kvm and another one for qemu.
> > you will probably need to rebase them.
> > Add "hv_tsc" cpu parameter to activate this feature.
> > you will probably need to deactivate hpet by adding "-no-hpet"
> > parameter as well.
> 
> I've also added +hv_relaxed since then, but this is the command I'm

I would suggest activating relaxed timing for all W2K8R2/Win7 guests.

> using now and there's no change:
> 
> /usr/bin/qemu-kvm -name b691546e-79f8-49c6-a293-81067503a6ad -S -M
> pc-1.2 -enable-kvm -m 16384 -smp 9,sockets=1,cores=9,threads=1 -uuid
> b691546e-79f8-49c6-a293-81067503a6ad -no-user-config -nodefaults
> -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/b691546e-79f8-49c6-a293-8
> 1067503a6ad.monitor,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> -no-hpet -no-shutdown -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> file=/var/lib/libvirt/images/dis-magnetics-2-223101/d8b233c6-8424-4de9-ae3c
> -7c9a60288514,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback,ai
> o=native -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=vir
> tio-disk0,bootindex=1 -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=36
> -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2e:fb:a2:36:be,bus=pci.0,addr
> =0x3 -netdev tap,fd=40,id=hostnet1,vhost=on,vhostfd=41 -device
> virtio-net-pci,netdev=hostnet1,id=net1,mac=22:94:44:5a:cb:24,bus=pci.0,addr
> =0x4 -vnc 127.0.0.1:0,password -vga cirrus -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -cpu host,hv_tsc
> 
> I compiled qemu-1.2.0-24 after applying your patch, used the head for
> KVM, and I see no difference. I've tried setting windows'
> useplatformclock on and off, no change either.
> 
> 
> Other than that, was looking into a profiling trace of the software
> running and a lot of time (60%?) is spent calling two functions from
> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and
> HalpHPETProgramRolloverTimer which do point at something related to
> the timers.
> 
It means that hyper-v time stamp source was not activated.
> Any other thing I can try?
> 
> 
> --
> George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-29 11:56                     ` Vadim Rozenfeld
@ 2012-11-29 13:45                       ` George-Cristian Bîrzan
  2012-11-29 13:56                         ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-29 13:45 UTC (permalink / raw)
  To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm

On Thu, Nov 29, 2012 at 1:56 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
>> I've also added +hv_relaxed since then, but this is the command I'm
>
> I would suggest activating relaxed timing for all W2K8R2/Win7 guests.

Is there any place I can read up on the downsides of this for Linux,
or is Just Better?

>>>> Other than that, was looking into a profiling trace of the software
>> running and a lot of time (60%?) is spent calling two functions from
>> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and
>> HalpHPETProgramRolloverTimer which do point at something related to
>> the timers.
>>
> It means that hyper-v time stamp source was not activated.

I recompiled the whole kernel, with your patch, and while I cannot
check at 70Mbps now, a test stream of 20 seems to do better. Also, now
I don't see any of those functions, which used to account ~60% of the
time spent by the program. I'm waiting for the customer to come back
and start the 'real' stream, but from my tests, time spent in hal.dll
is now an order of magnitude smaller.

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-29 13:45                       ` George-Cristian Bîrzan
@ 2012-11-29 13:56                         ` Gleb Natapov
  2012-11-29 20:34                           ` Vadim Rozenfeld
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-11-29 13:56 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: Vadim Rozenfeld, kvm

On Thu, Nov 29, 2012 at 03:45:52PM +0200, George-Cristian Bîrzan wrote:
> On Thu, Nov 29, 2012 at 1:56 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
> >> I've also added +hv_relaxed since then, but this is the command I'm
> >
> > I would suggest activating relaxed timing for all W2K8R2/Win7 guests.
> 
> Is there any place I can read up on the downsides of this for Linux,
> or is Just Better?
> 
You shouldn't use hyper-v flags for Linux guests. In theory Linux should
just ignore them, in practice there may be bugs that will prevent Linux
from detecting that it runs as a guest and disable optimizations.

> >>>> Other than that, was looking into a profiling trace of the software
> >> running and a lot of time (60%?) is spent calling two functions from
> >> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and
> >> HalpHPETProgramRolloverTimer which do point at something related to
> >> the timers.
> >>
> > It means that hyper-v time stamp source was not activated.
> 
> I recompiled the whole kernel, with your patch, and while I cannot
> check at 70Mbps now, a test stream of 20 seems to do better. Also, now
> I don't see any of those functions, which used to account ~60% of the
> time spent by the program. I'm waiting for the customer to come back
> and start the 'real' stream, but from my tests, time spent in hal.dll
> is now an order of magnitude smaller.
> 
> --
> George-Cristian Bîrzan

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-29 13:56                         ` Gleb Natapov
@ 2012-11-29 20:34                           ` Vadim Rozenfeld
  0 siblings, 0 replies; 23+ messages in thread
From: Vadim Rozenfeld @ 2012-11-29 20:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: George-Cristian Bîrzan, kvm

On Thursday, November 29, 2012 03:56:10 PM Gleb Natapov wrote:
> On Thu, Nov 29, 2012 at 03:45:52PM +0200, George-Cristian Bîrzan wrote:
> > On Thu, Nov 29, 2012 at 1:56 PM, Vadim Rozenfeld <vrozenfe@redhat.com> 
wrote:
> > >> I've also added +hv_relaxed since then, but this is the command I'm
> > > 
> > > I would suggest activating relaxed timing for all W2K8R2/Win7 guests.
> > 
> > Is there any place I can read up on the downsides of this for Linux,
> > or is Just Better?
> 
> You shouldn't use hyper-v flags for Linux guests. In theory Linux should
> just ignore them, in practice there may be bugs that will prevent Linux
> from detecting that it runs as a guest and disable optimizations.
> 
As Gleb said, hyper-v flag are relevant to the Windows guests only. 
IIRC spinlocks and vapic should work for Vista and higher. Relaxed timing and
partition reference time work for Win7/W2K8R2.
> > >>>> Other than that, was looking into a profiling trace of the software
> > >> 
> > >> running and a lot of time (60%?) is spent calling two functions from
> > >> hal.dll, HalpGetPmTimerSleepModePerfCounter when I disable HPET, and
> > >> HalpHPETProgramRolloverTimer which do point at something related to
> > >> the timers.
> > > 
> > > It means that hyper-v time stamp source was not activated.
> > 
> > I recompiled the whole kernel, with your patch, and while I cannot
> > check at 70Mbps now, a test stream of 20 seems to do better. Also, now
> > I don't see any of those functions, which used to account ~60% of the
> > time spent by the program. I'm waiting for the customer to come back
> > and start the 'real' stream, but from my tests, time spent in hal.dll
> > is now an order of magnitude smaller.
> > 
> > --
> > George-Cristian Bîrzan
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-28 11:39                 ` Vadim Rozenfeld
  2012-11-28 19:09                   ` George-Cristian Bîrzan
@ 2012-11-28 19:18                   ` George-Cristian Bîrzan
  2012-11-28 19:56                     ` Gleb Natapov
  1 sibling, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-28 19:18 UTC (permalink / raw)
  To: Vadim Rozenfeld; +Cc: Gleb Natapov, kvm

On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
> There are two patches, one for kvm and another one for qemu.

I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-28 19:18                   ` George-Cristian Bîrzan
@ 2012-11-28 19:56                     ` Gleb Natapov
  2012-11-28 20:01                       ` George-Cristian Bîrzan
  0 siblings, 1 reply; 23+ messages in thread
From: Gleb Natapov @ 2012-11-28 19:56 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: Vadim Rozenfeld, kvm

On Wed, Nov 28, 2012 at 09:18:38PM +0200, George-Cristian Bîrzan wrote:
> On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
> > There are two patches, one for kvm and another one for qemu.
> 
> I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu
> 
Does not matter, but you need to also recompile kernel with the first patch.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-28 19:56                     ` Gleb Natapov
@ 2012-11-28 20:01                       ` George-Cristian Bîrzan
  2012-11-28 20:12                         ` Gleb Natapov
  0 siblings, 1 reply; 23+ messages in thread
From: George-Cristian Bîrzan @ 2012-11-28 20:01 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Vadim Rozenfeld, kvm

On Wed, Nov 28, 2012 at 9:56 PM, Gleb Natapov <gleb@redhat.com> wrote:
> On Wed, Nov 28, 2012 at 09:18:38PM +0200, George-Cristian Bîrzan wrote:
>> On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
>> > There are two patches, one for kvm and another one for qemu.
>>
>> I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu
>>
> Does not matter, but you need to also recompile kernel with the first patch.

Do I have to recompile the kernel, or just the module? I followed the
instructions at
http://www.linux-kvm.org/page/Code#building_an_external_module_with_older_kernels
but I guess I can do the whole kernel, if it might help.

--
George-Cristian Bîrzan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Performance issue
  2012-11-28 20:01                       ` George-Cristian Bîrzan
@ 2012-11-28 20:12                         ` Gleb Natapov
  0 siblings, 0 replies; 23+ messages in thread
From: Gleb Natapov @ 2012-11-28 20:12 UTC (permalink / raw)
  To: George-Cristian Bîrzan; +Cc: Vadim Rozenfeld, kvm

On Wed, Nov 28, 2012 at 10:01:04PM +0200, George-Cristian Bîrzan wrote:
> On Wed, Nov 28, 2012 at 9:56 PM, Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Nov 28, 2012 at 09:18:38PM +0200, George-Cristian Bîrzan wrote:
> >> On Wed, Nov 28, 2012 at 1:39 PM, Vadim Rozenfeld <vrozenfe@redhat.com> wrote:
> >> > There are two patches, one for kvm and another one for qemu.
> >>
> >> I just realised this. I was supposed to use qemu, or qemu-kvm? I used qemu
> >>
> > Does not matter, but you need to also recompile kernel with the first patch.
> 
> Do I have to recompile the kernel, or just the module? I followed the
> instructions at
> http://www.linux-kvm.org/page/Code#building_an_external_module_with_older_kernels
> but I guess I can do the whole kernel, if it might help.
> 
Module is enough, but kvm-kmod is not what you want. Just rebuild the
whole kernel if you do not know how to rebuild only the module for your
distribution's kernel.

--
			Gleb.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
@ 2015-10-15 13:38 Rainer Fügenstein
  2015-10-16  1:15 ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Rainer Fügenstein @ 2015-10-15 13:38 UTC (permalink / raw)
  To: Linux-RAID

Hi,

my  NAS-like  server with 5*3TB SATA drives in RAID5 configuration was
running  without  problems  for  what seems an eternity; since about 3
weeks it keeps freezing every other day with the following error:

# grep soft /var/log/messages
Oct 15 11:26:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:26:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:26:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:26:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:26:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
Oct 15 11:27:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
Oct 15 11:28:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:28:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:28:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:28:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:28:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
[...]
this  is  only  part  of  the story, check the end of this message for
a detailed log.

sometimes the server recovers after 60+ seconds, sometimes it requires
a hard reset (causing mdraid to re-sync the whole array).

IIRC,  it  started  when  a  drive  in  the  array  failed  with "SATA
connection  timeouts" (kind of). this drive has been replaced by a new
one, but yet the  CPU lockups keep coming.

I  suspect  that  aging  hardware  slowly starts to fail, but not sure
which part (drives? SATA controller? cables? NIC? CPU? ...)

here's some info that might be useful:
# uname -a
Linux alfred 2.6.18-406.el5 #1 SMP Tue Jun 2 17:25:57 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[7] sdf1[3] sdc1[5] sde1[0] sdd1[8]
      11721061376 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
      [=>...................]  resync =  5.2% (154579584/2930265344) finish=3347.7min speed=13816K/sec

unused devices: <none>

excerpt:
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata9.00: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133
ata9.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata9.00: configured for UDMA/133
sdb : very big device. try to use READ CAPACITY(16).
SCSI device sdb: 5860533168 512-byte hdwr sectors (3000593 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
sdb : very big device. try to use READ CAPACITY(16).
SCSI device sdb: 5860533168 512-byte hdwr sectors (3000593 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
 sdb: sdb1
sd 4:0:0:0: Attached scsi disk sdb
sd 4:0:0:0: Attached scsi generic sg1 type 0
  Vendor: ATA       Model: WDC WD30EZRX-00D  Rev: 80.0
  Type:   Direct-Access                      ANSI SCSI revision: 05

# lspci
00:00.0 Host bridge: Intel Corporation Atom Processor D4xx/D5xx/N4xx/N5xx DMI Bridge (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Atom Processor D4xx/D5xx/N4xx/N5xx Integrated Graphics Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 01)
00:1c.1 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 2 (rev 01)
00:1c.2 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 3 (rev 01)
00:1c.3 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 4 (rev 01)
00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 (rev 01)
00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 (rev 01)
00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 (rev 01)
00:1d.3 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #4 (rev 01)
00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation NM10 Family LPC Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation NM10/ICH7 Family SATA Controller [AHCI mode] (rev 01)
00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)
05:00.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)

# cat /proc/cpuinfo
[...]
processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 28
model name      :          Intel(R) Atom(TM) CPU D510   @ 1.66GHz
stepping        : 10
cpu MHz         : 1666.686
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 3333.36
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

 = = = detailed log:

Oct 15 11:27:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
Oct 15 11:27:49 alfred kernel: CPU 1:
Oct 15 11:27:49 alfred kernel: Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat ip_
nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 ipv6 xfrm_nalgo crypto
_api xfs loop dm_multipath scsi_dh raid456 xor video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acp
i acpi_memhotplug ac parport_pc lp parport sg i2c_i801 i2c_core serio_raw tpm_tis pcspkr tpm sata_mv r8169 tpm_bios shpchp mii d
m_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci libata sd_mod scsi_
mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Oct 15 11:27:49 alfred kernel: Pid: 1614, comm: md0_raid5 Not tainted 2.6.18-406.el5 #1
Oct 15 11:27:49 alfred kernel: RIP: 0010:[<ffffffff881d35a2>]  [<ffffffff881d35a2>] :r8169:rtl8169_interrupt+0x248/0x26f
Oct 15 11:27:49 alfred kernel: RSP: 0018:ffff81007eec7df8  EFLAGS: 00000206
Oct 15 11:27:49 alfred kernel: RAX: 0000000000000040 RBX: ffff81007de0a000 RCX: 0000000000000042
Oct 15 11:27:49 alfred kernel: RDX: 00000000ffe2001d RSI: ffffffff80047254 RDI: ffff81007de0a180
Oct 15 11:27:49 alfred kernel: RBP: ffff81007eec7d70 R08: 0000000000000003 R09: ffffffff8005e298
Oct 15 11:27:49 alfred kernel: R10: 0000000000000001 R11: 0000000000000060 R12: ffffffff8005dc9e
Oct 15 11:27:49 alfred kernel: R13: 0000000000000040 R14: ffffffff800796ae R15: ffff81007eec7d70
Oct 15 11:27:49 alfred kernel: FS:  0000000000000000(0000) GS:ffff81007ef179c0(0000) knlGS:0000000000000000
Oct 15 11:27:49 alfred kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Oct 15 11:27:49 alfred kernel: CR2: 00002b0a2bbba30c CR3: 00000000547e8000 CR4: 00000000000006a0
Oct 15 11:27:49 alfred kernel:
Oct 15 11:27:49 alfred kernel: Call Trace:
Oct 15 11:27:49 alfred kernel:  <IRQ>  [<ffffffff881d356b>] :r8169:rtl8169_interrupt+0x211/0x26f
Oct 15 11:27:49 alfred kernel:  [<ffffffff80010dc0>] handle_IRQ_event+0x51/0xa6
Oct 15 11:27:49 alfred kernel:  [<ffffffff800becc5>] __do_IRQ+0xfb/0x15b
Oct 15 11:27:49 alfred kernel:  [<ffffffff8006d4c5>] do_IRQ+0xe9/0xf7
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005d625>] ret_from_intr+0x0/0xa
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
Oct 15 11:27:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005dc9e>] apic_timer_interrupt+0x66/0x6c
Oct 15 11:27:49 alfred kernel:  <EOI>  [<ffffffff80064b30>] _spin_unlock_irqrestore+0x8/0x9
Oct 15 11:27:49 alfred kernel:  [<ffffffff88075d16>] :scsi_mod:scsi_dispatch_cmd+0x207/0x2b1
Oct 15 11:27:49 alfred kernel:  [<ffffffff8807b926>] :scsi_mod:scsi_request_fn+0x2c3/0x392
Oct 15 11:27:49 alfred kernel:  [<ffffffff8014af49>] elv_insert+0xac/0x1c4
Oct 15 11:27:49 alfred kernel:  [<ffffffff8000c21c>] __make_request+0x47f/0x4ce
Oct 15 11:27:49 alfred kernel:  [<ffffffff8001c84f>] generic_make_request+0x211/0x228
Oct 15 11:27:49 alfred kernel:  [<ffffffff8001b125>] bio_alloc_bioset+0x89/0xd9
Oct 15 11:27:49 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:27:49 alfred kernel:  [<ffffffff8003368c>] submit_bio+0xe6/0xed
Oct 15 11:27:49 alfred kernel:  [<ffffffff80222dfe>] md_update_sb+0x1af/0x23a
Oct 15 11:27:49 alfred kernel:  [<ffffffff8022812e>] md_check_recovery+0x15d/0x454
Oct 15 11:27:49 alfred kernel:  [<ffffffff8833549f>] :raid456:raid5d+0x15/0x182
Oct 15 11:27:49 alfred kernel:  [<ffffffff8003b13b>] prepare_to_wait+0x34/0x61
Oct 15 11:27:49 alfred kernel:  [<ffffffff80225acc>] md_thread+0xf8/0x10e
Oct 15 11:27:49 alfred kernel:  [<ffffffff800a3fb1>] autoremove_wake_function+0x0/0x2e
Oct 15 11:27:49 alfred kernel:  [<ffffffff802259d4>] md_thread+0x0/0x10e
Oct 15 11:27:49 alfred kernel:  [<ffffffff80032c1d>] kthread+0xfe/0x132
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 15 11:27:49 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:27:49 alfred kernel:  [<ffffffff80032b1f>] kthread+0x0/0x132
Oct 15 11:27:49 alfred kernel:  [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 15 11:27:49 alfred kernel:

Oct 15 11:28:14 alfred kernel: INFO: task pdflush:10294 blocked for more than 120 seconds.
Oct 15 11:28:14 alfred kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 11:28:14 alfred kernel: pdflush       D ffff810002536420     0 10294     27         10375  1706 (L-TLB)
Oct 15 11:28:14 alfred kernel:  ffff81006318baa0 0000000000000046 0000000000000003 0000000082147ce0
Oct 15 11:28:14 alfred kernel:  00900000000000d8 000000000000000a ffff8100614ff040 ffffffff8031db60
Oct 15 11:28:14 alfred kernel:  00004a61ef4e2a4e 0000000000008115 ffff8100614ff228 000000006166ea40
Oct 15 11:28:14 alfred kernel: Call Trace:
Oct 15 11:28:14 alfred kernel:  [<ffffffff80224647>] md_write_start+0xf2/0x108
Oct 15 11:28:14 alfred kernel:  [<ffffffff800a3fb1>] autoremove_wake_function+0x0/0x2e
Oct 15 11:28:14 alfred kernel:  [<ffffffff883cce08>] :xfs:xfs_page_state_convert+0x4f7/0x546
Oct 15 11:28:14 alfred kernel:  [<ffffffff88335db1>] :raid456:make_request+0x4e/0x4e3
Oct 15 11:28:14 alfred kernel:  [<ffffffff8001c84f>] generic_make_request+0x211/0x228
Oct 15 11:28:14 alfred kernel:  [<ffffffff800238ac>] mempool_alloc+0x31/0xe7
Oct 15 11:28:14 alfred kernel:  [<ffffffff8003368c>] submit_bio+0xe6/0xed
Oct 15 11:28:14 alfred kernel:  [<ffffffff883ce805>] :xfs:_xfs_buf_ioapply+0x1f2/0x254
Oct 15 11:28:14 alfred kernel:  [<ffffffff883ce8a0>] :xfs:xfs_buf_iorequest+0x39/0x64
Oct 15 11:28:14 alfred kernel:  [<ffffffff883b89e2>] :xfs:xlog_bdstrat_cb+0x16/0x3c
Oct 15 11:28:14 alfred kernel:  [<ffffffff883b99e4>] :xfs:xlog_sync+0x218/0x3ad
Oct 15 11:28:14 alfred kernel:  [<ffffffff883ba744>] :xfs:xlog_state_sync_all+0xb9/0x1d9
Oct 15 11:28:14 alfred kernel:  [<ffffffff883bacc7>] :xfs:_xfs_log_force+0x59/0x68
Oct 15 11:28:14 alfred kernel:  [<ffffffff883bace1>] :xfs:xfs_log_force+0xb/0x3f
Oct 15 11:28:14 alfred kernel:  [<ffffffff883c6587>] :xfs:xfs_syncsub+0x33/0x226
Oct 15 11:28:14 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel:  [<ffffffff883d3cad>] :xfs:xfs_fs_write_super+0x1b/0x21
Oct 15 11:28:14 alfred kernel:  [<ffffffff800e8c5a>] sync_supers+0x80/0xe1
Oct 15 11:28:14 alfred kernel:  [<ffffffff8005697a>] pdflush+0x0/0x1fb
Oct 15 11:28:14 alfred kernel:  [<ffffffff800cdca0>] wb_kupdate+0x3e/0x16a
Oct 15 11:28:14 alfred kernel:  [<ffffffff8005697a>] pdflush+0x0/0x1fb
Oct 15 11:28:14 alfred kernel:  [<ffffffff80056acb>] pdflush+0x151/0x1fb
Oct 15 11:28:14 alfred kernel:  [<ffffffff800cdc62>] wb_kupdate+0x0/0x16a
Oct 15 11:28:14 alfred kernel:  [<ffffffff80032c1d>] kthread+0xfe/0x132
Oct 15 11:28:14 alfred kernel:  [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 15 11:28:14 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel:  [<ffffffff80032b1f>] kthread+0x0/0x132
Oct 15 11:28:14 alfred kernel:  [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 15 11:28:14 alfred kernel:
Oct 15 11:28:14 alfred kernel: INFO: task md0_resync:13543 blocked for more than 120 seconds.
Oct 15 11:28:14 alfred kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 11:28:14 alfred kernel: md0_resync    D ffff810037f117f0     0 13543     27               10375 (L-TLB)
Oct 15 11:28:14 alfred kernel:  ffff81004dad3c50 0000000000000046 0000000000000001 0000000000000000
Oct 15 11:28:14 alfred kernel:  ffff81007eb6f5f0 000000000000000a ffff81005fe63080 ffff810037f117f0
Oct 15 11:28:14 alfred kernel:  00004a613eedffb5 00000000003a50b4 ffff81005fe63268 0000000000000003
Oct 15 11:28:14 alfred kernel: Call Trace:
Oct 15 11:28:14 alfred kernel:  [<ffffffff8002e493>] __wake_up+0x38/0x4f
Oct 15 11:28:14 alfred kernel:  [<ffffffff880756c0>] :scsi_mod:scsi_done+0x0/0x18
Oct 15 11:28:14 alfred kernel:  [<ffffffff88330dc5>] :raid456:get_active_stripe+0x242/0x4bd
Oct 15 11:28:14 alfred kernel:  [<ffffffff8008f4f9>] default_wake_function+0x0/0xe
Oct 15 11:28:14 alfred kernel:  [<ffffffff88335ccc>] :raid456:sync_request+0x6c0/0x757
Oct 15 11:28:14 alfred kernel:  [<ffffffff8807b9a0>] :scsi_mod:scsi_request_fn+0x33d/0x392
Oct 15 11:28:14 alfred kernel:  [<ffffffff801583ef>] __next_cpu+0x19/0x28
Oct 15 11:28:14 alfred kernel:  [<ffffffff80225f46>] md_do_sync+0x464/0x84b
Oct 15 11:28:14 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel:  [<ffffffff80225acc>] md_thread+0xf8/0x10e
Oct 15 11:28:14 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel:  [<ffffffff802259d4>] md_thread+0x0/0x10e
Oct 15 11:28:14 alfred kernel:  [<ffffffff80032c1d>] kthread+0xfe/0x132
Oct 15 11:28:14 alfred kernel:  [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 15 11:28:14 alfred kernel:  [<ffffffff800a3d99>] keventd_create_kthread+0x0/0xc4
Oct 15 11:28:14 alfred kernel:  [<ffffffff80032b1f>] kthread+0x0/0x132
Oct 15 11:28:14 alfred kernel:  [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 15 11:28:14 alfred kernel:

tnx & cu

-- 
Best regards,
 Rainer                          mailto:rfu@oudeis.org


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
  2015-10-15 13:38 kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] Rainer Fügenstein
@ 2015-10-16  1:15 ` Neil Brown
  2015-10-24 16:15   ` performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) Rainer Fügenstein
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Brown @ 2015-10-16  1:15 UTC (permalink / raw)
  To: Rainer Fügenstein, Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

Rainer Fügenstein <rfu@oudeis.org> writes:

> Hi,
>
> my  NAS-like  server with 5*3TB SATA drives in RAID5 configuration was
> running  without  problems  for  what seems an eternity; since about 3
> weeks it keeps freezing every other day with the following error:
>
> # grep soft /var/log/messages
> Oct 15 11:26:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
> Oct 15 11:26:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
> Oct 15 11:26:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
> Oct 15 11:26:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
> Oct 15 11:26:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
> Oct 15 11:27:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
> Oct 15 11:27:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
> Oct 15 11:27:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
> Oct 15 11:27:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
> Oct 15 11:27:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
> Oct 15 11:28:49 alfred kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614]
> Oct 15 11:28:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
> Oct 15 11:28:49 alfred kernel:  [<ffffffff80012583>] __do_softirq+0x51/0x133
> Oct 15 11:28:49 alfred kernel:  [<ffffffff8005e298>] call_softirq+0x1c/0x28
> Oct 15 11:28:49 alfred kernel:  [<ffffffff8006d63a>] do_softirq+0x2c/0x7d
> [...]
> this  is  only  part  of  the story, check the end of this message for
> a detailed log.
>
> sometimes the server recovers after 60+ seconds, sometimes it requires
> a hard reset (causing mdraid to re-sync the whole array).

I strongly recommend adding a write-intend bitmap
  mdadm --grow /dev/md0 --bitmap=internal

that will speed up the resync enormously.

>
> IIRC,  it  started  when  a  drive  in  the  array  failed  with "SATA
> connection  timeouts" (kind of). this drive has been replaced by a new
> one, but yet the  CPU lockups keep coming.
>
> I  suspect  that  aging  hardware  slowly starts to fail, but not sure
> which part (drives? SATA controller? cables? NIC? CPU? ...)
>
> here's some info that might be useful:
> # uname -a
> Linux alfred 2.6.18-406.el5 #1 SMP Tue Jun 2 17:25:57 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

This is a rather ancient kernel.
The "el" suffix probably suggests Redhat?  If you have a Redhat support
contract you should ask them.  If you don't, you should probably try a
newer kernel (or buy a support contract).

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!)
  2015-10-16  1:15 ` Neil Brown
@ 2015-10-24 16:15   ` Rainer Fügenstein
  2015-10-24 16:31     ` Roman Mamedov
  0 siblings, 1 reply; 23+ messages in thread
From: Rainer Fügenstein @ 2015-10-24 16:15 UTC (permalink / raw)
  To: Neil Brown, Linux-RAID

hi,

> I strongly recommend adding a write-intend bitmap
>   mdadm --grow /dev/md0 --bitmap=internal

I  did  as suggested, but now it feels like performance has dropped to
about 1/4th of what it used to be before. since this system is already
pretty slow by design, this is quite frustrating.

no soft-lockups so far, fortunately.

may  a  new  kernel speed things up again? or can --bitmap=internal be
undone?
(need some time to prepare the upgrade to a new OS release)

tnx & cu

-- 
Best regards,
 Rainer                            mailto:rfu@oudeis.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!)
  2015-10-24 16:15   ` performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) Rainer Fügenstein
@ 2015-10-24 16:31     ` Roman Mamedov
  2015-10-25 19:23       ` Rainer Fügenstein
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Mamedov @ 2015-10-24 16:31 UTC (permalink / raw)
  To: Rainer Fügenstein; +Cc: Neil Brown, Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

On Sat, 24 Oct 2015 18:15:41 +0200
Rainer Fügenstein <rfu@oudeis.org> wrote:

> hi,
> 
> > I strongly recommend adding a write-intend bitmap
> >   mdadm --grow /dev/md0 --bitmap=internal
> 
> I  did  as suggested, but now it feels like performance has dropped to
> about 1/4th of what it used to be before. since this system is already
> pretty slow by design, this is quite frustrating.

Use a higher bitmap-chunk size, such as 256M or more.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!)
  2015-10-24 16:31     ` Roman Mamedov
@ 2015-10-25 19:23       ` Rainer Fügenstein
  2015-10-25 20:08         ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Rainer Fügenstein @ 2015-10-25 19:23 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Neil Brown, Linux-RAID

Hello Roman,

Saturday, October 24, 2015, 6:31:39 PM, you wrote:

> Use a higher bitmap-chunk size, such as 256M or more.

I guess that would be

   mdadm --grow /dev/md0 --bitmap-chunk=256M    ??

is  it  wise  to issue this command during a re-sync?

a  cron.weekly  job started the re-sync (although I'm pretty sure this
job has been disabled quite some time ago)

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[7] sdf1[3] sdc1[5] sde1[0] sdd1[8]
      11721061376 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
      [==>..................]  resync = 11.9% (348948608/2930265344) finish=7771.1min speed=5533K/sec
      bitmap: 8/350 pages [32KB], 4096KB chunk

unused devices: <none>

tnx & cu

-- 
Best regards,
 Rainer                            mailto:rfu@oudeis.org


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!)
  2015-10-25 19:23       ` Rainer Fügenstein
@ 2015-10-25 20:08         ` Neil Brown
  2015-11-02 22:55           ` performance issue Rainer Fügenstein
  0 siblings, 1 reply; 23+ messages in thread
From: Neil Brown @ 2015-10-25 20:08 UTC (permalink / raw)
  To: Rainer Fügenstein, Roman Mamedov; +Cc: Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]

Rainer Fügenstein <rfu@oudeis.org> writes:

> Hello Roman,
>
> Saturday, October 24, 2015, 6:31:39 PM, you wrote:
>
>> Use a higher bitmap-chunk size, such as 256M or more.
>
> I guess that would be
>
>    mdadm --grow /dev/md0 --bitmap-chunk=256M    ??

You would need to remove and then re-add the bitmap.  So:

  mdadm --grow /dev/md0 --bitmap=none
  mdadm --grow /dev/md0 --bitmap=intermnal --bitmap-chunk=256M
  
>
> is  it  wise  to issue this command during a re-sync?

Depending on kernel version, it will either work or it won't.
Either way, it won't cause harm.

>
> a  cron.weekly  job started the re-sync (although I'm pretty sure this
> job has been disabled quite some time ago)

Weekly is a bit more often than I would go for, but why disable it?
Regular scanning for latent bad blocks is fairly important for
reliability.

> $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[7] sdf1[3] sdc1[5] sde1[0] sdd1[8]
>       11721061376 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       [==>..................]  resync = 11.9% (348948608/2930265344) finish=7771.1min speed=5533K/sec
>       bitmap: 8/350 pages [32KB], 4096KB chunk

That isn't a cronjob started resync. That would say "check" rather than
'resync".
This looks a lot like a resync after an unclean restart.  But with the
bitmap that should go faster...
What does "mdadm --examine-bitmap /dev/sdb1" report?

NeilBrown


>
> unused devices: <none>
>
> tnx & cu
>
> -- 
> Best regards,
>  Rainer                            mailto:rfu@oudeis.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: performance issue
  2015-10-25 20:08         ` Neil Brown
@ 2015-11-02 22:55           ` Rainer Fügenstein
  2015-11-03  1:34             ` Neil Brown
  0 siblings, 1 reply; 23+ messages in thread
From: Rainer Fügenstein @ 2015-11-02 22:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux-RAID

On 25.10.2015 21:08, Neil Brown wrote:
> mdadm --grow /dev/md0 --bitmap=intermnal --bitmap-chunk=256

  not sure how to specify the chunks size:

[root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256
mdadm: failed to create internal bitmap - chunksize problem.
[root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256M
mdadm: invalid bitmap chunksize: 256M

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: performance issue
  2015-11-02 22:55           ` performance issue Rainer Fügenstein
@ 2015-11-03  1:34             ` Neil Brown
  0 siblings, 0 replies; 23+ messages in thread
From: Neil Brown @ 2015-11-03  1:34 UTC (permalink / raw)
  To: Rainer Fügenstein; +Cc: Linux-RAID

[-- Attachment #1: Type: text/plain, Size: 615 bytes --]

On Tue, Nov 03 2015, Rainer Fügenstein wrote:

> On 25.10.2015 21:08, Neil Brown wrote:
>> mdadm --grow /dev/md0 --bitmap=intermnal --bitmap-chunk=256
>
>   not sure how to specify the chunks size:
>
> [root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256
> mdadm: failed to create internal bitmap - chunksize problem.
> [root@alfred ~]# mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=256M
> mdadm: invalid bitmap chunksize: 256M

I guess you have an mdadm version earlier than 3.2

try
   --bitmap-chunk=262144

which is 256*1024.  The number is in K.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-11-03  1:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-22 19:17 Performance issue George-Cristian Bîrzan
2012-11-23  7:26 ` Stefan Hajnoczi
     [not found]   ` <CAMxNYabWpHqmNN7mCY9mwVJjoTj4jwS_js+cZcxQVnJsTdwfBg@mail.gmail.com>
2012-11-23 14:02     ` Fwd: " George-Cristian Bîrzan
2012-11-25 15:19 ` Gleb Natapov
2012-11-25 16:17   ` George-Cristian Bîrzan
2012-11-26 19:31     ` George-Cristian Bîrzan
2012-11-27 12:20       ` Gleb Natapov
2012-11-27 12:29         ` George-Cristian Bîrzan
2012-11-27 14:54           ` Gleb Natapov
2012-11-27 20:38             ` Vadim Rozenfeld
2012-11-27 21:13               ` George-Cristian Bîrzan
2012-11-28 11:39                 ` Vadim Rozenfeld
2012-11-28 19:09                   ` George-Cristian Bîrzan
2012-11-29 11:56                     ` Vadim Rozenfeld
2012-11-29 13:45                       ` George-Cristian Bîrzan
2012-11-29 13:56                         ` Gleb Natapov
2012-11-29 20:34                           ` Vadim Rozenfeld
2012-11-28 19:18                   ` George-Cristian Bîrzan
2012-11-28 19:56                     ` Gleb Natapov
2012-11-28 20:01                       ` George-Cristian Bîrzan
2012-11-28 20:12                         ` Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2015-10-15 13:38 kernel: BUG: soft lockup - CPU#1 stuck for 60s! [md0_raid5:1614] Rainer Fügenstein
2015-10-16  1:15 ` Neil Brown
2015-10-24 16:15   ` performance issue (was: Re: kernel: BUG: soft lockup - CPU#1 stuck for 60s!) Rainer Fügenstein
2015-10-24 16:31     ` Roman Mamedov
2015-10-25 19:23       ` Rainer Fügenstein
2015-10-25 20:08         ` Neil Brown
2015-11-02 22:55           ` performance issue Rainer Fügenstein
2015-11-03  1:34             ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.