raid6 badness

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* raid6 badness
@ 2004-01-30 17:15 Michael V. David
  2004-01-31  2:13 ` H. Peter Anvin
  0 siblings, 1 reply; 7+ messages in thread
From: Michael V. David @ 2004-01-30 17:15 UTC (permalink / raw)


This x86_64 system has dual Opteron CPUs on a Tyan 2880 board. Kernel
version string:

Linux version 2.6.2-bk4 (michael@sapphire) (gcc version 3.3.2 20040119 (Red Hat Linux 3.3.2-8)) #3 SMP Fri Jan 30 08:56:11 EST 2004

The same problem was produced with kernel versions 2.6.2-rc2 and
2.6.2-rc2-bk4. Output reproduced here is from -bk4.

If raid6 is compiled into the kernel, the kernel panics while
starting. In the present case, it was compiled as a module. On
loading, there is a segfault, and syslog gets what follows:

---<snip>---
raid6: int64x1   1175 MB/s
raid6: int64x2   1734 MB/s
raid6: int64x4   1773 MB/s
raid6: int64x8   1273 MB/s
general protection fault: 0000 [1]
CPU 1
Pid: 7310, comm: modprobe Not tainted
RIP: 0010:[<ffffffffa0186383>] <ffffffffa0186383>{:raid6:raid6_sse21_gen_syndrome+51}
RSP: 0018:0000010021825dd8  EFLAGS: 00010202
RAX: 000000008005003b RBX: 0000010021cc6000 RCX: 00000000c0000100
RDX: 0000010021825e88 RSI: 0000000000001000 RDI: 000000000000000f
RBP: 0000000000000001 R08: 0000010021824000 R09: 000000000000000f
R10: 00000000bfffe820 R11: 0000010021cc7000 R12: 0000010021825e88
R13: ffffffffa0186a40 R14: 0000000000000000 R15: ffffffffa0196ea0
FS:  0000002a958624c0(0000) GS:ffffffff803fe8c0(0000) knlGS:000000005577d4e0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000421140 CR3: 000000000247f000 CR4: 00000000000006a0
Process modprobe (pid: 7310, stackpage=10037f01240)
Stack: e9e6f7f8d5dacbc4 0000000000000000 1097038436b125a2 616e7f705d52434c
       0000000000000246 00000000bfffe820 0000000000000000 0000010021824000
       00000000ffff7cf4 00000000c0000100
Call Trace:<ffffffffa019b130>{:raid6:raid6_select_algo+240} <ffffffffa019b15d>{:raid6:raid6_select_algo+285}
       <ffffffffa019b009>{:raid6:raid6_init+9} <ffffffff8014f471>{sys_init_module+353}
       <ffffffff8010ee54>{system_call+124}

Code: 66 0f 7f 04 24 66 0f 7f 4c 24 10 66 0f 7f 54 24 20 66 0f 7f
RIP <ffffffffa0186383>{:raid6:raid6_sse21_gen_syndrome+51} RSP <0000010021825dd8>
bad: scheduling while atomic!

Call Trace:<ffffffff8012f6be>{schedule+94} <ffffffff80161eb5>{unmap_vmas+485}
       <ffffffff80166929>{exit_mmap+313} <ffffffff80132d17>{mmput+135}
       <ffffffff801381f0>{do_exit+576} <ffffffff80110515>{die+69}
       <ffffffff80110ef7>{do_general_protection+263} <ffffffff8010f811>{error_exit+0}
       <ffffffffa0186383>{:raid6:raid6_sse21_gen_syndrome+51}
       <ffffffffa019b130>{:raid6:raid6_select_algo+240} <ffffffffa019b15d>{:raid6:raid6_select_algo+285}
       <ffffffffa019b009>{:raid6:raid6_init+9} <ffffffff8014f471>{sys_init_module+353}
       <ffffffff8010ee54>{system_call+124}
---<snip>---





-- 
michael@mvdavid.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid6 badness
  2004-01-30 17:15 raid6 badness Michael V. David
@ 2004-01-31  2:13 ` H. Peter Anvin
  0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2004-01-31  2:13 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.58.0401301158340.8900@sapphire.newearth.org>
By author:    "Michael V. David" <michael@mvdavid.com>
In newsgroup: linux.dev.kernel
>
> This x86_64 system has dual Opteron CPUs on a Tyan 2880 board. Kernel
> version string:
> 
> Linux version 2.6.2-bk4 (michael@sapphire) (gcc version 3.3.2 20040119 (Red Hat Linux 3.3.2-8)) #3 SMP Fri Jan 30
> 08:56:11 EST 2004
> 
> The same problem was produced with kernel versions 2.6.2-rc2 and
> 2.6.2-rc2-bk4. Output reproduced here is from -bk4.
> 
> If raid6 is compiled into the kernel, the kernel panics while
> starting. In the present case, it was compiled as a module. On
> loading, there is a segfault, and syslog gets what follows:
> 
> ---<snip>---
> raid6: int64x1   1175 MB/s
> raid6: int64x2   1734 MB/s
> raid6: int64x4   1773 MB/s
> raid6: int64x8   1273 MB/s
> general protection fault: 0000 [1]
> CPU 1
> Pid: 7310, comm: modprobe Not tainted
> RIP: 0010:[<ffffffffa0186383>] <ffffffffa0186383>{:raid6:raid6_sse21_gen_syndrome+51}
> RSP: 0018:0000010021825dd8  EFLAGS: 00010202
                           ^
                           
It crashes because the stack is misaligned.  x86-64 requires that the
stack is always aligned to a 16-byte boundary, but your stack pointer
isn't.

The RAID-6 code for x86-64 specifically assumes proper stack
alignment, so a misaligned stack is fatal.

I don't know what would cause the stack to be misaligned, however.

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <Pine.LNX.4.58.0401301158340.8900@sapphire.newearth.org.suse.lists.linux.kernel>]

[parent not found: <bvf2vl$6pr$1@terminus.zytor.com.suse.lists.linux.kernel>]

* Re: raid6 badness
       [not found] ` <bvf2vl$6pr$1@terminus.zytor.com.suse.lists.linux.kernel>
@ 2004-01-31  3:04   ` Andi Kleen
  2004-01-31  5:50     ` H. Peter Anvin
  0 siblings, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2004-01-31  3:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, michael

"H. Peter Anvin" <hpa@zytor.com> writes:
> 
> I don't know what would cause the stack to be misaligned, however.

x86-64 kernel doesn't guarantee the stack to be 16 byte aligned
(although it usually is). If you need 16 byte alignment you have 
to align yourself.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid6 badness
  2004-01-31  3:04   ` Andi Kleen
@ 2004-01-31  5:50     ` H. Peter Anvin
  2004-01-31  9:24       ` Andi Kleen
  2004-01-31 13:10       ` Michael V. David
  0 siblings, 2 replies; 7+ messages in thread
From: H. Peter Anvin @ 2004-01-31  5:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, michael

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]

Andi Kleen wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
>>I don't know what would cause the stack to be misaligned, however.
> 
> x86-64 kernel doesn't guarantee the stack to be 16 byte aligned
> (although it usually is). If you need 16 byte alignment you have 
> to align yourself.
> 

OK, that's unfortunate... per our discussion I really think this is a 
bug, since the compiler still does 16-byte alignment, and thus we're 
taking the cost without the benefit.

I'll send in the attached patch for now, but at some point I'd like to 
fix this.  Unfortunately I still don't have an x86-64 machine that I can 
actually compile and install kernels on; I only have access to an x86-64 
userspace, so I'm a bit limited in what I can test.

Michael: Perhaps you could apply this patch and test it out for me?

	-hpa

[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1411 bytes --]

===================================================================
RCS file: /home/hpa/kernel/bkcvs/linux-2.5/drivers/md/raid6x86.h,v
retrieving revision 1.3
diff -u -r1.3 raid6x86.h
--- linux-2.5/drivers/md/raid6x86.h	22 Jan 2004 16:15:09 -0000	1.3
+++ linux-2.5/drivers/md/raid6x86.h	31 Jan 2004 05:41:49 -0000
@@ -32,18 +32,20 @@
 /* N.B.: For SSE we only save %xmm0-%xmm7 even for x86-64, since
    the code doesn't know about the additional x86-64 registers */
 typedef struct {
-	unsigned int sarea[8*4];
-	unsigned int cr0;
+	unsigned int sarea[8*4+2];
+	unsigned long cr0;
 } raid6_sse_save_t __attribute__((aligned(16)));
 
 /* This is for x86-64-specific code which uses all 16 XMM registers */
 typedef struct {
-	unsigned int sarea[16*4];
+	unsigned int sarea[16*4+2];
 	unsigned long cr0;
 } raid6_sse16_save_t __attribute__((aligned(16)));
 
-/* On x86-64 the stack is 16-byte aligned */
-#define SAREA(x) (x->sarea)
+/* On x86-64 the stack *SHOULD* be 16-byte aligned, but currently this
+   is buggy in the kernel and it's only 8-byte aligned in places, so
+   we need to do this anyway.  Sigh. */
+#define SAREA(x) ((unsigned int *)((((unsigned long)&(x)->sarea)+15) & ~15))
 
 #else /* __i386__ */
 
@@ -60,6 +62,7 @@
 	unsigned long cr0;
 } raid6_sse_save_t;
 
+/* Find the 16-byte aligned save area */
 #define SAREA(x) ((unsigned int *)((((unsigned long)&(x)->sarea)+15) & ~15))
 
 #endif

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid6 badness
  2004-01-31  5:50     ` H. Peter Anvin
@ 2004-01-31  9:24       ` Andi Kleen
  2004-01-31 13:10       ` Michael V. David
  1 sibling, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2004-01-31  9:24 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, michael

On Fri, 30 Jan 2004 21:50:23 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> Andi Kleen wrote:
> > "H. Peter Anvin" <hpa@zytor.com> writes:
> > 
> >>I don't know what would cause the stack to be misaligned, however.
> > 
> > x86-64 kernel doesn't guarantee the stack to be 16 byte aligned
> > (although it usually is). If you need 16 byte alignment you have 
> > to align yourself.
> > 
> 
> OK, that's unfortunate... per our discussion I really think this is a 
> bug, since the compiler still does 16-byte alignment, and thus we're 
> taking the cost without the benefit.

I disagree on the "bug" part. I will check with the compiler guys, but 
as long as gcc doesn't rely on 16 byte alignment I will rather just disable it 
in the compiler.

I don't see much sense in enforcing this just because of some obscure SSE2
function that can align itself. Saving instructions and stack space
would be more important.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid6 badness
  2004-01-31  5:50     ` H. Peter Anvin
  2004-01-31  9:24       ` Andi Kleen
@ 2004-01-31 13:10       ` Michael V. David
  2004-02-01  7:33         ` H. Peter Anvin
  1 sibling, 1 reply; 7+ messages in thread
From: Michael V. David @ 2004-01-31 13:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andi Kleen, H. Peter Anvin, michael

On Fri, 30 Jan 2004, H. Peter Anvin wrote:
> I'll send in the attached patch for now, but at some point I'd like
> to fix this.  Unfortunately I still don't have an x86-64 machine
> that I can actually compile and install kernels on; I only have
> access to an x86-64 userspace, so I'm a bit limited in what I can
> test.
>
> Michael: Perhaps you could apply this patch and test it out for me?

Done.

The patch applied, and the module raid6.ko compiled, with no problem.

The machine was rebooted because the crashed raid6.ko would not
unload.

The new raid6.ko loaded and unloaded repeatedly without a problem.

I created a raid6 device with 6 components, and a file system, and it
worked as expected, allowing failure of 1 or 2 component devices, but
not 3.

At present, I have not tried building it into the kernel, and have not
done any hard testing of raid6.

--mvd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid6 badness
  2004-01-31 13:10       ` Michael V. David
@ 2004-02-01  7:33         ` H. Peter Anvin
  0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2004-02-01  7:33 UTC (permalink / raw)
  To: Michael V. David; +Cc: linux-kernel, Andi Kleen

Michael V. David wrote:
> 
> Done.
> 
> The patch applied, and the module raid6.ko compiled, with no problem.
> 
> The machine was rebooted because the crashed raid6.ko would not
> unload.
> 
> The new raid6.ko loaded and unloaded repeatedly without a problem.
> 
> I created a raid6 device with 6 components, and a file system, and it
> worked as expected, allowing failure of 1 or 2 component devices, but
> not 3.
> 
> At present, I have not tried building it into the kernel, and have not
> done any hard testing of raid6.
> 

Very cool, thanks.

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-02-01  7:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-30 17:15 raid6 badness Michael V. David
2004-01-31  2:13 ` H. Peter Anvin
     [not found] <Pine.LNX.4.58.0401301158340.8900@sapphire.newearth.org.suse.lists.linux.kernel>
     [not found] ` <bvf2vl$6pr$1@terminus.zytor.com.suse.lists.linux.kernel>
2004-01-31  3:04   ` Andi Kleen
2004-01-31  5:50     ` H. Peter Anvin
2004-01-31  9:24       ` Andi Kleen
2004-01-31 13:10       ` Michael V. David
2004-02-01  7:33         ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox