Re: [Bug 45351] General protection fault in raid5, load

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bug 45351] General protection fault in raid5, load_balance
       [not found] ` <20121128103306.23F2011FB83@bugzilla.kernel.org>
@ 2012-11-29  1:24   ` NeilBrown
  2012-11-29 21:54     ` Jim Kukunas
  0 siblings, 1 reply; 2+ messages in thread
From: NeilBrown @ 2012-11-29  1:24 UTC (permalink / raw)
  To: bugzilla-daemon, Jim Kukunas; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2357 bytes --]

On Wed, 28 Nov 2012 10:33:06 +0000 (UTC) bugzilla-daemon@bugzilla.kernel.org
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=45351
> 
> 
> Cyril B. <cbay@excellency.fr> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Kernel Version|3.5.0                       |3.5.0, 3.6.8
> 
> 
> 
> 
> --- Comment #1 from Cyril B. <cbay@excellency.fr>  2012-11-28 10:33:05 ---
> I've just tested 3.6.8, I still get the same bug/trace.
> 

Hi Jim,
 could you look at this bug please?

https://bugzilla.kernel.org/show_bug.cgi?id=45351

It seems to be crashing in xor_avx_4:

[48595.135046] general protection fault: 0000 [#1] SMP
[48595.135093] CPU 0
[48595.135098] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport coretemp
hwmon i2c_i801 shpchp pci_hotplug ehci_hcd usbcore usb_common netconsole e1000e
[last unloaded: scsi_wait_scan]
[48595.135211]
[48595.135224] Pid: 2429, comm: md4_raid5 Not tainted 3.5.0 #2                 
/DH67BL
[48595.135263] RIP: 0010:[<ffffffff813512d8>]  [<ffffffff813512d8>] xor_avx_4+0x48/0x350
[48595.135303] RSP: 0018:ffff880213a259d0  EFLAGS: 00010282
[48595.135323] RAX: 000000008005003b RBX: 0000000000000008 RCX: ffff8802130b5000
[48595.135346] RDX: ffff880212c9f000 RSI: ffff880212c9e000 RDI: 0000000000001000
[48595.135368] RBP: ffff880213a25ac0 R08: ffff8802130b4000 R09: ffff880212c9e000
[48595.135391] R10: ffff880212c9e000 R11: 0000000000000000 R12: 000000008005003b
[48595.135413] R13: 0000000000000003 R14: ffff880213a25cd0 R15: 0000000000001000
[48595.135436] FS:  0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
[48595.135471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[48595.135492] CR2: 000000000235f570 CR3: 0000000001c0b000 CR4: 00000000000407f0
[48595.135514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[48595.135537] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
....

[48595.136063] Code: b5 30 ff ff ff 48 89 95 28 ff ff ff 48 89 8d 20 ff ff ff 4c 89 85 18 ff ff ff e8 c4 04 ce ff 66 90 49 89 c4 0f 06 66 66 90 66 90 <c5> fc 29 85 50 ff ff ff c5 fc 29 8d 70 ff ff ff c5 fc 29 55 90


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Bug 45351] General protection fault in raid5, load_balance
  2012-11-29  1:24   ` [Bug 45351] General protection fault in raid5, load_balance NeilBrown
@ 2012-11-29 21:54     ` Jim Kukunas
  0 siblings, 0 replies; 2+ messages in thread
From: Jim Kukunas @ 2012-11-29 21:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: bugzilla-daemon, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3505 bytes --]

On Thu, Nov 29, 2012 at 12:24:48PM +1100, Neil Brown wrote:
> On Wed, 28 Nov 2012 10:33:06 +0000 (UTC) bugzilla-daemon@bugzilla.kernel.org
> wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=45351
> > 
> > 
> > Cyril B. <cbay@excellency.fr> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >      Kernel Version|3.5.0                       |3.5.0, 3.6.8
> > 
> > 
> > 
> > 
> > --- Comment #1 from Cyril B. <cbay@excellency.fr>  2012-11-28 10:33:05 ---
> > I've just tested 3.6.8, I still get the same bug/trace.
> > 
> 
> Hi Jim,
>  could you look at this bug please?

Hi Neil,

Thank you for bringing this to my attention.

> 
> https://bugzilla.kernel.org/show_bug.cgi?id=45351
> 
> It seems to be crashing in xor_avx_4:
> 
> [48595.135046] general protection fault: 0000 [#1] SMP
> [48595.135093] CPU 0
> [48595.135098] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport coretemp
> hwmon i2c_i801 shpchp pci_hotplug ehci_hcd usbcore usb_common netconsole e1000e
> [last unloaded: scsi_wait_scan]
> [48595.135211]
> [48595.135224] Pid: 2429, comm: md4_raid5 Not tainted 3.5.0 #2                 
> /DH67BL
> [48595.135263] RIP: 0010:[<ffffffff813512d8>]  [<ffffffff813512d8>] xor_avx_4+0x48/0x350
> [48595.135303] RSP: 0018:ffff880213a259d0  EFLAGS: 00010282
> [48595.135323] RAX: 000000008005003b RBX: 0000000000000008 RCX: ffff8802130b5000
> [48595.135346] RDX: ffff880212c9f000 RSI: ffff880212c9e000 RDI: 0000000000001000
> [48595.135368] RBP: ffff880213a25ac0 R08: ffff8802130b4000 R09: ffff880212c9e000
> [48595.135391] R10: ffff880212c9e000 R11: 0000000000000000 R12: 000000008005003b
> [48595.135413] R13: 0000000000000003 R14: ffff880213a25cd0 R15: 0000000000001000
> [48595.135436] FS:  0000000000000000(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
> [48595.135471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [48595.135492] CR2: 000000000235f570 CR3: 0000000001c0b000 CR4: 00000000000407f0
> [48595.135514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [48595.135537] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> ....
> 
> [48595.136063] Code: b5 30 ff ff ff 48 89 95 28 ff ff ff 48 89 8d 20 ff ff ff 4c 89 85 18 ff ff ff e8 c4 04 ce ff 66 90 49 89 c4 0f 06 66 66 90 66 90 <c5> fc 29 85 50 ff ff ff c5 fc 29 8d 70 ff ff ff c5 fc 29 55 90

The code dump above is quiet revealing. The relevant instructions are:

	clts
	vmovaps	%ymm0,	-0xb0(%rbp)
	vmovaps %ymm1, 	-0x90(%rbp)
	vmovaps %ymm2,	-0x70(%rbp)

These instructions save the floating point state before we begin the
actual xor work. Looking at the register dump, -0xb0(%rbp) is not
properly aligned to 32 bytes, hence the #GP.

The question is whether the #GP still occurs after
841e3604d35aa70d399146abdc526d8c89a2c2f5.

Before that commit, we manually saved and restored the floating point state
to the stack with the YMMS_{SAVE,RESTORE} macros. After that commit, we
use the kernel_fpu_{begin,end} routines. In the former case, it would seem
GCC is ignoring our request to align the stack variable to 32-bytes and
841e3604d35aa70d399146abdc526d8c89a2c2f5 should resolve the problem. In the
later case, we will need to investigate further.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-11-29 21:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-45351-6135@https.bugzilla.kernel.org/>
     [not found] ` <20121128103306.23F2011FB83@bugzilla.kernel.org>
2012-11-29  1:24   ` [Bug 45351] General protection fault in raid5, load_balance NeilBrown
2012-11-29 21:54     ` Jim Kukunas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).