From: "Haar János" <djani22@netcenter.hu>
To: David Chinner <dgc@sgi.com>
Cc: linux-xfs@oss.sgi.com, linux-kernel@vger.kernel.org
Subject: Re: xfslogd-spinlock bug?
Date: Mon, 18 Dec 2006 00:56:41 +0100 [thread overview]
Message-ID: <026501c72237$0464f7a0$0400a8c0@dcccs> (raw)
In-Reply-To: 20061217224457.GN33919298@melbourne.sgi.com
----- Original Message -----
From: "David Chinner" <dgc@sgi.com>
To: "Haar János" <djani22@netcenter.hu>
Cc: <linux-xfs@oss.sgi.com>; <linux-kernel@vger.kernel.org>
Sent: Sunday, December 17, 2006 11:44 PM
Subject: Re: xfslogd-spinlock bug?
> On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> > Hi
> >
> > I have some news.
> >
> > I dont know there is a context between 2 messages, but i can see, the
> > spinlock bug comes always on cpu #3.
> >
> > Somebody have any idea?
>
> Your disk interrupts are directed to CPU 3, and so log I/O completion
> occurs on that CPU.
CPU0 CPU1 CPU2 CPU3
0: 100 0 0 4583704 IO-APIC-edge timer
1: 0 0 0 2 IO-APIC-edge i8042
4: 0 0 0 3878668 IO-APIC-edge serial
8: 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 3 IO-APIC-edge i8042
14: 3072118 0 0 181 IO-APIC-edge ide0
16: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb2
18: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb4
19: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb3
23: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
52: 0 0 0 213052723 IO-APIC-fasteoi eth1
53: 0 0 0 91913759 IO-APIC-fasteoi eth2
100: 0 0 0 16776910 IO-APIC-fasteoi eth0
NMI: 42271 43187 42234 43168
LOC: 4584247 4584219 4584215 4584198
ERR: 0
Maybe....
I have 3 XFS on this system, with 3 source.
1. 200G one ide hdd.
2. 2x200G mirror on 1 ide + 1 sata hdd.
3. 4x3.3TB strip on NBD.
The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
CPU0.
>
> > Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> > Dec 16 12:08:36 dy-base general protection fault: 0000 [1]
> > Dec 16 12:08:36 dy-base SMP
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base CPU 3
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Modules linked in:
> > Dec 16 12:08:36 dy-base nbd
>
> Are you using XFS on a NBD?
Yes, on the 3. source.
I used it about 1.5 years.
(The nbd deadlock is fixed on my system, thanks to Herbert Xu on 2.6.14.)
>
> > Dec 16 12:08:36 dy-base rd
> > Dec 16 12:08:36 dy-base netconsole
> > Dec 16 12:08:36 dy-base e1000
> > Dec 16 12:08:36 dy-base video
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> > Dec 16 12:08:36 dy-base RIP: 0010:[<ffffffff803f3aba>]
> > Dec 16 12:08:36 dy-base [<ffffffff803f3aba>] spin_bug+0x69/0xdf
> > Dec 16 12:08:36 dy-base RSP: 0018:ffff81011fdedbc0 EFLAGS: 00010002
> > Dec 16 12:08:36 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX:
> ^^^^^^^^^^^^^^^^
> Anyone recognise that pattern?
I think i have one idea.
This issue can stops sometimes the 5sec automatic restart on crash, and this
shows possible memory corruption, and if the bug occurs in the IRQ
handling.... :-)
I have a lot of logs about this issue, and the RAX, RBX always the same.
>
> > Dec 16 12:08:36 dy-base Call Trace:
> > Dec 16 12:08:36 dy-base [<ffffffff803f3bdc>] _raw_spin_lock+0x23/0xf1
> > Dec 16 12:08:36 dy-base [<ffffffff805e7f2b>]
_spin_lock_irqsave+0x11/0x18
> > Dec 16 12:08:36 dy-base [<ffffffff80222aab>] __wake_up+0x22/0x50
> > Dec 16 12:08:36 dy-base [<ffffffff803c97f9>] xfs_buf_unpin+0x21/0x23
> > Dec 16 12:08:36 dy-base [<ffffffff803970a4>]
xfs_buf_item_unpin+0x2e/0xa6
>
> This implies a spinlock inside a wait_queue_head_t is corrupt.
>
> What are you type of system do you have, and what sort of
> workload are you running?
OS: Fedora 5, 64bit.
HW: dual xeon, with HT, ram 4GB.
(the min_free_kbytes limit is set to 128000, because sometimes the e1000
driver run out the reserved memory during irq handling.)
Workload:
I use this system for free web storage.
(2x apache 2.0.xx, 12x pure-ftpd, 2x mysql but sql only use the source #2
fs.)
The normal system load is ~20-40, but currently i have a little problem with
apache, because it sometimes starts to read a lot from the big XFS device,
and eats all memory, the load is rising to 700-800.
At this point i use httpd restart, and everithing go back to normal, but if
i offline.....
Thanks a lot!
Janos
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
WARNING: multiple messages have this Message-ID (diff)
From: "Haar János" <djani22@netcenter.hu>
To: "David Chinner" <dgc@sgi.com>
Cc: <linux-xfs@oss.sgi.com>, <linux-kernel@vger.kernel.org>
Subject: Re: xfslogd-spinlock bug?
Date: Mon, 18 Dec 2006 00:56:41 +0100 [thread overview]
Message-ID: <026501c72237$0464f7a0$0400a8c0@dcccs> (raw)
In-Reply-To: 20061217224457.GN33919298@melbourne.sgi.com
----- Original Message -----
From: "David Chinner" <dgc@sgi.com>
To: "Haar János" <djani22@netcenter.hu>
Cc: <linux-xfs@oss.sgi.com>; <linux-kernel@vger.kernel.org>
Sent: Sunday, December 17, 2006 11:44 PM
Subject: Re: xfslogd-spinlock bug?
> On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote:
> > Hi
> >
> > I have some news.
> >
> > I dont know there is a context between 2 messages, but i can see, the
> > spinlock bug comes always on cpu #3.
> >
> > Somebody have any idea?
>
> Your disk interrupts are directed to CPU 3, and so log I/O completion
> occurs on that CPU.
CPU0 CPU1 CPU2 CPU3
0: 100 0 0 4583704 IO-APIC-edge timer
1: 0 0 0 2 IO-APIC-edge i8042
4: 0 0 0 3878668 IO-APIC-edge serial
8: 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 3 IO-APIC-edge i8042
14: 3072118 0 0 181 IO-APIC-edge ide0
16: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb2
18: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb4
19: 0 0 0 0 IO-APIC-fasteoi
uhci_hcd:usb3
23: 0 0 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
52: 0 0 0 213052723 IO-APIC-fasteoi eth1
53: 0 0 0 91913759 IO-APIC-fasteoi eth2
100: 0 0 0 16776910 IO-APIC-fasteoi eth0
NMI: 42271 43187 42234 43168
LOC: 4584247 4584219 4584215 4584198
ERR: 0
Maybe....
I have 3 XFS on this system, with 3 source.
1. 200G one ide hdd.
2. 2x200G mirror on 1 ide + 1 sata hdd.
3. 4x3.3TB strip on NBD.
The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the
CPU0.
>
> > Dec 16 12:08:36 dy-base BUG: spinlock bad magic on CPU#3, xfslogd/3/317
> > Dec 16 12:08:36 dy-base general protection fault: 0000 [1]
> > Dec 16 12:08:36 dy-base SMP
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base CPU 3
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Modules linked in:
> > Dec 16 12:08:36 dy-base nbd
>
> Are you using XFS on a NBD?
Yes, on the 3. source.
I used it about 1.5 years.
(The nbd deadlock is fixed on my system, thanks to Herbert Xu on 2.6.14.)
>
> > Dec 16 12:08:36 dy-base rd
> > Dec 16 12:08:36 dy-base netconsole
> > Dec 16 12:08:36 dy-base e1000
> > Dec 16 12:08:36 dy-base video
> > Dec 16 12:08:36 dy-base
> > Dec 16 12:08:36 dy-base Pid: 317, comm: xfslogd/3 Not tainted 2.6.19 #1
> > Dec 16 12:08:36 dy-base RIP: 0010:[<ffffffff803f3aba>]
> > Dec 16 12:08:36 dy-base [<ffffffff803f3aba>] spin_bug+0x69/0xdf
> > Dec 16 12:08:36 dy-base RSP: 0018:ffff81011fdedbc0 EFLAGS: 00010002
> > Dec 16 12:08:36 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX:
> ^^^^^^^^^^^^^^^^
> Anyone recognise that pattern?
I think i have one idea.
This issue can stops sometimes the 5sec automatic restart on crash, and this
shows possible memory corruption, and if the bug occurs in the IRQ
handling.... :-)
I have a lot of logs about this issue, and the RAX, RBX always the same.
>
> > Dec 16 12:08:36 dy-base Call Trace:
> > Dec 16 12:08:36 dy-base [<ffffffff803f3bdc>] _raw_spin_lock+0x23/0xf1
> > Dec 16 12:08:36 dy-base [<ffffffff805e7f2b>]
_spin_lock_irqsave+0x11/0x18
> > Dec 16 12:08:36 dy-base [<ffffffff80222aab>] __wake_up+0x22/0x50
> > Dec 16 12:08:36 dy-base [<ffffffff803c97f9>] xfs_buf_unpin+0x21/0x23
> > Dec 16 12:08:36 dy-base [<ffffffff803970a4>]
xfs_buf_item_unpin+0x2e/0xa6
>
> This implies a spinlock inside a wait_queue_head_t is corrupt.
>
> What are you type of system do you have, and what sort of
> workload are you running?
OS: Fedora 5, 64bit.
HW: dual xeon, with HT, ram 4GB.
(the min_free_kbytes limit is set to 128000, because sometimes the e1000
driver run out the reserved memory during irq handling.)
Workload:
I use this system for free web storage.
(2x apache 2.0.xx, 12x pure-ftpd, 2x mysql but sql only use the source #2
fs.)
The normal system load is ~20-40, but currently i have a little problem with
apache, because it sometimes starts to read a lot from the big XFS device,
and eats all memory, the load is rising to 700-800.
At this point i use httpd restart, and everithing go back to normal, but if
i offline.....
Thanks a lot!
Janos
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
next prev parent reply other threads:[~2006-12-17 23:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-11 23:00 xfslogd-spinlock bug? Haar János
2006-12-11 23:00 ` Haar János
2006-12-12 14:32 ` Justin Piszcz
2006-12-13 1:11 ` Haar János
2006-12-16 11:19 ` Haar János
2006-12-16 11:19 ` Haar János
2006-12-17 22:44 ` David Chinner
2006-12-17 23:56 ` Haar János [this message]
2006-12-17 23:56 ` Haar János
2006-12-18 6:24 ` David Chinner
2006-12-18 8:17 ` Haar János
2006-12-18 8:17 ` Haar János
2006-12-18 22:36 ` David Chinner
2006-12-18 23:39 ` Haar János
2006-12-18 23:39 ` Haar János
2006-12-19 2:52 ` David Chinner
2006-12-19 4:47 ` David Chinner
2006-12-27 12:58 ` Haar János
2006-12-27 12:58 ` Haar János
2007-01-07 23:14 ` David Chinner
2007-01-10 17:18 ` Janos Haar
2007-01-10 17:18 ` Janos Haar
2007-01-11 3:34 ` David Chinner
2007-01-11 20:15 ` Janos Haar
2007-01-11 20:15 ` Janos Haar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='026501c72237$0464f7a0$0400a8c0@dcccs' \
--to=djani22@netcenter.hu \
--cc=dgc@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.