* [linux-lvm] Frozen root volume
@ 2012-04-05 0:38 Larkin Lowrey
2012-04-05 16:02 ` Ray Morris
0 siblings, 1 reply; 4+ messages in thread
From: Larkin Lowrey @ 2012-04-05 0:38 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 4080 bytes --]
I've been chasing problem where my root volume freezes up and will not
process any more i/o (at least no more writes).
The iostat output below shows what is going on when it gets in this
state. The dm-1 device is stuck with avgrq-sz of 50 but zeros for all
other values. All disks and raid devices hosting this volume show zeros
across the board.
I am unable to cleanly shut down as all writes to the root fs are
blocked so I have to do a hard reset.
I've attached the output of 'echo t > /proc/sysrq-trigger'.
This phenomenon did not occur prior to updating from F15 -> F16 so I
doubt it's a hardware issue.
Any ideas or suggestions?
--Larkin
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdj 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdk 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdl 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
md1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
md2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
md10 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdm 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdn 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdp 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sds 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdt 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00
0.00 50.00 0.00 0.00 0.00 0.00 100.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
[-- Attachment #2: root freeze.zip --]
[-- Type: application/octet-stream, Size: 42959 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] Frozen root volume
2012-04-05 0:38 [linux-lvm] Frozen root volume Larkin Lowrey
@ 2012-04-05 16:02 ` Ray Morris
2012-04-05 17:36 ` Larkin Lowrey
0 siblings, 1 reply; 4+ messages in thread
From: Ray Morris @ 2012-04-05 16:02 UTC (permalink / raw)
To: linux-lvm
What does your storage stack look like? Something in the stack froze
up. It could be your SAN storage device, if you're using one, the switch
connecting to the SAN, if you're using one, the RAID card, if you're
using one, the software RAID, if you're using software RAID ...
A reasonable next step if there is nothing in the logs because are on
the root device might be to use rsyslog to send the log data to a
neighboring machine. That would you can see the messages in the log no
matter which local storage has a problem.
--
Ray Morris
support@bettercgi.com
Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/
Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/
Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php
On Wed, 04 Apr 2012 19:38:25 -0500
Larkin Lowrey <llowrey@nuclearwinter.com> wrote:
> I've been chasing problem where my root volume freezes up and will not
> process any more i/o (at least no more writes).
>
> The iostat output below shows what is going on when it gets in this
> state. The dm-1 device is stuck with avgrq-sz of 50 but zeros for all
> other values. All disks and raid devices hosting this volume show
> zeros across the board.
>
> I am unable to cleanly shut down as all writes to the root fs are
> blocked so I have to do a hard reset.
>
> I've attached the output of 'echo t > /proc/sysrq-trigger'.
>
> This phenomenon did not occur prior to updating from F15 -> F16 so I
> doubt it's a hardware issue.
>
> Any ideas or suggestions?
>
> --Larkin
>
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> sda 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdc 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdd 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sde 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdf 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdg 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdh 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdi 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdj 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdk 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdl 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> md0 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> md1 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> md2 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> md10 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdm 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdn 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdo 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdq 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdp 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdr 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sds 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdt 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> dm-0 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> dm-1 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 50.00 0.00 0.00 0.00 0.00 100.00
> dm-2 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> dm-3 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] Frozen root volume
2012-04-05 16:02 ` Ray Morris
@ 2012-04-05 17:36 ` Larkin Lowrey
2012-04-05 18:07 ` Ray Morris
0 siblings, 1 reply; 4+ messages in thread
From: Larkin Lowrey @ 2012-04-05 17:36 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Ray Morris
I have the serial console output going to a logging terminal server so
I'm able to capture everything that is sent to the console and I've seen
no errors or any other unusual output prior to these freezes. Would
rsyslog produce different results?
My vg is atop 4 md raid devices, a tiny raid6 for the boot fs, an 8
drive raid5 for the root fs, and two 6 drive raid5s for a data fs.
The 8 drives of the root raid5 are connected to a 6 port AHCI controller
(AMD SB850) and a 2 port AHCI controller (Marvell 88SE9128).
Is there any way to determine which of these (md device, AHCI
controller, disk) is the culprit? I have been able to read from each of
the constituent drives so I know that I/O at that level can take place.
I can't think of a safe way to test writes non-destructively.
--Larkin
On 4/5/2012 11:02 AM, Ray Morris wrote:
> What does your storage stack look like? Something in the stack froze
> up. It could be your SAN storage device, if you're using one, the switch
> connecting to the SAN, if you're using one, the RAID card, if you're
> using one, the software RAID, if you're using software RAID ...
>
> A reasonable next step if there is nothing in the logs because are on
> the root device might be to use rsyslog to send the log data to a
> neighboring machine. That would you can see the messages in the log no
> matter which local storage has a problem.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-lvm] Frozen root volume
2012-04-05 17:36 ` Larkin Lowrey
@ 2012-04-05 18:07 ` Ray Morris
0 siblings, 0 replies; 4+ messages in thread
From: Ray Morris @ 2012-04-05 18:07 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 2502 bytes --]
Depending on syslog.conf, there may well be messages in the logs not on the console.
I have experienced similar issues with software RAID 5 which were not present with hardware RAID and they showed up around the same kernel version as your experience.
There is a fix headed to the mainline kernel for a deadlock in raid1.c which caused very similar symptoms under very similar loads. I suspect another variation of a similar bug exists in raid5.c but I'm not a kernel programer, so I'm speculating. You may wish to have a close look at raid5.c. Perhaps add some printk to see if you can narrow it down. Obviously putting some of your storage on HW raid and seeing if the problem persists on those devices could be informative. That would let you know whether or not to move this to the RAID list.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Larkin Lowrey <llowrey@nuclearwinter.com> wrote:
I have the serial console output going to a logging terminal server so
I'm able to capture everything that is sent to the console and I've seen
no errors or any other unusual output prior to these freezes. Would
rsyslog produce different results?
My vg is atop 4 md raid devices, a tiny raid6 for the boot fs, an 8
drive raid5 for the root fs, and two 6 drive raid5s for a data fs.
The 8 drives of the root raid5 are connected to a 6 port AHCI controller
(AMD SB850) and a 2 port AHCI controller (Marvell 88SE9128).
Is there any way to determine which of these (md device, AHCI
controller, disk) is the culprit? I have been able to read from each of
the constituent drives so I know that I/O at that level can take place.
I can't think of a safe way to test writes non-destructively.
--Larkin
On 4/5/2012 11:02 AM, Ray Morris wrote:
> What does your storage stack look like? Something in the stack froze
> up. It could be your SAN storage device, if you're using one, the switch
> connecting to the SAN, if you're using one, the RAID card, if you're
> using one, the software RAID, if you're using software RAID ...
>
> A reasonable next step if there is nothing in the logs because are on
> the root device might be to use rsyslog to send the log data to a
> neighboring machine. That would you can see the messages in the log no
> matter which local storage has a problem.
_____________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[-- Attachment #2: Type: text/html, Size: 3119 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-04-05 18:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-05 0:38 [linux-lvm] Frozen root volume Larkin Lowrey
2012-04-05 16:02 ` Ray Morris
2012-04-05 17:36 ` Larkin Lowrey
2012-04-05 18:07 ` Ray Morris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).