Array 'freezes' for some time after large writes?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Array 'freezes' for some time after large writes?
@ 2010-03-30 17:07 Jim Duchek
  2010-03-30 17:18 ` Mark Knecht
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jim Duchek @ 2010-03-30 17:07 UTC (permalink / raw)
  To: linux-raid

Hi all.  Regularly after a large write to the disk (untarring a very
large file, etc), my RAID5 will 'freeze' for a period of time --
perhaps around a minute.  My system is completely responsive otherwise
during this time, with the exception of anything that is attempting to
read or write from the array -- it's as if any file descriptors simply
block.  Nothing disk/raid-related is written to the logs during this
time.  The array is mounted as /home -- so an awful lot of things
completely freeze during this time (web browser, any video that is
running, etc).  The disks don't seem to be actually accessed during
this time (I can't hear them, and the disk access light stays off),
and it's not as if it's just reading slowly -- it's not reading at
all.   Array performance is completely normal before and after the
freeze and simply non-existent during it.  The root disk (which is on
a seperate disk entirely from the RAID) runs fine during this time, as
does everything else (network, video card, etc -- as long it doesn't
touch the array) -- for example, a Terminal window open is still
responsive during the freeze, and 'ls /' would work fine, while 'ls
/home' would block until the 'freeze' is over.

Some more detailed information on my setup attached.  It's pretty
vanilla.  Unfortunately this started around the time four things
happened -- a kernel upgrade to 2.6.32, upgrading my filesystems to
ext4, replacing a disk gone bad in the RAID, and a video card change.
I would assume one of these is the culprit, but you know what they say
about 'assume'.  I cannot reproduce the problem reliably, but it
happens a couple times a day.  My questions are these:

1. Is there any way to turn on more detailed logging for the RAID
system in the kernel?  The wiki or a google search makes no mention I
can find, and mdadm doesn't put anything out during this time.
2. Possibly a problem with the SATA system?  My root drive is PATA --
my RAID disks are all SATA.
2. Uh, any other ideas? :)


Thanks, all.

Jim Duchek





[jrduchek@jimbob ~]$ uname -a
Linux jimbob 2.6.32-ARCH #1 SMP PREEMPT Mon Mar 15 20:44:03 CET 2010
x86_64 Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz GenuineIntel
GNU/Linux

[jrduchek@jimbob ~]$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[0] sde1[3] sdd1[2] sdc1[1]
      1465151808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>


[jrduchek@jimbob ~]$ mount
/dev/sda3 on / type ext4 (rw,noatime,user_xattr)
udev on /dev type tmpfs (rw,nosuid,relatime,size=10240k,mode=755)
none on /proc type proc (rw,relatime)
none on /sys type sysfs (rw,relatime)
none on /dev/pts type devpts (rw)
none on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext2 (rw)
/dev/md0 on /home type ext4 (rw,noatime,user_xattr)

[jrduchek@jimbob ~]$ more /etc/rc.local
#!/bin/bash
#
# /etc/rc.local: Local multi-user startup script.
#

echo 8192 > /sys/block/md0/md/stripe_cache_size
blockdev --setra 32768 /dev/md0
blockdev --setfra 32768 /dev/md0



dmesg (relevant):




ata3: SATA max UDMA/133 cmd 0xc400 ctl 0xc080 bmdma 0xb880 irq 19
ata4: SATA max UDMA/133 cmd 0xc000 ctl 0xbc00 bmdma 0xb888 irq 19
ata3.00: ATA-7: WDC WD5000AAJS-22TKA0, 12.01C01, max UDMA/133
ata3.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.01: ATA-8: WDC WD5002ABYS-02B1B0, 02.03B03, max UDMA/133
ata3.01: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.00: configured for UDMA/133
ata3.01: configured for UDMA/133
ata4.00: ATA-7: WDC WD5000AAJS-22TKA0, 12.01C01, max UDMA/133
ata4.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.01: ATA-7: WDC WD5000AAJS-22TKA0, 12.01C01, max UDMA/133
ata4.01: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/133
ata4.01: configured for UDMA/133
ata1.00: ATA-7: MAXTOR STM3160815A, 3.AAD, max UDMA/100
ata1.00: 312581808 sectors, multi 16: LBA48
ata1.01: ATAPI: LITE-ON DVDRW LH-20A1P, KL0G, max UDMA/66
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/66
scsi 0:0:0:0: Direct-Access     ATA      MAXTOR STM316081 3.AA PQ: 0 ANSI: 5
scsi 0:0:1:0: CD-ROM            LITE-ON  DVDRW LH-20A1P   KL0G PQ: 0 ANSI: 5
scsi 2:0:0:0: Direct-Access     ATA      WDC WD5000AAJS-2 12.0 PQ: 0 ANSI: 5
scsi 2:0:1:0: Direct-Access     ATA      WDC WD5002ABYS-0 02.0 PQ: 0 ANSI: 5
scsi 3:0:0:0: Direct-Access     ATA      WDC WD5000AAJS-2 12.0 PQ: 0 ANSI: 5
scsi 3:0:1:0: Direct-Access     ATA      WDC WD5000AAJS-2 12.0 PQ: 0 ANSI: 5
sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 2:0:1:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 0:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
sd 3:0:0:0: [sdd] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdd:
 sda:
 sdb:
sd 2:0:1:0: [sdc] Write Protect is off
sd 2:0:1:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdc: sdb1
 sdd1
sd 3:0:0:0: [sdd] Attached SCSI disk
sd 3:0:1:0: [sde] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 3:0:1:0: [sde] Write Protect is off
sd 3:0:1:0: [sde] Mode Sense: 00 3a 00 00
sd 3:0:1:0: [sde] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sde: sde1
sd 3:0:1:0: [sde] Attached SCSI disk
 sda1 sda2 sda3
 sdc1
sd 0:0:0:0: [sda] Attached SCSI disk

sd 2:0:0:0: [sdb] Attached SCSI disk
sd 2:0:1:0: [sdc] Attached SCSI disk

md: md0 stopped.
md: bind<sdc1>
md: bind<sdd1>
md: bind<sde1>
md: bind<sdb1>
async_tx: api initialized (async)
xor: automatically using best checksumming function: generic_sse
   generic_sse:  7597.200 MB/sec
xor: using function: generic_sse (7597.200 MB/sec)
raid6: int64x1   1567 MB/s
raid6: int64x2   1994 MB/s
raid6: int64x4   1582 MB/s
raid6: int64x8   1427 MB/s
raid6: sse2x1    3698 MB/s
raid6: sse2x2    4184 MB/s
raid6: sse2x4    5888 MB/s
raid6: using algorithm sse2x4 (5888 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: device sdb1 operational as raid disk 0
raid5: device sde1 operational as raid disk 3
raid5: device sdd1 operational as raid disk 2
raid5: device sdc1 operational as raid disk 1
raid5: allocated 4272kB for md0
0: w=1 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
3: w=2 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
2: w=3 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
1: w=4 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
raid5: raid level 5 set md0 active with 4 out of 4 devices, algorithm 2
RAID5 conf printout:
 --- rd:4 wd:4
 disk 0, o:1, dev:sdb1
 disk 1, o:1, dev:sdc1
 disk 2, o:1, dev:sdd1
 disk 3, o:1, dev:sde1
md0: detected capacity change from 0 to 1500315451392
 md0: unknown partition table
EXT4-fs (md0): mounted filesystem with ordered data mode

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
@ 2010-03-30 17:18 ` Mark Knecht
  2010-03-30 17:47   ` Jim Duchek
  2010-03-31  1:35 ` Roger Heflin
  2010-03-31 16:37 ` Asdo
  2 siblings, 1 reply; 15+ messages in thread
From: Mark Knecht @ 2010-03-30 17:18 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

On Tue, Mar 30, 2010 at 10:07 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
> Hi all.  Regularly after a large write to the disk (untarring a very
> large file, etc), my RAID5 will 'freeze' for a period of time --
> perhaps around a minute.  My system is completely responsive otherwise
> during this time, with the exception of anything that is attempting to
> read or write from the array -- it's as if any file descriptors simply
> block.  Nothing disk/raid-related is written to the logs during this
> time.  The array is mounted as /home -- so an awful lot of things
> completely freeze during this time (web browser, any video that is
> running, etc).  The disks don't seem to be actually accessed during
> this time (I can't hear them, and the disk access light stays off),
> and it's not as if it's just reading slowly -- it's not reading at
> all.   Array performance is completely normal before and after the
> freeze and simply non-existent during it.  The root disk (which is on
> a seperate disk entirely from the RAID) runs fine during this time, as
> does everything else (network, video card, etc -- as long it doesn't
> touch the array) -- for example, a Terminal window open is still
> responsive during the freeze, and 'ls /' would work fine, while 'ls
> /home' would block until the 'freeze' is over.
>
> Some more detailed information on my setup attached.  It's pretty
> vanilla.  Unfortunately this started around the time four things
> happened -- a kernel upgrade to 2.6.32, upgrading my filesystems to
> ext4, replacing a disk gone bad in the RAID, and a video card change.
> I would assume one of these is the culprit, but you know what they say
> about 'assume'.  I cannot reproduce the problem reliably, but it
> happens a couple times a day.  My questions are these:
>
> 1. Is there any way to turn on more detailed logging for the RAID
> system in the kernel?  The wiki or a google search makes no mention I
> can find, and mdadm doesn't put anything out during this time.
> 2. Possibly a problem with the SATA system?  My root drive is PATA --
> my RAID disks are all SATA.
> 2. Uh, any other ideas? :)
>
>
> Thanks, all.
>
> Jim Duchek
>

I'm seeing a lot of this on a new Intel-based system. I've never run
into it before.

In my case I can see the delays while looking at top. They correspond
to 100%wa, as shown here:

top - 02:27:17 up 28 min,  2 users,  load average: 2.76, 1.95, 1.30
Tasks: 125 total,   1 running, 124 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,  0.0%id,100.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.3%sy,  0.0%ni,  0.0%id, 99.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6107024k total,  1448676k used,  4658348k free,   187492k buffers
Swap:  4200988k total,        0k used,  4200988k free,   915900k cached

Like you nothing seems to get written anywhere when this is happening,
and in my case it happens whether I'm using RAID1 or not.

From the command line if I do the following and wait for one of these
100%wa events to occur

echo "1" > /proc/sys/vm/block_dump
... wait a short while ...
echo "0" > /proc/sys/vm/block_dump

then grepping dmesg with this command

dmesg | egrep "READ|WRITE|dirtied"

shows the following:


flush-8:0(3365): WRITE block 33555792 on sda3
flush-8:0(3365): WRITE block 33555800 on sda3
flush-8:0(3365): WRITE block 33701984 on sda3
flush-8:0(3365): WRITE block 33720128 on sda3
flush-8:0(3365): WRITE block 33721496 on sda3
flush-8:0(3365): WRITE block 33816576 on sda3

so something ugly is going on. I have no idea what causes these blocks
but they are really messing me up.

Sometimes these events last for minutes. I've not yet discovered if
it's specific to my drives, my motherboard, the kernel or what.

- Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 17:18 ` Mark Knecht
@ 2010-03-30 17:47   ` Jim Duchek
  2010-03-30 18:00     ` Mark Knecht
  2010-03-30 18:05     ` Mark Knecht
  0 siblings, 2 replies; 15+ messages in thread
From: Jim Duchek @ 2010-03-30 17:47 UTC (permalink / raw)
  To: Mark Knecht; +Cc: linux-raid

Well it appears that I can absolutely reproduce it 100% of the time by
copying a large (>1 gig) video file and then immediately playing it.
It seems to hit the freeze on just about the same frame every time,
and playing it seems to be necessary (it doesn't freeze if I just do
the copy and go about my business).  Possibly an issue w/disk buffers?
 You're having this happen even if the disk in question is not in an
array?  If so perhaps it's an SATA issue and not a RAID one, and we
should move this discussion accordingly.  I reproduced your steps and
I'm seeing pretty much the same thing, although not quite hitting 100%
wa (although I'm guessing it would if I shut everything else down --
Got a full desktop running).

I'm using a Biostar T41-A7 mobo, Intel Core 2 Quad Q8400 Yorkfield
2.66GHz 4MB L2, and 3 Western Digital Caviar Blue WD5000AAJS drives
and 1 WD5002ABYS.  I note that the 3 older drives claim using ATA-7
and the newer one says ATA-8. Any similarities?

Jim


On 30 March 2010 12:18, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 10:07 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
>> Hi all.  Regularly after a large write to the disk (untarring a very
>> large file, etc), my RAID5 will 'freeze' for a period of time --
>> perhaps around a minute.  My system is completely responsive otherwise
>> during this time, with the exception of anything that is attempting to
>> read or write from the array -- it's as if any file descriptors simply
>> block.  Nothing disk/raid-related is written to the logs during this
>> time.  The array is mounted as /home -- so an awful lot of things
>> completely freeze during this time (web browser, any video that is
>> running, etc).  The disks don't seem to be actually accessed during
>> this time (I can't hear them, and the disk access light stays off),
>> and it's not as if it's just reading slowly -- it's not reading at
>> all.   Array performance is completely normal before and after the
>> freeze and simply non-existent during it.  The root disk (which is on
>> a seperate disk entirely from the RAID) runs fine during this time, as
>> does everything else (network, video card, etc -- as long it doesn't
>> touch the array) -- for example, a Terminal window open is still
>> responsive during the freeze, and 'ls /' would work fine, while 'ls
>> /home' would block until the 'freeze' is over.
>>
>> Some more detailed information on my setup attached.  It's pretty
>> vanilla.  Unfortunately this started around the time four things
>> happened -- a kernel upgrade to 2.6.32, upgrading my filesystems to
>> ext4, replacing a disk gone bad in the RAID, and a video card change.
>> I would assume one of these is the culprit, but you know what they say
>> about 'assume'.  I cannot reproduce the problem reliably, but it
>> happens a couple times a day.  My questions are these:
>>
>> 1. Is there any way to turn on more detailed logging for the RAID
>> system in the kernel?  The wiki or a google search makes no mention I
>> can find, and mdadm doesn't put anything out during this time.
>> 2. Possibly a problem with the SATA system?  My root drive is PATA --
>> my RAID disks are all SATA.
>> 2. Uh, any other ideas? :)
>>
>>
>> Thanks, all.
>>
>> Jim Duchek
>>
>
> I'm seeing a lot of this on a new Intel-based system. I've never run
> into it before.
>
> In my case I can see the delays while looking at top. They correspond
> to 100%wa, as shown here:
>
> top - 02:27:17 up 28 min,  2 users,  load average: 2.76, 1.95, 1.30
> Tasks: 125 total,   1 running, 124 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,  0.0%id,100.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu5  :  0.0%us,  0.3%sy,  0.0%ni,  0.0%id, 99.7%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:   6107024k total,  1448676k used,  4658348k free,   187492k buffers
> Swap:  4200988k total,        0k used,  4200988k free,   915900k cached
>
> Like you nothing seems to get written anywhere when this is happening,
> and in my case it happens whether I'm using RAID1 or not.
>
> From the command line if I do the following and wait for one of these
> 100%wa events to occur
>
> echo "1" > /proc/sys/vm/block_dump
> ... wait a short while ...
> echo "0" > /proc/sys/vm/block_dump
>
> then grepping dmesg with this command
>
> dmesg | egrep "READ|WRITE|dirtied"
>
> shows the following:
>
>
> flush-8:0(3365): WRITE block 33555792 on sda3
> flush-8:0(3365): WRITE block 33555800 on sda3
> flush-8:0(3365): WRITE block 33701984 on sda3
> flush-8:0(3365): WRITE block 33720128 on sda3
> flush-8:0(3365): WRITE block 33721496 on sda3
> flush-8:0(3365): WRITE block 33816576 on sda3
>
> so something ugly is going on. I have no idea what causes these blocks
> but they are really messing me up.
>
> Sometimes these events last for minutes. I've not yet discovered if
> it's specific to my drives, my motherboard, the kernel or what.
>
> - Mark
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 17:47   ` Jim Duchek
@ 2010-03-30 18:00     ` Mark Knecht
  2010-03-30 18:05     ` Mark Knecht
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Knecht @ 2010-03-30 18:00 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
> Well it appears that I can absolutely reproduce it 100% of the time by
> copying a large (>1 gig) video file and then immediately playing it.
> It seems to hit the freeze on just about the same frame every time,
> and playing it seems to be necessary (it doesn't freeze if I just do
> the copy and go about my business).  Possibly an issue w/disk buffers?
>  You're having this happen even if the disk in question is not in an
> array?  If so perhaps it's an SATA issue and not a RAID one, and we
> should move this discussion accordingly.  I reproduced your steps and
> I'm seeing pretty much the same thing, although not quite hitting 100%
> wa (although I'm guessing it would if I shut everything else down --
> Got a full desktop running).
>
> I'm using a Biostar T41-A7 mobo, Intel Core 2 Quad Q8400 Yorkfield
> 2.66GHz 4MB L2, and 3 Western Digital Caviar Blue WD5000AAJS drives
> and 1 WD5002ABYS.  I note that the 3 older drives claim using ATA-7
> and the newer one says ATA-8. Any similarities?
>
> Jim
>

Intex DX58SO motherboard
6GB
i7 920
3 WD10EARS green 1TB (4K/sector)
nvidia 9500GT
2.6.32-gentoo kernel

In my setup (and being Gentoo based where we build everything from
scratch) I see this mostly when building code. I haven't copied any
large files yet but I'm not surprised at your report as my dmesg error
is about writes being blocked so that makes sense.

Closest similarity would be both motherboards are based on Intel
chipsets. SATA support is in the South Bridge.

Your motherboard chipset

North Bridge 	Intel G41
South Bridge 	Intel ICH7

My motherboard chipset

North Bridge  	Intel X58
South Bridge 	Intel ICH10R

You did see the same message about writes being block, correct?
Nothing else as of yet?


- Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 17:47   ` Jim Duchek
  2010-03-30 18:00     ` Mark Knecht
@ 2010-03-30 18:05     ` Mark Knecht
  2010-03-30 20:32       ` Jim Duchek
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Knecht @ 2010-03-30 18:05 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
<SNIP>
>  You're having this happen even if the disk in question is not in an
> array?  If so perhaps it's an SATA issue and not a RAID one, and we
> should move this discussion accordingly.

Yes, in my case the delays are so long - sometimes 2 or 3 minutes -
that when I tried to build the system using RAID1 I got this kernel
bug in dmesg. It's jsut info - not a real failure - but because it's
talking about long delays I gave up on RAID and tried a standard
single drive build. Turns out that it has (I think...) nothing to do
with RAID at all. you'll not that there are instructions for turning
the message off but I've not tried them. I intend to do a parallel
RAID1 build on this machine and be able to test both RAID vs non-RAID.

- Mark

INFO: task kjournald:17466 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff8800280bbe00     0 17466      2 0x00000000
 ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000
 ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878
 0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08
Call Trace:
 [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1
 [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
 [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a
 [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40
 [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70
 [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
 [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77
 [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23
 [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa
 [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2
 [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b
 [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff81134804>] ? kjournald+0xe3/0x206
 [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff81134721>] ? kjournald+0x0/0x206
 [<ffffffff81043591>] ? kthread+0x8b/0x93
 [<ffffffff8100bd3a>] ? child_rip+0xa/0x20
 [<ffffffff81043506>] ? kthread+0x0/0x93
 [<ffffffff8100bd30>] ? child_rip+0x0/0x20
livecd ~ #
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 18:05     ` Mark Knecht
@ 2010-03-30 20:32       ` Jim Duchek
  2010-03-30 20:45         ` Mark Knecht
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Duchek @ 2010-03-30 20:32 UTC (permalink / raw)
  To: Mark Knecht; +Cc: linux-raid

Hrm, I've never seen that kernel message.  I don't think any of my
freezes have lasted for up to 120 seconds though (my drives are half
as big -- might matter?)  It looks like we've both got WD drives --
and we both have nvidia 9500gt's as well.  Are you running the nvidia
binary drivers, or noveau? (It seems like it wouldn't matter
especially as, at least on my system, they don't share an interrupt or
anything, but I hate to ignore any hardware that we both have the same
of). I did move to 2.6.33 for some time, but that didn't change the
behaviour.

Jim


On 30 March 2010 13:05, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
> <SNIP>
>>  You're having this happen even if the disk in question is not in an
>> array?  If so perhaps it's an SATA issue and not a RAID one, and we
>> should move this discussion accordingly.
>
> Yes, in my case the delays are so long - sometimes 2 or 3 minutes -
> that when I tried to build the system using RAID1 I got this kernel
> bug in dmesg. It's jsut info - not a real failure - but because it's
> talking about long delays I gave up on RAID and tried a standard
> single drive build. Turns out that it has (I think...) nothing to do
> with RAID at all. you'll not that there are instructions for turning
> the message off but I've not tried them. I intend to do a parallel
> RAID1 build on this machine and be able to test both RAID vs non-RAID.
>
> - Mark
>
> INFO: task kjournald:17466 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kjournald     D ffff8800280bbe00     0 17466      2 0x00000000
>  ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000
>  ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878
>  0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08
> Call Trace:
>  [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1
>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>  [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a
>  [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40
>  [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70
>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>  [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77
>  [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23
>  [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa
>  [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2
>  [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b
>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff81134804>] ? kjournald+0xe3/0x206
>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff81134721>] ? kjournald+0x0/0x206
>  [<ffffffff81043591>] ? kthread+0x8b/0x93
>  [<ffffffff8100bd3a>] ? child_rip+0xa/0x20
>  [<ffffffff81043506>] ? kthread+0x0/0x93
>  [<ffffffff8100bd30>] ? child_rip+0x0/0x20
> livecd ~ #
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 20:32       ` Jim Duchek
@ 2010-03-30 20:45         ` Mark Knecht
  2010-03-30 20:59           ` Jim Duchek
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Knecht @ 2010-03-30 20:45 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

Hi,
   I am running the nvidia binary drivers. I'm not doing anything with
X at this point so I an just unload them I think. I could even remove
the card I suppose.

   I built a machine for my dad a couple of months ago that uses the
same 1TB WD drive that I am using now. I don't remember seeing
anything like this on his machine but I'm going to go check that.

   One other similarity I suspect we have is ext3? There were problems
with ext3 priority inversion in earlier kernel. It's my understanding
that they thought they had that worked out but possibly we're
triggering this somehow? since I've got a lot of disk space I can set
up some other partitions, etc4, reiser4, etc., and try copying files
to trigger it. However it's difficult for me if it requires read/write
as I'm not set up to really use the machine yet. Is that something you
have room to try?

   Also, we haven't discussed what drivers are loaded or kernel
config. Here's my current driver set:

keeper ~ # lsmod
Module                  Size  Used by
ipv6                  207757  30
usbhid                 21529  0
nvidia              10611606  22
snd_hda_codec_realtek   239530  1
snd_hda_intel          17688  0
ehci_hcd               30854  0
snd_hda_codec          45755  2 snd_hda_codec_realtek,snd_hda_intel
snd_pcm                58104  2 snd_hda_intel,snd_hda_codec
snd_timer              15030  1 snd_pcm
snd                    37476  5
snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
soundcore                800  1 snd
snd_page_alloc          5809  2 snd_hda_intel,snd_pcm
rtc_cmos                7678  0
rtc_core               11093  1 rtc_cmos
sg                     23029  0
uhci_hcd               18047  0
usbcore               115023  4 usbhid,ehci_hcd,uhci_hcd
agpgart                24341  1 nvidia
processor              23121  0
e1000e                111701  0
firewire_ohci          20022  0
rtc_lib                 1617  1 rtc_core
firewire_core          36109  1 firewire_ohci
thermal                11650  0
keeper ~ #

- Mark

On Tue, Mar 30, 2010 at 1:32 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
> Hrm, I've never seen that kernel message.  I don't think any of my
> freezes have lasted for up to 120 seconds though (my drives are half
> as big -- might matter?)  It looks like we've both got WD drives --
> and we both have nvidia 9500gt's as well.  Are you running the nvidia
> binary drivers, or noveau? (It seems like it wouldn't matter
> especially as, at least on my system, they don't share an interrupt or
> anything, but I hate to ignore any hardware that we both have the same
> of). I did move to 2.6.33 for some time, but that didn't change the
> behaviour.
>
> Jim
>
>
> On 30 March 2010 13:05, Mark Knecht <markknecht@gmail.com> wrote:
>> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
>> <SNIP>
>>>  You're having this happen even if the disk in question is not in an
>>> array?  If so perhaps it's an SATA issue and not a RAID one, and we
>>> should move this discussion accordingly.
>>
>> Yes, in my case the delays are so long - sometimes 2 or 3 minutes -
>> that when I tried to build the system using RAID1 I got this kernel
>> bug in dmesg. It's jsut info - not a real failure - but because it's
>> talking about long delays I gave up on RAID and tried a standard
>> single drive build. Turns out that it has (I think...) nothing to do
>> with RAID at all. you'll not that there are instructions for turning
>> the message off but I've not tried them. I intend to do a parallel
>> RAID1 build on this machine and be able to test both RAID vs non-RAID.
>>
>> - Mark
>>
>> INFO: task kjournald:17466 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kjournald     D ffff8800280bbe00     0 17466      2 0x00000000
>>  ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000
>>  ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878
>>  0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08
>> Call Trace:
>>  [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1
>>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>  [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a
>>  [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40
>>  [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70
>>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>  [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77
>>  [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23
>>  [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa
>>  [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2
>>  [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b
>>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>  [<ffffffff81134804>] ? kjournald+0xe3/0x206
>>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>  [<ffffffff81134721>] ? kjournald+0x0/0x206
>>  [<ffffffff81043591>] ? kthread+0x8b/0x93
>>  [<ffffffff8100bd3a>] ? child_rip+0xa/0x20
>>  [<ffffffff81043506>] ? kthread+0x0/0x93
>>  [<ffffffff8100bd30>] ? child_rip+0x0/0x20
>> livecd ~ #
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 20:45         ` Mark Knecht
@ 2010-03-30 20:59           ` Jim Duchek
  2010-03-30 22:21             ` Mark Knecht
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Duchek @ 2010-03-30 20:59 UTC (permalink / raw)
  To: Mark Knecht; +Cc: linux-raid

I'm using ext4 on everything, but it's hard to judge which ext3 bugs
might affect ext4 as well.  I really don't have the ability to
destructively test the array, I need all the data that's on it and I
don't have enough spare space elsewhere to back it all up.  You might
see if you can trigger it with dd, writing to the drive directly w/no
filesystem?

Jim



On 30 March 2010 14:45, Mark Knecht <markknecht@gmail.com> wrote:
> Hi,
>   I am running the nvidia binary drivers. I'm not doing anything with
> X at this point so I an just unload them I think. I could even remove
> the card I suppose.
>
>   I built a machine for my dad a couple of months ago that uses the
> same 1TB WD drive that I am using now. I don't remember seeing
> anything like this on his machine but I'm going to go check that.
>
>   One other similarity I suspect we have is ext3? There were problems
> with ext3 priority inversion in earlier kernel. It's my understanding
> that they thought they had that worked out but possibly we're
> triggering this somehow? since I've got a lot of disk space I can set
> up some other partitions, etc4, reiser4, etc., and try copying files
> to trigger it. However it's difficult for me if it requires read/write
> as I'm not set up to really use the machine yet. Is that something you
> have room to try?
>
>   Also, we haven't discussed what drivers are loaded or kernel
> config. Here's my current driver set:
>
> keeper ~ # lsmod
> Module                  Size  Used by
> ipv6                  207757  30
> usbhid                 21529  0
> nvidia              10611606  22
> snd_hda_codec_realtek   239530  1
> snd_hda_intel          17688  0
> ehci_hcd               30854  0
> snd_hda_codec          45755  2 snd_hda_codec_realtek,snd_hda_intel
> snd_pcm                58104  2 snd_hda_intel,snd_hda_codec
> snd_timer              15030  1 snd_pcm
> snd                    37476  5
> snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
> soundcore                800  1 snd
> snd_page_alloc          5809  2 snd_hda_intel,snd_pcm
> rtc_cmos                7678  0
> rtc_core               11093  1 rtc_cmos
> sg                     23029  0
> uhci_hcd               18047  0
> usbcore               115023  4 usbhid,ehci_hcd,uhci_hcd
> agpgart                24341  1 nvidia
> processor              23121  0
> e1000e                111701  0
> firewire_ohci          20022  0
> rtc_lib                 1617  1 rtc_core
> firewire_core          36109  1 firewire_ohci
> thermal                11650  0
> keeper ~ #
>
> - Mark
>
> On Tue, Mar 30, 2010 at 1:32 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>> Hrm, I've never seen that kernel message.  I don't think any of my
>> freezes have lasted for up to 120 seconds though (my drives are half
>> as big -- might matter?)  It looks like we've both got WD drives --
>> and we both have nvidia 9500gt's as well.  Are you running the nvidia
>> binary drivers, or noveau? (It seems like it wouldn't matter
>> especially as, at least on my system, they don't share an interrupt or
>> anything, but I hate to ignore any hardware that we both have the same
>> of). I did move to 2.6.33 for some time, but that didn't change the
>> behaviour.
>>
>> Jim
>>
>>
>> On 30 March 2010 13:05, Mark Knecht <markknecht@gmail.com> wrote:
>>> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
>>> <SNIP>
>>>>  You're having this happen even if the disk in question is not in an
>>>> array?  If so perhaps it's an SATA issue and not a RAID one, and we
>>>> should move this discussion accordingly.
>>>
>>> Yes, in my case the delays are so long - sometimes 2 or 3 minutes -
>>> that when I tried to build the system using RAID1 I got this kernel
>>> bug in dmesg. It's jsut info - not a real failure - but because it's
>>> talking about long delays I gave up on RAID and tried a standard
>>> single drive build. Turns out that it has (I think...) nothing to do
>>> with RAID at all. you'll not that there are instructions for turning
>>> the message off but I've not tried them. I intend to do a parallel
>>> RAID1 build on this machine and be able to test both RAID vs non-RAID.
>>>
>>> - Mark
>>>
>>> INFO: task kjournald:17466 blocked for more than 120 seconds.
>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> kjournald     D ffff8800280bbe00     0 17466      2 0x00000000
>>>  ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000
>>>  ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878
>>>  0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08
>>> Call Trace:
>>>  [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1
>>>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>>  [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a
>>>  [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40
>>>  [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70
>>>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>>  [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77
>>>  [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23
>>>  [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa
>>>  [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2
>>>  [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b
>>>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>>  [<ffffffff81134804>] ? kjournald+0xe3/0x206
>>>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>>  [<ffffffff81134721>] ? kjournald+0x0/0x206
>>>  [<ffffffff81043591>] ? kthread+0x8b/0x93
>>>  [<ffffffff8100bd3a>] ? child_rip+0xa/0x20
>>>  [<ffffffff81043506>] ? kthread+0x0/0x93
>>>  [<ffffffff8100bd30>] ? child_rip+0x0/0x20
>>> livecd ~ #
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 20:59           ` Jim Duchek
@ 2010-03-30 22:21             ` Mark Knecht
  2010-03-30 23:50               ` Mark Knecht
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Knecht @ 2010-03-30 22:21 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

I just finished a long compile on my dad's i5-661/DH55HC machine which
uses this same WD drive and I didn't spot any sign of this happening
there. That's a very recent Intel chipset also and probably more or
less the same SATA controller.

I'm going to turn on the kernel message into dmesg thing for a while
and see if anything pops up.

I can set up some additional partitions on my local drive to test
other file systems but since you're ext3 and I'm ext3 then it's not
that unless the problem moved forward with code over time.

I like the idea of using dd but I want to be careful about that sort
of thing. I've not used dd before, but if I could tell it to write a
gigabyte without messing up existing stuff then that could be helpful.

Back later,
Mark

On Tue, Mar 30, 2010 at 1:59 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
> I'm using ext4 on everything, but it's hard to judge which ext3 bugs
> might affect ext4 as well.  I really don't have the ability to
> destructively test the array, I need all the data that's on it and I
> don't have enough spare space elsewhere to back it all up.  You might
> see if you can trigger it with dd, writing to the drive directly w/no
> filesystem?
>
> Jim
>
>
>
> On 30 March 2010 14:45, Mark Knecht <markknecht@gmail.com> wrote:
>> Hi,
>>   I am running the nvidia binary drivers. I'm not doing anything with
>> X at this point so I an just unload them I think. I could even remove
>> the card I suppose.
>>
>>   I built a machine for my dad a couple of months ago that uses the
>> same 1TB WD drive that I am using now. I don't remember seeing
>> anything like this on his machine but I'm going to go check that.
>>
>>   One other similarity I suspect we have is ext3? There were problems
>> with ext3 priority inversion in earlier kernel. It's my understanding
>> that they thought they had that worked out but possibly we're
>> triggering this somehow? since I've got a lot of disk space I can set
>> up some other partitions, etc4, reiser4, etc., and try copying files
>> to trigger it. However it's difficult for me if it requires read/write
>> as I'm not set up to really use the machine yet. Is that something you
>> have room to try?
>>
>>   Also, we haven't discussed what drivers are loaded or kernel
>> config. Here's my current driver set:
>>
>> keeper ~ # lsmod
>> Module                  Size  Used by
>> ipv6                  207757  30
>> usbhid                 21529  0
>> nvidia              10611606  22
>> snd_hda_codec_realtek   239530  1
>> snd_hda_intel          17688  0
>> ehci_hcd               30854  0
>> snd_hda_codec          45755  2 snd_hda_codec_realtek,snd_hda_intel
>> snd_pcm                58104  2 snd_hda_intel,snd_hda_codec
>> snd_timer              15030  1 snd_pcm
>> snd                    37476  5
>> snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
>> soundcore                800  1 snd
>> snd_page_alloc          5809  2 snd_hda_intel,snd_pcm
>> rtc_cmos                7678  0
>> rtc_core               11093  1 rtc_cmos
>> sg                     23029  0
>> uhci_hcd               18047  0
>> usbcore               115023  4 usbhid,ehci_hcd,uhci_hcd
>> agpgart                24341  1 nvidia
>> processor              23121  0
>> e1000e                111701  0
>> firewire_ohci          20022  0
>> rtc_lib                 1617  1 rtc_core
>> firewire_core          36109  1 firewire_ohci
>> thermal                11650  0
>> keeper ~ #
>>
>> - Mark
>>
>> On Tue, Mar 30, 2010 at 1:32 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>>> Hrm, I've never seen that kernel message.  I don't think any of my
>>> freezes have lasted for up to 120 seconds though (my drives are half
>>> as big -- might matter?)  It looks like we've both got WD drives --
>>> and we both have nvidia 9500gt's as well.  Are you running the nvidia
>>> binary drivers, or noveau? (It seems like it wouldn't matter
>>> especially as, at least on my system, they don't share an interrupt or
>>> anything, but I hate to ignore any hardware that we both have the same
>>> of). I did move to 2.6.33 for some time, but that didn't change the
>>> behaviour.
>>>
>>> Jim
>>>
>>>
>>> On 30 March 2010 13:05, Mark Knecht <markknecht@gmail.com> wrote:
>>>> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@gmail.com> wrote:
>>>> <SNIP>
>>>>>  You're having this happen even if the disk in question is not in an
>>>>> array?  If so perhaps it's an SATA issue and not a RAID one, and we
>>>>> should move this discussion accordingly.
>>>>
>>>> Yes, in my case the delays are so long - sometimes 2 or 3 minutes -
>>>> that when I tried to build the system using RAID1 I got this kernel
>>>> bug in dmesg. It's jsut info - not a real failure - but because it's
>>>> talking about long delays I gave up on RAID and tried a standard
>>>> single drive build. Turns out that it has (I think...) nothing to do
>>>> with RAID at all. you'll not that there are instructions for turning
>>>> the message off but I've not tried them. I intend to do a parallel
>>>> RAID1 build on this machine and be able to test both RAID vs non-RAID.
>>>>
>>>> - Mark
>>>>
>>>> INFO: task kjournald:17466 blocked for more than 120 seconds.
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> kjournald     D ffff8800280bbe00     0 17466      2 0x00000000
>>>>  ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000
>>>>  ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878
>>>>  0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08
>>>> Call Trace:
>>>>  [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1
>>>>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>>>  [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a
>>>>  [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40
>>>>  [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70
>>>>  [<ffffffff8109c248>] ? sync_buffer+0x0/0x40
>>>>  [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77
>>>>  [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23
>>>>  [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa
>>>>  [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2
>>>>  [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b
>>>>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>>>  [<ffffffff81134804>] ? kjournald+0xe3/0x206
>>>>  [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e
>>>>  [<ffffffff81134721>] ? kjournald+0x0/0x206
>>>>  [<ffffffff81043591>] ? kthread+0x8b/0x93
>>>>  [<ffffffff8100bd3a>] ? child_rip+0xa/0x20
>>>>  [<ffffffff81043506>] ? kthread+0x0/0x93
>>>>  [<ffffffff8100bd30>] ? child_rip+0x0/0x20
>>>> livecd ~ #
>>>>
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 22:21             ` Mark Knecht
@ 2010-03-30 23:50               ` Mark Knecht
  2010-03-31  0:22                 ` Jim Duchek
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Knecht @ 2010-03-30 23:50 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

On Tue, Mar 30, 2010 at 3:21 PM, Mark Knecht <markknecht@gmail.com> wrote:
> I just finished a long compile on my dad's i5-661/DH55HC machine which
> uses this same WD drive and I didn't spot any sign of this happening
> there. That's a very recent Intel chipset also and probably more or
> less the same SATA controller.
>
> I'm going to turn on the kernel message into dmesg thing for a while
> and see if anything pops up.
>
> I can set up some additional partitions on my local drive to test
> other file systems but since you're ext3 and I'm ext3 then it's not
> that unless the problem moved forward with code over time.
>
> I like the idea of using dd but I want to be careful about that sort
> of thing. I've not used dd before, but if I could tell it to write a
> gigabyte without messing up existing stuff then that could be helpful.
>
> Back later,
> Mark
>
> On Tue, Mar 30, 2010 at 1:59 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>> I'm using ext4 on everything, but it's hard to judge which ext3 bugs
>> might affect ext4 as well.  I really don't have the ability to
>> destructively test the array, I need all the data that's on it and I
>> don't have enough spare space elsewhere to back it all up.  You might
>> see if you can trigger it with dd, writing to the drive directly w/no
>> filesystem?
>>
>> Jim
>>

<SNIP>

I know this isn't going to survive email very well but you might want
to look at interrupts. I'm seeing the count on CPU #5 rising much more
quickly than other CPU's, and in my case it's generally CPU #5 that
stalls out with this 100% wait problem.

I'm looking at another 4 processor machine that's been up for a few
days. Its interrupt counts are fairly balanced, except for TLB
Shootdowns, whatever that is.

Wouldn't know how to tell if it's related...

- Mark

Using keyboard-interactive authentication.
Password:
Last login: Tue Mar 30 15:59:22 PDT 2010 from 192.168.1.65 on pts/0
keeper ~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
  0:        232          0          0          1          0          0
         0          0   IO-APIC-edge      timer
  1:          0          0          0          2          0          0
         0          0   IO-APIC-edge      i8042
  3:          0          0          0          2          0          0
         0          0   IO-APIC-edge
  8:          0          0          0         91          0          0
         0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          4          0          0
         0          0   IO-APIC-edge      i8042
 14:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide0
 15:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide1
 16:          0          0          0          0         82          0
         0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb1, nvidia
 18:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb6, ehci_hcd:usb7
 19:          0          0          0          0          0       3137
         0          0   IO-APIC-fasteoi   ahci, firewire_ohci,
uhci_hcd:usb3, uhci_hcd:usb5
 20:          0          0          0          0          0          0
       265          0   IO-APIC-fasteoi   eth0
 21:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb2
 22:        154          0          0          0          0          0
         0          0   IO-APIC-fasteoi   hda_intel
 23:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb4, ehci_hcd:usb8
NMI:          0          0          0          0          0          0
         0          0   Non-maskable interrupts
LOC:       7048       6722       3577       3598       3491       8425
      3756       3569   Local timer interrupts
SPU:          0          0          0          0          0          0
         0          0   Spurious interrupts
PMI:          0          0          0          0          0          0
         0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0
         0          0   Performance pending work
RES:        335        332        353        259        176        173
       251         82   Rescheduling interrupts
CAL:        242        233        258        180        241        160
       260        260   Function call interrupts
TLB:        232        242        270        235        342        474
       537        497   TLB shootdowns
TRM:          0          0          0          0          0          0
         0          0   Thermal event interrupts
THR:          0          0          0          0          0          0
         0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0
         0          0   Machine check exceptions
MCP:          2          2          2          2          2          2
         2          2   Machine check polls
ERR:          7
MIS:          0
keeper ~ # date
Tue Mar 30 16:45:13 PDT 2010
keeper ~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
  0:        232          0          0          9          0          0
         0          0   IO-APIC-edge      timer
  1:          0          0          0          2          0          0
         0          0   IO-APIC-edge      i8042
  3:          0          0          0          2          0          0
         0          0   IO-APIC-edge
  8:          0          0          0         91          0          0
         0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          4          0          0
         0          0   IO-APIC-edge      i8042
 14:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide0
 15:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide1
 16:          0          0          0          0       2660          0
         0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb1, nvidia
 18:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb6, ehci_hcd:usb7
 19:          0          0          0          0          0      20762
         0          0   IO-APIC-fasteoi   ahci, firewire_ohci,
uhci_hcd:usb3, uhci_hcd:usb5
 20:          0          0          0          0          0          0
      1903          0   IO-APIC-fasteoi   eth0
 21:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb2
 22:        154          0          0          0          0          0
         0          0   IO-APIC-fasteoi   hda_intel
 23:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb4, ehci_hcd:usb8
NMI:          0          0          0          0          0          0
         0          0   Non-maskable interrupts
LOC:      10618      11998       8756       6940       6484      22076
      7456       6599   Local timer interrupts
SPU:          0          0          0          0          0          0
         0          0   Spurious interrupts
PMI:          0          0          0          0          0          0
         0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0
         0          0   Performance pending work
RES:        335        332        353        259        176        173
       251         82   Rescheduling interrupts
CAL:        242        233        258        180        241        160
       260        260   Function call interrupts
TLB:        232        243        270        236        343        475
       538        497   TLB shootdowns
TRM:          0          0          0          0          0          0
         0          0   Thermal event interrupts
THR:          0          0          0          0          0          0
         0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0
         0          0   Machine check exceptions
MCP:         10         10         10         10         10         10
        10         10   Machine check polls
ERR:          7
MIS:          0
keeper ~ #
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 23:50               ` Mark Knecht
@ 2010-03-31  0:22                 ` Jim Duchek
  0 siblings, 0 replies; 15+ messages in thread
From: Jim Duchek @ 2010-03-31  0:22 UTC (permalink / raw)
  To: Mark Knecht; +Cc: linux-raid

Interesting... You're using the AHCI SATA driver... I'm using
ata_piix.   I begin to think it might be a hardware issue.

Jim


On 30 March 2010 17:50, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 3:21 PM, Mark Knecht <markknecht@gmail.com> wrote:
>> I just finished a long compile on my dad's i5-661/DH55HC machine which
>> uses this same WD drive and I didn't spot any sign of this happening
>> there. That's a very recent Intel chipset also and probably more or
>> less the same SATA controller.
>>
>> I'm going to turn on the kernel message into dmesg thing for a while
>> and see if anything pops up.
>>
>> I can set up some additional partitions on my local drive to test
>> other file systems but since you're ext3 and I'm ext3 then it's not
>> that unless the problem moved forward with code over time.
>>
>> I like the idea of using dd but I want to be careful about that sort
>> of thing. I've not used dd before, but if I could tell it to write a
>> gigabyte without messing up existing stuff then that could be helpful.
>>
>> Back later,
>> Mark
>>
>> On Tue, Mar 30, 2010 at 1:59 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>>> I'm using ext4 on everything, but it's hard to judge which ext3 bugs
>>> might affect ext4 as well.  I really don't have the ability to
>>> destructively test the array, I need all the data that's on it and I
>>> don't have enough spare space elsewhere to back it all up.  You might
>>> see if you can trigger it with dd, writing to the drive directly w/no
>>> filesystem?
>>>
>>> Jim
>>>
>
> <SNIP>
>
> I know this isn't going to survive email very well but you might want
> to look at interrupts. I'm seeing the count on CPU #5 rising much more
> quickly than other CPU's, and in my case it's generally CPU #5 that
> stalls out with this 100% wait problem.
>
> I'm looking at another 4 processor machine that's been up for a few
> days. Its interrupt counts are fairly balanced, except for TLB
> Shootdowns, whatever that is.
>
> Wouldn't know how to tell if it's related...
>
> - Mark
>
> Using keyboard-interactive authentication.
> Password:
> Last login: Tue Mar 30 15:59:22 PDT 2010 from 192.168.1.65 on pts/0
> keeper ~ # cat /proc/interrupts
>           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>      CPU6       CPU7
>  0:        232          0          0          1          0          0
>         0          0   IO-APIC-edge      timer
>  1:          0          0          0          2          0          0
>         0          0   IO-APIC-edge      i8042
>  3:          0          0          0          2          0          0
>         0          0   IO-APIC-edge
>  8:          0          0          0         91          0          0
>         0          0   IO-APIC-edge      rtc0
>  9:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   acpi
>  12:          0          0          0          4          0          0
>         0          0   IO-APIC-edge      i8042
>  14:          0          0          0          0          0          0
>         0          0   IO-APIC-edge      ide0
>  15:          0          0          0          0          0          0
>         0          0   IO-APIC-edge      ide1
>  16:          0          0          0          0         82          0
>         0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb1, nvidia
>  18:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   uhci_hcd:usb6, ehci_hcd:usb7
>  19:          0          0          0          0          0       3137
>         0          0   IO-APIC-fasteoi   ahci, firewire_ohci,
> uhci_hcd:usb3, uhci_hcd:usb5
>  20:          0          0          0          0          0          0
>       265          0   IO-APIC-fasteoi   eth0
>  21:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   uhci_hcd:usb2
>  22:        154          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   hda_intel
>  23:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   uhci_hcd:usb4, ehci_hcd:usb8
> NMI:          0          0          0          0          0          0
>         0          0   Non-maskable interrupts
> LOC:       7048       6722       3577       3598       3491       8425
>      3756       3569   Local timer interrupts
> SPU:          0          0          0          0          0          0
>         0          0   Spurious interrupts
> PMI:          0          0          0          0          0          0
>         0          0   Performance monitoring interrupts
> PND:          0          0          0          0          0          0
>         0          0   Performance pending work
> RES:        335        332        353        259        176        173
>       251         82   Rescheduling interrupts
> CAL:        242        233        258        180        241        160
>       260        260   Function call interrupts
> TLB:        232        242        270        235        342        474
>       537        497   TLB shootdowns
> TRM:          0          0          0          0          0          0
>         0          0   Thermal event interrupts
> THR:          0          0          0          0          0          0
>         0          0   Threshold APIC interrupts
> MCE:          0          0          0          0          0          0
>         0          0   Machine check exceptions
> MCP:          2          2          2          2          2          2
>         2          2   Machine check polls
> ERR:          7
> MIS:          0
> keeper ~ # date
> Tue Mar 30 16:45:13 PDT 2010
> keeper ~ # cat /proc/interrupts
>           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>      CPU6       CPU7
>  0:        232          0          0          9          0          0
>         0          0   IO-APIC-edge      timer
>  1:          0          0          0          2          0          0
>         0          0   IO-APIC-edge      i8042
>  3:          0          0          0          2          0          0
>         0          0   IO-APIC-edge
>  8:          0          0          0         91          0          0
>         0          0   IO-APIC-edge      rtc0
>  9:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   acpi
>  12:          0          0          0          4          0          0
>         0          0   IO-APIC-edge      i8042
>  14:          0          0          0          0          0          0
>         0          0   IO-APIC-edge      ide0
>  15:          0          0          0          0          0          0
>         0          0   IO-APIC-edge      ide1
>  16:          0          0          0          0       2660          0
>         0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb1, nvidia
>  18:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   uhci_hcd:usb6, ehci_hcd:usb7
>  19:          0          0          0          0          0      20762
>         0          0   IO-APIC-fasteoi   ahci, firewire_ohci,
> uhci_hcd:usb3, uhci_hcd:usb5
>  20:          0          0          0          0          0          0
>      1903          0   IO-APIC-fasteoi   eth0
>  21:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   uhci_hcd:usb2
>  22:        154          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   hda_intel
>  23:          0          0          0          0          0          0
>         0          0   IO-APIC-fasteoi   uhci_hcd:usb4, ehci_hcd:usb8
> NMI:          0          0          0          0          0          0
>         0          0   Non-maskable interrupts
> LOC:      10618      11998       8756       6940       6484      22076
>      7456       6599   Local timer interrupts
> SPU:          0          0          0          0          0          0
>         0          0   Spurious interrupts
> PMI:          0          0          0          0          0          0
>         0          0   Performance monitoring interrupts
> PND:          0          0          0          0          0          0
>         0          0   Performance pending work
> RES:        335        332        353        259        176        173
>       251         82   Rescheduling interrupts
> CAL:        242        233        258        180        241        160
>       260        260   Function call interrupts
> TLB:        232        243        270        236        343        475
>       538        497   TLB shootdowns
> TRM:          0          0          0          0          0          0
>         0          0   Thermal event interrupts
> THR:          0          0          0          0          0          0
>         0          0   Threshold APIC interrupts
> MCE:          0          0          0          0          0          0
>         0          0   Machine check exceptions
> MCP:         10         10         10         10         10         10
>        10         10   Machine check polls
> ERR:          7
> MIS:          0
> keeper ~ #
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
  2010-03-30 17:18 ` Mark Knecht
@ 2010-03-31  1:35 ` Roger Heflin
  2010-03-31 16:12   ` Mark Knecht
  2010-03-31 16:37 ` Asdo
  2 siblings, 1 reply; 15+ messages in thread
From: Roger Heflin @ 2010-03-31  1:35 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

Jim Duchek wrote:
> Hi all.  Regularly after a large write to the disk (untarring a very
> large file, etc), my RAID5 will 'freeze' for a period of time --
> perhaps around a minute.  My system is completely responsive otherwise
> during this time, with the exception of anything that is attempting to
> read or write from the array -- it's as if any file descriptors simply
> block.  Nothing disk/raid-related is written to the logs during this
> time.  The array is mounted as /home -- so an awful lot of things
> completely freeze during this time (web browser, any video that is
> running, etc).  The disks don't seem to be actually accessed during
> this time (I can't hear them, and the disk access light stays off),
> and it's not as if it's just reading slowly -- it's not reading at
> all.   Array performance is completely normal before and after the
> freeze and simply non-existent during it.  The root disk (which is on
> a seperate disk entirely from the RAID) runs fine during this time, as
> does everything else (network, video card, etc -- as long it doesn't
> touch the array) -- for example, a Terminal window open is still
> responsive during the freeze, and 'ls /' would work fine, while 'ls
> /home' would block until the 'freeze' is over.
> 
> Some more detailed information on my setup attached.  It's pretty
> vanilla.  Unfortunately this started around the time four things
> happened -- a kernel upgrade to 2.6.32, upgrading my filesystems to
> ext4, replacing a disk gone bad in the RAID, and a video card change.
> I would assume one of these is the culprit, but you know what they say
> about 'assume'.  I cannot reproduce the problem reliably, but it
> happens a couple times a day.  My questions are these:
> 
> 1. Is there any way to turn on more detailed logging for the RAID
> system in the kernel?  The wiki or a google search makes no mention I
> can find, and mdadm doesn't put anything out during this time.
> 2. Possibly a problem with the SATA system?  My root drive is PATA --
> my RAID disks are all SATA.
> 2. Uh, any other ideas? :)
> 
> 
> Thanks, all.
> 
> Jim Duchek
> 
> 
> 
> 
> 
> [jrduchek@jimbob ~]$ uname -a
> Linux jimbob 2.6.32-ARCH #1 SMP PREEMPT Mon Mar 15 20:44:03 CET 2010
> x86_64 Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz GenuineIntel
> GNU/Linux
> 
> [jrduchek@jimbob ~]$ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[0] sde1[3] sdd1[2] sdc1[1]
>       1465151808 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
> 
> unused devices: <none>
> 
> 
> [jrduchek@jimbob ~]$ mount
> /dev/sda3 on / type ext4 (rw,noatime,user_xattr)
> udev on /dev type tmpfs (rw,nosuid,relatime,size=10240k,mode=755)
> none on /proc type proc (rw,relatime)
> none on /sys type sysfs (rw,relatime)
> none on /dev/pts type devpts (rw)
> none on /dev/shm type tmpfs (rw)
> /dev/sda1 on /boot type ext2 (rw)
> /dev/md0 on /home type ext4 (rw,noatime,user_xattr)
> 
> [jrduchek@jimbob ~]$ more /etc/rc.local
> #!/bin/bash
> #
> # /etc/rc.local: Local multi-user startup script.
> #
> 
> echo 8192 > /sys/block/md0/md/stripe_cache_size
> blockdev --setra 32768 /dev/md0
> blockdev --setfra 32768 /dev/md0
> 
> 
> 
> dmesg (relevant):
> 
> 
> 
> 
> ata3: SATA max UDMA/133 cmd 0xc400 ctl 0xc080 bmdma 0xb880 irq 19
> ata4: SATA max UDMA/133 cmd 0xc000 ctl 0xbc00 bmdma 0xb888 irq 19
> ata3.00: ATA-7: WDC WD5000AAJS-22TKA0, 12.01C01, max UDMA/133
> ata3.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata3.01: ATA-8: WDC WD5002ABYS-02B1B0, 02.03B03, max UDMA/133
> ata3.01: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata3.00: configured for UDMA/133
> ata3.01: configured for UDMA/133
> ata4.00: ATA-7: WDC WD5000AAJS-22TKA0, 12.01C01, max UDMA/133
> ata4.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata4.01: ATA-7: WDC WD5000AAJS-22TKA0, 12.01C01, max UDMA/133
> ata4.01: 976773168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata4.00: configured for UDMA/133
> ata4.01: configured for UDMA/133
> ata1.00: ATA-7: MAXTOR STM3160815A, 3.AAD, max UDMA/100
> ata1.00: 312581808 sectors, multi 16: LBA48
> ata1.01: ATAPI: LITE-ON DVDRW LH-20A1P, KL0G, max UDMA/66
> ata1.00: configured for UDMA/100
> ata1.01: configured for UDMA/66
> scsi 0:0:0:0: Direct-Access     ATA      MAXTOR STM316081 3.AA PQ: 0 ANSI: 5
> scsi 0:0:1:0: CD-ROM            LITE-ON  DVDRW LH-20A1P   KL0G PQ: 0 ANSI: 5
> scsi 2:0:0:0: Direct-Access     ATA      WDC WD5000AAJS-2 12.0 PQ: 0 ANSI: 5
> scsi 2:0:1:0: Direct-Access     ATA      WDC WD5002ABYS-0 02.0 PQ: 0 ANSI: 5
> scsi 3:0:0:0: Direct-Access     ATA      WDC WD5000AAJS-2 12.0 PQ: 0 ANSI: 5
> scsi 3:0:1:0: Direct-Access     ATA      WDC WD5000AAJS-2 12.0 PQ: 0 ANSI: 5
> sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> sd 2:0:1:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> sd 0:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
> sd 3:0:0:0: [sdd] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 3:0:0:0: [sdd] Write Protect is off
> sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 2:0:0:0: [sdb] Write Protect is off
> sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdd:
>  sda:
>  sdb:
> sd 2:0:1:0: [sdc] Write Protect is off
> sd 2:0:1:0: [sdc] Mode Sense: 00 3a 00 00
> sd 2:0:1:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdc: sdb1
>  sdd1
> sd 3:0:0:0: [sdd] Attached SCSI disk
> sd 3:0:1:0: [sde] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> sd 3:0:1:0: [sde] Write Protect is off
> sd 3:0:1:0: [sde] Mode Sense: 00 3a 00 00
> sd 3:0:1:0: [sde] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sde: sde1
> sd 3:0:1:0: [sde] Attached SCSI disk
>  sda1 sda2 sda3
>  sdc1
> sd 0:0:0:0: [sda] Attached SCSI disk
> 
> sd 2:0:0:0: [sdb] Attached SCSI disk
> sd 2:0:1:0: [sdc] Attached SCSI disk
> 
> md: md0 stopped.
> md: bind<sdc1>
> md: bind<sdd1>
> md: bind<sde1>
> md: bind<sdb1>
> async_tx: api initialized (async)
> xor: automatically using best checksumming function: generic_sse
>    generic_sse:  7597.200 MB/sec
> xor: using function: generic_sse (7597.200 MB/sec)
> raid6: int64x1   1567 MB/s
> raid6: int64x2   1994 MB/s
> raid6: int64x4   1582 MB/s
> raid6: int64x8   1427 MB/s
> raid6: sse2x1    3698 MB/s
> raid6: sse2x2    4184 MB/s
> raid6: sse2x4    5888 MB/s
> raid6: using algorithm sse2x4 (5888 MB/s)
> md: raid6 personality registered for level 6
> md: raid5 personality registered for level 5
> md: raid4 personality registered for level 4
> raid5: device sdb1 operational as raid disk 0
> raid5: device sde1 operational as raid disk 3
> raid5: device sdd1 operational as raid disk 2
> raid5: device sdc1 operational as raid disk 1
> raid5: allocated 4272kB for md0
> 0: w=1 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
> 3: w=2 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
> 2: w=3 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
> 1: w=4 pa=0 pr=4 m=1 a=2 r=4 op1=0 op2=0
> raid5: raid level 5 set md0 active with 4 out of 4 devices, algorithm 2
> RAID5 conf printout:
>  --- rd:4 wd:4
>  disk 0, o:1, dev:sdb1
>  disk 1, o:1, dev:sdc1
>  disk 2, o:1, dev:sdd1
>  disk 3, o:1, dev:sde1
> md0: detected capacity change from 0 to 1500315451392
>  md0: unknown partition table
> EXT4-fs (md0): mounted filesystem with ordered data mode
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

In /etc/sysctl.conf or with "sysctl -a|grep vm.dirty" check these two 
settings:
vm.dirty_background_ratio 5
vm.dirty_ratio = 6

Default will be something like 40 for the second one and 10 for the 
first on.

40% is how much memory the kernel lets get dirty with write data, 10% 
or whatever the bottom number is, is once it starts cleaning it up how 
low it has to go before letting anyone else write again (ie freeze all 
writes and massively slow down reads)

I set the values to the above, in older kernels 5 is the min value, 
newer ones may allow lower, I don't believe it is well documented what 
the limits are, and if you set it lower the older kernels silently set 
the value to the min internally in the kernel, you won't see it on 
sysctl -a check.   So on my machine I could freeze for how long it 
takes to write 1% of memory out to disk, which with 8GB is 81MB which 
takes at most a second or 2 at 60mb/second or so.  If you have 8G and 
have the difference between the two set to 10% it can take 10+ 
seconds, I don't remember the default, but the large it is the bigger 
the freeze will be.

And these depends on the underlying disk speed, if the underlying disk 
is slower the time it takes to write out that amount of data is larger 
and things are uglier, and file copies do a good job of causing this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-31  1:35 ` Roger Heflin
@ 2010-03-31 16:12   ` Mark Knecht
  2010-03-31 16:25     ` Jim Duchek
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Knecht @ 2010-03-31 16:12 UTC (permalink / raw)
  To: Roger Heflin; +Cc: Jim Duchek, linux-raid

On Tue, Mar 30, 2010 at 6:35 PM, Roger Heflin <rogerheflin@gmail.com> wrote:
> Jim Duchek wrote:
>>
>> Hi all.  Regularly after a large write to the disk (untarring a very
>> large file, etc), my RAID5 will 'freeze' for a period of time --
>> perhaps around a minute.  My system is completely responsive otherwise
>> during this time, with the exception of anything that is attempting to
>> read or write from the array -- it's as if any file descriptors simply
>> block.
<SNIP>
>
> In /etc/sysctl.conf or with "sysctl -a|grep vm.dirty" check these two
> settings:
> vm.dirty_background_ratio 5
> vm.dirty_ratio = 6
>
> Default will be something like 40 for the second one and 10 for the first
> on.
>
> 40% is how much memory the kernel lets get dirty with write data, 10% or
> whatever the bottom number is, is once it starts cleaning it up how low it
> has to go before letting anyone else write again (ie freeze all writes and
> massively slow down reads)
>
> I set the values to the above, in older kernels 5 is the min value, newer
> ones may allow lower, I don't believe it is well documented what the limits
> are, and if you set it lower the older kernels silently set the value to the
> min internally in the kernel, you won't see it on sysctl -a check.   So on
> my machine I could freeze for how long it takes to write 1% of memory out to
> disk, which with 8GB is 81MB which takes at most a second or 2 at
> 60mb/second or so.  If you have 8G and have the difference between the two
> set to 10% it can take 10+ seconds, I don't remember the default, but the
> large it is the bigger the freeze will be.
>
> And these depends on the underlying disk speed, if the underlying disk is
> slower the time it takes to write out that amount of data is larger and
> things are uglier, and file copies do a good job of causing this.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Very interesting Roger. Thanks.

I did some reading on a couple of web site and then did some testing.
I found for the sort of jobs I do that create and write data, as an
example compiling and installing MythTV, these settings have a big
effect on the percentage of time my system drops into these 100%wa, 0%
CPU type of states. The default setting on my system was 10/20 and
that tended to create this state quite a lot. 3/40 reduced it by
probably 50-75%, while 3/70 seemed to eliminate it until the end of
the build where the kernel/compiler is presumably forcing it out to
disk because the job is finishing.

One page I read mentioned data centers using a very good UPS and
internal power supply and then running it at 1/100. I think the basic
idea is that if we lose power there should be enough time to flush all
this stuff to disk before the power completely drops out but up until
that time let the kernel take care of things completely.

Experimentally what I see is that when I cross above the lower value
it isn't that nothing gets written, but more that the kernel sort of
opportunistically starts writing it to disk without letting it get too
much in the way of running programs, and then when the higher value
seems to get crossed the system goes 100% wait while it pushes the
data out and is waiting for the disk. I used the command

grep -A 1 dirty /proc/vmstat

to watch a compile taking place and looked when it was 100%
user/system and then also when it went to 100% wait.

Some additional reading seems to suggest tuning things like

vm.overcommit_ratio

and possibly changing the I/O scheduler

keeper ~ # cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

or changing the number of requests

keeper ~ # cat /sys/block/sda/queue/nr_requests
128

or read ahead values

keeper ~ # blockdev --getra /dev/sda
256

I haven't played with any of those.

Based on this info I think it's worth my time trying a new RAID
install and see if I'm more successful.

Thanks very much for your insights and help!

Cheers,
Mark

keeper ~ # vi /etc/sysctl.conf

vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

keeper ~ # sysctl -p

real    8m50.667s
user    30m6.995s
sys     1m30.605s
keeper ~ #

keeper ~ # vi /etc/sysctl.conf

vm.dirty_background_ratio = 3
vm.dirty_ratio = 40

keeper ~ # sysctl -p

keeper ~ # time emerge -DuN mythtv
<SNIP>
real    8m59.401s
user    30m9.980s
sys     1m30.303s
keeper ~ #

keeper ~ # vi /etc/sysctl.conf

vm.dirty_background_ratio = 3
vm.dirty_ratio = 70

keeper ~ # time emerge -DuN mythtv
<SNIP>
real    8m52.272s
user    30m0.889s
sys     1m30.609s
keeper ~ #keeper ~ # vi /etc/sysctl.conf
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-31 16:12   ` Mark Knecht
@ 2010-03-31 16:25     ` Jim Duchek
  0 siblings, 0 replies; 15+ messages in thread
From: Jim Duchek @ 2010-03-31 16:25 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Roger Heflin, linux-raid

Agreed, playing with some of these settings appear to clear the
problem up, for at least the cases in which I tend to trigger it.
Much obliged for the help!

Jim


On 31 March 2010 10:12, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 6:35 PM, Roger Heflin <rogerheflin@gmail.com> wrote:
>> Jim Duchek wrote:
>>>
>>> Hi all.  Regularly after a large write to the disk (untarring a very
>>> large file, etc), my RAID5 will 'freeze' for a period of time --
>>> perhaps around a minute.  My system is completely responsive otherwise
>>> during this time, with the exception of anything that is attempting to
>>> read or write from the array -- it's as if any file descriptors simply
>>> block.
> <SNIP>
>>
>> In /etc/sysctl.conf or with "sysctl -a|grep vm.dirty" check these two
>> settings:
>> vm.dirty_background_ratio 5
>> vm.dirty_ratio = 6
>>
>> Default will be something like 40 for the second one and 10 for the first
>> on.
>>
>> 40% is how much memory the kernel lets get dirty with write data, 10% or
>> whatever the bottom number is, is once it starts cleaning it up how low it
>> has to go before letting anyone else write again (ie freeze all writes and
>> massively slow down reads)
>>
>> I set the values to the above, in older kernels 5 is the min value, newer
>> ones may allow lower, I don't believe it is well documented what the limits
>> are, and if you set it lower the older kernels silently set the value to the
>> min internally in the kernel, you won't see it on sysctl -a check.   So on
>> my machine I could freeze for how long it takes to write 1% of memory out to
>> disk, which with 8GB is 81MB which takes at most a second or 2 at
>> 60mb/second or so.  If you have 8G and have the difference between the two
>> set to 10% it can take 10+ seconds, I don't remember the default, but the
>> large it is the bigger the freeze will be.
>>
>> And these depends on the underlying disk speed, if the underlying disk is
>> slower the time it takes to write out that amount of data is larger and
>> things are uglier, and file copies do a good job of causing this.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> Very interesting Roger. Thanks.
>
> I did some reading on a couple of web site and then did some testing.
> I found for the sort of jobs I do that create and write data, as an
> example compiling and installing MythTV, these settings have a big
> effect on the percentage of time my system drops into these 100%wa, 0%
> CPU type of states. The default setting on my system was 10/20 and
> that tended to create this state quite a lot. 3/40 reduced it by
> probably 50-75%, while 3/70 seemed to eliminate it until the end of
> the build where the kernel/compiler is presumably forcing it out to
> disk because the job is finishing.
>
> One page I read mentioned data centers using a very good UPS and
> internal power supply and then running it at 1/100. I think the basic
> idea is that if we lose power there should be enough time to flush all
> this stuff to disk before the power completely drops out but up until
> that time let the kernel take care of things completely.
>
> Experimentally what I see is that when I cross above the lower value
> it isn't that nothing gets written, but more that the kernel sort of
> opportunistically starts writing it to disk without letting it get too
> much in the way of running programs, and then when the higher value
> seems to get crossed the system goes 100% wait while it pushes the
> data out and is waiting for the disk. I used the command
>
> grep -A 1 dirty /proc/vmstat
>
> to watch a compile taking place and looked when it was 100%
> user/system and then also when it went to 100% wait.
>
> Some additional reading seems to suggest tuning things like
>
> vm.overcommit_ratio
>
> and possibly changing the I/O scheduler
>
> keeper ~ # cat /sys/block/sda/queue/scheduler
> noop deadline [cfq]
>
> or changing the number of requests
>
> keeper ~ # cat /sys/block/sda/queue/nr_requests
> 128
>
> or read ahead values
>
> keeper ~ # blockdev --getra /dev/sda
> 256
>
> I haven't played with any of those.
>
> Based on this info I think it's worth my time trying a new RAID
> install and see if I'm more successful.
>
> Thanks very much for your insights and help!
>
> Cheers,
> Mark
>
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 10
> vm.dirty_ratio = 20
>
> keeper ~ # sysctl -p
>
> real    8m50.667s
> user    30m6.995s
> sys     1m30.605s
> keeper ~ #
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 40
>
> keeper ~ # sysctl -p
>
> keeper ~ # time emerge -DuN mythtv
> <SNIP>
> real    8m59.401s
> user    30m9.980s
> sys     1m30.303s
> keeper ~ #
>
>
> keeper ~ # vi /etc/sysctl.conf
>
> vm.dirty_background_ratio = 3
> vm.dirty_ratio = 70
>
> keeper ~ # time emerge -DuN mythtv
> <SNIP>
> real    8m52.272s
> user    30m0.889s
> sys     1m30.609s
> keeper ~ #keeper ~ # vi /etc/sysctl.conf
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Array 'freezes' for some time after large writes?
  2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
  2010-03-30 17:18 ` Mark Knecht
  2010-03-31  1:35 ` Roger Heflin
@ 2010-03-31 16:37 ` Asdo
  2 siblings, 0 replies; 15+ messages in thread
From: Asdo @ 2010-03-31 16:37 UTC (permalink / raw)
  To: Jim Duchek; +Cc: linux-raid

Jim Duchek wrote:
> Hi all.  Regularly after a large write to the disk (untarring a very
> large file, etc), my RAID5 will 'freeze' for a period of time --
> perhaps around a minute.  My system is completely responsive otherwise
> during this time,

Why don't you
  cat /proc/<pid of blocked process>/stack
to see what was the process doing?
Maybe more than once to see if it hangs always on the same syscalls...

I believe this is more useful if there is only 1 access to the array, 
not concurrent access. You need to freeze it with e.g. the tar only,

This might identify filesystem problems. Other problems might be more 
difficult.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-03-31 16:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
2010-03-30 17:18 ` Mark Knecht
2010-03-30 17:47   ` Jim Duchek
2010-03-30 18:00     ` Mark Knecht
2010-03-30 18:05     ` Mark Knecht
2010-03-30 20:32       ` Jim Duchek
2010-03-30 20:45         ` Mark Knecht
2010-03-30 20:59           ` Jim Duchek
2010-03-30 22:21             ` Mark Knecht
2010-03-30 23:50               ` Mark Knecht
2010-03-31  0:22                 ` Jim Duchek
2010-03-31  1:35 ` Roger Heflin
2010-03-31 16:12   ` Mark Knecht
2010-03-31 16:25     ` Jim Duchek
2010-03-31 16:37 ` Asdo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).