* (unknown), @ 2009-04-02 4:16 Lelsie Rhorer 2009-04-02 4:22 ` David Lethe ` (4 more replies) 0 siblings, 5 replies; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-02 4:16 UTC (permalink / raw) To: linux-raid I'm having a severe problem whose root cause I cannot determine. I have a RAID 6 array managed by mdadm running on Debian "Lenny" with a 3.2GHz AMD Athlon 64 x 2 processor and 8G of RAM. There are ten 1 Terabyte SATA drives, unpartitioned, fully allocated to the /dev/md0 device. The drive are served by 3 Silicon Image SATA port multipliers and a Silicon Image 4 port eSATA controller. The /dev/md0 device is also unpartitioned, and all 8T of active space is formatted as a single Reiserfs file system. The entire volume is mounted to /RAID. Various directories on the volume are shared using both NFS and SAMBA. Performance of the RAID system is very good. The array can read and write at over 450 Mbps, and I don't know if the limit is the array itself or the network, but since the performance is more than adequate I really am not concerned which is the case. The issue is the entire array will occasionally pause completely for about 40 seconds when a file is created. This does not always happen, but the situation is easily reproducible. The frequency at which the symptom occurs seems to be related to the transfer load on the array. If no other transfers are in process, then the failure seems somewhat more rare, perhaps accompanying less than 1 file creation in 10.. During heavy file transfer activity, sometimes the system halts with every other file creation. Although I have observed many dozens of these events, I have never once observed it to happen except when a file creation occurs. Reading and writing existing files never triggers the event, although any read or write occurring during the event is halted for the duration. (There is one cron jog which runs every half-hour that creates a tiny file; this is the most common failure vector.) There are other drives formatted with other file systems on the machine, but the issue has never been seen on any of the other drives. When the array runs its regularly scheduled health check, the problem is much worse. Not only does it lock up with almost every single file creation, but the lock-up time is much longer - sometimes in excess of 2 minutes. Transfers via Linux based utilities (ftp, NFS, cp, mv, rsync, etc) all recover after the event, but SAMBA based transfers frequently fail, both reads and writes. How can I troubleshoot and more importantly resolve this issue? ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: 2009-04-02 4:16 (unknown), Lelsie Rhorer @ 2009-04-02 4:22 ` David Lethe 2009-04-05 0:12 ` RE: Lelsie Rhorer 2009-04-02 4:38 ` Strange filesystem slowness with 8TB RAID6 NeilBrown ` (3 subsequent siblings) 4 siblings, 1 reply; 24+ messages in thread From: David Lethe @ 2009-04-02 4:22 UTC (permalink / raw) To: lrhorer, linux-raid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of Lelsie Rhorer > Sent: Wednesday, April 01, 2009 11:16 PM > To: linux-raid@vger.kernel.org > Subject: > > I'm having a severe problem whose root cause I cannot determine. I > have a > RAID 6 array managed by mdadm running on Debian "Lenny" with a 3.2GHz > AMD > Athlon 64 x 2 processor and 8G of RAM. There are ten 1 Terabyte SATA > drives, unpartitioned, fully allocated to the /dev/md0 device. The > drive > are served by 3 Silicon Image SATA port multipliers and a Silicon Image > 4 > port eSATA controller. The /dev/md0 device is also unpartitioned, and > all > 8T of active space is formatted as a single Reiserfs file system. The > entire volume is mounted to /RAID. Various directories on the volume > are > shared using both NFS and SAMBA. > > Performance of the RAID system is very good. The array can read and > write > at over 450 Mbps, and I don't know if the limit is the array itself or > the > network, but since the performance is more than adequate I really am > not > concerned which is the case. > > The issue is the entire array will occasionally pause completely for > about > 40 seconds when a file is created. This does not always happen, but > the > situation is easily reproducible. The frequency at which the symptom > occurs seems to be related to the transfer load on the array. If no > other > transfers are in process, then the failure seems somewhat more rare, > perhaps accompanying less than 1 file creation in 10.. During heavy > file > transfer activity, sometimes the system halts with every other file > creation. Although I have observed many dozens of these events, I have > never once observed it to happen except when a file creation occurs. > Reading and writing existing files never triggers the event, although > any > read or write occurring during the event is halted for the duration. > (There is one cron jog which runs every half-hour that creates a tiny > file; > this is the most common failure vector.) There are other drives > formatted > with other file systems on the machine, but the issue has never been > seen > on any of the other drives. When the array runs its regularly > scheduled > health check, the problem is much worse. Not only does it lock up with > almost every single file creation, but the lock-up time is much longer > - > sometimes in excess of 2 minutes. > > Transfers via Linux based utilities (ftp, NFS, cp, mv, rsync, etc) all > recover after the event, but SAMBA based transfers frequently fail, > both > reads and writes. > > How can I troubleshoot and more importantly resolve this issue? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html I would try to first run hardware diagnostics. Maybe you will get "lucky" and one or more disks will fail diagnostics, which at least means it will be easy to repair the problem. This could very well be situation where you have a lot of bad blocks that have to get restriped, and parity has to be regenerated. Are these the cheap consumer SATA disk drives, or enterprise class disks? David ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: 2009-04-02 4:22 ` David Lethe @ 2009-04-05 0:12 ` Lelsie Rhorer 2009-04-05 0:38 ` Greg Freemyer 2009-04-05 0:45 ` Re: Roger Heflin 0 siblings, 2 replies; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-05 0:12 UTC (permalink / raw) To: linux-raid > I would try to first run hardware diagnostics. Maybe you will get > "lucky" and one or more disks will fail diagnostics, which at least > means it will be easy to repair the problem. > > This could very well be situation where you have a lot of bad blocks > that have to get restriped, and parity has to be regenerated. Are > these the cheap consumer SATA disk drives, or enterprise class disks? I don't buy that for a second. First of all, restriping parity can and does occur in the background. Secondly, how is it the system writes many terrabytes of data post file creation, then chokes on a 0 byte file? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2009-04-05 0:12 ` RE: Lelsie Rhorer @ 2009-04-05 0:38 ` Greg Freemyer 2009-04-05 5:05 ` Lelsie Rhorer 2009-04-05 0:45 ` Re: Roger Heflin 1 sibling, 1 reply; 24+ messages in thread From: Greg Freemyer @ 2009-04-05 0:38 UTC (permalink / raw) To: lrhorer; +Cc: linux-raid On Sat, Apr 4, 2009 at 8:12 PM, Lelsie Rhorer <lrhorer@satx.rr.com> wrote: >> I would try to first run hardware diagnostics. Maybe you will get >> "lucky" and one or more disks will fail diagnostics, which at least >> means it will be easy to repair the problem. >> >> This could very well be situation where you have a lot of bad blocks >> that have to get restriped, and parity has to be regenerated. Are >> these the cheap consumer SATA disk drives, or enterprise class disks? > > > I don't buy that for a second. First of all, restriping parity can and does > occur in the background. Secondly, how is it the system writes many > terrabytes of data post file creation, then chokes on a 0 byte file? > Alternate theory - serious fsync performance issue I don't know if it's related, but there is a lot of recent discussion related to fsync causing large delays in ext3. Linus is saying his highspeed SDD is seeing multisecond delays. He is very frustrated because the SDD should be more or less instantaneous. The current thread is http://markmail.org/message/adiyz3lri6tlueaf In one of the other threads I saw someone saying that in one test they had a fsync() call take minutes to return. Apparently no one yet fully understands what is going on. Seems like something that could in some way be related to what you are seeing. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: 2009-04-05 0:38 ` Greg Freemyer @ 2009-04-05 5:05 ` Lelsie Rhorer 2009-04-05 11:42 ` Greg Freemyer 0 siblings, 1 reply; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-05 5:05 UTC (permalink / raw) To: linux-raid > Alternate theory - serious fsync performance issue > > I don't know if it's related, but there is a lot of recent discussion > related to fsync causing large delays in ext3. Linus is saying his > highspeed SDD is seeing multisecond delays. He is very frustrated > because the SDD should be more or less instantaneous. > > The current thread is http://markmail.org/message/adiyz3lri6tlueaf > > In one of the other threads I saw someone saying that in one test they > had a fsync() call take minutes to return. Apparently no one yet > fully understands what is going on. Seems like something that could > in some way be related to what you are seeing. Well, it could be. I tried flushing the cashes numerous times while testing, but I never could see it made a difference one way or the other. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2009-04-05 5:05 ` Lelsie Rhorer @ 2009-04-05 11:42 ` Greg Freemyer 0 siblings, 0 replies; 24+ messages in thread From: Greg Freemyer @ 2009-04-05 11:42 UTC (permalink / raw) To: lrhorer; +Cc: linux-raid On Sun, Apr 5, 2009 at 1:05 AM, Lelsie Rhorer <lrhorer@satx.rr.com> wrote: >> Alternate theory - serious fsync performance issue >> >> I don't know if it's related, but there is a lot of recent discussion >> related to fsync causing large delays in ext3. Linus is saying his >> highspeed SDD is seeing multisecond delays. He is very frustrated >> because the SDD should be more or less instantaneous. >> >> The current thread is http://markmail.org/message/adiyz3lri6tlueaf >> >> In one of the other threads I saw someone saying that in one test they >> had a fsync() call take minutes to return. Apparently no one yet >> fully understands what is going on. Seems like something that could >> in some way be related to what you are seeing. > > Well, it could be. I tried flushing the cashes numerous times while > testing, but I never could see it made a difference one way or the other. > In a separate thread you said it was reiser and what I have seen discussed is ext3, so you may be safe from that bug. As to flushing caches, I don't think that is the same thing. This bug specifically impacts fsyncs on a small file while a heavy i/o load is underway via other processes. The elevators were being discussed as part of the problem and fsync triggers different elevator logic than sync or drop_caches does. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2009-04-05 0:12 ` RE: Lelsie Rhorer 2009-04-05 0:38 ` Greg Freemyer @ 2009-04-05 0:45 ` Roger Heflin 2009-04-05 5:21 ` Lelsie Rhorer 1 sibling, 1 reply; 24+ messages in thread From: Roger Heflin @ 2009-04-05 0:45 UTC (permalink / raw) To: lrhorer; +Cc: linux-raid Lelsie Rhorer wrote: >> I would try to first run hardware diagnostics. Maybe you will get >> "lucky" and one or more disks will fail diagnostics, which at least >> means it will be easy to repair the problem. >> >> This could very well be situation where you have a lot of bad blocks >> that have to get restriped, and parity has to be regenerated. Are >> these the cheap consumer SATA disk drives, or enterprise class disks? > > > I don't buy that for a second. First of all, restriping parity can and does > occur in the background. Secondly, how is it the system writes many > terrabytes of data post file creation, then chokes on a 0 byte file? > You should note that the drive won't know a sector it just wrote is bad until it reads it....are you sure you actually successfully wrote all of that data and that it is still there? And it is not the writes that kill when you have a drive going bad, it is the reads of the bad sectors. And to create a file, a number of things will likely need to be read to finish the file creation, and if one of those is a bad sector things get ugly. ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: 2009-04-05 0:45 ` Re: Roger Heflin @ 2009-04-05 5:21 ` Lelsie Rhorer 2009-04-05 5:33 ` RE: David Lethe 0 siblings, 1 reply; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-05 5:21 UTC (permalink / raw) To: linux-raid > You should note that the drive won't know a sector it just wrote is > bad until it reads it Yes, but it also won't halt the write for 40 seconds because it was bad. More to the point, there is no difference at the drive level between a bad sector written for a 30Gb file and a 30 byte file. > ....are you sure you actually successfully wrote all of that data and that >it is still there? Pretty sure, yeah. There are no errors in the filesystem, and every file I have written works. Again, however, the point is there is never a problem once the file is created, no matter how long it takes to write it out to disk. The moment the file is created, however, there may be up to a 2 minute delay in writing its data to the drive. > And it is not the writes that kill when you have a drive going bad, it > is the reads of the bad sectors. And to create a file, a number of > things will likely need to be read to finish the file creation, and if > one of those is a bad sector things get ugly. Well, I agree to some extent, except that why would it be loosely related to the volume of drive activity, and why is it 5 drives stop reading altogether and 5 do not? Furthermore, every single video file gets read, re-written, edited, re-written again, and finally read again at least once, sometimes several times, before being finally archived. Why does the kernel log never report any errors of any sort? ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: 2009-04-05 5:21 ` Lelsie Rhorer @ 2009-04-05 5:33 ` David Lethe 2009-04-05 8:14 ` RAID halting Lelsie Rhorer 0 siblings, 1 reply; 24+ messages in thread From: David Lethe @ 2009-04-05 5:33 UTC (permalink / raw) To: lrhorer, linux-raid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of Lelsie Rhorer > Sent: Sunday, April 05, 2009 12:21 AM > To: linux-raid@vger.kernel.org > Subject: RE: > > > You should note that the drive won't know a sector it just wrote is > > bad until it reads it > > Yes, but it also won't halt the write for 40 seconds because it was > bad. > More to the point, there is no difference at the drive level between a > bad > sector written for a 30Gb file and a 30 byte file. > > > ....are you sure you actually successfully wrote all of that data and > that > >it is still there? Pretty sure, yeah. There are no errors in the > filesystem, and every file I have written works. Again, however, the > point > is there is never a problem once the file is created, no matter how > long it > takes to write it out to disk. The moment the file is created, > however, > there may be up to a 2 minute delay in writing its data to the drive. > > > And it is not the writes that kill when you have a drive going bad, > it > > is the reads of the bad sectors. And to create a file, a number of > > things will likely need to be read to finish the file creation, and > if > > one of those is a bad sector things get ugly. > > Well, I agree to some extent, except that why would it be loosely > related to > the volume of drive activity, and why is it 5 drives stop reading > altogether > and 5 do not? Furthermore, every single video file gets read, re- > written, > edited, re-written again, and finally read again at least once, > sometimes > several times, before being finally archived. Why does the kernel log > never > report any errors of any sort? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html All of what you report is still consistent with delays caused by having to remap bad blocks The O/S will not report recovered errors, as this gets done internally by the disk drive, and the O/S never learns about it. (Queue depth settings can account for some of the other "weirdness" you reported. Really, if this was my system I would run non-destructive read tests on all blocks; along with the embedded self-test on the disk. It is often a lot easier and more productive to eliminate what ISN'T the problem rather than chase all of the potential reasons for the problem. ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-05 5:33 ` RE: David Lethe @ 2009-04-05 8:14 ` Lelsie Rhorer 0 siblings, 0 replies; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-05 8:14 UTC (permalink / raw) To: linux-raid > All of what you report is still consistent with delays caused by having > to remap bad blocks I disagree. If it happened with some frequency during ordinary reads, then I would agree. If it happened without respect to the volume of reads and writes on the system, then I would be less inclined to disagree. > The O/S will not report recovered errors, as this gets done internally > by the disk drive, and the O/S never learns about it. (Queue depth SMART is supposed to report this, and rarely the kernel log does report a block of sectors being marked bad by the controller. I cannot speak to the notion SMART's reporting of relocated sectors and failed relocations may not be accurate, as I have no means to verify. Actually, I should amend the first sentence, because while the ten drives in the array are almost never reporting any errors, there is another drive in the chassis which is chunking out error reports like a farm boy spitting out watermelon seeds. I had a 320G drive in another system which was behaving erratically, so I moved it to the array chassis on this machine to eliminate it being a cable or the drive controller. It's reporting blocks being marked bad all over the place. > Really, if this was my system I would run non-destructive read tests on > all blocks; How does one do this? Or rather, isn't this what the monthly mdadm resync does? > along with the embedded self-test on the disk. It is often How does one do this? > a lot easier and more productive to eliminate what ISN'T the problem > rather than chase all of the potential reasons for the problem. I agree, which is why I am asking for troubleshooting methods and utilities. The monthly RAID array resync started a few minutes ago, and it is providing some interesting results. The number of blocks read / second is consistently 13,000 - 24,000 on all ten drives. There were no other drive accesses of any sort at the time, so the number of blocks written was flat zero on all drives in the array. I copied the /etc/hosts file to the RAID array, and instantly the file system locked, but the array resync *DID NOT*. The number of blocks read and written per second continued to range from 13,000 to 24,000 blocks/second, with no apparent halt or slow-down at all, not even for one second. So if it's a drive error, why are file system reads halted almost completely, and writes halted altogether, yet drive reads at the RAID array level continue unabated at an aggregate of more than 130,000 blocks - 240,000 blocks (500 - 940 megabits) per second? I tried a second copy and again the file system accesses to the drives halted altogether. The block reads (which had been alternating with writes after the new transfer proceses were implemented) again jumped to between 13,000 and 24,000. This time I used a stopwatch, and the halt was 18 minutes 21 seconds - I believe the longest ever. There is absolutely no way it would take a drive almost 20 minutes to mark a block bad. The dirty blocks grew to more than 78 Megabytes. I just did a 3rd cp of the /etc/hosts file to the array, and once again it locked the machine for what is likely to be another 15 - 20 minutes. I tried forcing a sync, but it also hung. <Sigh> The next three days are going to be Hell, again. It's going to be all but impossible to edit a file until the RAID resync completes. It's often really bad under ordinary loads, but when the resync is underway, it's beyond absurd. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange filesystem slowness with 8TB RAID6 2009-04-02 4:16 (unknown), Lelsie Rhorer 2009-04-02 4:22 ` David Lethe @ 2009-04-02 4:38 ` NeilBrown 2009-04-04 7:12 ` RAID halting Lelsie Rhorer 2009-04-02 6:56 ` your mail Luca Berra ` (2 subsequent siblings) 4 siblings, 1 reply; 24+ messages in thread From: NeilBrown @ 2009-04-02 4:38 UTC (permalink / raw) To: lrhorer; +Cc: linux-raid On Thu, April 2, 2009 3:16 pm, Lelsie Rhorer wrote: > The issue is the entire array will occasionally pause completely for about > 40 seconds when a file is created. This does not always happen, but the > situation is easily reproducible. The frequency at which the symptom > occurs seems to be related to the transfer load on the array. If no other > transfers are in process, then the failure seems somewhat more rare, > perhaps accompanying less than 1 file creation in 10.. During heavy file > transfer activity, sometimes the system halts with every other file > creation. Although I have observed many dozens of these events, I have > never once observed it to happen except when a file creation occurs. > Reading and writing existing files never triggers the event, although any > read or write occurring during the event is halted for the duration. ... > How can I troubleshoot and more importantly resolve this issue? This sounds like a filesys problem rather than a RAID problem. One thing that can cause this sort of behaviour is if the filesystem is in the middle of a sync and has to complete it before the create can complete, and the sync is writing out many megabytes of data. You can see if this is happening by running watch 'grep Dirty /proc/meminfo' if that is large when the hang starts, and drops down to zero, and the hang lets go when it hits (close to) zero, then this is the problem. The answer then is to keep that number low by writing a suitable number into /proc/sys/vm/dirty_ratio (a percentage of system RAM) or /proc/sys/vm/dirty_bytes If that doesn't turn out to be the problem, then knowing how the "Dirty" count is behaving might still be useful, and I would probably look at what processes are in 'D' state, (ps axgu) and look at their stack (/proc/$PID/stack).. You didn't say what kernel you are running. It might make a difference. NeilBrown ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-02 4:38 ` Strange filesystem slowness with 8TB RAID6 NeilBrown @ 2009-04-04 7:12 ` Lelsie Rhorer 2009-04-04 12:38 ` Roger Heflin 0 siblings, 1 reply; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-04 7:12 UTC (permalink / raw) To: 'Linux RAID' >This sounds like a filesys problem rather than a RAID problem. I considered that. It may well be. >One thing that can cause this sort of behaviour is if the filesystem is in >the middle of a sync and has to complete it before the create can >complete, and the sync is writing out many megabytes of data. For between 40 seconds and 2 minutes? The drive subsystem can easily gulp down 200 megabytes to 6000 megabytes in that period of time. What synch would be that large? Secondly, the problem occurs also when there is relatively little or no data being written to the array. Finally, unless I am misunderstanding at what layers iostat and atop are reporting the traffic, the fact all drive writes invariably fall to dead zero during an event and reads on precisely half the drives (and always the same drives) drop to dead zero suggests to me this is not the case. >You can see if this is happening by running > watch 'grep Dirty /proc/meminfo' >if that is large when the hang starts, and drops down to zero, and the >hang lets go when it hits (close to) zero, then this is the problem. Thanks, I'll give it a try later today. Right now I am dead tired, plus there are some batches running I really don't want interrupted, and triggering an event might halt them. >The answer then is to keep that number low by writing a suitable >number into > /proc/sys/vm/dirty_ratio (a percentage of system RAM) >or > /proc/sys/vm/dirty_bytes Um, OK. What would constitute suitable numbers, assuming it turns out to be the issue? >If that doesn't turn out to be the problem, then knowing how the >"Dirty" count is behaving might still be useful, and I would probably >look at what processes are in 'D' state, (ps axgu) and look at their >stack (/proc/$PID/stack).. I'll surely try that, too. >You didn't say what kernel you are running. It might make a difference. >NeilBrown Oh, sorry! 2.6.26-1-amd64 4G of RAM, with typically 600 - 800M in use. The swap space is 4.7G, but the used swap space has never exceeded 200K. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: RAID halting 2009-04-04 7:12 ` RAID halting Lelsie Rhorer @ 2009-04-04 12:38 ` Roger Heflin 0 siblings, 0 replies; 24+ messages in thread From: Roger Heflin @ 2009-04-04 12:38 UTC (permalink / raw) To: lrhorer; +Cc: 'Linux RAID' Lelsie Rhorer wrote: >> This sounds like a filesys problem rather than a RAID problem. > > I considered that. It may well be. > >> One thing that can cause this sort of behaviour is if the filesystem is in >> the middle of a sync and has to complete it before the create can >> complete, and the sync is writing out many megabytes of data. > > For between 40 seconds and 2 minutes? The drive subsystem can easily gulp > down 200 megabytes to 6000 megabytes in that period of time. What synch > would be that large? Secondly, the problem occurs also when there is > relatively little or no data being written to the array. Finally, unless I > am misunderstanding at what layers iostat and atop are reporting the > traffic, the fact all drive writes invariably fall to dead zero during an > event and reads on precisely half the drives (and always the same drives) > drop to dead zero suggests to me this is not the case. > > If you have things setup such that you have lights on the disks, and can check the lights when the event is happening, often if a single drive is being slow it will be the only one with its lights on when things are going bad. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: your mail 2009-04-02 4:16 (unknown), Lelsie Rhorer 2009-04-02 4:22 ` David Lethe 2009-04-02 4:38 ` Strange filesystem slowness with 8TB RAID6 NeilBrown @ 2009-04-02 6:56 ` Luca Berra 2009-04-04 6:44 ` RAID halting Lelsie Rhorer 2009-04-02 7:33 ` Peter Grandi 2009-04-02 13:35 ` Andrew Burgess 4 siblings, 1 reply; 24+ messages in thread From: Luca Berra @ 2009-04-02 6:56 UTC (permalink / raw) To: linux-raid On Wed, Apr 01, 2009 at 11:16:06PM -0500, Lelsie Rhorer wrote: >8T of active space is formatted as a single Reiserfs file system. The .... >The issue is the entire array will occasionally pause completely for about >40 seconds when a file is created. This does not always happen, but the >situation is easily reproducible. The frequency at which the symptom i wonder how costly b-tree operaton are for a 8Tb filesystem... L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-02 6:56 ` your mail Luca Berra @ 2009-04-04 6:44 ` Lelsie Rhorer 0 siblings, 0 replies; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-04 6:44 UTC (permalink / raw) To: 'Linux RAID' >>The issue is the entire array will occasionally pause completely for about >>40 seconds when a file is created. This does not always happen, but the >>situation is easily reproducible. The frequency at which the symptom >i wonder how costly b-tree operaton are for a 8Tb filesystem... I expect somewhat costly, but a 4 second to 2 minute halt just to create a file of between 0 and 1000 bytes is ridiculous. That, and as I said, it doesn't always happen, by a long shot, not even when transfers in the 400 Mbps range are underway. OTOH, I have had it happen when only a few Mbps transfers were underway. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2009-04-02 4:16 (unknown), Lelsie Rhorer ` (2 preceding siblings ...) 2009-04-02 6:56 ` your mail Luca Berra @ 2009-04-02 7:33 ` Peter Grandi 2009-04-02 23:01 ` RAID halting Lelsie Rhorer 2009-04-02 13:35 ` Andrew Burgess 4 siblings, 1 reply; 24+ messages in thread From: Peter Grandi @ 2009-04-02 7:33 UTC (permalink / raw) To: Linux RAID > The issue is the entire array will occasionally pause completely > for about 40 seconds when a file is created. [ ... ] During heavy > file transfer activity, sometimes the system halts with every > other file creation. [ ... ] There are other drives formatted > with other file systems on the machine, but the issue has never > been seen on any of the other drives. When the array runs its > regularly scheduled health check, the problem is much worse. [ > ... ] Looks like that either you have hw issues (transfer errors, bad blocks) or more likely the cache flusher and elevator settings have not been tuned for a steady flow. > How can I troubleshoot and more importantly resolve this issue? Well, troubleshooting would require a good understanding of file system design and storage subsystem design, and quite a bit of time. However, for hardware errors check the kernel logs, and for cache flusher and elevator settings check the 'bi'/'bo' numbers of 'vmstat 1' while the pause happens. For a deeper profile of per-drive IO run 'watch iostat 1 2' while this is happening. This might also help indicate drive errors (no IO is happening) or flusher/elevator tuning issues (lots of IO is happening suddenfly). ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-02 7:33 ` Peter Grandi @ 2009-04-02 23:01 ` Lelsie Rhorer 0 siblings, 0 replies; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-02 23:01 UTC (permalink / raw) To: 'Linux RAID' >> The issue is the entire array will occasionally pause completely >> for about 40 seconds when a file is created. [ ... ] During heavy >> file transfer activity, sometimes the system halts with every >> other file creation. [ ... ] There are other drives formatted >> with other file systems on the machine, but the issue has never >> been seen on any of the other drives. When the array runs its >> regularly scheduled health check, the problem is much worse. [ >> ... ] >Looks like that either you have hw issues (transfer errors, bad >blocks) or more likely the cache flusher and elevator settings have >not been tuned for a steady flow. That doesn't sound right. I can read and write all day long at up to 450 Mbps in both directions continuously for hours at a time. It's only when a file is created, even a file of only a few bytes, that the issue occurs, and then not always. Indeed, earlier today I had transfers going with an average throughput of more than 300 Mbps total and despite creating more than 20 new files, not once did the transfers halt. > How can I troubleshoot and more importantly resolve this issue? >Well, troubleshooting would require a good understanding of file >system design and storage subsystem design, and quite a bit of time. >However, for hardware errors check the kernel logs, and for cache >flusher and elevator settings check the 'bi'/'bo' numbers of >'vmstat 1' while the pause happens. I've already done that. There are no errors of any sort in the kernel log. Vmstat only tells me both bi and bo are zero, which we already knew. I've tried ps, iostat, vmstat, and top, and nothing provides anything of any significance I can see except that resierfs is waiting on md, which we already knew, and (as I recall - it's been a couple of weeks) the number of bytes in and out of md0 falls to zero. >For a deeper profile of per-drive IO run 'watch iostat 1 2' while >this is happening. This might also help indicate drive errors (no >IO is happening) or flusher/elevator tuning issues (lots of IO is >happening suddenfly). I'll give it a try. I haven't been able to reproduce the issue today. Usually it's pretty easy. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2009-04-02 4:16 (unknown), Lelsie Rhorer ` (3 preceding siblings ...) 2009-04-02 7:33 ` Peter Grandi @ 2009-04-02 13:35 ` Andrew Burgess 2009-04-04 5:57 ` RAID halting Lelsie Rhorer 4 siblings, 1 reply; 24+ messages in thread From: Andrew Burgess @ 2009-04-02 13:35 UTC (permalink / raw) To: lrhorer; +Cc: linux-raid On Wed, 2009-04-01 at 23:16 -0500, Lelsie Rhorer wrote: > The issue is the entire array will occasionally pause completely for about > 40 seconds when a file is created. I had symptoms like this once. It turned out to be a defective disk. The disk would never return a read or write error but just intermittently took a really long time to respond. I found it by running atop. All the other drives would be running at low utilization and this one drive would be at 100% when the symptoms occurred (which in atop gets colored red so it jumps out at you) ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-02 13:35 ` Andrew Burgess @ 2009-04-04 5:57 ` Lelsie Rhorer 2009-04-04 13:01 ` Andrew Burgess 0 siblings, 1 reply; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-04 5:57 UTC (permalink / raw) To: 'Linux RAID' >> The issue is the entire array will occasionally pause completely for about >> 40 seconds when a file is created. >I had symptoms like this once. It turned out to be a defective disk. The >disk would never return a read or write error but just intermittently >took a really long time to respond. >I found it by running atop. All the other drives would be running at low >utilization and this one drive would be at 100% when the symptoms >occurred (which in atop gets colored red so it jumps out at you) Thanks. I gave this a try, but not being at all familiar with atop, I'm not sure what, if anything, the results mean in terms of any additional diagnostic data. Depending somewhat upon the I/O load on the RAID array, atop sometimes reports the drive utilization on several or all of the drives to be well in excess of 85% - occasionally even 99%, but never flat 100% at any time. Oddly, even under relatively light loads of 20 or 30 Mbps, sometimes the RAID members would show utilization in the high 90s, usually on all the drives on a multiplier channel. I don't know if this is ordinary behavior for atop, but all the drives also periodically disappear from the status display. Additionally, while atop is running and I am using my usual video editor, Video Redo, on a Windows workstation to stream video from the server, every time atop updates, the video and audio skip when reading from a drive not on the RAID array. I did not notice the same behavior from the RAID array. Odd. Anyway, on to the diagnostics. I ran both `atop` and `watch iostat 1 2` concurrently and triggered several events while under heavy load ( >450 Mbps, total ). In atop, drives sdb, sdd, sde, sdg, and sdi consistently disappeared from atop entirely, and writes for the other drives fell to dead zero. Reads fell to a very small number. The iostat session returned information in agreement with atop: both reads and writes for sdb, sdd, sde, sdg, sdi, and md0 all fell to dead zero from nominal values frequently exceeding 20,000 reads / sec and 5000 writes / sec. Meanwhile, writes to sda, sdc, sdf, sdh, and sdj also dropped to dead zero, but reads only fell to between 230 and 256 reads/sec. ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-04 5:57 ` RAID halting Lelsie Rhorer @ 2009-04-04 13:01 ` Andrew Burgess 2009-04-04 14:39 ` Lelsie Rhorer 0 siblings, 1 reply; 24+ messages in thread From: Andrew Burgess @ 2009-04-04 13:01 UTC (permalink / raw) To: lrhorer; +Cc: 'Linux RAID' On Sat, 2009-04-04 at 00:57 -0500, Lelsie Rhorer wrote: > >> The issue is the entire array will occasionally pause completely for > about 40 seconds when a file is created. > > >I had symptoms like this once. It turned out to be a defective disk. The > >disk would never return a read or write error but just intermittently > >took a really long time to respond. > > >I found it by running atop. All the other drives would be running at low > >utilization and this one drive would be at 100% when the symptoms > >occurred (which in atop gets colored red so it jumps out at you) > > Thanks. I gave this a try, but not being at all familiar with atop, I'm not > sure what, if anything, the results mean in terms of any additional > diagnostic data. It's the same info as iostat just in color > Depending somewhat upon the I/O load on the RAID array, > atop sometimes reports the drive utilization on several or all of the drives > to be well in excess of 85% - occasionally even 99%, but never flat 100% at > any time. High 90's is what I ment by 100% :-) > Oddly, even under relatively light loads of 20 or 30 Mbps, > sometimes the RAID members would show utilization in the high 90s, usually > on all the drives on a multiplier channel. I think that's the filesystem buffering and then writing all at once. It's normal if it's periodic; they go briefly to ~100% and then back to ~0%? Did you watch the atop display when the problem occurred? > I don't know if this is ordinary > behavior for atop, but all the drives also periodically disappear from the > status display. That's a config option (and I find the default annoying). atop also sorts the drives by utilization every second which can be a little hard to watch. But if you have the problem I had then that one drive stays at the top of the list when the problem occurs. > Additionally, while atop is running and I am using my usual > video editor, Video Redo, on a Windows workstation to stream video from the > server, every time atop updates, the video and audio skip when reading from > a drive not on the RAID array. I did not notice the same behavior from the > RAID array. Odd. I think this is heavy /proc filesystem access which I have noticed can screw up even realtime processes. > Anyway, on to the diagnostics. > > I ran both `atop` and `watch iostat 1 2` concurrently and triggered several > events while under heavy load ( >450 Mbps, total ). In atop, drives sdb, > sdd, sde, sdg, and sdi consistently disappeared from atop entirely, and > writes for the other drives fell to dead zero. Reads fell to a very small > number. The iostat session returned information in agreement with atop: > both reads and writes for sdb, sdd, sde, sdg, sdi, and md0 all fell to dead > zero from nominal values frequently exceeding 20,000 reads / sec and 5000 > writes / sec. Meanwhile, writes to sda, sdc, sdf, sdh, and sdj also dropped > to dead zero, but reads only fell to between 230 and 256 reads/sec. I used: iostat -t -k -x 1 | egrep -v 'sd.[0-9]' to get percent utilization and not show each partition but just whole drives. For atop you want the -f option to 'fixate' the number of lines so drives with zero utilization don't disappear. If you didn't get constant 100% utilization while the event occurred then I guess you don't have the problem I had. Does the sata multiplier have it's own driver and if so, is it the latest? Any other complaints on the net about it? I would think a problem there would show up as 100% utilization though... And I think you already said the cpu usage is low when the event occurs? No one core at near 100%? (atop would show this too...) ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-04 13:01 ` Andrew Burgess @ 2009-04-04 14:39 ` Lelsie Rhorer 2009-04-04 15:04 ` Andrew Burgess 0 siblings, 1 reply; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-04 14:39 UTC (permalink / raw) To: 'Linux RAID' > I think that's the filesystem buffering and then writing all at once. > It's normal if it's periodic; they go briefly to ~100% and then back to > ~0%? Yes. > > I don't know if this is ordinary > > behavior for atop, but all the drives also periodically disappear from > the > > status display. > > That's a config option (and I find the default annoying). Yeah, me, too. > sorts the drives by utilization every second which can be a little hard > to watch. But if you have the problem I had then that one drive stays at > the top of the list when the problem occurs. No. > I used: > > iostat -t -k -x 1 | egrep -v 'sd.[0-9]' > > to get percent utilization and not show each partition but just whole > drives. Since there are no partitions, it shouldn't make a difference. > For atop you want the -f option to 'fixate' the number of lines so > drives with zero utilization don't disappear. Well, diagnostically, I think the situation is clear. All ten drives stop writing completely. Five of the ten stop reading, and the other five slow their reads to a dribble - always the same five drives. > Does the sata multiplier have it's own driver and if so, is it the > latest? Any other complaints on the net about it? I would think a > problem there would show up as 100% utilization though... Multipliers - three of them, and no, they require no driver. The SI controller's drivers are included in the distro. > And I think you already said the cpu usage is low when the event occurs? > No one core at near 100%? (atop would show this too...) Nowhere near. Typically both cores are running below 25%, depending upon what processes are running, of course. I have the Gnome system monitor up, and the graphs don't spike when the event occurs. Of course, if there is a local drive access process which uses lots of CPU horsepower, such as ffmpeg, then when the array halt occurs, the CPU utilization falls right off. ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-04 14:39 ` Lelsie Rhorer @ 2009-04-04 15:04 ` Andrew Burgess 2009-04-04 15:15 ` Lelsie Rhorer 0 siblings, 1 reply; 24+ messages in thread From: Andrew Burgess @ 2009-04-04 15:04 UTC (permalink / raw) To: lrhorer; +Cc: 'Linux RAID' On Sat, 2009-04-04 at 09:39 -0500, Lelsie Rhorer wrote: > Well, diagnostically, I think the situation is clear. All ten drives stop > writing completely. Five of the ten stop reading, and the other five slow > their reads to a dribble - always the same five drives. So the delay seems to be hiding in the kernel else the userspace tools would see it (they see some kernel stuff too, like utilization) Oprofile is supposed to be good for user and kernel profiling but I don't know if it can find non-cpu bound stuff. There are also a bunch of latency analysis tools in the kernel that were used for realtime tuning, they might show where something is getting stuck. Andrew Morton did alot of work in this area. If the cpu was spinning somewhere it would show as system time so it must be waiting for a timer or some other event (wild guessing). It's as if the i/o completion never arrives but some timer eventually goes off and maybe the i/o is retried and everything gets back on track? But that should cause utilization to go up and you'd think some sort of message... Perhaps the ide list would know of some diagnostic knobs to tweak. It's a puzzler... One last thing, the cpu goes toward 100% idle not wait? ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-04 15:04 ` Andrew Burgess @ 2009-04-04 15:15 ` Lelsie Rhorer 2009-04-04 16:39 ` Andrew Burgess 0 siblings, 1 reply; 24+ messages in thread From: Lelsie Rhorer @ 2009-04-04 15:15 UTC (permalink / raw) To: 'Linux RAID' > Oprofile is supposed to be good for user and kernel profiling but I > don't know if it can find non-cpu bound stuff. There are also a bunch of > latency analysis tools in the kernel that were used for realtime tuning, > they might show where something is getting stuck. Andrew Morton did alot > of work in this area. Do you know if he subscribes to this list? If not, how may I reach him? > If the cpu was spinning somewhere it would show as system time so it > must be waiting for a timer or some other event (wild guessing). It's as > if the i/o completion never arrives but some timer eventually goes off > and maybe the i/o is retried and everything gets back on track? But that > should cause utilization to go up and you'd think some sort of > message... It's also puzzling why the issue is so much worse when the array diagnostic is running. Almost every file creation triggers a halt, and the halt time extends to 2 minutes. This is one thing which makes me tend to think it is an interaction between the file system and the RAID system, rather than either one alone. > Perhaps the ide list would know of some diagnostic knobs to tweak. > One last thing, the cpu goes toward 100% idle not wait? Neither one. If the drive access is tied to a local CPU intensive user application, for example ffmpeg, then of course CPU utilization dips, but if all the drive accesses are via network utilities (ftp, SAMBA, etc), I haven't noticed any change in CPU utilization. Simultaneous reads and writes to other local drives or network drives continue without a hiccough. ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: RAID halting 2009-04-04 15:15 ` Lelsie Rhorer @ 2009-04-04 16:39 ` Andrew Burgess 0 siblings, 0 replies; 24+ messages in thread From: Andrew Burgess @ 2009-04-04 16:39 UTC (permalink / raw) To: lrhorer; +Cc: 'Linux RAID' On Sat, 2009-04-04 at 10:15 -0500, Lelsie Rhorer wrote: > > Oprofile is supposed to be good for user and kernel profiling but I > > don't know if it can find non-cpu bound stuff. There are also a bunch of > > latency analysis tools in the kernel that were used for realtime tuning, > > they might show where something is getting stuck. Andrew Morton did alot > > of work in this area. > > Do you know if he subscribes to this list? If not, how may I reach him? He's on the kernel list which I seldom read nowadays. His email used to be akpm@something I think. He worked on ftrace, documented in the kernel source in /usr/src/linux/Documentation/ftrace.txt An excerpt: "Ftrace is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel. It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space. Although ftrace is the function tracer, it also includes an infrastructure that allows for other types of tracing. Some of the tracers that are currently in ftrace include a tracer to trace context switches, the time it takes for a high priority task to run after it was woken up, the time interrupts are disabled, and more (ftrace allows for tracer plugins, which means that the list of tracers can always grow)." ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2009-04-05 11:42 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-02 4:16 (unknown), Lelsie Rhorer 2009-04-02 4:22 ` David Lethe 2009-04-05 0:12 ` RE: Lelsie Rhorer 2009-04-05 0:38 ` Greg Freemyer 2009-04-05 5:05 ` Lelsie Rhorer 2009-04-05 11:42 ` Greg Freemyer 2009-04-05 0:45 ` Re: Roger Heflin 2009-04-05 5:21 ` Lelsie Rhorer 2009-04-05 5:33 ` RE: David Lethe 2009-04-05 8:14 ` RAID halting Lelsie Rhorer 2009-04-02 4:38 ` Strange filesystem slowness with 8TB RAID6 NeilBrown 2009-04-04 7:12 ` RAID halting Lelsie Rhorer 2009-04-04 12:38 ` Roger Heflin 2009-04-02 6:56 ` your mail Luca Berra 2009-04-04 6:44 ` RAID halting Lelsie Rhorer 2009-04-02 7:33 ` Peter Grandi 2009-04-02 23:01 ` RAID halting Lelsie Rhorer 2009-04-02 13:35 ` Andrew Burgess 2009-04-04 5:57 ` RAID halting Lelsie Rhorer 2009-04-04 13:01 ` Andrew Burgess 2009-04-04 14:39 ` Lelsie Rhorer 2009-04-04 15:04 ` Andrew Burgess 2009-04-04 15:15 ` Lelsie Rhorer 2009-04-04 16:39 ` Andrew Burgess
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).