* performance problems with raid10,f2 @ 2008-03-14 23:11 Keld Jørn Simonsen 2008-03-20 17:28 ` Keld Jørn Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Keld Jørn Simonsen @ 2008-03-14 23:11 UTC (permalink / raw) To: linux-raid Hi I have a 4 drive array with 1 TB Hitachi disks, formatted as raid10,f2 I had some strange observations: 1. while resyncing I could get the raid to give me about 320 MB/s in sequential read, which was good. After resync had been done, and with all 4 drives active, I only get 115 MB/s. 2. while resyncing, the IO rate on each disk is about 27 MB/s - and the rate of each sdisk is about 82 MB/s. Why is this? best regards keld ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-03-14 23:11 performance problems with raid10,f2 Keld Jørn Simonsen @ 2008-03-20 17:28 ` Keld Jørn Simonsen 2008-03-25 5:13 ` Neil Brown 0 siblings, 1 reply; 10+ messages in thread From: Keld Jørn Simonsen @ 2008-03-20 17:28 UTC (permalink / raw) To: linux-raid On Sat, Mar 15, 2008 at 12:11:51AM +0100, Keld Jørn Simonsen wrote: > Hi > > I have a 4 drive array with 1 TB Hitachi disks, formatted as raid10,f2 > > I had some strange observations: > > 1. while resyncing I could get the raid to give me about 320 MB/s in > sequential read, which was good. After resync had been done, and with > all 4 drives active, I only get 115 MB/s. This was reproducable. I dont know what could be wrong. I tried to enlarge my readahed, but the system did not allow me to have more than a 2 MiB readahed, - well that should be ok for a 4 disk array with 256 kiB chunks? I did try to have chunks of 64 kiB - but no luck. It seemed like it is something that the resync process builds up. What could it be? Best regards keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-03-20 17:28 ` Keld Jørn Simonsen @ 2008-03-25 5:13 ` Neil Brown 2008-03-25 10:36 ` Keld Jørn Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Neil Brown @ 2008-03-25 5:13 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: linux-raid On Thursday March 20, keld@dkuug.dk wrote: > On Sat, Mar 15, 2008 at 12:11:51AM +0100, Keld Jørn Simonsen wrote: > > Hi > > > > I have a 4 drive array with 1 TB Hitachi disks, formatted as raid10,f2 > > > > I had some strange observations: > > > > 1. while resyncing I could get the raid to give me about 320 MB/s in > > sequential read, which was good. After resync had been done, and with > > all 4 drives active, I only get 115 MB/s. > > This was reproducable. I dont know what could be wrong. Is this with, or without, your patch to avoid "read-balancing" for raid10/far layouts? It sounds like it is without that patch ???? NeilBrown > I tried to enlarge my readahed, but the system did not allow me to have > more than a 2 MiB readahed, - well that should be ok for a 4 disk array > with 256 kiB chunks? > > I did try to have chunks of 64 kiB - but no luck. > > It seemed like it is something that the resync process builds up. > What could it be? > > Best regards > keld > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-03-25 5:13 ` Neil Brown @ 2008-03-25 10:36 ` Keld Jørn Simonsen 2008-03-25 13:22 ` Peter Grandi 0 siblings, 1 reply; 10+ messages in thread From: Keld Jørn Simonsen @ 2008-03-25 10:36 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On Tue, Mar 25, 2008 at 04:13:28PM +1100, Neil Brown wrote: > On Thursday March 20, keld@dkuug.dk wrote: > > On Sat, Mar 15, 2008 at 12:11:51AM +0100, Keld Jørn Simonsen wrote: > > > Hi > > > > > > I have a 4 drive array with 1 TB Hitachi disks, formatted as raid10,f2 > > > > > > I had some strange observations: > > > > > > 1. while resyncing I could get the raid to give me about 320 MB/s in > > > sequential read, which was good. After resync had been done, and with > > > all 4 drives active, I only get 115 MB/s. > > > > This was reproducable. I dont know what could be wrong. > > Is this with, or without, your patch to avoid "read-balancing" for > raid10/far layouts? > It sounds like it is without that patch ???? > > NeilBrown I tried both without the patch and with the patch, with almost same resulte. Is resync building some table, and could that be it? Or could it be some time of inode traffic? best regards keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-03-25 10:36 ` Keld Jørn Simonsen @ 2008-03-25 13:22 ` Peter Grandi 2008-04-02 21:13 ` Keld Jørn Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Peter Grandi @ 2008-03-25 13:22 UTC (permalink / raw) To: Linux RAID >>>> I have a 4 drive array with 1 TB Hitachi disks, formatted >>>> as raid10,f2 I had some strange observations: 1. while >>>> resyncing I could get the raid to give me about 320 MB/s in >>>> sequential read, which was good. After resync had been >>>> done, and with all 4 drives active, I only get 115 MB/s. [ ... ] >> Is this with, or without, your patch to avoid "read-balancing" >> for raid10/far layouts? It sounds like it is without that >> patch ???? > I tried both without the patch and with the patch, with almost > same resulte. That could be the usual issue with apparent pauses in the stream of IO requests to the array component devices, with the usual workaround of trying 'blockdev --setra 65536 /dev/mdN' and see if sequential reads improve. > Is resync building some table, and could that be it? Or could > it be some time of inode traffic? One good way to see what is actually happening is to use either 'watch iostat -k 1 2' and look at the load on the individual MD array component devices, or use 'sysctl vm/block_dump=1' and look at the addresses being read or written. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-03-25 13:22 ` Peter Grandi @ 2008-04-02 21:13 ` Keld Jørn Simonsen 2008-04-03 20:20 ` Peter Grandi 0 siblings, 1 reply; 10+ messages in thread From: Keld Jørn Simonsen @ 2008-04-02 21:13 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID On Tue, Mar 25, 2008 at 01:22:03PM +0000, Peter Grandi wrote: > >>>> I have a 4 drive array with 1 TB Hitachi disks, formatted > >>>> as raid10,f2 I had some strange observations: 1. while > >>>> resyncing I could get the raid to give me about 320 MB/s in > >>>> sequential read, which was good. After resync had been > >>>> done, and with all 4 drives active, I only get 115 MB/s. > > [ ... ] > > >> Is this with, or without, your patch to avoid "read-balancing" > >> for raid10/far layouts? It sounds like it is without that > >> patch ???? > > > I tried both without the patch and with the patch, with almost > > same resulte. > > That could be the usual issue with apparent pauses in the stream > of IO requests to the array component devices, with the usual > workaround of trying 'blockdev --setra 65536 /dev/mdN' and see if > sequential reads improve. Yes, that did it! > > Is resync building some table, and could that be it? Or could > > it be some time of inode traffic? > > One good way to see what is actually happening is to use either > 'watch iostat -k 1 2' and look at the load on the individual MD > array component devices, or use 'sysctl vm/block_dump=1' and look > at the addresses being read or written. Good advice. I added your info to the wiki. best regards keld ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-04-02 21:13 ` Keld Jørn Simonsen @ 2008-04-03 20:20 ` Peter Grandi 2008-04-04 8:03 ` Keld Jørn Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Peter Grandi @ 2008-04-03 20:20 UTC (permalink / raw) To: Linux RAID >>> On Wed, 2 Apr 2008 23:13:15 +0200, Keld Jørn Simonsen >>> <keld@dkuug.dk> said: [ ... slow RAID reading ... ] >> That could be the usual issue with apparent pauses in the >> stream of IO requests to the array component devices, with >> the usual workaround of trying 'blockdev --setra 65536 >> /dev/mdN' and see if sequential reads improve. keld> Yes, that did it! But that's as usual very wrong. Such a large readhead has negative consequences, and most likely is the result of both some terrible misdesign in the Linux block IO subsystem (from some further experiments it is most likely related to "plugging") and integration of MD into it. However I have found that on relatively fast machines (I think) much lower values of read-ahead still give reasonable speed, with some values being much better than others. For example with another RAID10 I get pretty decent speed with a read-ahead of 128 on '/dev/md0' (but much worse with say 64 or 256). On others 1000 sectors read-ahead is good. The read-ahead needed also depends a bit on the file system type, don't trust tests done on the block device itself. So please experiment a bit to try and reduce it, at least until I find the time to figure out the (surely embarrasing) reason why it is needed and how to avoid it, or the Linux block IO and MD maintainers confess (they almost surely already know why) and/or fix it already. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-04-03 20:20 ` Peter Grandi @ 2008-04-04 8:03 ` Keld Jørn Simonsen 2008-04-05 17:31 ` Peter Grandi 0 siblings, 1 reply; 10+ messages in thread From: Keld Jørn Simonsen @ 2008-04-04 8:03 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID On Thu, Apr 03, 2008 at 09:20:37PM +0100, Peter Grandi wrote: > >>> On Wed, 2 Apr 2008 23:13:15 +0200, Keld Jørn Simonsen > >>> <keld@dkuug.dk> said: > > [ ... slow RAID reading ... ] > > >> That could be the usual issue with apparent pauses in the > >> stream of IO requests to the array component devices, with > >> the usual workaround of trying 'blockdev --setra 65536 > >> /dev/mdN' and see if sequential reads improve. > > keld> Yes, that did it! > > But that's as usual very wrong. Such a large readhead has > negative consequences, and most likely is the result of both > some terrible misdesign in the Linux block IO subsystem (from > some further experiments it is most likely related to "plugging") > and integration of MD into it. > > However I have found that on relatively fast machines (I think) > much lower values of read-ahead still give reasonable speed, > with some values being much better than others. For example with > another RAID10 I get pretty decent speed with a read-ahead of > 128 on '/dev/md0' (but much worse with say 64 or 256). On others > 1000 sectors read-ahead is good. > > The read-ahead needed also depends a bit on the file system > type, don't trust tests done on the block device itself. > > So please experiment a bit to try and reduce it, at least until > I find the time to figure out the (surely embarrasing) reason > why it is needed and how to avoid it, or the Linux block IO and > MD maintainers confess (they almost surely already know why) > and/or fix it already. I did experiment and I noted that a 16 MiB readahead was sufficient. And then I was wondering if this had negative consequences, eg on random reads. I then had a test with reading 1000 files concurrently, and Some strange things happened. Each drive was doing about 2000 transactions per second (tps). Why? I thought a drive could only do about 150 tps, given t5hat it is a 7200 rpm drive. What is tps measuring? Why is the fs not reading the chunk size for every IO operation? Best regards keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-04-04 8:03 ` Keld Jørn Simonsen @ 2008-04-05 17:31 ` Peter Grandi 2008-04-05 18:46 ` Keld Jørn Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Peter Grandi @ 2008-04-05 17:31 UTC (permalink / raw) To: Linux RAID >>> On Fri, 4 Apr 2008 10:03:59 +0200, Keld Jørn Simonsen >>> <keld@dkuug.dk> said: [ ... slow software RAID in sequential access ... ] > I did experiment and I noted that a 16 MiB readahead was > sufficient. That still sounds a bit high. > And then I was wondering if this had negative consequences, eg > on random reads. It surely has large negative consequences, but not necessarily on random reads. After all that depends when an operations completes, and I suspect that read-ahead is at least partially asynchronous, that is the read of a block completes when it gets to memory, not when the whole read-ahead is done. The problem is more likely to be increased memory contention when the system is busy, and even worse, increased disks arm contention. Read ahead not only loads memory with not-yet-needed blocks, it keeps the disk busier reading those not-yet-needed blocks. > I then had a test with reading 1000 files concurrently, and > Some strange things happened. Each drive was doing about 2000 > transactions per second (tps). Why? I thought a drive could > only do about 150 tps, given t5hat it is a 7200 rpm drive. RPM is not that related to transactions/s, however defined, perhaps arm movement time and locality of access are. > What is tps measuring? That's pretty mysterious to me. It could mean anything, and anyhow I I have become even more disillusioned about the whole Liux IO subsystem, which I now think to be as poorly misdesigned as the Linux VM subsystem. Just the idea of putting "plugging" at the block device level demonstrates the level of its developers (amazingly some recent tests I have done seem to show that at least in some cases it has no influence on performance either way). But then I was recently reading these wise words from a great old man of OS design: http://CSG.CSAIL.MIT.edu/Users/dennis/essay.htm "During the 1980s things changed. Computer Science Departments had proliferated throughout the universities to meet the demand, primarily for programmers and software engineers, and the faculty assembled to teach the subjects was expected to do meaningful research. To manage the burgeoning flood of conference papers, program committees adopted a new strategy for papers in computer architecture: No more wild ideas; papers had to present quantitative results. The effect was to create a style of graduate research in computer architecture that remains the "conventional wisdom" of the community to the present day: Make a small, innovative, change to a commercially accepted design and evaluate it using standard benchmark programs. This style has stifled the exploration and publication of interesting architectural ideas that require more than a modicum of change from current practice. The practice of basing evaluations on standard benchmark codes neglects the potential benefits of architectural concepts that need a change in programming methodology to demonstrate their full benefit." and around the same time I had a very depressing IRC conversation with a well known kernel developer about what I think to be some rather stupid aspects of the Linux VM susbsystem and he was quite unrepentant, saying that in some tests they were of benefit... > Why is the fs not reading the chunk size for every IO operation? Why should it? The goal is to keep the disk busy in the cheapest way. Keep the queue as long as you need to keep the disk busy (back-to-back operations) and no more. However if you are really asking why the MD subsystem needs read-ahead values hundreds or thousands of times larger than the underlying devices, counterproductively, that's something that I am trying to figure out in my not so abundant spare time. If anybody knows please let the rest of us know. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: performance problems with raid10,f2 2008-04-05 17:31 ` Peter Grandi @ 2008-04-05 18:46 ` Keld Jørn Simonsen 0 siblings, 0 replies; 10+ messages in thread From: Keld Jørn Simonsen @ 2008-04-05 18:46 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID On Sat, Apr 05, 2008 at 06:31:00PM +0100, Peter Grandi wrote: > >>> On Fri, 4 Apr 2008 10:03:59 +0200, Keld Jørn Simonsen > >>> <keld@dkuug.dk> said: > > [ ... slow software RAID in sequential access ... ] > > > I did experiment and I noted that a 16 MiB readahead was > > sufficient. > > That still sounds a bit high. Well.. it was ony 8 Mib... > > And then I was wondering if this had negative consequences, eg > > on random reads. > > It surely has large negative consequences, but not necessarily on > random reads. After all that depends when an operations completes, > and I suspect that read-ahead is at least partially asynchronous, > that is the read of a block completes when it gets to memory, not > when the whole read-ahead is done. The problem is more likely to be > increased memory contention when the system is busy, and even > worse, increased disks arm contention. Well, it looks like the bigger the chunk size, the better for random reading.. 1000 processes. I need to do some more tests. > Read ahead not only loads memory with not-yet-needed blocks, it > keeps the disk busier reading those not-yet-needed blocks. But they will be needed, given that most processes read files sequentially, which is my scenario. The trick is to keep the data in memory till they are needed. > > I then had a test with reading 1000 files concurrently, and > > Some strange things happened. Each drive was doing about 2000 > > transactions per second (tps). Why? I thought a drive could > > only do about 150 tps, given t5hat it is a 7200 rpm drive. > RPM is not that related to transactions/s, however defined, perhaps > arm movement time and locality of access are. RPM is also related. Actually quite related. > > What is tps measuring? > > That's pretty mysterious to me. It could mean anything, and anyhow > I I have become even more disillusioned about the whole Liux IO > subsystem, which I now think to be as poorly misdesigned as the > Linux VM subsystem. iostat -x actually gave a more plausible measurement. It has two measures, actually aggregated IO requests to disk, and IO requests made by programs. > > Why is the fs not reading the chunk size for every IO operation? > > Why should it? The goal is to keep the disk busy in the cheapest > way. Keep the queue as long as you need to keep the disk busy > (back-to-back operations) and no more. I would like the disk to produce as muce real data for processes as possible. With about 150 requests per second that would for 256 kiB chunks produce about 37 MB/s - but my system only gives me around 15 MB/s per disk. Some room for improvement. > However if you are really asking why the MD subsystem needs > read-ahead values hundreds or thousands of times larger than the > underlying devices, counterproductively, that's something that I am > trying to figure out in my not so abundant spare time. If anybody > knows please let the rest of us know. Yes, quite strange. Keld -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-04-05 18:46 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-14 23:11 performance problems with raid10,f2 Keld Jørn Simonsen 2008-03-20 17:28 ` Keld Jørn Simonsen 2008-03-25 5:13 ` Neil Brown 2008-03-25 10:36 ` Keld Jørn Simonsen 2008-03-25 13:22 ` Peter Grandi 2008-04-02 21:13 ` Keld Jørn Simonsen 2008-04-03 20:20 ` Peter Grandi 2008-04-04 8:03 ` Keld Jørn Simonsen 2008-04-05 17:31 ` Peter Grandi 2008-04-05 18:46 ` Keld Jørn Simonsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).