* Large numbers of devices...
@ 2009-08-17 17:33 Alan D. Brunelle
2009-08-19 9:03 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Alan D. Brunelle @ 2009-08-17 17:33 UTC (permalink / raw)
To: fio
Before I start diving into the code, has anybody else out there had
problems with 'fio' not being able to scale well with large numbers of
devices (files) being used? I have a system w/ 32-cpus, 256GB RAM, plus
11 dual-ported FC HBAs connected to 44 HP MSA1000 FC controllers. (the
44 MSAs are spread out 4 per FC HBA). I'd like to use 'fio' to gather &
produce scaling results, but I seem to run into inconsistencies once I
get above using 26 or 27 of the 44 MSAs. I have noticed similar things
in the past, but it hasn't been so bothersome. I have a locally crafted
tool (much more limited than 'fio') called 'aiod' that /is/ able to
scale up past 35 or 36 of the MSAs doing what I /believe/ is something
similar. [Once past 35 or 36 devices we run into system issues which
reduce scaling opportunities.]
In any event, an example fio job-file can be found at:
http://free.linux.hp.com/~adb/2009-08-17/044_disk_1_parts.txt
The graph showing the noise in the graph for fio can be found at:
http://free.linux.hp.com/~adb/2009-08-17/fio.png
And the "better" graph w/ aiod can be found at:
http://free.linux.hp.com/~adb/2009-08-17/aiod.png
The test uses between 1 and 4 partitions per LUN exported by each MSA
(each LUN is crafted from 4 physical devices striped together.) You'll
see in the latter graph the continued scaling up through almost 37
devices, and much tighter results after that (even with the tail-off at
the end above 40 devices.)
Anyways, if there is something I'm missing in the fio job-file to help
it scale better let me know, otherwise I'll go through the aiod code to
see if there were any applicable scaling improvements there than can be
applied to fio...
Alan D. Brunelle
Hewlett-Packard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Large numbers of devices...
2009-08-17 17:33 Large numbers of devices Alan D. Brunelle
@ 2009-08-19 9:03 ` Jens Axboe
2009-08-19 12:47 ` Alan D. Brunelle
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2009-08-19 9:03 UTC (permalink / raw)
To: Alan D. Brunelle; +Cc: fio
On Mon, Aug 17 2009, Alan D. Brunelle wrote:
> Before I start diving into the code, has anybody else out there had
> problems with 'fio' not being able to scale well with large numbers of
> devices (files) being used? I have a system w/ 32-cpus, 256GB RAM, plus
> 11 dual-ported FC HBAs connected to 44 HP MSA1000 FC controllers. (the
> 44 MSAs are spread out 4 per FC HBA). I'd like to use 'fio' to gather &
> produce scaling results, but I seem to run into inconsistencies once I
> get above using 26 or 27 of the 44 MSAs. I have noticed similar things
> in the past, but it hasn't been so bothersome. I have a locally crafted
> tool (much more limited than 'fio') called 'aiod' that /is/ able to
> scale up past 35 or 36 of the MSAs doing what I /believe/ is something
> similar. [Once past 35 or 36 devices we run into system issues which
> reduce scaling opportunities.]
>
> In any event, an example fio job-file can be found at:
>
> http://free.linux.hp.com/~adb/2009-08-17/044_disk_1_parts.txt
>
> The graph showing the noise in the graph for fio can be found at:
>
> http://free.linux.hp.com/~adb/2009-08-17/fio.png
>
> And the "better" graph w/ aiod can be found at:
>
> http://free.linux.hp.com/~adb/2009-08-17/aiod.png
>
> The test uses between 1 and 4 partitions per LUN exported by each MSA
> (each LUN is crafted from 4 physical devices striped together.) You'll
> see in the latter graph the continued scaling up through almost 37
> devices, and much tighter results after that (even with the tail-off at
> the end above 40 devices.)
>
> Anyways, if there is something I'm missing in the fio job-file to help
> it scale better let me know, otherwise I'll go through the aiod code to
> see if there were any applicable scaling improvements there than can be
> applied to fio...
Alan, I haven't run with that many files, so it's indeed possible that
there's an inherent limitiation in the file selection in fio. It's not
my best code, that part... If you have the time and inclination to jump
in there and find out what is going on, then that would be great!
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Large numbers of devices...
2009-08-19 9:03 ` Jens Axboe
@ 2009-08-19 12:47 ` Alan D. Brunelle
2009-08-19 12:51 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Alan D. Brunelle @ 2009-08-19 12:47 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
On Wed, 2009-08-19 at 11:03 +0200, Jens Axboe wrote:
>
> Alan, I haven't run with that many files, so it's indeed possible that
> there's an inherent limitiation in the file selection in fio. It's not
> my best code, that part... If you have the time and inclination to jump
> in there and find out what is going on, then that would be great!
>
OK, will do - I just wanted to check that the fio job-file looked OK.
Alan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Large numbers of devices...
2009-08-19 12:47 ` Alan D. Brunelle
@ 2009-08-19 12:51 ` Jens Axboe
2009-08-19 22:44 ` Alan D. Brunelle
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2009-08-19 12:51 UTC (permalink / raw)
To: Alan D. Brunelle; +Cc: fio
On Wed, Aug 19 2009, Alan D. Brunelle wrote:
> On Wed, 2009-08-19 at 11:03 +0200, Jens Axboe wrote:
>
> >
> > Alan, I haven't run with that many files, so it's indeed possible that
> > there's an inherent limitiation in the file selection in fio. It's not
> > my best code, that part... If you have the time and inclination to jump
> > in there and find out what is going on, then that would be great!
> >
>
> OK, will do - I just wanted to check that the fio job-file looked OK.
It looks fine, in fact if that is your setup, then you are not using the
multi file stuff. Your job files basically creates a process per device,
which should work fine as-is.
You can try and play with the iodepth batching control, iodepth_batch
and iodepth_batch_complete. They both default to 1, meaning that it'll
submit and complete 1 command at the time.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Large numbers of devices...
2009-08-19 12:51 ` Jens Axboe
@ 2009-08-19 22:44 ` Alan D. Brunelle
0 siblings, 0 replies; 5+ messages in thread
From: Alan D. Brunelle @ 2009-08-19 22:44 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
On Wed, 2009-08-19 at 14:51 +0200, Jens Axboe wrote:
> On Wed, Aug 19 2009, Alan D. Brunelle wrote:
> > On Wed, 2009-08-19 at 11:03 +0200, Jens Axboe wrote:
> >
> > >
> > > Alan, I haven't run with that many files, so it's indeed possible that
> > > there's an inherent limitiation in the file selection in fio. It's not
> > > my best code, that part... If you have the time and inclination to jump
> > > in there and find out what is going on, then that would be great!
> > >
> >
> > OK, will do - I just wanted to check that the fio job-file looked OK.
>
> It looks fine, in fact if that is your setup, then you are not using the
> multi file stuff. Your job files basically creates a process per device,
> which should work fine as-is.
>
> You can try and play with the iodepth batching control, iodepth_batch
> and iodepth_batch_complete. They both default to 1, meaning that it'll
> submit and complete 1 command at the time.
>
Setting iodepth_batch_complete to 25 (having set iodepth to 128, as
before) seems to have helped quite a bit - I just did the second half of
the test (from about 22 MSAs to 44) and the line looks much smoother to
about 36 or 37 devices and has very similar features as 'aiod' was
exhibiting around 40 devices (drop off as expected).
With aiod we default to using about 20% of the "depth" for the min value
(iodepth_batch_complete in fio), and that seems to work quite well. In
any event, I think you're "not best code" is looking pretty good! :-)
The only other thing I saw in the code that was strange was your timeout
for the io_getevents call - it looks to be set to 0.0 seconds when
iodepth_batch_complete is set to 0 (default being 1), is this what you
really want? In aiod we set it to 10,000,000 nanoseconds (10
milliseconds), and we use that even if there is a min.
The new tail graph can be found at :
http://free.linux.hp.com/~adb/2009-08-17/fio-bcom=25.png
It is a little bit more noisy after 38 or 39 disks than the aiod graph -
http://free.linux.hp.com/~adb/2009-08-17/aiod.png
but nothing earth-shattering. (And I know that we're overloading the
hyperlinks at that state, so it's going to be a bumpy ride no matter
what.)
Thanks!
Alan
PS. This was run on your loop-direct branch'ed OS, so the good news is
that it's handling a lot of traffic OK. So far it's been running quite
solid.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-08-19 22:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-17 17:33 Large numbers of devices Alan D. Brunelle
2009-08-19 9:03 ` Jens Axboe
2009-08-19 12:47 ` Alan D. Brunelle
2009-08-19 12:51 ` Jens Axboe
2009-08-19 22:44 ` Alan D. Brunelle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox