public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* sunit not working
@ 2007-06-11 23:55 Salmon, Rene
  2007-06-12  0:34 ` Nathan Scott
  0 siblings, 1 reply; 8+ messages in thread
From: Salmon, Rene @ 2007-06-11 23:55 UTC (permalink / raw)
  To: xfs

Hi list,

I have a HW raid 5 array with chunck size 256KB and stripe size 2560KB.
Basically a 10+1+1 array. 10 drives, one parity, one spare.

I am trying to create an xfs file system with the appropriate sunit and
width.



# mkfs.xfs -d sunit=512,swidth=5120 -f /dev/mapper/mpath9 
meta-data=/dev/mapper/mpath9     isize=256    agcount=32,
agsize=56652352 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=1812874752,
imaxpct=25
         =                       sunit=64     swidth=640 blks,
unwritten=1
naming   =version 2              bsize=4096  
log      =internal log           bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0



As you can see the sunit gets set to 64 upon creation and not 512 like I
asked.  Also if it try to give it some mount options it does the same
thing.

sgi210a:~ # mount -o sunit=512,swidth=5120 /dev/mapper/mpath9 /mnt/   
sgi210a:~ # xfs_info /mnt/
meta-data=/dev/mapper/mpath9     isize=256    agcount=32,
agsize=56652352 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=1812874752,
imaxpct=25
         =                       sunit=64     swidth=640 blks,
unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0
sgi210a:~ # 




Any ideas?

Last I tried to subscribe to the list by sending email to
ecartis@oss.sgi.com a couple of times but was unsuccessful should I send
email elsewhere to subscribe?



thank you 
Rene

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sunit not working
  2007-06-11 23:55 sunit not working Salmon, Rene
@ 2007-06-12  0:34 ` Nathan Scott
  2007-06-12 13:12   ` salmr0
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Scott @ 2007-06-12  0:34 UTC (permalink / raw)
  To: Salmon, Rene; +Cc: xfs

On Mon, 2007-06-11 at 18:55 -0500, Salmon, Rene wrote:
> As you can see the sunit gets set to 64 upon creation and not 512 like I
> asked.  Also if it try to give it some mount options it does the same
> thing.
> 
> sgi210a:~ # mount -o sunit=512,swidth=5120 /dev/mapper/mpath9 /mnt/

Its being reported in units of filesystem blocks, and its specified
in 512 byte units.  Pretty dopey, but thats why its different.

> sgi210a:~ # xfs_info /mnt/
> meta-data=/dev/mapper/mpath9     isize=256    agcount=32,
> agsize=56652352 blks
>          =                       sectsz=512   attr=0
> data     =                       bsize=4096   blocks=1812874752,
> imaxpct=25
>          =                       sunit=64     swidth=640 blks,
> unwritten=1
> naming   =version 2              bsize=4096  
> log      =internal               bsize=4096   blocks=32768, version=1
>          =                       sectsz=512   sunit=0 blks
> realtime =none                   extsz=65536  blocks=0, rtextents=0
> sgi210a:~ # 

$ gdb -q
(gdb) p 512 * 512
$1 = 262144
(gdb) p 64 * 4096
$2 = 262144
(gdb)

(thats 262144 bytes, of course)

> Last I tried to subscribe to the list by sending email to
> ecartis@oss.sgi.com a couple of times but was unsuccessful should I send
> email elsewhere to subscribe?

Its a frikkin' lottery. :)  Keep trying and keep whining is how I ended
up getting back on (whining on IRC on #xfs helps too).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sunit not working
  2007-06-12  0:34 ` Nathan Scott
@ 2007-06-12 13:12   ` salmr0
  2007-06-12 23:21     ` Nathan Scott
  2007-06-12 23:28     ` David Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: salmr0 @ 2007-06-12 13:12 UTC (permalink / raw)
  To: nscott, Salmon, Rene; +Cc: xfs


Thanks that helps.  Now that I know I have the right sunit and swidth I have a performace related question.

If I do a dd on the raw device or to the lun directy I get speeds of around 190-200 MBytes/sec.

As soon as I add xfs on top of the lun my speeds go to around 150 MBytes/sec. This is for a single stream write using various block sizes on a 2 Gbit/sec fiber channel card.

Is this overhead more or less what you would expect from xfs? Or is there some tunning I need to do?

Thanks
Rene




Rene

-----Original Message-----
From: Nathan Scott <nscott@aconex.com>

Date: Tue, 12 Jun 2007 10:34:04 
To:"Salmon, Rene" <Rene.Salmon@bp.com>
Cc:xfs@oss.sgi.com
Subject: Re: sunit not working


On Mon, 2007-06-11 at 18:55 -0500, Salmon, Rene wrote:
> As you can see the sunit gets set to 64 upon creation and not 512 like I
> asked.  Also if it try to give it some mount options it does the same
> thing.
> 
> sgi210a:~ # mount -o sunit=512,swidth=5120 /dev/mapper/mpath9 /mnt/

Its being reported in units of filesystem blocks, and its specified
in 512 byte units.  Pretty dopey, but thats why its different.

> sgi210a:~ # xfs_info /mnt/
> meta-data=/dev/mapper/mpath9     isize=256    agcount=32,
> agsize=56652352 blks
>          =                       sectsz=512   attr=0
> data     =                       bsize=4096   blocks=1812874752,
> imaxpct=25
>          =                       sunit=64     swidth=640 blks,
> unwritten=1
> naming   =version 2              bsize=4096  
> log      =internal               bsize=4096   blocks=32768, version=1
>          =                       sectsz=512   sunit=0 blks
> realtime =none                   extsz=65536  blocks=0, rtextents=0
> sgi210a:~ # 

$ gdb -q
(gdb) p 512 * 512
$1 = 262144
(gdb) p 64 * 4096
$2 = 262144
(gdb)

(thats 262144 bytes, of course)

> Last I tried to subscribe to the list by sending email to
> ecartis@oss.sgi.com a couple of times but was unsuccessful should I send
> email elsewhere to subscribe?

Its a frikkin' lottery. :)  Keep trying and keep whining is how I ended
up getting back on (whining on IRC on #xfs helps too).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sunit not working
  2007-06-12 13:12   ` salmr0
@ 2007-06-12 23:21     ` Nathan Scott
  2007-06-13 18:46       ` Salmon, Rene
  2007-06-12 23:28     ` David Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Nathan Scott @ 2007-06-12 23:21 UTC (permalink / raw)
  To: salmr0; +Cc: Salmon, Rene, xfs

On Tue, 2007-06-12 at 13:12 +0000, salmr0@bp.com wrote:
> 
> 
> 
> 
> 
> Thanks that helps.  Now that I know I have the right sunit and swidth
> I have a performace related question.
> 
> If I do a dd on the raw device or to the lun directy I get speeds of
> around 190-200 MBytes/sec.
> 
> As soon as I add xfs on top of the lun my speeds go to around 150
> MBytes/sec. This is for a single stream write using various block
> sizes on a 2 Gbit/sec fiber channel card.
> 

Reads or writes?
What are your I/O sizes?
Buffered or direct IO?
Including fsync time in there or not?  etc, etc.

(Actual dd commands used and their output results would be best)
xfs_io is pretty good for this kind of analysis, as it gives very
fine grained control of operations performed, has integrated bmap
command, etc - use the -F flag for the raw device comparisons).

> Is this overhead more or less what you would expect from xfs? Or is
> there some tunning I need to do? 

You should be able to get very close to raw device speeds esp. for a
single stream reader/writer, with some tuning.

cheers.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sunit not working
  2007-06-12 13:12   ` salmr0
  2007-06-12 23:21     ` Nathan Scott
@ 2007-06-12 23:28     ` David Chinner
  1 sibling, 0 replies; 8+ messages in thread
From: David Chinner @ 2007-06-12 23:28 UTC (permalink / raw)
  To: salmr0; +Cc: nscott, Salmon, Rene, xfs

On Tue, Jun 12, 2007 at 01:12:15PM +0000, salmr0@bp.com wrote:
> 
> Thanks that helps.  Now that I know I have the right sunit and swidth I have
> a performace related question.
> 
> If I do a dd on the raw device or to the lun directy I get speeds of around
> 190-200 MBytes/sec.
> 
> As soon as I add xfs on top of the lun my speeds go to around 150
> MBytes/sec. This is for a single stream write using various block sizes on a
> 2 Gbit/sec fiber channel card.

That's for buffered I/O, right? That sounds about right - if you do two
writes, it should increase a little further. Also, direct I/O should be able
to get you to >90% of the raw device capability....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sunit not working
  2007-06-12 23:21     ` Nathan Scott
@ 2007-06-13 18:46       ` Salmon, Rene
  2007-06-13 19:03         ` Sebastian Brings
  2007-06-13 22:31         ` David Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Salmon, Rene @ 2007-06-13 18:46 UTC (permalink / raw)
  To: nscott, David Chinner; +Cc: salmr0, xfs


Hi,

More details on this:

Using dd with various block sizes to measure write performance only for
now.

This is using two options to dd. The direct I/O option for direct i/o
and the fsync option for buffered i/o.

Using direct:
/usr/bin/time -p dd of=/mnt/testfile if=/dev/zero oflag=direct 

Using fsync:
/usr/bin/time -p dd of=/mnt/testfile if=/dev/zero conv=fsync

Using a 2Gbit/sec fiber channel card my theoretical max is 256
MBytes/sec.  If we allow a bit of overhead for the card driver and
things the manufacturer claims the card should be able to max out at
around 200 MBytes/sec.

The block sizes I used range from 128KBytes - 1024000Kbytes and all the
writes generate a 1.0GB file.

Some of the results I got:

Buffered I/O(fsync):
--------------------
Linux seems to do a good job at buffering this. Regardless of the block
size I choose I always get write speeds of around 150MBytes/sec

Direct I/O(direct):
-------------------
The speeds I get here of course are very dependent on the block size I
choose and how well they align with the stripe size of the storage array
underneath. For the appropriate block sizes I get really good
performance about 200MBytes/sec.   


>From your feedback is sounds like these are reasonable numbers.
Most of our user apps do not use direct I/O but rather buffered I/O.  Is
150MBytes/sec as good as it gets for buffered I/O or is there something
I can tune to get a bit more out of buffered I/O?

Thanks 
Rene




> > 
> > Thanks that helps.  Now that I know I have the right sunit and swidth
> > I have a performace related question.
> > 
> > If I do a dd on the raw device or to the lun directy I get speeds of
> > around 190-200 MBytes/sec.
> > 
> > As soon as I add xfs on top of the lun my speeds go to around 150
> > MBytes/sec. This is for a single stream write using various block
> > sizes on a 2 Gbit/sec fiber channel card.
> > 
> 
> Reads or writes?
> What are your I/O sizes?
> Buffered or direct IO?
> Including fsync time in there or not?  etc, etc.
> 
> (Actual dd commands used and their output results would be best)
> xfs_io is pretty good for this kind of analysis, as it gives very
> fine grained control of operations performed, has integrated bmap
> command, etc - use the -F flag for the raw device comparisons).
> 
> > Is this overhead more or less what you would expect from xfs? Or is
> > there some tunning I need to do? 
> 
> You should be able to get very close to raw device speeds esp. for a
> single stream reader/writer, with some tuning.
> 
> cheers.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: sunit not working
  2007-06-13 18:46       ` Salmon, Rene
@ 2007-06-13 19:03         ` Sebastian Brings
  2007-06-13 22:31         ` David Chinner
  1 sibling, 0 replies; 8+ messages in thread
From: Sebastian Brings @ 2007-06-13 19:03 UTC (permalink / raw)
  To: Salmon, Rene; +Cc: xfs

Is 1 GB a reasonable filesize in your environment? And also most user
apps don't use fsync, but maybe I missed somehting. Not knowing your
storage vendor the numbers look pretty good to me, but the way you
tested this is close to a benchmarking environment. 

Cheers

Sebastian

> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf
Of Salmon, Rene
> Sent: Mittwoch, 13. Juni 2007 20:46
> To: nscott@aconex.com; David Chinner
> Cc: salmr0@bp.com; xfs@oss.sgi.com
> Subject: Re: sunit not working
> 
> 
> Hi,
> 
> More details on this:
> 
> Using dd with various block sizes to measure write performance only
for
> now.
> 
> This is using two options to dd. The direct I/O option for direct i/o
> and the fsync option for buffered i/o.
> 
> Using direct:
> /usr/bin/time -p dd of=/mnt/testfile if=/dev/zero oflag=direct
> 
> Using fsync:
> /usr/bin/time -p dd of=/mnt/testfile if=/dev/zero conv=fsync
> 
> Using a 2Gbit/sec fiber channel card my theoretical max is 256
> MBytes/sec.  If we allow a bit of overhead for the card driver and
> things the manufacturer claims the card should be able to max out at
> around 200 MBytes/sec.
> 
> The block sizes I used range from 128KBytes - 1024000Kbytes and all
the
> writes generate a 1.0GB file.
> 
> Some of the results I got:
> 
> Buffered I/O(fsync):
> --------------------
> Linux seems to do a good job at buffering this. Regardless of the
block
> size I choose I always get write speeds of around 150MBytes/sec
> 
> Direct I/O(direct):
> -------------------
> The speeds I get here of course are very dependent on the block size I
> choose and how well they align with the stripe size of the storage
array
> underneath. For the appropriate block sizes I get really good
> performance about 200MBytes/sec.
> 
> 
> >From your feedback is sounds like these are reasonable numbers.
> Most of our user apps do not use direct I/O but rather buffered I/O.
Is
> 150MBytes/sec as good as it gets for buffered I/O or is there
something
> I can tune to get a bit more out of buffered I/O?
> 
> Thanks
> Rene
> 
> 
> 
> 
> > >
> > > Thanks that helps.  Now that I know I have the right sunit and
swidth
> > > I have a performace related question.
> > >
> > > If I do a dd on the raw device or to the lun directy I get speeds
of
> > > around 190-200 MBytes/sec.
> > >
> > > As soon as I add xfs on top of the lun my speeds go to around 150
> > > MBytes/sec. This is for a single stream write using various block
> > > sizes on a 2 Gbit/sec fiber channel card.
> > >
> >
> > Reads or writes?
> > What are your I/O sizes?
> > Buffered or direct IO?
> > Including fsync time in there or not?  etc, etc.
> >
> > (Actual dd commands used and their output results would be best)
> > xfs_io is pretty good for this kind of analysis, as it gives very
> > fine grained control of operations performed, has integrated bmap
> > command, etc - use the -F flag for the raw device comparisons).
> >
> > > Is this overhead more or less what you would expect from xfs? Or
is
> > > there some tunning I need to do?
> >
> > You should be able to get very close to raw device speeds esp. for a
> > single stream reader/writer, with some tuning.
> >
> > cheers.
> >
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sunit not working
  2007-06-13 18:46       ` Salmon, Rene
  2007-06-13 19:03         ` Sebastian Brings
@ 2007-06-13 22:31         ` David Chinner
  1 sibling, 0 replies; 8+ messages in thread
From: David Chinner @ 2007-06-13 22:31 UTC (permalink / raw)
  To: Salmon, Rene; +Cc: nscott, David Chinner, salmr0, xfs

On Wed, Jun 13, 2007 at 01:46:20PM -0500, Salmon, Rene wrote:
> 
> Hi,
> 
> More details on this:
> 
> Using dd with various block sizes to measure write performance only for
> now.
> 
> This is using two options to dd. The direct I/O option for direct i/o
> and the fsync option for buffered i/o.
> 
> Using direct:
> /usr/bin/time -p dd of=/mnt/testfile if=/dev/zero oflag=direct 
> 
> Using fsync:
> /usr/bin/time -p dd of=/mnt/testfile if=/dev/zero conv=fsync
> 
> Using a 2Gbit/sec fiber channel card my theoretical max is 256
> MBytes/sec.  If we allow a bit of overhead for the card driver and
> things the manufacturer claims the card should be able to max out at
> around 200 MBytes/sec.

Right.

> The block sizes I used range from 128KBytes - 1024000Kbytes and all the
> writes generate a 1.0GB file.
> 
> Some of the results I got:
> 
> Buffered I/O(fsync):
> --------------------
> Linux seems to do a good job at buffering this. Regardless of the block
> size I choose I always get write speeds of around 150MBytes/sec

Because it does single threaded writeback via pdflush. It should always
get the same throughput.

If you wind /proc/sys/vm/dirty_ratio down to 5, it might go a bit
faster because writeback will start earlier in the write and so the
fsync will have less to do and overall speed will appear faster.

What you should be looking at it iostat throughput in the steady state,
not inferring the throughput from timing a write operation.....

> Direct I/O(direct):
> -------------------
> The speeds I get here of course are very dependent on the block size I
> choose and how well they align with the stripe size of the storage array
> underneath. For the appropriate block sizes I get really good
> performance about 200MBytes/sec.   

Also normal, because you're iop bound at small block sizes. At large
block sizes, you saturate the fibre. Sounds like nothing is wrong
here.

> >From your feedback is sounds like these are reasonable numbers.
> Most of our user apps do not use direct I/O but rather buffered I/O.  Is
> 150MBytes/sec as good as it gets for buffered I/O or is there something
> I can tune to get a bit more out of buffered I/O?

That's about it, I think. With tweaking any tuning of the vm parameters,
you might be able to get it higher, but it may be that writeback (when it
occurs) is actually higher than this....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-06-13 22:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-11 23:55 sunit not working Salmon, Rene
2007-06-12  0:34 ` Nathan Scott
2007-06-12 13:12   ` salmr0
2007-06-12 23:21     ` Nathan Scott
2007-06-13 18:46       ` Salmon, Rene
2007-06-13 19:03         ` Sebastian Brings
2007-06-13 22:31         ` David Chinner
2007-06-12 23:28     ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox