public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* agsize and performance
@ 2013-10-29 22:10 K T
  2013-10-30  9:59 ` Matthias Schniedermeyer
  0 siblings, 1 reply; 5+ messages in thread
From: K T @ 2013-10-29 22:10 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 5106 bytes --]

Hi,

I have a 1 TB SATA disk(WD1003FBYX) with XFS. In my tests, I preallocate a
bunch of 10GB files and write data to the files one at a time. I have
observed that the default mkfs setting(4 AGs) gives very low throughput.
When I reformat the disk with a agsize of 256mb(agcount=3726), I see better
throughput. I thought with a bigger agsize, the files will be made of fewer
extents and hence perform better(due to lesser entries in the extent map
getting updated). But, according to my tests, the opposite seems to be
true. Can you please explain why this the case? Am I missing something?

My test parameters:

mkfs.xfs -f /dev/sdbf1
mount  -o inode64 /dev/sdbf1 /mnt/test
fallocate -l 10G fname
dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync conv=notrunc seek=0

# uname -a
Linux gold 3.0.82-0.7-default #1 SMP Thu Jun 27 13:19:18 UTC 2013
(6efde93) x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3
# Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz


------- Tests with agsize of 256MB -----------
# mkfs.xfs -f /dev/sdbf1 -d agsize=256m
meta-data=/dev/sdbf1             isize=256    agcount=3726, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=244187136, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=65532, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

# mount  -o inode64 /dev/sdbf1 /mnt/test
# cd test
# ls
# fallocate -l 10g file.1
# xfs_bmap -p -v file.1 | wc -l
43
# dd if=/dev/zero of=file.1 bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 3.56155 s, 37.7 MB/s
(the first file write seems to be slow)

# fallocate -l 10g file.2
# dd if=/dev/zero of=file.2 bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 1.57496 s, 85.2 MB/s
# fallocate -l 10g file.3
# dd if=/dev/zero of=file.3 bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 1.56151 s, 86.0 MB/s

------- Tests with default mkfs parameters -----------
# cd ..
# umount test
# mkfs.xfs -f /dev/sdbf1
meta-data=/dev/sdbf1             isize=256    agcount=4, agsize=61047598 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=244190390, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=119233, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount  -o inode64 /dev/sdbf1 /mnt/test
# cd test

# fallocate -l 10g fle.1

# xfs_bmap -p -v file.1 | wc -l
3
# xfs_bmap -p -v file.1
file.1:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET           TOTAL FLAGS
   0: [0..20971519]:   96..20971615      0 (96..20971615)   20971520 10000

# dd if=/dev/zero of=file.1 bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 3.55862 s, 37.7 MB/s
# xfs_bmap -p -v file.1
file.1:
 EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET             TOTAL FLAGS
   0: [0..262143]:        96..262239        0 (96..262239)         262144 00000
   1: [262144..20971519]: 262240..20971615  0 (262240..20971615) 20709376 10000

# fallocate -l 10g file.2
# xfs_bmap -p -v file.2
file.2:
 EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET               TOTAL FLAGS
   0: [0..20971519]:   20971616..41943135  0 (20971616..41943135) 20971520 10000
# dd if=/dev/zero of=file.2 bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 3.56464 s, 37.7 MB/s
# xfs_bmap -p -v file.2
file.2:
 EXT: FILE-OFFSET         BLOCK-RANGE        AG AG-OFFSET
 TOTAL FLAGS
   0: [0..262143]:        20971616..21233759  0 (20971616..21233759)
262144 00000
   1: [262144..20971519]: 21233760..41943135  0 (21233760..41943135)
20709376 10000

# fallocate -l 10g file.3
# xfs_bmap -p -v file.3
file.3:
 EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET               TOTAL FLAGS
   0: [0..20971519]:   41943136..62914655  0 (41943136..62914655) 20971520 10000
# dd if=/dev/zero of=file.3 bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 3.55932 s, 37.7 MB/s
# xfs_bmap -p -v file.3
file.3:
 EXT: FILE-OFFSET         BLOCK-RANGE        AG AG-OFFSET
 TOTAL FLAGS
   0: [0..262143]:        41943136..42205279  0 (41943136..42205279)
262144 00000
   1: [262144..20971519]: 42205280..62914655  0 (42205280..62914655)
20709376 10000


Thanks,
Karthik

[-- Attachment #1.2: Type: text/html, Size: 5872 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: agsize and performance
  2013-10-29 22:10 agsize and performance K T
@ 2013-10-30  9:59 ` Matthias Schniedermeyer
  2013-10-30 14:46   ` K T
  0 siblings, 1 reply; 5+ messages in thread
From: Matthias Schniedermeyer @ 2013-10-30  9:59 UTC (permalink / raw)
  To: K T; +Cc: xfs

On 29.10.2013 18:10, K T wrote:
> Hi,
> 
> I have a 1 TB SATA disk(WD1003FBYX) with XFS. In my tests, I preallocate a
> bunch of 10GB files and write data to the files one at a time. I have
> observed that the default mkfs setting(4 AGs) gives very low throughput.
> When I reformat the disk with a agsize of 256mb(agcount=3726), I see better
> throughput. I thought with a bigger agsize, the files will be made of fewer
> extents and hence perform better(due to lesser entries in the extent map
> getting updated). But, according to my tests, the opposite seems to be
> true. Can you please explain why this the case? Am I missing something?
> 
> My test parameters:
> 
> mkfs.xfs -f /dev/sdbf1
> mount  -o inode64 /dev/sdbf1 /mnt/test
> fallocate -l 10G fname
> dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync conv=notrunc seek=0

I get the same bad performance with your dd statement.

fallocate -l 10G fname
time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync conv=notrunc seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 4,24088 s, 31,6 MB/s

After pondering the really hard to read dd-man-page.
Sync is for 'synchronized' I/O. aka REALLY BAD PERFORMANCE. And i assume 
you don't really that.

I think what you meant is fsync. (a.k.a. File (and Metadata) has hit 
stable-storage before dd exits).
That is: conv=fsync

So:
time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct conv=notrunc,fsync seek=0
64+0 records in
64+0 records out
134217728 bytes (134 MB) copied, 1,44088 s, 93,2 MB/s

That gets much better performance, and in my case it can't get any 
better because the HDD (and encryption) just can't go any faster.




-- 

Matthias

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: agsize and performance
  2013-10-30  9:59 ` Matthias Schniedermeyer
@ 2013-10-30 14:46   ` K T
  2013-10-30 15:27     ` Matthias Schniedermeyer
  2013-10-30 20:31     ` Stan Hoeppner
  0 siblings, 2 replies; 5+ messages in thread
From: K T @ 2013-10-30 14:46 UTC (permalink / raw)
  To: Matthias Schniedermeyer; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1993 bytes --]

I meant sync not fsync(O_SYNC flag).

My main question is why there is better throughput when I make the agsize
smaller?


On Wed, Oct 30, 2013 at 5:59 AM, Matthias Schniedermeyer <ms@citd.de> wrote:

> On 29.10.2013 18:10, K T wrote:
> > Hi,
> >
> > I have a 1 TB SATA disk(WD1003FBYX) with XFS. In my tests, I preallocate
> a
> > bunch of 10GB files and write data to the files one at a time. I have
> > observed that the default mkfs setting(4 AGs) gives very low throughput.
> > When I reformat the disk with a agsize of 256mb(agcount=3726), I see
> better
> > throughput. I thought with a bigger agsize, the files will be made of
> fewer
> > extents and hence perform better(due to lesser entries in the extent map
> > getting updated). But, according to my tests, the opposite seems to be
> > true. Can you please explain why this the case? Am I missing something?
> >
> > My test parameters:
> >
> > mkfs.xfs -f /dev/sdbf1
> > mount  -o inode64 /dev/sdbf1 /mnt/test
> > fallocate -l 10G fname
> > dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync conv=notrunc
> seek=0
>
> I get the same bad performance with your dd statement.
>
> fallocate -l 10G fname
> time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync
> conv=notrunc seek=0
> 64+0 records in
> 64+0 records out
> 134217728 bytes (134 MB) copied, 4,24088 s, 31,6 MB/s
>
> After pondering the really hard to read dd-man-page.
> Sync is for 'synchronized' I/O. aka REALLY BAD PERFORMANCE. And i assume
> you don't really that.
>
> I think what you meant is fsync. (a.k.a. File (and Metadata) has hit
> stable-storage before dd exits).
> That is: conv=fsync
>
> So:
> time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct
> conv=notrunc,fsync seek=0
> 64+0 records in
> 64+0 records out
> 134217728 bytes (134 MB) copied, 1,44088 s, 93,2 MB/s
>
> That gets much better performance, and in my case it can't get any
> better because the HDD (and encryption) just can't go any faster.
>
>
>
>
> --
>
> Matthias
>

[-- Attachment #1.2: Type: text/html, Size: 2650 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: agsize and performance
  2013-10-30 14:46   ` K T
@ 2013-10-30 15:27     ` Matthias Schniedermeyer
  2013-10-30 20:31     ` Stan Hoeppner
  1 sibling, 0 replies; 5+ messages in thread
From: Matthias Schniedermeyer @ 2013-10-30 15:27 UTC (permalink / raw)
  To: K T; +Cc: xfs

On 30.10.2013 10:46, K T wrote:
> I meant sync not fsync(O_SYNC flag).

What kind of workloads needs sync I/O?

> My main question is why there is better throughput when I make the agsize
> smaller?

Unfortunatly i can't help you here, i'm no expert in things XFS.
I thought you didn't really meant to use sync I/O.

> On Wed, Oct 30, 2013 at 5:59 AM, Matthias Schniedermeyer <ms@citd.de> wrote:
> 
> > On 29.10.2013 18:10, K T wrote:
> > > Hi,
> > >
> > > I have a 1 TB SATA disk(WD1003FBYX) with XFS. In my tests, I preallocate
> > a
> > > bunch of 10GB files and write data to the files one at a time. I have
> > > observed that the default mkfs setting(4 AGs) gives very low throughput.
> > > When I reformat the disk with a agsize of 256mb(agcount=3726), I see
> > better
> > > throughput. I thought with a bigger agsize, the files will be made of
> > fewer
> > > extents and hence perform better(due to lesser entries in the extent map
> > > getting updated). But, according to my tests, the opposite seems to be
> > > true. Can you please explain why this the case? Am I missing something?
> > >
> > > My test parameters:
> > >
> > > mkfs.xfs -f /dev/sdbf1
> > > mount  -o inode64 /dev/sdbf1 /mnt/test
> > > fallocate -l 10G fname
> > > dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync conv=notrunc
> > seek=0
> >
> > I get the same bad performance with your dd statement.
> >
> > fallocate -l 10G fname
> > time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync
> > conv=notrunc seek=0
> > 64+0 records in
> > 64+0 records out
> > 134217728 bytes (134 MB) copied, 4,24088 s, 31,6 MB/s
> >
> > After pondering the really hard to read dd-man-page.
> > Sync is for 'synchronized' I/O. aka REALLY BAD PERFORMANCE. And i assume
> > you don't really that.
> >
> > I think what you meant is fsync. (a.k.a. File (and Metadata) has hit
> > stable-storage before dd exits).
> > That is: conv=fsync
> >
> > So:
> > time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct
> > conv=notrunc,fsync seek=0
> > 64+0 records in
> > 64+0 records out
> > 134217728 bytes (134 MB) copied, 1,44088 s, 93,2 MB/s
> >
> > That gets much better performance, and in my case it can't get any
> > better because the HDD (and encryption) just can't go any faster.
> >
> >
> >
> >
> > --
> >
> > Matthias
> >

-- 

Matthias

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: agsize and performance
  2013-10-30 14:46   ` K T
  2013-10-30 15:27     ` Matthias Schniedermeyer
@ 2013-10-30 20:31     ` Stan Hoeppner
  1 sibling, 0 replies; 5+ messages in thread
From: Stan Hoeppner @ 2013-10-30 20:31 UTC (permalink / raw)
  To: K T, Matthias Schniedermeyer; +Cc: xfs

On 10/30/2013 9:46 AM, K T wrote:
> I meant sync not fsync(O_SYNC flag).
> 
> My main question is why there is better throughput when I make the agsize
> smaller?

The transfer rate of a spinning rust drive is greatest on the outer
cylinders and least on the inner cylinders.

A default mkfs.xfs with 4 AGs causes the first directory created to be
placed in AG0 on the outer cylinders.  The 4th dir will be placed in AG3
on the inner cylinders.  Thus writing to the 4th directory will be
significantly slower, 4x or more, than to the 1st directory.

With 3726 allocation groups, the first few hundred directories you
create will be on the outer cylinders of the drive and writes to these
will be about the same speed, and much greater than to AG3 in the
default case.

You omitted data showing which AGs your tests were writing to in each
case, as best I could tell.  But given the above, it's most likely that
this is the cause of the behavior you are seeing.



> On Wed, Oct 30, 2013 at 5:59 AM, Matthias Schniedermeyer <ms@citd.de> wrote:
> 
>> On 29.10.2013 18:10, K T wrote:
>>> Hi,
>>>
>>> I have a 1 TB SATA disk(WD1003FBYX) with XFS. In my tests, I preallocate
>> a
>>> bunch of 10GB files and write data to the files one at a time. I have
>>> observed that the default mkfs setting(4 AGs) gives very low throughput.
>>> When I reformat the disk with a agsize of 256mb(agcount=3726), I see
>> better
>>> throughput. I thought with a bigger agsize, the files will be made of
>> fewer
>>> extents and hence perform better(due to lesser entries in the extent map
>>> getting updated). But, according to my tests, the opposite seems to be
>>> true. Can you please explain why this the case? Am I missing something?
>>>
>>> My test parameters:
>>>
>>> mkfs.xfs -f /dev/sdbf1
>>> mount  -o inode64 /dev/sdbf1 /mnt/test
>>> fallocate -l 10G fname
>>> dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync conv=notrunc
>> seek=0
>>
>> I get the same bad performance with your dd statement.
>>
>> fallocate -l 10G fname
>> time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct,sync
>> conv=notrunc seek=0
>> 64+0 records in
>> 64+0 records out
>> 134217728 bytes (134 MB) copied, 4,24088 s, 31,6 MB/s
>>
>> After pondering the really hard to read dd-man-page.
>> Sync is for 'synchronized' I/O. aka REALLY BAD PERFORMANCE. And i assume
>> you don't really that.
>>
>> I think what you meant is fsync. (a.k.a. File (and Metadata) has hit
>> stable-storage before dd exits).
>> That is: conv=fsync
>>
>> So:
>> time dd if=/dev/zero of=fname bs=2M count=64 oflag=direct
>> conv=notrunc,fsync seek=0
>> 64+0 records in
>> 64+0 records out
>> 134217728 bytes (134 MB) copied, 1,44088 s, 93,2 MB/s
>>
>> That gets much better performance, and in my case it can't get any
>> better because the HDD (and encryption) just can't go any faster.
>>
>>
>>
>>
>> --
>>
>> Matthias
>>
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-30 20:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-29 22:10 agsize and performance K T
2013-10-30  9:59 ` Matthias Schniedermeyer
2013-10-30 14:46   ` K T
2013-10-30 15:27     ` Matthias Schniedermeyer
2013-10-30 20:31     ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox