public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS buffer IO performance is very poor
@ 2015-02-11  7:39 yy
  2015-02-11 13:35 ` Brian Foster
  2015-02-11 16:08 ` Eric Sandeen
  0 siblings, 2 replies; 8+ messages in thread
From: yy @ 2015-02-11  7:39 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 3865 bytes --]

Hi,
I am run some test with fio on XFS, and I found that buffer IO is very poor. These are some result:


read(iops) write(iops)
direct IO + ext3 1848  1232
buffer IO + ext3 1976 1319
direct IO + XFS 1954 1304
buffer IO + XFS 307 203


I do not understand why such a big difference?ext3 is much better.
direct IO parameters:
fio --filename=/data1/fio.dat —direct=1 --thread --rw=randrw --rwmixread=60 --ioengine=libaio --runtime=300 --iodepth=1 --size=40G --numjobs=32 -name=test_rw --group_reporting --bs=16k —time_base


buffer IO parametes:
fio --filename=/data1/fio.dat --direct=0 --thread --rw=randrw --rwmixread=60 --ioengine=libaio --runtime=300 --iodepth=1 --size=40G --numjobs=32 -name=test_rw --group_reporting --bs=16k —time_base


the system I've used for my tests:
HW server: 4 cores (Intel), 32GB RAM, running RHEL 6.5
Kernel:2.6.32-431.el6.x86_64
storage: 10disks RAID1+0, stripe size: 256KB


XFS format parametes:
#mkfs.xfs -d su=256k,sw=5 /dev/sdb1
#cat /proc/mounts
/dev/sdb1 /data1 xfs rw,noatime,attr2,delaylog,nobarrier,logbsize=256k,sunit=512,swidth=2560,noquota 0 0
#fdisk -ul
Device Boot   Start     End   Blocks  Id System
/dev/sdb1       128 2929356359 1464678116  83 Linux




# fio --filename=/data1/fio.dat --direct=0 --thread --rw=randrw --rwmixread=60 --ioengine=libaio --runtime=300 --iodepth=1 --size=40G --numjobs=32 -name=test_rw --group_reporting --bs=16k --time_base
test_rw: (g=0): rw=randrw, bs=16K-16K/16K-16K/16K-16K, ioengine=libaio, iodepth=1
...
test_rw: (g=0): rw=randrw, bs=16K-16K/16K-16K/16K-16K, ioengine=libaio, iodepth=1
fio-2.0.13
Starting 32 threads
Jobs: 32 (f=32): [mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] [100.0% done] [5466K/3644K/0K /s] [341 /227 /0 iops] [eta 00m:00s]
test_rw: (groupid=0, jobs=32): err= 0: pid=5711: Wed Feb 11 15:26:30 2015
 read : io=1442.2MB, bw=4922.3KB/s, iops=307 , runt=300010msec
  slat (usec): min=7 , max=125345 , avg=5765.52, stdev=3741.61
  clat (usec): min=0 , max=192 , avg= 2.72, stdev= 1.12
  lat (usec): min=7 , max=125348 , avg=5770.09, stdev=3741.68
  clat percentiles (usec):
  | 1.00th=[  1], 5.00th=[  2], 10.00th=[  2], 20.00th=[  2],
  | 30.00th=[  2], 40.00th=[  3], 50.00th=[  3], 60.00th=[  3],
  | 70.00th=[  3], 80.00th=[  3], 90.00th=[  3], 95.00th=[  4],
  | 99.00th=[  4], 99.50th=[  4], 99.90th=[  14], 99.95th=[  16],
  | 99.99th=[  20]
  bw (KB/s) : min=  16, max= 699, per=3.22%, avg=158.37, stdev=85.79
 write: io=978736KB, bw=3262.4KB/s, iops=203 , runt=300010msec
  slat (usec): min=10 , max=577043 , avg=148215.93, stdev=125650.40
  clat (usec): min=0 , max=198 , avg= 2.50, stdev= 1.26
  lat (usec): min=11 , max=577048 , avg=148220.20, stdev=125650.94
  clat percentiles (usec):
  | 1.00th=[  1], 5.00th=[  1], 10.00th=[  1], 20.00th=[  2],
  | 30.00th=[  2], 40.00th=[  2], 50.00th=[  3], 60.00th=[  3],
  | 70.00th=[  3], 80.00th=[  3], 90.00th=[  3], 95.00th=[  3],
  | 99.00th=[  4], 99.50th=[  6], 99.90th=[  14], 99.95th=[  14],
  | 99.99th=[  17]
  bw (KB/s) : min=  25, max= 448, per=3.17%, avg=103.28, stdev=46.76
  lat (usec) : 2=6.40%, 4=88.39%, 10=4.93%, 20=0.27%, 50=0.01%
  lat (usec) : 100=0.01%, 250=0.01%
 cpu     : usr=0.00%, sys=0.13%, ctx=238853, majf=18446744073709551520, minf=18446744073709278371
 IO depths  : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%
  submit  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
  complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0%
  issued  : total=r=92296/w=61171/d=0, short=r=0/w=0/d=0


Run status group 0 (all jobs):
 READ: io=1442.2MB, aggrb=4922KB/s, minb=4922KB/s, maxb=4922KB/s, mint=300010msec, maxt=300010msec
 WRITE: io=978736KB, aggrb=3262KB/s, minb=3262KB/s, maxb=3262KB/s, mint=300010msec, maxt=300010msec


Disk stats (read/write):
 sdb: ios=89616/55141, merge=0/0, ticks=442611/171325, in_queue=613823, util=97.08%

[-- Attachment #1.2: Type: text/html, Size: 8492 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: XFS buffer IO performance is very poor
@ 2015-02-12  5:30 yy
  0 siblings, 0 replies; 8+ messages in thread
From: yy @ 2015-02-12  5:30 UTC (permalink / raw)
  To: xfs, Eric Sandeen, bfoster


[-- Attachment #1.1: Type: text/plain, Size: 2318 bytes --]

Brian and Eric,


Thanks very much for your replay.


I changed partition start position with 256K, but the performance is still poor, no change.
# fdisk -ul /dev/sdb          
 Device Boot   Start     End   Blocks  Id System
/dev/sdb1       512 2929356359 1464677924  83 Linux


I checked the XFS’s code, I agree thatXFS_IOLOCK_EXCL lock maybe the reason:
https://bitbucket.org/hustcat/kernel-2.6.32/src/786d720807052737bb17bc44da9da20554400039/fs/xfs/linux-2.6/xfs_file.c?at=master#cl-714
STATIC ssize_t
xfs_file_buffered_aio_write(
struct kiocb *iocb,
const struct iovec *iovp,
unsigned long nr_segs,
loff_t pos,
size_t ocount)
{
struct file *file = iocb-ki_filp;
struct address_space *mapping = file-f_mapping;
struct inode *inode = mapping-host;
struct xfs_inode *ip = XFS_I(inode);
ssize_t ret;
int enospc = 0;
int iolock = XFS_IOLOCK_EXCL;
size_t count = ocount;


xfs_rw_ilock(ip, iolock);


ret = xfs_file_aio_write_checks(file, pos, count, iolock);
if (ret)




However,I found that EXT3 also have mutex when with buffered IO:
https://bitbucket.org/hustcat/kernel-2.6.32/src/786d720807052737bb17bc44da9da20554400039/mm/filemap.c?at=master#cl-2642
ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
struct file *file = iocb-ki_filp;
struct inode *inode = file-f_mapping-host;
ssize_t ret;


BUG_ON(iocb-ki_pos != pos);


sb_start_write(inode-i_sb);
mutex_lock(inode-i_mutex);
ret = __generic_file_aio_write(iocb, iov, nr_segs, iocb-ki_pos);
mutex_unlock(inode-i_mutex);


I still don’t understand why ext3 does not have this problem with buffered IO.


Best regards,
yy


原始邮件
发件人:Eric Sandeensandeen@sandeen.net
收件人:yyyy@xspring.net; xfsxfs@oss.sgi.com
发送时间:2015年2月12日(周四) 00:08
主题:Re: XFS buffer IO performance is very poor


On 2/11/15 1:39 AM, yy wrote: snip (In addition to Brian's questions):  XFS format parametes:   #mkfs.xfs -d su=256k,sw=5 /dev/sdb1   #cat /proc/mounts   /dev/sdb1 /data1 xfs rw,noatime,attr2,delaylog,nobarrier,logbsize=256k,sunit=512,swidth=2560,noquota 0 0   #fdisk -ul  Device Boot Start End Blocks Id System  /dev/sdb1 128 2929356359 1464678116 83 Linux so 128*512 = 64k; your partition doesn't start on a 256k stripe unit boundary, right? Shouldn't it do so? -Eric

[-- Attachment #1.2: Type: text/html, Size: 10210 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: XFS buffer IO performance is very poor
@ 2015-02-12  6:59 yy
  2015-02-12 21:04 ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: yy @ 2015-02-12  6:59 UTC (permalink / raw)
  To: xfs, Eric Sandeen, bfoster


[-- Attachment #1.1: Type: text/plain, Size: 3233 bytes --]

In functionxfs_file_aio_read, will requestXFS_IOLOCK_SHARED lock for both direct IO and buffered IO:
STATIC ssize_t
xfs_file_aio_read(
struct kiocb *iocb,
const struct iovec *iovp,
unsigned long nr_segs,
loff_t pos)
{
...
xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);


https://bitbucket.org/hustcat/kernel-2.6.32/src/786d720807052737bb17bc44da9da20554400039/fs/xfs/linux-2.6/xfs_file.c?at=master#cl-281


so write will prevent read in XFS.


However, in function generic_file_aio_read for ext3, will not lockinode-i_mutex, so write will not prevent read in ext3.


I think this maybe the reason of poor performance for XFS. I do not know if this is a bug, or design flaws of XFS.




Best regards,
yy


原始邮件
发件人:yyyy@xspring.net
收件人:xfsxfs@oss.sgi.com; Eric Sandeensandeen@sandeen.net; bfoster@redhat.com
发送时间:2015年2月12日(周四) 13:30
主题:Re: XFS buffer IO performance is very poor


Brian and Eric,


Thanks very much for your replay.


I changed partition start position with 256K, but the performance is still poor, no change.
# fdisk -ul /dev/sdb          
 Device Boot   Start     End   Blocks  Id System
/dev/sdb1       512 2929356359 1464677924  83 Linux


I checked the XFS’s code, I agree thatXFS_IOLOCK_EXCL lock maybe the reason:
https://bitbucket.org/hustcat/kernel-2.6.32/src/786d720807052737bb17bc44da9da20554400039/fs/xfs/linux-2.6/xfs_file.c?at=master#cl-714
STATIC ssize_t
xfs_file_buffered_aio_write(
struct kiocb *iocb,
const struct iovec *iovp,
unsigned long nr_segs,
loff_t pos,
size_t ocount)
{
struct file *file = iocb-ki_filp;
struct address_space *mapping = file-f_mapping;
struct inode *inode = mapping-host;
struct xfs_inode *ip = XFS_I(inode);
ssize_t ret;
int enospc = 0;
int iolock = XFS_IOLOCK_EXCL;
size_t count = ocount;


xfs_rw_ilock(ip, iolock);


ret = xfs_file_aio_write_checks(file, pos, count, iolock);
if (ret)




However,I found that EXT3 also have mutex when with buffered IO:
https://bitbucket.org/hustcat/kernel-2.6.32/src/786d720807052737bb17bc44da9da20554400039/mm/filemap.c?at=master#cl-2642
ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos)
{
struct file *file = iocb-ki_filp;
struct inode *inode = file-f_mapping-host;
ssize_t ret;


BUG_ON(iocb-ki_pos != pos);


sb_start_write(inode-i_sb);
mutex_lock(inode-i_mutex);
ret = __generic_file_aio_write(iocb, iov, nr_segs, iocb-ki_pos);
mutex_unlock(inode-i_mutex);


I still don’t understand why ext3 does not have this problem with buffered IO.


Best regards,
yy


原始邮件
发件人:Eric Sandeensandeen@sandeen.net
收件人:yyyy@xspring.net; xfsxfs@oss.sgi.com
发送时间:2015年2月12日(周四) 00:08
主题:Re: XFS buffer IO performance is very poor


On 2/11/15 1:39 AM, yy wrote: snip (In addition to Brian's questions):  XFS format parametes:   #mkfs.xfs -d su=256k,sw=5 /dev/sdb1   #cat /proc/mounts   /dev/sdb1 /data1 xfs rw,noatime,attr2,delaylog,nobarrier,logbsize=256k,sunit=512,swidth=2560,noquota 0 0   #fdisk -ul  Device Boot Start End Blocks Id System  /dev/sdb1 128 2929356359 1464678116 83 Linux so 128*512 = 64k; your partition doesn't start on a 256k stripe unit boundary, right? Shouldn't it do so? -Eric

[-- Attachment #1.2: Type: text/html, Size: 14213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: XFS buffer IO performance is very poor
@ 2015-02-13  2:20 yy
  2015-02-13 13:46 ` Carlos Maiolino
  0 siblings, 1 reply; 8+ messages in thread
From: yy @ 2015-02-13  2:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: bfoster, Eric Sandeen, xfs


[-- Attachment #1.1: Type: text/plain, Size: 2113 bytes --]

Dave,
Thank you very much for your explanation.


I hit this issue when run MySQL on XFS. Direct IO is very import for MySQL on XFS,but I can’t found any document explanation this problem.Maybe this will cause great confusion for other MySQL users also, so maybe this problem should be explained in XFS document.


Best regards,
yy


原始邮件
发件人:Dave Chinnerdavid@fromorbit.com
收件人:yyyy@xspring.net
抄送:xfsxfs@oss.sgi.com; Eric Sandeensandeen@sandeen.net; bfosterbfoster@redhat.com
发送时间:2015年2月13日(周五) 05:04
主题:Re: XFS buffer IO performance is very poor


On Thu, Feb 12, 2015 at 02:59:52PM +0800, yy wrote:  In functionxfs_file_aio_read, will requestXFS_IOLOCK_SHARED lock  for both direct IO and buffered IO:  so write will prevent read in XFS.   However, in function generic_file_aio_read for ext3, will not  lockinode-i_mutex, so write will not prevent read in ext3.   I think this maybe the reason of poor performance for XFS. I do  not know if this is a bug, or design flaws of XFS. This is a bug and design flaw in ext3, and most other Linux filesystems. Posix states that write() must execute atomically and so no concurrent operation that reads or modifies data should should see a partial write. The linux page cache doesn't enforce this - a read to the same range as a write can return partially written data on page granularity, as read/write only serialise on page locks in the page cache. XFS is the only Linux filesystem that actually follows POSIX requirements here - the shared/exclusive locking guarantees that a buffer write completes wholly before a read is allowed to access the data. There is a down side - you can't run concurrent buffered reads and writes to the same file - if you need to do that then that's what direct IO is for, and coherency between overlapping reads and writes is then the application's problem, not the filesystem... Maybe at some point in the future we might address this with ranged IO locks, but there really aren't many multithreaded programs that hit this issue... Cheers, Dave. -- Dave Chinner david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 3532 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-02-13 13:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-11  7:39 XFS buffer IO performance is very poor yy
2015-02-11 13:35 ` Brian Foster
2015-02-11 16:08 ` Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2015-02-12  5:30 yy
2015-02-12  6:59 yy
2015-02-12 21:04 ` Dave Chinner
2015-02-13  2:20 yy
2015-02-13 13:46 ` Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox