From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keyur Govande Subject: XFS fragmentation on file append Date: Mon, 7 Apr 2014 18:53:46 -0400 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 To: linux-fsdevel@vger.kernel.org Return-path: Received: from mail-we0-f182.google.com ([74.125.82.182]:44443 "EHLO mail-we0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754695AbaDGWxs (ORCPT ); Mon, 7 Apr 2014 18:53:48 -0400 Received: by mail-we0-f182.google.com with SMTP id p61so121096wes.13 for ; Mon, 07 Apr 2014 15:53:47 -0700 (PDT) Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hello, I'm currently investigating a MySQL performance degradation on XFS due to file fragmentation. The box has a 16 drive RAID 10 array with a 1GB battery backed cache running on a 12 core box. xfs_info shows: meta-data=/dev/sda4 isize=256 agcount=24, agsize=24024992 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=576599552, imaxpct=5 = sunit=16 swidth=512 blks naming = version 2 bsize=4096 ascii-ci=0 log = internal bsize=4096 blocks=281552, version=2 = sectsz=512 sunit=16 blks, lazy-count=1 realtime = none extsz=4096 blocks=0, rtextents=0 The kernel version is: 3.14.0-1.el6.elrepo.x86_64 and the XFS partition is mounted with: rw,noatime,allocsize=128m,inode64,swalloc. The partition is 2TB in size and 40% full to simulate production. Here's a test program that appends 512KB like MySQL does (write and then fsync). To exacerbate the issue, it loops a bunch of times: https://gist.github.com/keyurdg/961c19175b81c73fdaa3 When run, this creates ~9500 extents most of length 1024. cat'ing the file to /dev/null after dropping the caches reads at an average of 75 MBps, way less than the hardware is capable of. When I add a posix_fallocate before calling pwrite() as shown here https://gist.github.com/keyurdg/eb504864d27ebfe7b40a the file fragments an order of magnitude less (~30 extents), and cat'ing to /dev/null proceeds at ~1GBps. The same behavior is seen even when the allocsize option is removed and the partition remounted. This is somewhat unexpected and I'm working on a patch to add fallocate to MySQL, wanted to check in here if I'm missing anything obvious here. Cheers, Keyur.