* extremely slow file creation/deletion after xfs ran full
@ 2015-01-12 8:36 Carsten Aulbert
2015-01-12 12:44 ` Brian Foster
2015-01-12 13:30 ` Carsten Aulbert
0 siblings, 2 replies; 14+ messages in thread
From: Carsten Aulbert @ 2015-01-12 8:36 UTC (permalink / raw)
To: xfs
Hi
a couple of days ago, one of our XFS pools ran full which did not
realize due to some bogged monitoring.
After deleting files, the FS is not filled to about 80-85% and file
creation as well as deletion is extremely slow:
$ cd /srv
$ time touch test
touch test 0.00s user 0.17s system 5% cpu 2.993 total
$ time touch test
touch test 0.00s user 0.00s system 0% cpu 0.002 total
$ time rm test
rm test 0.00s user 0.00s system 0% cpu 4.954 total
As we had seen some XFS error messages like XFS: Internal error
XFS_WANT_CORRUPTED_GOTO[1] shortly before the file system was completely
filled, we umounted it and ran successfully xfs_repair on it[2].
We started to dig into fragmentation levels, but to be honest, I'm not
too sure how to interpret the numbers. All seem pretty innocent to me
except the very high directory fragmentation:
# mount
[...]
/dev/sdb1 on /srv type xfs (rw,relatime,attr2,inode64,noquota)
# uname -a
Linux einstein4 3.14.24-atlas #1 SMP Wed Nov 19 08:29:23 UTC 2014 x86_64
GNU/Linux
(after the reboot to add more RAM, before 3.14.11)
# df -h /dev/sdb1
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 15T 12T 2.7T 82% /srv
# df -i /dev/sdb1
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdb1 3105468352 554441361 2551026991 18% /srv
# xfs_info /dev/sdb1
meta-data=/dev/sdb1 isize=256 agcount=15,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=3881835478, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
(underneath is a Areca backed 16 disk raid 10 and yes, possibly we
should have gone to more AGs, hindsight is great ;))
# xfs_db -r -c "frag" /dev/sdb1
actual 562488447, ideal 553668062, fragmentation factor 1.57%
# xfs_db -r -c "freesp" /dev/sdb1
from to extents blocks pct
1 1 9669664 9669664 1.36
2 3 5634086 12956862 1.83
4 7 1720988 8320694 1.17
8 15 314377 3088702 0.44
16 31 231279 4941737 0.70
32 63 10501367 372068430 52.45
64 127 2604359 204737946 28.86
128 255 377595 63623350 8.97
256 511 58744 20198673 2.85
512 1023 8843 5759074 0.81
1024 2047 1044 1418464 0.20
2048 4095 255 685540 0.10
4096 8191 81 453953 0.06
8192 16383 14 126863 0.02
16384 32767 18 373876 0.05
32768 65535 11 435041 0.06
65536 131071 6 458767 0.06
# xfs_db -r -c "frag -d" /dev/sdb1
actual 7987218, ideal 67532, fragmentation factor 99.15%
# xfs_db -r -c "frag -f" /dev/sdb1
actual 555079796, ideal 554171759, fragmentation factor 0.16%
Finally, we still xfs error messages in the kernel logs[3].
Questions:
* Would xfs_fsr be able to cure this problem?
* Would moving to xfsprogs 3.2 and kernel > 3.15 with on disk format 5
plus crc help to protect against this "wear"?
Cheers
Carsten
PS: Please Cc me, as I'm not subscribed ATM.
[1]Jan 7 20:58:47 einstein4 kernel: [14029681.124548] XFS: Internal
error XFS_WANT_CORRUPTED_GOTO at line 1419 of file fs/xfs/xfs_alloc.c.
Caller 0xffffffffa023594d
Jan 7 20:58:47 einstein4 kernel: [14029681.124657] CPU: 5 PID: 13437
Comm: kworker/5:1 Not tainted 3.14.11-atlas #1
Jan 7 20:58:47 einstein4 kernel: [14029681.124745] Hardware name:
Supermicro X7DVL/X7DVL, BIOS 6.00 11/02/2007
Jan 7 20:58:47 einstein4 kernel: [14029681.124860] Workqueue: xfsalloc
xfs_bmapi_allocate_worker [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.124977] 0000000000000000
ffff88013b7ebc80 ffffffff814e9555 ffff88017022bd30
Jan 7 20:58:47 einstein4 kernel: [14029681.125247] ffffffffa0234aed
ffff88041877c380 0000000000000000 000000000000000c
Jan 7 20:58:47 einstein4 kernel: [14029681.125514] 0000000000000000
0000000100000000 0ff0b98800000000 0000000100000002
Jan 7 20:58:47 einstein4 kernel: [14029681.125783] Call Trace:
Jan 7 20:58:47 einstein4 kernel: [14029681.125856]
[<ffffffff814e9555>] ? dump_stack+0x41/0x51
Jan 7 20:58:47 einstein4 kernel: [14029681.125937]
[<ffffffffa0234aed>] ? xfs_alloc_ag_vextent_size+0x33d/0x650 [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.126035]
[<ffffffffa023594d>] ? xfs_alloc_ag_vextent+0x9d/0x100 [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.126118]
[<ffffffffa02366c5>] ? xfs_alloc_vextent+0x485/0x610 [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.126202]
[<ffffffffa0245ff4>] ? xfs_bmap_btalloc+0x2d4/0x790 [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.126286]
[<ffffffffa0246ab1>] ? __xfs_bmapi_allocate+0xd1/0x2e0 [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.126366]
[<ffffffffa0216375>] ? xfs_bmapi_allocate_worker+0x35/0x70 [xfs]
Jan 7 20:58:47 einstein4 kernel: [14029681.126456]
[<ffffffff8107d5db>] ? process_one_work+0x16b/0x400
Jan 7 20:58:47 einstein4 kernel: [14029681.126529]
[<ffffffff8107e9c6>] ? worker_thread+0x116/0x380
Jan 7 20:58:47 einstein4 kernel: [14029681.126601]
[<ffffffff8107e8b0>] ? manage_workers.isra.21+0x290/0x290
Jan 7 20:58:47 einstein4 kernel: [14029681.126675]
[<ffffffff81084bcc>] ? kthread+0xbc/0xe0
Jan 7 20:58:47 einstein4 kernel: [14029681.126745]
[<ffffffff81084b10>] ? flush_kthread_worker+0x80/0x80
Jan 7 20:58:47 einstein4 kernel: [14029681.126819]
[<ffffffff814f724c>] ? ret_from_fork+0x7c/0xb0
Jan 7 20:58:47 einstein4 kernel: [14029681.126890]
[<ffffffff81084b10>] ? flush_kthread_worker+0x80/0x80
Jan 7 20:58:47 einstein4 kernel: [14029681.126973] XFS (sdb1): page
discard on page ffffea000b973650, inode 0x8e56233bd, offset 172167168.
[2] multiple times until we had enough RAM installed into the machine
(24GB+swap space finally made the difference)
[3] e.g.
[170369.029932] ------------[ cut here ]------------
[170369.029940] WARNING: CPU: 4 PID: 65 at fs/xfs/xfs_aops.c:1184
xfs_vm_releasepage+0xb5/0xc0 [xfs]()
[170369.029942] Modules linked in: dm_mod nfsv3 tun xt_nat autofs4
ipt_REJECT ipt_MASQUERADE xt_state iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables
xt_tcpudp x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd
fscache sunrpc ext4 crc16 mbcache jbd2 ipmi_poweroff ipmi_watchdog
ipmi_devintf ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support ttm
drm_kms_helper drm coretemp i5000_edac edac_core lpc_ich i5k_amb
mfd_core ioatdma sr_mod cdrom i2c_algo_bit i2c_i801 rng_core i2c_core
joydev dca psmouse ehci_pci snd_pcm snd_timer snd soundcore pcspkr
serio_raw shpchp processor parport_pc parport evdev kvm thermal_sys
button xfs crc32c libcrc32c md_mod usb_storage hid_generic usbhid hid
sd_mod sg crc_t10dif crct10dif_common floppy ahci libahci libata
uhci_hcd e1000 ehci_hcd arcmsr usbcore usb_common scsi_mod e1000e ptp
pps_core
[170369.029986] CPU: 4 PID: 65 Comm: kswapd0 Tainted: G W
3.14.24-atlas #1
[170369.029986] Hardware name: Supermicro X7DVL/X7DVL, BIOS 6.00 11/02/2007
[170369.029987] 0000000000000009 ffff88060d461a18 ffffffff81529af8
0000000000000000
[170369.029989] 0000000000000000 ffff88060d461a58 ffffffff8106a37c
ffff880610389d40
[170369.029991] ffff88060d461b28 ffffea000f39f8a0 ffff8805f722dd98
ffffea000f39f8c0
[170369.029994] Call Trace:
[170369.029994] [<ffffffff81529af8>] dump_stack+0x46/0x58
[170369.029996] [<ffffffff8106a37c>] warn_slowpath_common+0x8c/0xc0
[170369.029998] [<ffffffff8106a3ca>] warn_slowpath_null+0x1a/0x20
[170369.030000] [<ffffffffa0212225>] xfs_vm_releasepage+0xb5/0xc0 [xfs]
[170369.030009] [<ffffffff81182a10>] ? anon_vma_prepare+0x140/0x140
[170369.030011] [<ffffffff81149902>] try_to_release_page+0x32/0x50
[170369.030014] [<ffffffff8115ed7a>] shrink_active_list+0x3aa/0x3b0
[170369.030016] [<ffffffff8115f154>] shrink_lruvec+0x3d4/0x6a0
[170369.030017] [<ffffffff811aa001>] ?
__mem_cgroup_largest_soft_limit_node+0x21/0xd0
[170369.030019] [<ffffffff811aa001>] ?
__mem_cgroup_largest_soft_limit_node+0x21/0xd0
[170369.030021] [<ffffffff8115f486>] shrink_zone+0x66/0x1a0
[170369.030023] [<ffffffff8116065c>] balance_pgdat+0x3ac/0x600
[170369.030025] [<ffffffff81160a34>] kswapd+0x184/0x470
[170369.030027] [<ffffffff810ae460>] ? finish_wait+0x80/0x80
[170369.030028] [<ffffffff811608b0>] ? balance_pgdat+0x600/0x600
[170369.030030] [<ffffffff8108da79>] kthread+0xc9/0xe0
[170369.030032] [<ffffffff8108d9b0>] ? flush_kthread_worker+0x80/0x80
[170369.030034] [<ffffffff81538cfc>] ret_from_fork+0x7c/0xb0
[170369.030035] [<ffffffff8108d9b0>] ? flush_kthread_worker+0x80/0x80
[170369.030038] ---[ end trace 9f4d705dbe4e039c ]---
--
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
phone/fax: +49 511 762-17185 / -17193
https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 8:36 extremely slow file creation/deletion after xfs ran full Carsten Aulbert @ 2015-01-12 12:44 ` Brian Foster 2015-01-12 13:30 ` Carsten Aulbert 1 sibling, 0 replies; 14+ messages in thread From: Brian Foster @ 2015-01-12 12:44 UTC (permalink / raw) To: Carsten Aulbert; +Cc: xfs On Mon, Jan 12, 2015 at 09:36:49AM +0100, Carsten Aulbert wrote: > Hi > > a couple of days ago, one of our XFS pools ran full which did not > realize due to some bogged monitoring. > > After deleting files, the FS is not filled to about 80-85% and file > creation as well as deletion is extremely slow: > > $ cd /srv > $ time touch test > touch test 0.00s user 0.17s system 5% cpu 2.993 total > $ time touch test > touch test 0.00s user 0.00s system 0% cpu 0.002 total > $ time rm test > rm test 0.00s user 0.00s system 0% cpu 4.954 total > Based on the size and consumption of the fs, first thing that comes to mind is perhaps fragmented use of inode records. E.g., inode records are spread all over the storage with handfulls of inodes free here and there, which means individual inode allocation can take a hit searching for the record with free inodes. I don't think that explains rm performance though. It might be interesting to grab a couple perf traces of the touch and rm commands and see what cpu usage looks like. E.g., 'perf record -g touch <file>,' 'perf report -g,' etc. > As we had seen some XFS error messages like XFS: Internal error > XFS_WANT_CORRUPTED_GOTO[1] shortly before the file system was completely > filled, we umounted it and ran successfully xfs_repair on it[2]. > Interesting, that is: XFS_WANT_CORRUPTED_GOTO(rlen <= flen, error0); ... which seems strange given that rlen should have been previously trimmed. Taking a closer look at xfs_alloc_fix_len() reminded me of the following commit, which actually documents and fixes instances of this error, so that might be the most likely culprit: 30265117 xfs: Fix rounding in xfs_alloc_fix_len() That originally went into 3.16 and I don't see it in the 3.14 stable branch. Did xfs_repair actually report anything wrong? > We started to dig into fragmentation levels, but to be honest, I'm not > too sure how to interpret the numbers. All seem pretty innocent to me > except the very high directory fragmentation: > > # mount > [...] > /dev/sdb1 on /srv type xfs (rw,relatime,attr2,inode64,noquota) > > # uname -a > Linux einstein4 3.14.24-atlas #1 SMP Wed Nov 19 08:29:23 UTC 2014 x86_64 > GNU/Linux > > (after the reboot to add more RAM, before 3.14.11) > # df -h /dev/sdb1 > Filesystem Size Used Avail Use% Mounted on > /dev/sdb1 15T 12T 2.7T 82% /srv > > # df -i /dev/sdb1 > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/sdb1 3105468352 554441361 2551026991 18% /srv > > # xfs_info /dev/sdb1 > meta-data=/dev/sdb1 isize=256 agcount=15, > agsize=268435455 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=3881835478, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > (underneath is a Areca backed 16 disk raid 10 and yes, possibly we > should have gone to more AGs, hindsight is great ;)) > > # xfs_db -r -c "frag" /dev/sdb1 > actual 562488447, ideal 553668062, fragmentation factor 1.57% > # xfs_db -r -c "freesp" /dev/sdb1 > from to extents blocks pct > 1 1 9669664 9669664 1.36 > 2 3 5634086 12956862 1.83 > 4 7 1720988 8320694 1.17 > 8 15 314377 3088702 0.44 > 16 31 231279 4941737 0.70 > 32 63 10501367 372068430 52.45 > 64 127 2604359 204737946 28.86 > 128 255 377595 63623350 8.97 > 256 511 58744 20198673 2.85 > 512 1023 8843 5759074 0.81 > 1024 2047 1044 1418464 0.20 > 2048 4095 255 685540 0.10 > 4096 8191 81 453953 0.06 > 8192 16383 14 126863 0.02 > 16384 32767 18 373876 0.05 > 32768 65535 11 435041 0.06 > 65536 131071 6 458767 0.06 > > # xfs_db -r -c "frag -d" /dev/sdb1 > actual 7987218, ideal 67532, fragmentation factor 99.15% > # xfs_db -r -c "frag -f" /dev/sdb1 > actual 555079796, ideal 554171759, fragmentation factor 0.16% > > Finally, we still xfs error messages in the kernel logs[3]. > > Questions: > * Would xfs_fsr be able to cure this problem? > * Would moving to xfsprogs 3.2 and kernel > 3.15 with on disk format 5 > plus crc help to protect against this "wear"? > It seems like you have sufficiently large and available free space. That said, it's fairly common for filesytems to naturally drop in performance as free space becomes more limited. E.g., I think it's common practice to avoid regular usage while over 80% used if performance is a major concern. Also, I doubt xfs_fsr will do much to affect inode allocation performance, but I could be wrong. With regard to v5, I don't think there is anything that will affect "wear" as such. Perhaps there are speculative preallocation fixes that might affect long term allocation patterns, depending on your workload. CRC support provides extra metadata validation. v5 also includes free inode btree support, which can improve inode allocation performance in the inode record fragmentation scenario described above. Brian > Cheers > > Carsten > > PS: Please Cc me, as I'm not subscribed ATM. > > > [1]Jan 7 20:58:47 einstein4 kernel: [14029681.124548] XFS: Internal > error XFS_WANT_CORRUPTED_GOTO at line 1419 of file fs/xfs/xfs_alloc.c. > Caller 0xffffffffa023594d > Jan 7 20:58:47 einstein4 kernel: [14029681.124657] CPU: 5 PID: 13437 > Comm: kworker/5:1 Not tainted 3.14.11-atlas #1 > Jan 7 20:58:47 einstein4 kernel: [14029681.124745] Hardware name: > Supermicro X7DVL/X7DVL, BIOS 6.00 11/02/2007 > Jan 7 20:58:47 einstein4 kernel: [14029681.124860] Workqueue: xfsalloc > xfs_bmapi_allocate_worker [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.124977] 0000000000000000 > ffff88013b7ebc80 ffffffff814e9555 ffff88017022bd30 > Jan 7 20:58:47 einstein4 kernel: [14029681.125247] ffffffffa0234aed > ffff88041877c380 0000000000000000 000000000000000c > Jan 7 20:58:47 einstein4 kernel: [14029681.125514] 0000000000000000 > 0000000100000000 0ff0b98800000000 0000000100000002 > Jan 7 20:58:47 einstein4 kernel: [14029681.125783] Call Trace: > Jan 7 20:58:47 einstein4 kernel: [14029681.125856] > [<ffffffff814e9555>] ? dump_stack+0x41/0x51 > Jan 7 20:58:47 einstein4 kernel: [14029681.125937] > [<ffffffffa0234aed>] ? xfs_alloc_ag_vextent_size+0x33d/0x650 [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.126035] > [<ffffffffa023594d>] ? xfs_alloc_ag_vextent+0x9d/0x100 [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.126118] > [<ffffffffa02366c5>] ? xfs_alloc_vextent+0x485/0x610 [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.126202] > [<ffffffffa0245ff4>] ? xfs_bmap_btalloc+0x2d4/0x790 [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.126286] > [<ffffffffa0246ab1>] ? __xfs_bmapi_allocate+0xd1/0x2e0 [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.126366] > [<ffffffffa0216375>] ? xfs_bmapi_allocate_worker+0x35/0x70 [xfs] > Jan 7 20:58:47 einstein4 kernel: [14029681.126456] > [<ffffffff8107d5db>] ? process_one_work+0x16b/0x400 > Jan 7 20:58:47 einstein4 kernel: [14029681.126529] > [<ffffffff8107e9c6>] ? worker_thread+0x116/0x380 > Jan 7 20:58:47 einstein4 kernel: [14029681.126601] > [<ffffffff8107e8b0>] ? manage_workers.isra.21+0x290/0x290 > Jan 7 20:58:47 einstein4 kernel: [14029681.126675] > [<ffffffff81084bcc>] ? kthread+0xbc/0xe0 > Jan 7 20:58:47 einstein4 kernel: [14029681.126745] > [<ffffffff81084b10>] ? flush_kthread_worker+0x80/0x80 > Jan 7 20:58:47 einstein4 kernel: [14029681.126819] > [<ffffffff814f724c>] ? ret_from_fork+0x7c/0xb0 > Jan 7 20:58:47 einstein4 kernel: [14029681.126890] > [<ffffffff81084b10>] ? flush_kthread_worker+0x80/0x80 > Jan 7 20:58:47 einstein4 kernel: [14029681.126973] XFS (sdb1): page > discard on page ffffea000b973650, inode 0x8e56233bd, offset 172167168. > > [2] multiple times until we had enough RAM installed into the machine > (24GB+swap space finally made the difference) > > [3] e.g. > [170369.029932] ------------[ cut here ]------------ > [170369.029940] WARNING: CPU: 4 PID: 65 at fs/xfs/xfs_aops.c:1184 > xfs_vm_releasepage+0xb5/0xc0 [xfs]() > [170369.029942] Modules linked in: dm_mod nfsv3 tun xt_nat autofs4 > ipt_REJECT ipt_MASQUERADE xt_state iptable_nat nf_nat_ipv4 nf_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_filter ip_tables > xt_tcpudp x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd > fscache sunrpc ext4 crc16 mbcache jbd2 ipmi_poweroff ipmi_watchdog > ipmi_devintf ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support ttm > drm_kms_helper drm coretemp i5000_edac edac_core lpc_ich i5k_amb > mfd_core ioatdma sr_mod cdrom i2c_algo_bit i2c_i801 rng_core i2c_core > joydev dca psmouse ehci_pci snd_pcm snd_timer snd soundcore pcspkr > serio_raw shpchp processor parport_pc parport evdev kvm thermal_sys > button xfs crc32c libcrc32c md_mod usb_storage hid_generic usbhid hid > sd_mod sg crc_t10dif crct10dif_common floppy ahci libahci libata > uhci_hcd e1000 ehci_hcd arcmsr usbcore usb_common scsi_mod e1000e ptp > pps_core > [170369.029986] CPU: 4 PID: 65 Comm: kswapd0 Tainted: G W > 3.14.24-atlas #1 > [170369.029986] Hardware name: Supermicro X7DVL/X7DVL, BIOS 6.00 11/02/2007 > [170369.029987] 0000000000000009 ffff88060d461a18 ffffffff81529af8 > 0000000000000000 > [170369.029989] 0000000000000000 ffff88060d461a58 ffffffff8106a37c > ffff880610389d40 > [170369.029991] ffff88060d461b28 ffffea000f39f8a0 ffff8805f722dd98 > ffffea000f39f8c0 > [170369.029994] Call Trace: > [170369.029994] [<ffffffff81529af8>] dump_stack+0x46/0x58 > [170369.029996] [<ffffffff8106a37c>] warn_slowpath_common+0x8c/0xc0 > [170369.029998] [<ffffffff8106a3ca>] warn_slowpath_null+0x1a/0x20 > [170369.030000] [<ffffffffa0212225>] xfs_vm_releasepage+0xb5/0xc0 [xfs] > [170369.030009] [<ffffffff81182a10>] ? anon_vma_prepare+0x140/0x140 > [170369.030011] [<ffffffff81149902>] try_to_release_page+0x32/0x50 > [170369.030014] [<ffffffff8115ed7a>] shrink_active_list+0x3aa/0x3b0 > [170369.030016] [<ffffffff8115f154>] shrink_lruvec+0x3d4/0x6a0 > [170369.030017] [<ffffffff811aa001>] ? > __mem_cgroup_largest_soft_limit_node+0x21/0xd0 > [170369.030019] [<ffffffff811aa001>] ? > __mem_cgroup_largest_soft_limit_node+0x21/0xd0 > [170369.030021] [<ffffffff8115f486>] shrink_zone+0x66/0x1a0 > [170369.030023] [<ffffffff8116065c>] balance_pgdat+0x3ac/0x600 > [170369.030025] [<ffffffff81160a34>] kswapd+0x184/0x470 > [170369.030027] [<ffffffff810ae460>] ? finish_wait+0x80/0x80 > [170369.030028] [<ffffffff811608b0>] ? balance_pgdat+0x600/0x600 > [170369.030030] [<ffffffff8108da79>] kthread+0xc9/0xe0 > [170369.030032] [<ffffffff8108d9b0>] ? flush_kthread_worker+0x80/0x80 > [170369.030034] [<ffffffff81538cfc>] ret_from_fork+0x7c/0xb0 > [170369.030035] [<ffffffff8108d9b0>] ? flush_kthread_worker+0x80/0x80 > [170369.030038] ---[ end trace 9f4d705dbe4e039c ]--- > > -- > Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics > Callinstrasse 38, 30167 Hannover, Germany > phone/fax: +49 511 762-17185 / -17193 > https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 8:36 extremely slow file creation/deletion after xfs ran full Carsten Aulbert 2015-01-12 12:44 ` Brian Foster @ 2015-01-12 13:30 ` Carsten Aulbert 2015-01-12 15:52 ` Brian Foster 1 sibling, 1 reply; 14+ messages in thread From: Carsten Aulbert @ 2015-01-12 13:30 UTC (permalink / raw) To: xfs [-- Attachment #1: Type: text/plain, Size: 2350 bytes --] Hi again (sorry that I reply to my own email, rather than Brian's, but I've only just subscribed to the list), I'll try to address Brian's email in here with fake quotation. Sorry for breaking the threading :( Brian Foster wrote: > Based on the size and consumption of the fs, first thing that comes > to mind is perhaps fragmented use of inode records. E.g., inode > records are spread all over the storage with handfulls of inodes > free here and there, which means individual inode allocation can > take a hit searching for the record with free inodes. I don't think > that explains rm performance though. > It might be interesting to grab a couple perf traces of the touch > and rm commands and see what cpu usage looks like. E.g., 'perf > record -g touch <file>,' 'perf report -g,' etc. I've attached both perf outputs and after reviewing them briefly I think slowness is caused by different means, i.e. only the touch one is in xfs' territory. > 30265117 xfs: Fix rounding in xfs_alloc_fix_len() > > That originally went into 3.16 and I don't see it in the 3.14 stable > branch. Did xfs_repair actually report anything wrong? Nope, only displayed all the stages, but nothing was fixed. > It seems like you have sufficiently large and available free > space. That said, it's fairly common for filesytems to naturally > drop in performance as free space becomes more limited. E.g., I > think it's common practice to avoid regular usage while over 80% > used if performance is a major concern. Also, I doubt xfs_fsr will > do much to affect inode allocation performance, but I could be > wrong. Yes, we should have monitored that mount point rather than /tmp which we did when bad things happened(TM). Given that we have a high fragmentation of directories, would xfs_fsr help here at all? Regarding v5, currently we are copying data off that disk and will create it anew with -m crc=1,finobt=1 on a recent 3.18 kernel. Apart from that I don't know much we can further do to safe-guard us against this happening again (well kepp it below 80% all the time as well). Thanks a lot for the remarks! cheers Carsten -- Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics Callinstrasse 38, 30167 Hannover, Germany phone/fax: +49 511 762-17185 / -17193 https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome [-- Attachment #2: rm.perf.data.gz --] [-- Type: application/gzip, Size: 2822 bytes --] [-- Attachment #3: touch.perf.data.gz --] [-- Type: application/gzip, Size: 14833 bytes --] [-- Attachment #4: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 13:30 ` Carsten Aulbert @ 2015-01-12 15:52 ` Brian Foster 2015-01-12 16:09 ` Carsten Aulbert 0 siblings, 1 reply; 14+ messages in thread From: Brian Foster @ 2015-01-12 15:52 UTC (permalink / raw) To: Carsten Aulbert; +Cc: xfs On Mon, Jan 12, 2015 at 02:30:18PM +0100, Carsten Aulbert wrote: > Hi again > > (sorry that I reply to my own email, rather than Brian's, but I've only > just subscribed to the list), I'll try to address Brian's email in here > with fake quotation. Sorry for breaking the threading :( > > Brian Foster wrote: > > Based on the size and consumption of the fs, first thing that comes > > to mind is perhaps fragmented use of inode records. E.g., inode > > records are spread all over the storage with handfulls of inodes > > free here and there, which means individual inode allocation can > > take a hit searching for the record with free inodes. I don't think > > that explains rm performance though. > > > It might be interesting to grab a couple perf traces of the touch > > and rm commands and see what cpu usage looks like. E.g., 'perf > > record -g touch <file>,' 'perf report -g,' etc. > > I've attached both perf outputs and after reviewing them briefly I think > slowness is caused by different means, i.e. only the touch one is in > xfs' territory. > Ok, well if something else causes the rm slowness, the scattered free inode scenario might be more likely. I can't see any symbols associated with the perf output. I suspect because I'm not running on your kernel. It might be better to run 'perf report -g' and copy/paste the stack trace for some of the larger consumers. > > 30265117 xfs: Fix rounding in xfs_alloc_fix_len() > > > > That originally went into 3.16 and I don't see it in the 3.14 stable > > branch. Did xfs_repair actually report anything wrong? > > Nope, only displayed all the stages, but nothing was fixed. > Ok, also seems like further indication of the problem fixed by the above commit. > > It seems like you have sufficiently large and available free > > space. That said, it's fairly common for filesytems to naturally > > drop in performance as free space becomes more limited. E.g., I > > think it's common practice to avoid regular usage while over 80% > > used if performance is a major concern. Also, I doubt xfs_fsr will > > do much to affect inode allocation performance, but I could be > > wrong. > > Yes, we should have monitored that mount point rather than /tmp which we > did when bad things happened(TM). Given that we have a high > fragmentation of directories, would xfs_fsr help here at all? > I don't _think_ that fsr will mess with directories, but I don't know for sure... > Regarding v5, currently we are copying data off that disk and will > create it anew with -m crc=1,finobt=1 on a recent 3.18 kernel. Apart > from that I don't know much we can further do to safe-guard us against > this happening again (well kepp it below 80% all the time as well). > Sounds good. FWIW, something like the following should tell us how many free inodes are available in each ag, and thus whether we have to search for free inodes in existing records rather than allocate new ones: for i in $(seq 0 15); do xfs_db -c "agi $i" -c "p freecount" <dev> done Brian > Thanks a lot for the remarks! > > cheers > > Carsten > > -- > Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics > Callinstrasse 38, 30167 Hannover, Germany > phone/fax: +49 511 762-17185 / -17193 > https://wiki.atlas.aei.uni-hannover.de/foswiki/bin/view/ATLAS/WebHome > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 15:52 ` Brian Foster @ 2015-01-12 16:09 ` Carsten Aulbert 2015-01-12 16:37 ` Brian Foster 0 siblings, 1 reply; 14+ messages in thread From: Carsten Aulbert @ 2015-01-12 16:09 UTC (permalink / raw) To: Brian Foster; +Cc: xfs [-- Attachment #1: Type: text/plain, Size: 1095 bytes --] Hi Brian On 01/12/2015 04:52 PM, Brian Foster wrote: > I can't see any symbols associated with the perf output. I suspect > because I'm not running on your kernel. It might be better to run 'perf > report -g' and copy/paste the stack trace for some of the larger > consumers. > Sorry, I rarely need to use perf and of course forgot that the intermediate output it tightly coupled to the running kernel. Attaching the output of perf report -g here. > > Sounds good. FWIW, something like the following should tell us how many > free inodes are available in each ag, and thus whether we have to search > for free inodes in existing records rather than allocate new ones: > > for i in $(seq 0 15); do > xfs_db -c "agi $i" -c "p freecount" <dev> > done > Another metric :) freecount = 53795884 freecount = 251 freecount = 45 freecount = 381 freecount = 11009 freecount = 6748 freecount = 663 freecount = 595 freecount = 693 freecount = 9089 freecount = 37122 freecount = 2657 freecount = 60497 freecount = 1790275 freecount = 54544 That looks... not really uniform to me. Cheers Carsten [-- Attachment #2: touch.stack.gz --] [-- Type: application/gzip, Size: 3498 bytes --] [-- Attachment #3: rm.stack.gz --] [-- Type: application/gzip, Size: 702 bytes --] [-- Attachment #4: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 16:09 ` Carsten Aulbert @ 2015-01-12 16:37 ` Brian Foster 2015-01-12 17:33 ` Carsten Aulbert 0 siblings, 1 reply; 14+ messages in thread From: Brian Foster @ 2015-01-12 16:37 UTC (permalink / raw) To: Carsten Aulbert; +Cc: xfs On Mon, Jan 12, 2015 at 05:09:01PM +0100, Carsten Aulbert wrote: > Hi Brian > > On 01/12/2015 04:52 PM, Brian Foster wrote: > > I can't see any symbols associated with the perf output. I suspect > > because I'm not running on your kernel. It might be better to run 'perf > > report -g' and copy/paste the stack trace for some of the larger > > consumers. > > > > Sorry, I rarely need to use perf and of course forgot that the > intermediate output it tightly coupled to the running kernel. Attaching > the output of perf report -g here. It does look like we're spending most of the time down in xfs_diallog_ag(), which is the algorithm for finding a free inode in the btree for a particular AG when we know that existing records have some free. > > > > Sounds good. FWIW, something like the following should tell us how many > > free inodes are available in each ag, and thus whether we have to search > > for free inodes in existing records rather than allocate new ones: > > > > for i in $(seq 0 15); do > > xfs_db -c "agi $i" -c "p freecount" <dev> > > done > > > Another metric :) > > freecount = 53795884 > freecount = 251 > freecount = 45 > freecount = 381 > freecount = 11009 > freecount = 6748 > freecount = 663 > freecount = 595 > freecount = 693 > freecount = 9089 > freecount = 37122 > freecount = 2657 > freecount = 60497 > freecount = 1790275 > freecount = 54544 > > That looks... not really uniform to me. > No, but it does show that there are a bunch of free inodes scattered throughout the existing records in most of the AGs. The finobt should definitely help avoid the allocation latency when this occurs. It is interesting that you have so many more free inodes in ag 0 (~53m as opposed to several hundreds/thousands in others). What does 'p count' show for each ag? Was this fs grown to the current size over time? Brian > Cheers > > Carsten > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 16:37 ` Brian Foster @ 2015-01-12 17:33 ` Carsten Aulbert 2015-01-13 20:06 ` Stan Hoeppner 2015-01-13 20:33 ` Dave Chinner 0 siblings, 2 replies; 14+ messages in thread From: Carsten Aulbert @ 2015-01-12 17:33 UTC (permalink / raw) To: Brian Foster; +Cc: xfs [-- Attachment #1: Type: text/plain, Size: 1205 bytes --] Hi On 01/12/2015 05:37 PM, Brian Foster wrote: > No, but it does show that there are a bunch of free inodes scattered > throughout the existing records in most of the AGs. The finobt should > definitely help avoid the allocation latency when this occurs. > That is good to know/hope :) > It is interesting that you have so many more free inodes in ag 0 (~53m > as opposed to several hundreds/thousands in others). What does 'p count' > show for each ag? Was this fs grown to the current size over time? > "p count" seems to "thin out" over ag: count = 513057792 count = 16596224 count = 15387584 count = 14958528 count = 4096960 count = 4340416 count = 4987968 count = 3321792 count = 5041856 count = 5485376 count = 5233088 count = 5810432 count = 5271552 count = 5464000 count = 365440 (if the full print output is interesting, it's attached). The FS was never grown, the machine was installed on November 5th, 2012 and was (ab)using the FS ever since. On average there have been about 1-2 million file creations per day (ranging from a few kByte files to a few 100 kBytes) and also an equally large number of deletions (after some time). Thus overall, a somewhat busy server. Cheers Carsten [-- Attachment #2: ag-print.log --] [-- Type: text/plain, Size: 4175 bytes --] AG 0 magicnum = 0x58414749 versionnum = 1 seqno = 0 length = 268435455 count = 513057792 root = 1286 level = 3 freecount = 53795884 newino = 128 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 1 magicnum = 0x58414749 versionnum = 1 seqno = 1 length = 268435455 count = 16596224 root = 123653 level = 3 freecount = 251 newino = 17696 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 2 magicnum = 0x58414749 versionnum = 1 seqno = 2 length = 268435455 count = 15387584 root = 644 level = 3 freecount = 45 newino = 5081504 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 3 magicnum = 0x58414749 versionnum = 1 seqno = 3 length = 268435455 count = 14958528 root = 317207 level = 3 freecount = 381 newino = 5034112 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 4 magicnum = 0x58414749 versionnum = 1 seqno = 4 length = 268435455 count = 4096960 root = 320520 level = 2 freecount = 11009 newino = 5079872 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 5 magicnum = 0x58414749 versionnum = 1 seqno = 5 length = 268435455 count = 4340416 root = 743 level = 2 freecount = 6748 newino = 5150208 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 6 magicnum = 0x58414749 versionnum = 1 seqno = 6 length = 268435455 count = 4987968 root = 155879 level = 2 freecount = 663 newino = 4937472 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 7 magicnum = 0x58414749 versionnum = 1 seqno = 7 length = 268435455 count = 3321792 root = 543815 level = 2 freecount = 595 newino = 13695232 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 8 magicnum = 0x58414749 versionnum = 1 seqno = 8 length = 268435455 count = 5041856 root = 507410 level = 2 freecount = 693 newino = 5057856 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 9 magicnum = 0x58414749 versionnum = 1 seqno = 9 length = 268435455 count = 5485376 root = 208 level = 2 freecount = 9089 newino = 5134912 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 10 magicnum = 0x58414749 versionnum = 1 seqno = 10 length = 268435455 count = 5233088 root = 821 level = 2 freecount = 37122 newino = 5060416 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 11 magicnum = 0x58414749 versionnum = 1 seqno = 11 length = 268435455 count = 5810432 root = 349837 level = 2 freecount = 2657 newino = 5102880 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 12 magicnum = 0x58414749 versionnum = 1 seqno = 12 length = 268435455 count = 5271552 root = 314 level = 2 freecount = 60497 newino = 5050528 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 13 magicnum = 0x58414749 versionnum = 1 seqno = 13 length = 268435455 count = 5464000 root = 992 level = 2 freecount = 1790275 newino = 5067808 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 AG 14 magicnum = 0x58414749 versionnum = 1 seqno = 14 length = 123739108 count = 365440 root = 945 level = 2 freecount = 54544 newino = 5084224 dirino = null unlinked[0-63] = uuid = 00000000-0000-0000-0000-000000000000 lsn = 0 crc = 0 (unchecked) free_root = 0 free_level = 0 [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 17:33 ` Carsten Aulbert @ 2015-01-13 20:06 ` Stan Hoeppner 2015-01-13 20:13 ` Carsten Aulbert 2015-01-13 20:33 ` Dave Chinner 1 sibling, 1 reply; 14+ messages in thread From: Stan Hoeppner @ 2015-01-13 20:06 UTC (permalink / raw) To: Carsten Aulbert, Brian Foster; +Cc: xfs On 01/12/2015 11:33 AM, Carsten Aulbert wrote: ... > The FS was never grown, the machine was installed on November 5th, 2012 > and was (ab)using the FS ever since. On average there have been about > 1-2 million file creations per day (ranging from a few kByte files to a > few 100 kBytes) and also an equally large number of deletions (after > some time). Thus overall, a somewhat busy server. This workload seems more suited to a database than a filesystem. Though surely you've already considered such, and chose not to go that route. With high fragmentation you get lots of seeking. What model disks are these? What is your RAID10 geometry? Are your partitions properly aligned to that geometry, and to the drives (512n/512e)? Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-13 20:06 ` Stan Hoeppner @ 2015-01-13 20:13 ` Carsten Aulbert 2015-01-13 20:43 ` Stan Hoeppner 0 siblings, 1 reply; 14+ messages in thread From: Carsten Aulbert @ 2015-01-13 20:13 UTC (permalink / raw) To: Stan Hoeppner, Brian Foster; +Cc: xfs Hi Stan On 01/13/2015 09:06 PM, Stan Hoeppner wrote: > This workload seems more suited to a database than a filesystem. Though > surely you've already considered such, and chose not to go that route. > Yepp, but as we do not fully control the server software and need to work further on the binary blobs arriving, a database is also not that well suited for it, but yes, we looked into it (and run mysql, marida, cassandra, mongo, postgresql, ...) > With high fragmentation you get lots of seeking. What model disks are > these? What is your RAID10 geometry? Are your partitions properly > aligned to that geometry, and to the drives (512n/512e)? Disks are 2TB Hitachi SATA drives (Ultrastar, HUA722020ALA330). As these are some yrs old, they are native 512byte ones. They are connected via an Areca 1261ML controller with a Supermicro backplane. RAID striping is not ideal (128kByte per member disk) and thus our xfs layout is not ideal as well. Things we plan to change with the next attempt ;) After the arrival of "advanced format" HDD and SSDs we usually try to align everything to full 1 MByte or larger, just to be sure any combination of 512b, 4kb, ... will eventually align :) Cheers Carsten _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-13 20:13 ` Carsten Aulbert @ 2015-01-13 20:43 ` Stan Hoeppner 2015-01-14 6:07 ` Carsten Aulbert 0 siblings, 1 reply; 14+ messages in thread From: Stan Hoeppner @ 2015-01-13 20:43 UTC (permalink / raw) To: Carsten Aulbert, Brian Foster; +Cc: xfs On 01/13/2015 02:13 PM, Carsten Aulbert wrote: > Hi Stan > > On 01/13/2015 09:06 PM, Stan Hoeppner wrote: >> This workload seems more suited to a database than a filesystem. Though >> surely you've already considered such, and chose not to go that route. >> > > Yepp, but as we do not fully control the server software and need to > work further on the binary blobs arriving, a database is also not that > well suited for it, but yes, we looked into it (and run mysql, marida, > cassandra, mongo, postgresql, ...) > >> With high fragmentation you get lots of seeking. What model disks are >> these? What is your RAID10 geometry? Are your partitions properly >> aligned to that geometry, and to the drives (512n/512e)? > > Disks are 2TB Hitachi SATA drives (Ultrastar, HUA722020ALA330). As these > are some yrs old, they are native 512byte ones. They are connected via > an Areca 1261ML controller with a Supermicro backplane. > > RAID striping is not ideal (128kByte per member disk) and thus our xfs > layout is not ideal as well. Things we plan to change with the next > attempt ;) With your file sizes that seems a recipe for hotspots. What do the controller metrics tell you about IOs per drive, bytes per drive? Are they balanced? > After the arrival of "advanced format" HDD and SSDs we usually try to > align everything to full 1 MByte or larger, just to be sure any > combination of 512b, 4kb, ... will eventually align :) It's not that simple with striping. Partitions need to start and end on stripe boundaries, not simply multiples of 4KB or 1MB as with single disks. If you use non power of 2 drive counts in a stripe, aligning at multiples of 1MB will screw ya, e.g. 12 drives * 64KB is a 768KB stripe. Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-13 20:43 ` Stan Hoeppner @ 2015-01-14 6:07 ` Carsten Aulbert 0 siblings, 0 replies; 14+ messages in thread From: Carsten Aulbert @ 2015-01-14 6:07 UTC (permalink / raw) To: Stan Hoeppner; +Cc: xfs Hi Stan On 01/13/2015 09:43 PM, Stan Hoeppner wrote: > With your file sizes that seems a recipe for hotspots. What do the > controller metrics tell you about IOs per drive, bytes per drive? Are > they balanced? > They looked ok from 10000ft, but unfortunately, the controller is not that talkative :( >> After the arrival of "advanced format" HDD and SSDs we usually try to >> align everything to full 1 MByte or larger, just to be sure any >> combination of 512b, 4kb, ... will eventually align :) > > It's not that simple with striping. Partitions need to start and end on > stripe boundaries, not simply multiples of 4KB or 1MB as with single > disks. If you use non power of 2 drive counts in a stripe, aligning at > multiples of 1MB will screw ya, e.g. 12 drives * 64KB is a 768KB stripe. At the moment there are 16 disks (or 8 "data" disks), so here, power of 2 still prevails. But still, very valid remark! Cheers Carsten _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-12 17:33 ` Carsten Aulbert 2015-01-13 20:06 ` Stan Hoeppner @ 2015-01-13 20:33 ` Dave Chinner 2015-01-14 6:12 ` Carsten Aulbert 1 sibling, 1 reply; 14+ messages in thread From: Dave Chinner @ 2015-01-13 20:33 UTC (permalink / raw) To: Carsten Aulbert; +Cc: Brian Foster, xfs On Mon, Jan 12, 2015 at 06:33:06PM +0100, Carsten Aulbert wrote: > Hi > > On 01/12/2015 05:37 PM, Brian Foster wrote: > > No, but it does show that there are a bunch of free inodes scattered > > throughout the existing records in most of the AGs. The finobt should > > definitely help avoid the allocation latency when this occurs. > > > > That is good to know/hope :) > > > It is interesting that you have so many more free inodes in ag 0 (~53m > > as opposed to several hundreds/thousands in others). What does 'p count' > > show for each ag? Was this fs grown to the current size over time? > > > "p count" seems to "thin out" over ag: > > count = 513057792 AG 0 has > 500 allocated million inodes, and > 50 million free inodes. Traversal of the inode btree to find free inodes is going to be costly in both CPU time and memory footprint. The other AGs have much fewer inodes, so searches are much less costly. That's why allocation will be sometimes slow and sometimes fast - it depends on what AG the inode is being allocated from. > count = 16596224 > count = 15387584 > count = 14958528 > count = 4096960 > count = 4340416 > count = 4987968 > count = 3321792 > count = 5041856 > count = 5485376 > count = 5233088 > count = 5810432 > count = 5271552 > count = 5464000 > count = 365440 This pattern seems to match a filesystem that was running under inode32 for most of it's life, and is now using inode64 and hence spreading the subdirectories and hence new inodes over all AGs instead of just limiting them to AG 0.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-13 20:33 ` Dave Chinner @ 2015-01-14 6:12 ` Carsten Aulbert 2015-01-16 15:35 ` Carlos Maiolino 0 siblings, 1 reply; 14+ messages in thread From: Carsten Aulbert @ 2015-01-14 6:12 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Hi Dave On 01/13/2015 09:33 PM, Dave Chinner wrote: > > This pattern seems to match a filesystem that was running under > inode32 for most of it's life, and is now using inode64 and hence > spreading the subdirectories and hence new inodes over all AGs > instead of just limiting them to AG 0.... > I fear you are right. The install logs from 2012 show, that fstab simply contained: UUID=dd407af5-1edc-41de-8c72-5fe1de31fb11 /srv xfs rw 0 2 And our change management is not that good to track back, when we changed to inode64. OK, I think I've gathered quite a lot of things we did wrong. I'll try to summarize it in another email later today or tomorrow to invite a few more comments before we go ahead redoing the FS. Also, it may serve as summary for the list archive. For now, I want to say a very, very big thank you for remarks and suggestions! Cheers Carsten _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: extremely slow file creation/deletion after xfs ran full 2015-01-14 6:12 ` Carsten Aulbert @ 2015-01-16 15:35 ` Carlos Maiolino 0 siblings, 0 replies; 14+ messages in thread From: Carlos Maiolino @ 2015-01-16 15:35 UTC (permalink / raw) To: Carsten Aulbert; +Cc: xfs On Wed, Jan 14, 2015 at 07:12:55AM +0100, Carsten Aulbert wrote: > Hi Dave > > On 01/13/2015 09:33 PM, Dave Chinner wrote: > > > > This pattern seems to match a filesystem that was running under > > inode32 for most of it's life, and is now using inode64 and hence > > spreading the subdirectories and hence new inodes over all AGs > > instead of just limiting them to AG 0.... > > > > I fear you are right. The install logs from 2012 show, that fstab simply > contained: > > UUID=dd407af5-1edc-41de-8c72-5fe1de31fb11 /srv xfs rw 0 2 > > And our change management is not that good to track back, when we > changed to inode64. > So, looks like Dave is right, you have been running XFS with inode32 for the entire time (which probably overloaded AG0 from this FS) and then changed to inode64 when it became the default option. > OK, I think I've gathered quite a lot of things we did wrong. I'll try > to summarize it in another email later today or tomorrow to invite a few > more comments before we go ahead redoing the FS. Also, it may serve as > summary for the list archive. > > For now, I want to say a very, very big thank you for remarks and > suggestions! > > Cheers > > Carsten > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs -- Carlos _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-01-16 15:35 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-12 8:36 extremely slow file creation/deletion after xfs ran full Carsten Aulbert 2015-01-12 12:44 ` Brian Foster 2015-01-12 13:30 ` Carsten Aulbert 2015-01-12 15:52 ` Brian Foster 2015-01-12 16:09 ` Carsten Aulbert 2015-01-12 16:37 ` Brian Foster 2015-01-12 17:33 ` Carsten Aulbert 2015-01-13 20:06 ` Stan Hoeppner 2015-01-13 20:13 ` Carsten Aulbert 2015-01-13 20:43 ` Stan Hoeppner 2015-01-14 6:07 ` Carsten Aulbert 2015-01-13 20:33 ` Dave Chinner 2015-01-14 6:12 ` Carsten Aulbert 2015-01-16 15:35 ` Carlos Maiolino
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox