* "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine
@ 2015-06-01 14:57 Anders Ossowicki
2015-06-01 21:01 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Anders Ossowicki @ 2015-06-01 14:57 UTC (permalink / raw)
To: xfs
Hi,
We've started seeing a slew of these messages in dmesg:
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
First question: Is this cause for alarm at all? Should we expect the
disk to blow up in our faces? Should we expect loss of performance?
This is from a machine under heavy load (database server, large dataset,
lots of I/O). It seems to happen only when we hit 15k-20k+ iops on the
disk.
We're running on 3.18.13, built from kernel.org git.
The machine has 3TB of memory and after googling the message for a
while, I guess memory fragmentation could be a likely cause. Looking at
/proc/buddyinfo when these messages show up, we see that there are
almost no fragments of order 1 and none of higher orders.
My completely uneducated guess would be that the kernel can't reap pages
fast enough, so XFS gets impatient waiting for them. That seems like an
issue for mm though but I'd like to confirm if my understanding of what
XFS does is correct.
Most of the memory is used by disk cache:
$ free -g
total used free shared buffers cached
Mem: 3023 3001 22 0 0 2840
Let me know if there is any more info I should provide.
--
Anders Ossowicki
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine
2015-06-01 14:57 "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine Anders Ossowicki
@ 2015-06-01 21:01 ` Dave Chinner
2015-06-02 12:06 ` Anders Ossowicki
0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2015-06-01 21:01 UTC (permalink / raw)
To: Anders Ossowicki; +Cc: xfs
On Mon, Jun 01, 2015 at 04:57:41PM +0200, Anders Ossowicki wrote:
> Hi,
>
> We've started seeing a slew of these messages in dmesg:
>
> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
>
> First question: Is this cause for alarm at all? Should we expect the
> disk to blow up in our faces? Should we expect loss of performance?
Nothing should go wrong - XFS will essentially block until it gets
the memory it requires.
> This is from a machine under heavy load (database server, large dataset,
> lots of I/O). It seems to happen only when we hit 15k-20k+ iops on the
> disk.
>
> We're running on 3.18.13, built from kernel.org git.
Right around the time that I was seeing all sorts of regressions
relating to low memory behaviour and the OOM killer....
> The machine has 3TB of memory and after googling the message for a
> while, I guess memory fragmentation could be a likely cause. Looking at
> /proc/buddyinfo when these messages show up, we see that there are
> almost no fragments of order 1 and none of higher orders.
Ouch. 3TB of memory, and no higher order pages left? Do you have
memory compaction turned on? That should be reforming large pages in
this situation. What type of machine is it?
> My completely uneducated guess would be that the kernel can't reap pages
> fast enough, so XFS gets impatient waiting for them. That seems like an
> issue for mm though but I'd like to confirm if my understanding of what
> XFS does is correct.
Yes, memory fragmentation tends to be a MM problem; nothing XFS can
do about it.
> Most of the memory is used by disk cache:
> $ free -g
> total used free shared buffers cached
> Mem: 3023 3001 22 0 0 2840
Especially as it appears that 2.8TB of your memory is in the page
cache and should be reclaimable.
> Let me know if there is any more info I should provide.
The info asked for here:
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
will give us more insight into the memory usage, storage and
filesystem, and help us determine the next step...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine
2015-06-01 21:01 ` Dave Chinner
@ 2015-06-02 12:06 ` Anders Ossowicki
2015-06-03 1:52 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Anders Ossowicki @ 2015-06-02 12:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs@oss.sgi.com
On Mon, Jun 01, 2015 at 11:01:13PM +0200, Dave Chinner wrote:
> Nothing should go wrong - XFS will essentially block until it gets
> the memory it requires.
Good to know, thanks!
> > We're running on 3.18.13, built from kernel.org git.
>
> Right around the time that I was seeing all sorts of regressions
> relating to low memory behaviour and the OOM killer....
We fought with some high cpu load issues back in march, related to
memory management, and we ended up on a recent longterm kernel.
http://thread.gmane.org/gmane.linux.kernel.mm/129858
> Ouch. 3TB of memory, and no higher order pages left? Do you have
> memory compaction turned on? That should be reforming large pages in
> this situation. What type of machine is it?
Memory compaction is turned on. It's an off-the-shelf dell server with 4
12c Xeon processors.
> Yes, memory fragmentation tends to be a MM problem; nothing XFS can
> do about it.
Ya, knowing we're not in immediate danger of a filesystem meltdown, I
think we'll tackle the fragmentation issue next.
> Especially as it appears that 2.8TB of your memory is in the page
> cache and should be reclaimable.
Indeed. I haven't been able to catch the issue while it was ongoing,
since upgrading to 3.13.18, but my guess is that we're not reclaiming
the cache fast enough for some reason, possibly because it takes too
long to find the best reclaimable regions with so many fragment to sift
through.
As for the pertinent system info:
Linux 3.18.13 (we also saw the issue with 3.18.9)
xfs_repair version 3.1.7
4x Intel Xeon E7-8857 v2
$ cat /proc/meminfo
MemTotal: 3170749444 kB
MemFree: 18947564 kB
MemAvailable: 2968870324 kB
Buffers: 270704 kB
Cached: 3008702200 kB
SwapCached: 0 kB
Active: 1617534420 kB
Inactive: 1415684856 kB
Active(anon): 156973416 kB
Inactive(anon): 4856264 kB
Active(file): 1460561004 kB
Inactive(file): 1410828592 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 25353212 kB
SwapFree: 25353212 kB
Dirty: 1228056 kB
Writeback: 348024 kB
AnonPages: 24244728 kB
Mapped: 137738148 kB
Shmem: 137578880 kB
Slab: 79729144 kB
SReclaimable: 79040008 kB
SUnreclaim: 689136 kB
KernelStack: 22976 kB
PageTables: 19203180 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1610727932 kB
Committed_AS: 178507488 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 6628972 kB
VmallocChunk: 31937036032 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 172736 kB
DirectMap2M: 13412352 kB
DirectMap1G: 3207593984 kB
We have three hardware raid'ed disks with XFS on them, one of which receives
the bulk of the load. This is a raid 50 volume on SSDs with the raid controller
running in writethrough mode.
$ xfs_info /dev/sdb
meta-data=/dev/sdb isize=256 agcount=32, agsize=97640448 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=3124494336, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
--
Anders Ossowicki
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine
2015-06-02 12:06 ` Anders Ossowicki
@ 2015-06-03 1:52 ` Dave Chinner
2015-06-03 7:07 ` Anders Ossowicki
0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2015-06-03 1:52 UTC (permalink / raw)
To: Anders Ossowicki; +Cc: xfs@oss.sgi.com
On Tue, Jun 02, 2015 at 02:06:48PM +0200, Anders Ossowicki wrote:
> On Mon, Jun 01, 2015 at 11:01:13PM +0200, Dave Chinner wrote:
> > Nothing should go wrong - XFS will essentially block until it gets
> > the memory it requires.
>
> Good to know, thanks!
>
> > > We're running on 3.18.13, built from kernel.org git.
> >
> > Right around the time that I was seeing all sorts of regressions
> > relating to low memory behaviour and the OOM killer....
>
> We fought with some high cpu load issues back in march, related to
> memory management, and we ended up on a recent longterm kernel.
> http://thread.gmane.org/gmane.linux.kernel.mm/129858
>
> > Ouch. 3TB of memory, and no higher order pages left? Do you have
> > memory compaction turned on? That should be reforming large pages in
> > this situation. What type of machine is it?
>
> Memory compaction is turned on. It's an off-the-shelf dell server with 4
> 12c Xeon processors.
>
> > Yes, memory fragmentation tends to be a MM problem; nothing XFS can
> > do about it.
>
> Ya, knowing we're not in immediate danger of a filesystem meltdown, I
> think we'll tackle the fragmentation issue next.
>
> > Especially as it appears that 2.8TB of your memory is in the page
> > cache and should be reclaimable.
>
> Indeed. I haven't been able to catch the issue while it was ongoing,
> since upgrading to 3.13.18, but my guess is that we're not reclaiming
> the cache fast enough for some reason, possibly because it takes too
> long to find the best reclaimable regions with so many fragment to sift
> through.
You can always try to drop the page cache to see if that solves the
problem...
> As for the pertinent system info:
>
> Linux 3.18.13 (we also saw the issue with 3.18.9)
> xfs_repair version 3.1.7
>
> 4x Intel Xeon E7-8857 v2
>
> $ cat /proc/meminfo
> MemTotal: 3170749444 kB
> MemFree: 18947564 kB
> MemAvailable: 2968870324 kB
> Buffers: 270704 kB
> Cached: 3008702200 kB
> SwapCached: 0 kB
> Active: 1617534420 kB
> Inactive: 1415684856 kB
> Active(anon): 156973416 kB
> Inactive(anon): 4856264 kB
> Active(file): 1460561004 kB
> Inactive(file): 1410828592 kB
This. You've got 2.8GB of reclaimable page cache there.
> Unevictable: 0 kB
> Mlocked: 0 kB
> SwapTotal: 25353212 kB
> SwapFree: 25353212 kB
> Dirty: 1228056 kB
> Writeback: 348024 kB
And very little of it is dirty, so it should all be immediately
reclaimable or compactable.
> Slab: 79729144 kB
> SReclaimable: 79040008 kB
80GB of slab caches as well - what is the output of /proc/slabinfo?
> We have three hardware raid'ed disks with XFS on them, one of which receives
> the bulk of the load. This is a raid 50 volume on SSDs with the raid controller
> running in writethrough mode.
It doesn't seem like writeback of dirty pages is the problem; more
the case that the page cache is rediculously huge and not being
reclaimed in a sane manner. Do you really need 2.8TB of cached file
data in memory for performance?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine
2015-06-03 1:52 ` Dave Chinner
@ 2015-06-03 7:07 ` Anders Ossowicki
2015-06-03 23:11 ` Dave Chinner
0 siblings, 1 reply; 6+ messages in thread
From: Anders Ossowicki @ 2015-06-03 7:07 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs@oss.sgi.com
On Wed, Jun 03, 2015 at 03:52:45AM +0200, Dave Chinner wrote:
> On Tue, Jun 02, 2015 at 02:06:48PM +0200, Anders Ossowicki wrote:
>
> > Slab: 79729144 kB
> > SReclaimable: 79040008 kB
>
> 80GB of slab caches as well - what is the output of /proc/slabinfo?
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
btrfs_prelim_ref 0 0 80 51 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_delayed_data_ref 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_delayed_ref_head 0 0 160 51 2 : tunables 0 0 0 : slabdata 0 0 0
btrfs_delayed_node 0 0 304 53 4 : tunables 0 0 0 : slabdata 0 0 0
btrfs_ordered_extent 0 0 424 38 4 : tunables 0 0 0 : slabdata 0 0 0
btrfs_extent_buffer 0 0 280 58 4 : tunables 0 0 0 : slabdata 0 0 0
btrfs_extent_state 0 0 88 46 1 : tunables 0 0 0 : slabdata 0 0 0
btrfs_delalloc_work 0 0 152 53 2 : tunables 0 0 0 : slabdata 0 0 0
btrfs_trans_handle 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0
btrfs_inode 0 0 1000 32 8 : tunables 0 0 0 : slabdata 0 0 0
ufs_inode_cache 0 0 744 44 8 : tunables 0 0 0 : slabdata 0 0 0
qnx4_inode_cache 0 0 656 49 8 : tunables 0 0 0 : slabdata 0 0 0
hfsplus_attr_cache 0 0 3840 8 8 : tunables 0 0 0 : slabdata 0 0 0
hfsplus_icache 0 0 896 36 8 : tunables 0 0 0 : slabdata 0 0 0
hfs_inode_cache 0 0 768 42 8 : tunables 0 0 0 : slabdata 0 0 0
minix_inode_cache 0 0 648 50 8 : tunables 0 0 0 : slabdata 0 0 0
ntfs_big_inode_cache 0 0 896 36 8 : tunables 0 0 0 : slabdata 0 0 0
ntfs_inode_cache 0 0 312 52 4 : tunables 0 0 0 : slabdata 0 0 0
jfs_mp 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0
jfs_ip 0 0 1240 26 8 : tunables 0 0 0 : slabdata 0 0 0
reiser_inode_cache 0 0 744 44 8 : tunables 0 0 0 : slabdata 0 0 0
ext2_inode_cache 0 0 792 41 8 : tunables 0 0 0 : slabdata 0 0 0
nfsd4_openowners 13394 13838 440 37 4 : tunables 0 0 0 : slabdata 374 374 0
nfs_direct_cache 0 0 208 39 2 : tunables 0 0 0 : slabdata 0 0 0
nfs_commit_data 2167 2167 704 46 8 : tunables 0 0 0 : slabdata 48 48 0
nfs_inode_cache 174843 175088 1040 31 8 : tunables 0 0 0 : slabdata 5648 5648 0
fscache_cookie_jar 1702 1702 88 46 1 : tunables 0 0 0 : slabdata 37 37 0
rpc_inode_cache 2448 2448 640 51 8 : tunables 0 0 0 : slabdata 48 48 0
xfs_dquot 0 0 472 34 4 : tunables 0 0 0 : slabdata 0 0 0
xfs_icr 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0
xfs_ili 1066228 1066625 152 53 2 : tunables 0 0 0 : slabdata 20125 20125 0
xfs_inode 2522728 2523172 1024 32 8 : tunables 0 0 0 : slabdata 78857 78857 0
xfs_efd_item 8320 8920 400 40 4 : tunables 0 0 0 : slabdata 223 223 0
xfs_da_state 1632 1632 480 34 4 : tunables 0 0 0 : slabdata 48 48 0
xfs_btree_cur 1872 1872 208 39 2 : tunables 0 0 0 : slabdata 48 48 0
ext4_groupinfo_4k 896 896 144 56 2 : tunables 0 0 0 : slabdata 16 16 0
ip6-frags 0 0 216 37 2 : tunables 0 0 0 : slabdata 0 0 0
UDPLITEv6 0 0 1088 30 8 : tunables 0 0 0 : slabdata 0 0 0
UDPv6 1440 1440 1088 30 8 : tunables 0 0 0 : slabdata 48 48 0
tw_sock_TCPv6 1312 1312 256 32 2 : tunables 0 0 0 : slabdata 41 41 0
TCPv6 768 768 1984 16 8 : tunables 0 0 0 : slabdata 48 48 0
kcopyd_job 0 0 3312 9 8 : tunables 0 0 0 : slabdata 0 0 0
dm_uevent 0 0 2632 12 8 : tunables 0 0 0 : slabdata 0 0 0
dm_rq_target_io 0 0 408 40 4 : tunables 0 0 0 : slabdata 0 0 0
scsi_cmd_cache 93743 107688 384 42 4 : tunables 0 0 0 : slabdata 2564 2564 0
cfq_queue 21842 21910 232 35 2 : tunables 0 0 0 : slabdata 626 626 0
bsg_cmd 0 0 312 52 4 : tunables 0 0 0 : slabdata 0 0 0
mqueue_inode_cache 36 36 896 36 8 : tunables 0 0 0 : slabdata 1 1 0
fuse_request 0 0 416 39 4 : tunables 0 0 0 : slabdata 0 0 0
fuse_inode 0 0 768 42 8 : tunables 0 0 0 : slabdata 0 0 0
ecryptfs_key_record_cache 0 0 576 56 8 : tunables 0 0 0 : slabdata 0 0 0
ecryptfs_inode_cache 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0
fat_inode_cache 0 0 720 45 8 : tunables 0 0 0 : slabdata 0 0 0
fat_cache 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0
hugetlbfs_inode_cache 2592 2592 600 54 8 : tunables 0 0 0 : slabdata 48 48 0
jbd2_journal_handle 4080 4080 48 85 1 : tunables 0 0 0 : slabdata 48 48 0
journal_handle 0 0 24 170 1 : tunables 0 0 0 : slabdata 0 0 0
journal_head 17208 17712 112 36 1 : tunables 0 0 0 : slabdata 492 492 0
revoke_table 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
revoke_record 6400 6400 32 128 1 : tunables 0 0 0 : slabdata 50 50 0
ext4_inode_cache 642098 775776 1008 32 8 : tunables 0 0 0 : slabdata 24243 24243 0
ext4_free_data 11328 11328 64 64 1 : tunables 0 0 0 : slabdata 177 177 0
ext4_allocation_context 1536 1536 128 32 1 : tunables 0 0 0 : slabdata 48 48 0
ext4_io_end 3024 3024 72 56 1 : tunables 0 0 0 : slabdata 54 54 0
ext4_extent_status 68877 105672 40 102 1 : tunables 0 0 0 : slabdata 1036 1036 0
ext3_inode_cache 0 0 816 40 8 : tunables 0 0 0 : slabdata 0 0 0
dquot 1760 1760 256 32 2 : tunables 0 0 0 : slabdata 55 55 0
fsnotify_mark 0 0 112 36 1 : tunables 0 0 0 : slabdata 0 0 0
pid_namespace 0 0 2200 14 8 : tunables 0 0 0 : slabdata 0 0 0
posix_timers_cache 12541 12705 248 33 2 : tunables 0 0 0 : slabdata 385 385 0
UDP-Lite 0 0 960 34 8 : tunables 0 0 0 : slabdata 0 0 0
xfrm_dst_cache 0 0 448 36 4 : tunables 0 0 0 : slabdata 0 0 0
ip_fib_trie 292 292 56 73 1 : tunables 0 0 0 : slabdata 4 4 0
UDP 1632 1632 960 34 8 : tunables 0 0 0 : slabdata 48 48 0
tw_sock_TCP 2358 2624 256 32 2 : tunables 0 0 0 : slabdata 82 82 0
TCP 3539 3808 1856 17 8 : tunables 0 0 0 : slabdata 224 224 0
blkdev_queue 323 464 1928 16 8 : tunables 0 0 0 : slabdata 29 29 0
blkdev_requests 13992 14740 368 44 4 : tunables 0 0 0 : slabdata 335 335 0
blkdev_ioc 23201 24141 104 39 1 : tunables 0 0 0 : slabdata 619 619 0
dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0
dmaengine-unmap-128 180 180 1088 30 8 : tunables 0 0 0 : slabdata 6 6 0
sock_inode_cache 6273 6273 640 51 8 : tunables 0 0 0 : slabdata 123 123 0
net_namespace 0 0 4352 7 8 : tunables 0 0 0 : slabdata 0 0 0
shmem_inode_cache 2784 2784 672 48 8 : tunables 0 0 0 : slabdata 58 58 0
ftrace_event_file 1702 1702 88 46 1 : tunables 0 0 0 : slabdata 37 37 0
taskstats 2352 2352 328 49 4 : tunables 0 0 0 : slabdata 48 48 0
proc_inode_cache 19293 24750 648 50 8 : tunables 0 0 0 : slabdata 495 495 0
sigqueue 3111 3111 160 51 2 : tunables 0 0 0 : slabdata 61 61 0
bdev_cache 1014 1014 832 39 8 : tunables 0 0 0 : slabdata 26 26 0
kernfs_node_cache 342112 342244 120 34 1 : tunables 0 0 0 : slabdata 10066 10066 0
mnt_cache 2448 2448 320 51 4 : tunables 0 0 0 : slabdata 48 48 0
inode_cache 36637 45080 584 56 8 : tunables 0 0 0 : slabdata 805 805 0
dentry 3217866 3702384 192 42 2 : tunables 0 0 0 : slabdata 88152 88152 0
iint_cache 0 0 72 56 1 : tunables 0 0 0 : slabdata 0 0 0
buffer_head 370050715 400741536 104 39 1 : tunables 0 0 0 : slabdata 10275424 10275424 0
vm_area_struct 147654 150128 184 44 2 : tunables 0 0 0 : slabdata 3412 3412 0
mm_struct 9550 10404 896 36 8 : tunables 0 0 0 : slabdata 289 289 0
files_cache 4029 4029 640 51 8 : tunables 0 0 0 : slabdata 79 79 0
signal_cache 4383 5180 1152 28 8 : tunables 0 0 0 : slabdata 185 185 0
sighand_cache 3081 3255 2112 15 8 : tunables 0 0 0 : slabdata 217 217 0
task_xstate 10016 10881 832 39 8 : tunables 0 0 0 : slabdata 279 279 0
task_struct 1913 2070 6432 5 8 : tunables 0 0 0 : slabdata 414 414 0
Acpi-ParseExt 6048 6048 72 56 1 : tunables 0 0 0 : slabdata 108 108 0
Acpi-State 306 306 80 51 1 : tunables 0 0 0 : slabdata 6 6 0
Acpi-Namespace 2040 2040 40 102 1 : tunables 0 0 0 : slabdata 20 20 0
anon_vma 69129 72318 80 51 1 : tunables 0 0 0 : slabdata 1418 1418 0
shared_policy_node 27030 27285 48 85 1 : tunables 0 0 0 : slabdata 321 321 0
numa_policy 8160 8160 24 170 1 : tunables 0 0 0 : slabdata 48 48 0
radix_tree_node 64025078 64148728 584 56 8 : tunables 0 0 0 : slabdata 1145751 1145751 0
idr_layer_cache 3456 3709 2096 15 8 : tunables 0 0 0 : slabdata 251 251 0
dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-1024 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-512 32 32 512 32 4 : tunables 0 0 0 : slabdata 1 1 0
dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-8192 779 832 8192 4 8 : tunables 0 0 0 : slabdata 208 208 0
kmalloc-4096 2918 3304 4096 8 8 : tunables 0 0 0 : slabdata 413 413 0
kmalloc-2048 3530 4224 2048 16 8 : tunables 0 0 0 : slabdata 264 264 0
kmalloc-1024 23980 27264 1024 32 8 : tunables 0 0 0 : slabdata 852 852 0
kmalloc-512 179639 222048 512 32 4 : tunables 0 0 0 : slabdata 6939 6939 0
kmalloc-256 488489 521296 256 32 2 : tunables 0 0 0 : slabdata 16291 16291 0
kmalloc-192 41454 53508 192 42 2 : tunables 0 0 0 : slabdata 1274 1274 0
kmalloc-128 54683 67168 128 32 1 : tunables 0 0 0 : slabdata 2099 2099 0
kmalloc-96 20274 37044 96 42 1 : tunables 0 0 0 : slabdata 882 882 0
kmalloc-64 329220 1136832 64 64 1 : tunables 0 0 0 : slabdata 17763 17763 0
kmalloc-32 76222 90112 32 128 1 : tunables 0 0 0 : slabdata 704 704 0
kmalloc-16 245496 247552 16 256 1 : tunables 0 0 0 : slabdata 967 967 0
kmalloc-8 570025 915456 8 512 1 : tunables 0 0 0 : slabdata 1788 1788 0
kmem_cache_node 1310 1472 64 64 1 : tunables 0 0 0 : slabdata 23 23 0
kmem_cache 512 512 256 32 2 : tunables 0 0 0 : slabdata 16 16 0
And just for good measure, this was meminfo at the same time:
MemTotal: 3170749444 kB
MemFree: 27109596 kB
MemAvailable: 2853545664 kB
Buffers: 442216 kB
Cached: 2882173504 kB
SwapCached: 0 kB
Active: 1636112224 kB
Inactive: 1303578276 kB
Active(anon): 188466588 kB
Inactive(anon): 6202588 kB
Active(file): 1447645636 kB
Inactive(file): 1297375688 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 25353212 kB
SwapFree: 25353212 kB
Dirty: 924980 kB
Writeback: 0 kB
AnonPages: 57090800 kB
Mapped: 137637500 kB
Shmem: 137578904 kB
Slab: 82699516 kB
SReclaimable: 81921588 kB
SUnreclaim: 777928 kB
KernelStack: 27968 kB
PageTables: 101008148 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1610727932 kB
Committed_AS: 205791344 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 6629132 kB
VmallocChunk: 31937035472 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 172736 kB
DirectMap2M: 13412352 kB
DirectMap1G: 3207593984 kB
> > We have three hardware raid'ed disks with XFS on them, one of which receives
> > the bulk of the load. This is a raid 50 volume on SSDs with the raid controller
> > running in writethrough mode.
>
> It doesn't seem like writeback of dirty pages is the problem; more
> the case that the page cache is rediculously huge and not being
> reclaimed in a sane manner. Do you really need 2.8TB of cached file
> data in memory for performance?
Yeah, disk cache is the primary reason for stuffing memory into that machine.
--
Anders Ossowicki
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine
2015-06-03 7:07 ` Anders Ossowicki
@ 2015-06-03 23:11 ` Dave Chinner
0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2015-06-03 23:11 UTC (permalink / raw)
To: Anders Ossowicki; +Cc: xfs@oss.sgi.com
On Wed, Jun 03, 2015 at 09:07:25AM +0200, Anders Ossowicki wrote:
> On Wed, Jun 03, 2015 at 03:52:45AM +0200, Dave Chinner wrote:
> > On Tue, Jun 02, 2015 at 02:06:48PM +0200, Anders Ossowicki wrote:
> >
> > > Slab: 79729144 kB
> > > SReclaimable: 79040008 kB
> >
> > 80GB of slab caches as well - what is the output of /proc/slabinfo?
>
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
...
> xfs_ili 1066228 1066625 152 53 2 : tunables 0 0 0 : slabdata 20125 20125 0
> xfs_inode 2522728 2523172 1024 32 8 : tunables 0 0 0 : slabdata 78857 78857 0
> dentry 3217866 3702384 192 42 2 : tunables 0 0 0 : slabdata 88152 88152 0
> buffer_head 370050715 400741536 104 39 1 : tunables 0 0 0 : slabdata 10275424 10275424 0
> radix_tree_node 64025078 64148728 584 56 8 : tunables 0 0 0 : slabdata 1145751 1145751 0
.....
> Slab: 82699516 kB
> SReclaimable: 81921588 kB
....
So 400 million bufferheads (consuming 40GB RAM) and 60 million radix
tree nodes (consuming 35GB RAM) is where all that memory is. That's
being used to track the 2.8GB of page cache data (roughly 3% memory
overhead).
Ok, nothing unusual there, but it demonstrates why I want to get rid
of bufferheads.....
> > > We have three hardware raid'ed disks with XFS on them, one of which receives
> > > the bulk of the load. This is a raid 50 volume on SSDs with the raid controller
> > > running in writethrough mode.
> >
> > It doesn't seem like writeback of dirty pages is the problem; more
> > the case that the page cache is rediculously huge and not being
> > reclaimed in a sane manner. Do you really need 2.8TB of cached file
> > data in memory for performance?
>
> Yeah, disk cache is the primary reason for stuffing memory into that machine.
Hmmmm. I don't think anyone has considered the page cache to be used
at this scale for caching before. Normally this amount of memory is
needed by applications in their process space, not as a disk buffer
to avoid disk IO. You've only got a 12TB filesystem, so you're
keeping 25% of it in the page cache at any given time, so I'm not
surprised that the page cache reclaim algorithms are having trouble....
I don't think there's anything on the XFS side we can do here to
improve the situation you are in - it appears that it's memory
relcaim and compaction that aren't working well enough to sustain
your workload on that platform....
OTOH, have you considered using something like dm-cache with a huge
ramdisk as the cache device and running it in write-through mode so
that power failure doesn't result in data loss or filesystem
corruption?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-06-03 23:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-01 14:57 "XFS: possible memory allocation deadlock in kmem_alloc" on high memory machine Anders Ossowicki
2015-06-01 21:01 ` Dave Chinner
2015-06-02 12:06 ` Anders Ossowicki
2015-06-03 1:52 ` Dave Chinner
2015-06-03 7:07 ` Anders Ossowicki
2015-06-03 23:11 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox