* Question on the xfs inode slab memory @ 2023-05-31 21:29 Jianan Wang 2023-06-01 0:08 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Jianan Wang @ 2023-05-31 21:29 UTC (permalink / raw) To: linux-xfs Hi all, I have a question regarding the xfs slab memory usage when operating a filesystem with 1-2 billion inodes (raid 0 with 6 disks, totally 18TB). On this partition, whenever there is a high disk io operation, like removing millions of small files, the slab kernel memory usage will increase a lot, leading to many OOM issues happening for the services running on this node. You could check some of the stats as the following (only includes the xfs related): ######################################################################### Active / Total Objects (% used): 281803052 / 317485764 (88.8%) Active / Total Slabs (% used): 13033144 / 13033144 (100.0%) Active / Total Caches (% used): 126 / 180 (70.0%) Active / Total Size (% used): 114671057.99K / 127265108.19K (90.1%) Minium / Average / Maximum Object : 0.01K / 0.40K / 16.75K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 78207920 70947541 0% 1.00K 7731010 32 247392320K xfs_inode 59945928 46548798 0% 0.19K 1433102 42 11464816K dentry 25051296 25051282 0% 0.38K 599680 42 9594880K xfs_buf ######################################################################### The peak slab memory usage could spike all the way to 100GB+. We are using Ubuntu 18.04 and the xfs version is 4.9, kernel version is 5.4 ######################################################################### Linux# cat /etc/*-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS" NAME="Ubuntu" VERSION="18.04.5 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.5 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic Linux# sudo apt list | grep xfs libguestfs-xfs/bionic-updates 1:1.36.13-1ubuntu3.3 amd64 nfs-ganesha-xfs/bionic 2.6.0-2 amd64 obexfs/bionic 0.11-2build1 amd64 x11-xfs-utils/bionic 7.7+2build1 amd64 xfsdump/bionic 3.1.6+nmu2 amd64 xfslibs-dev/bionic 4.9.0+nmu1ubuntu2 amd64 xfsprogs/bionic,now 4.9.0+nmu1ubuntu2 amd64 [installed] xfstt/bionic 1.9.3-3 amd64 xfswitch-plugin/bionic 0.0.1-5ubuntu5 amd64 Linux# uname -a Linux linux-host 5.4.0-45-generic #49~18.04.2-Ubuntu SMP Wed Aug 26 16:29:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux ######################################################################### Is there any potential way to limit the slab memory increase for a node as a whole, or the only thing we could do is to reduce the filesystem inode or iops usage? Thanks in advance! Jianan Wang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-05-31 21:29 Question on the xfs inode slab memory Jianan Wang @ 2023-06-01 0:08 ` Dave Chinner 2023-06-01 5:25 ` Jianan Wang 2023-06-01 6:21 ` Jianan Wang 0 siblings, 2 replies; 10+ messages in thread From: Dave Chinner @ 2023-06-01 0:08 UTC (permalink / raw) To: Jianan Wang; +Cc: linux-xfs On Wed, May 31, 2023 at 02:29:52PM -0700, Jianan Wang wrote: > Hi all, > > I have a question regarding the xfs slab memory usage when operating a > filesystem with 1-2 billion inodes (raid 0 with 6 disks, totally > 18TB). On this partition, whenever there is a high disk io operation, > like removing millions of small files, the slab kernel memory usage > will increase a lot, leading to many OOM issues happening for the > services running on this node. You could check some of the stats as > the following (only includes the xfs related): You didn't include all the XFS related slabs. At minimum, the inode log item slab needs to be shown (xfs_ili) because that tells us how many of the inodes in the cache have been dirtied. As it is, I'm betting the problem is the disk subsystem can't write back dirty inodes fast enough to keep up with memory demand and so reclaim is declaring OOM faster than your disks can clean inodes to enable them to be reclaimed. > ######################################################################### > Active / Total Objects (% used): 281803052 / 317485764 (88.8%) > Active / Total Slabs (% used): 13033144 / 13033144 (100.0%) > Active / Total Caches (% used): 126 / 180 (70.0%) > Active / Total Size (% used): 114671057.99K / 127265108.19K (90.1%) > Minium / Average / Maximum Object : 0.01K / 0.40K / 16.75K > > OBJS ACTIVE USE OBJ SIZE SLABS > OBJ/SLAB CACHE SIZE NAME > 78207920 70947541 0% 1.00K 7731010 > 32 247392320K xfs_inode > 59945928 46548798 0% 0.19K 1433102 > 42 11464816K dentry > 25051296 25051282 0% 0.38K 599680 > 42 9594880K xfs_buf Ok, that's from slabtop. Please don't autowrap stuff you've pasted in - it makes it really hard to read. (reformatted so I can read it). OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 78207920 70947541 0% 1.00K 7731010 32 247392320K xfs_inode 59945928 46548798 0% 0.19K 1433102 42 11464816K dentry 25051296 25051282 0% 0.38K 599680 42 9594880K xfs_buf So, 70 million cached inodes, with a cache size of 240GB. There are 7.7 million slabs, 32 objects per slab, and that's roughly 240GB. But why does the slab report only 78 million objects in the slab when at 240GB there should be 240 million objects in the slab? It looks like theres some kind of accounting problem here, likely in the slabtop program. I have always found slabtop to be unreliable like this.... Can you attach the output of 'cat /proc/slabinfo' and 'cat /proc/meminfo' when you have a large slab cache in memory? > ######################################################################### > > The peak slab memory usage could spike all the way to 100GB+. Is that all? :) > We are using Ubuntu 18.04 and the xfs version is 4.9, kernel version is 5.4 Ah, I don't think there's anything upstream can do for you. We rewrote large portions of the XFS inode reclaim in 5.9 (3 years ago) to address the issues with memory reclaim getting stuck on dirty XFS inodes, so inode reclaim behaviour in modern kernels is completely different to old kernels. I'd suggest that you need to upgrade your systems to run a more modern kernel and see if that fixes the issues you are seeing... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-01 0:08 ` Dave Chinner @ 2023-06-01 5:25 ` Jianan Wang 2023-06-01 15:06 ` Darrick J. Wong 2023-06-01 6:21 ` Jianan Wang 1 sibling, 1 reply; 10+ messages in thread From: Jianan Wang @ 2023-06-01 5:25 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Hi Dave, Thanks for the prompt response! On Wed, May 31, 2023 at 5:08 PM Dave Chinner <david@fromorbit.com> wrote: > > On Wed, May 31, 2023 at 02:29:52PM -0700, Jianan Wang wrote: > > Hi all, > > > > I have a question regarding the xfs slab memory usage when operating a > > filesystem with 1-2 billion inodes (raid 0 with 6 disks, totally > > 18TB). On this partition, whenever there is a high disk io operation, > > like removing millions of small files, the slab kernel memory usage > > will increase a lot, leading to many OOM issues happening for the > > services running on this node. You could check some of the stats as > > the following (only includes the xfs related): > > You didn't include all the XFS related slabs. At minimum, the inode > log item slab needs to be shown (xfs_ili) because that tells us how > many of the inodes in the cache have been dirtied. > > As it is, I'm betting the problem is the disk subsystem can't write > back dirty inodes fast enough to keep up with memory demand and so > reclaim is declaring OOM faster than your disks can clean inodes to > enable them to be reclaimed. We have similar feelings about this. Do you think 1-2 billion inodes and fast walk of millions of files could be an issue overtime on the current xfs implementation. In a production environment, do you have any suggestion on tuning the xfs performance to fit this kind of large number of small files workload or we shall consider reducing the data volume and io workload to more nodes? > > > ######################################################################### > > Active / Total Objects (% used): 281803052 / 317485764 (88.8%) > > Active / Total Slabs (% used): 13033144 / 13033144 (100.0%) > > Active / Total Caches (% used): 126 / 180 (70.0%) > > Active / Total Size (% used): 114671057.99K / 127265108.19K (90.1%) > > Minium / Average / Maximum Object : 0.01K / 0.40K / 16.75K > > > > OBJS ACTIVE USE OBJ SIZE SLABS > > OBJ/SLAB CACHE SIZE NAME > > 78207920 70947541 0% 1.00K 7731010 > > 32 247392320K xfs_inode > > 59945928 46548798 0% 0.19K 1433102 > > 42 11464816K dentry > > 25051296 25051282 0% 0.38K 599680 > > 42 9594880K xfs_buf > > Ok, that's from slabtop. Please don't autowrap stuff you've pasted > in - it makes it really hard to read. (reformatted so I can read > it). Got it, will pay more attention to this. > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 78207920 70947541 0% 1.00K 7731010 32 247392320K xfs_inode > 59945928 46548798 0% 0.19K 1433102 42 11464816K dentry > 25051296 25051282 0% 0.38K 599680 42 9594880K xfs_buf > > So, 70 million cached inodes, with a cache size of 240GB. There are > 7.7 million slabs, 32 objects per slab, and that's roughly 240GB. > > But why does the slab report only 78 million objects in the slab > when at 240GB there should be 240 million objects in the slab? > > It looks like theres some kind of accounting problem here, likely in > the slabtop program. I have always found slabtop to be unreliable > like this.... > > Can you attach the output of 'cat /proc/slabinfo' and 'cat > /proc/meminfo' when you have a large slab cache in memory? I do not have those output you requested for the exact situation I pasted originally, but this is another node where xfs consumes a lot of slab memory using the same xfs version: Linux # cat /proc/slabinfo slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> nf_conntrack 15716 20349 320 51 4 : tunables 0 0 0 : slabdata 399 399 0 au_finfo 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 au_icntnr 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 au_dinfo 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 ovl_inode 41792 42757 688 47 8 : tunables 0 0 0 : slabdata 941 941 0 ufs_inode_cache 0 0 808 40 8 : tunables 0 0 0 : slabdata 0 0 0 qnx4_inode_cache 0 0 680 48 8 : tunables 0 0 0 : slabdata 0 0 0 hfsplus_attr_cache 0 0 3840 8 8 : tunables 0 0 0 : slabdata 0 0 0 hfsplus_icache 0 0 896 36 8 : tunables 0 0 0 : slabdata 0 0 0 hfs_inode_cache 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 minix_inode_cache 0 0 672 48 8 : tunables 0 0 0 : slabdata 0 0 0 ntfs_big_inode_cache 0 0 960 34 8 : tunables 0 0 0 : slabdata 0 0 0 ntfs_inode_cache 0 0 296 55 4 : tunables 0 0 0 : slabdata 0 0 0 jfs_ip 0 0 1280 25 8 : tunables 0 0 0 : slabdata 0 0 0 xfs_dqtrx 0 0 528 31 4 : tunables 0 0 0 : slabdata 0 0 0 xfs_dquot 0 0 496 33 4 : tunables 0 0 0 : slabdata 0 0 0 xfs_buf 2545661 3291582 384 42 4 : tunables 0 0 0 : slabdata 78371 78371 0 xfs_rui_item 0 0 696 47 8 : tunables 0 0 0 : slabdata 0 0 0 xfs_rud_item 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0 xfs_inode 23063278 77479540 1024 32 8 : tunables 0 0 0 : slabdata 2425069 2425069 0 xfs_efd_item 4662 4847 440 37 4 : tunables 0 0 0 : slabdata 131 131 0 xfs_buf_item 8610 8760 272 30 2 : tunables 0 0 0 : slabdata 292 292 0 xfs_trans 1925 1925 232 35 2 : tunables 0 0 0 : slabdata 55 55 0 xfs_da_state 1632 1632 480 34 4 : tunables 0 0 0 : slabdata 48 48 0 xfs_btree_cur 1728 1728 224 36 2 : tunables 0 0 0 : slabdata 48 48 0 kvm_async_pf 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0 kvm_vcpu 0 0 17152 1 8 : tunables 0 0 0 : slabdata 0 0 0 kvm_mmu_page_header 0 0 168 48 2 : tunables 0 0 0 : slabdata 0 0 0 x86_fpu 0 0 4160 7 8 : tunables 0 0 0 : slabdata 0 0 0 ext4_groupinfo_4k 7196 7196 144 28 1 : tunables 0 0 0 : slabdata 257 257 0 btrfs_delayed_node 0 0 312 52 4 : tunables 0 0 0 : slabdata 0 0 0 btrfs_ordered_extent 0 0 416 39 4 : tunables 0 0 0 : slabdata 0 0 0 btrfs_inode 0 0 1168 28 8 : tunables 0 0 0 : slabdata 0 0 0 mlx5_fs_ftes 560 560 584 28 4 : tunables 0 0 0 : slabdata 20 20 0 mlx5_fs_fgs 100 100 648 50 8 : tunables 0 0 0 : slabdata 2 2 0 scsi_sense_cache 16896 16896 128 32 1 : tunables 0 0 0 : slabdata 528 528 0 fsverity_info 0 0 248 33 2 : tunables 0 0 0 : slabdata 0 0 0 ip6-frags 21560 21736 184 44 2 : tunables 0 0 0 : slabdata 494 494 0 PINGv6 26 26 1216 26 8 : tunables 0 0 0 : slabdata 1 1 0 RAWv6 390 390 1216 26 8 : tunables 0 0 0 : slabdata 15 15 0 UDPv6 4032 4032 1344 24 8 : tunables 0 0 0 : slabdata 168 168 0 tw_sock_TCPv6 4785 4785 248 33 2 : tunables 0 0 0 : slabdata 145 145 0 request_sock_TCPv6 0 0 304 53 4 : tunables 0 0 0 : slabdata 0 0 0 TCPv6 3809 3874 2432 13 8 : tunables 0 0 0 : slabdata 298 298 0 kcopyd_job 0 0 3312 9 8 : tunables 0 0 0 : slabdata 0 0 0 dm_uevent 0 0 2632 12 8 : tunables 0 0 0 : slabdata 0 0 0 mqueue_inode_cache 1632 1632 960 34 8 : tunables 0 0 0 : slabdata 48 48 0 fuse_request 1344 1344 144 28 1 : tunables 0 0 0 : slabdata 48 48 0 fuse_inode 13428 13830 832 39 8 : tunables 0 0 0 : slabdata 360 360 0 ecryptfs_key_record_cache 0 0 576 28 4 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_inode_cache 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_file_cache 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_auth_tok_list_item 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 fat_inode_cache 176 176 744 44 8 : tunables 0 0 0 : slabdata 4 4 0 fat_cache 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0 squashfs_inode_cache 46 46 704 46 8 : tunables 0 0 0 : slabdata 1 1 0 jbd2_journal_handle 4080 4080 48 85 1 : tunables 0 0 0 : slabdata 48 48 0 jbd2_journal_head 10438 10608 120 34 1 : tunables 0 0 0 : slabdata 312 312 0 jbd2_revoke_table_s 1024 1024 16 256 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_inode_cache 56239 67562 1096 29 8 : tunables 0 0 0 : slabdata 2700 2700 0 ext4_allocation_context 1536 1536 128 32 1 : tunables 0 0 0 : slabdata 48 48 0 ext4_system_zone 816 816 40 102 1 : tunables 0 0 0 : slabdata 8 8 0 ext4_io_end 24832 24896 64 64 1 : tunables 0 0 0 : slabdata 389 389 0 ext4_pending_reservation 67072 67456 32 128 1 : tunables 0 0 0 : slabdata 527 527 0 ext4_extent_status 44359 55386 40 102 1 : tunables 0 0 0 : slabdata 543 543 0 mbcache 50005 50005 56 73 1 : tunables 0 0 0 : slabdata 685 685 0 userfaultfd_ctx_cache 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 dnotify_struct 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 pid_namespace 1872 1872 208 39 2 : tunables 0 0 0 : slabdata 48 48 0 ip4-frags 0 0 200 40 2 : tunables 0 0 0 : slabdata 0 0 0 xfrm_state 0 0 704 46 8 : tunables 0 0 0 : slabdata 0 0 0 PING 25440 25440 1024 32 8 : tunables 0 0 0 : slabdata 795 795 0 RAW 832 832 1024 32 8 : tunables 0 0 0 : slabdata 26 26 0 tw_sock_TCP 21153 21153 248 33 2 : tunables 0 0 0 : slabdata 641 641 0 request_sock_TCP 13674 13780 304 53 4 : tunables 0 0 0 : slabdata 260 260 0 TCP 8470 8666 2240 14 8 : tunables 0 0 0 : slabdata 619 619 0 hugetlbfs_inode_cache 102 102 632 51 8 : tunables 0 0 0 : slabdata 2 2 0 dquot 1536 1536 256 32 2 : tunables 0 0 0 : slabdata 48 48 0 eventpoll_pwq 81872 81928 72 56 1 : tunables 0 0 0 : slabdata 1463 1463 0 dax_cache 42 42 768 42 8 : tunables 0 0 0 : slabdata 1 1 0 request_queue 180 255 2104 15 8 : tunables 0 0 0 : slabdata 17 17 0 biovec-max 1120 1192 4096 8 8 : tunables 0 0 0 : slabdata 149 149 0 biovec-128 2546 2642 2048 16 8 : tunables 0 0 0 : slabdata 166 166 0 biovec-64 5492 5656 1024 32 8 : tunables 0 0 0 : slabdata 182 182 0 khugepaged_mm_slot 1440 1440 112 36 1 : tunables 0 0 0 : slabdata 40 40 0 user_namespace 0 0 536 30 4 : tunables 0 0 0 : slabdata 0 0 0 uid_cache 16514 16640 128 32 1 : tunables 0 0 0 : slabdata 520 520 0 dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0 dmaengine-unmap-128 30 30 1088 30 8 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 62080 62433 832 39 8 : tunables 0 0 0 : slabdata 1617 1617 0 skbuff_ext_cache 16454495 32746392 192 42 2 : tunables 0 0 0 : slabdata 779676 779676 0 skbuff_fclone_cache 6752 7008 512 32 4 : tunables 0 0 0 : slabdata 219 219 0 skbuff_head_cache 48769 49184 256 32 2 : tunables 0 0 0 : slabdata 1537 1537 0 file_lock_cache 1776 1776 216 37 2 : tunables 0 0 0 : slabdata 48 48 0 fsnotify_mark_connector 6144 6144 32 128 1 : tunables 0 0 0 : slabdata 48 48 0 net_namespace 18 18 4928 6 8 : tunables 0 0 0 : slabdata 3 3 0 task_delay_info 79305 79407 80 51 1 : tunables 0 0 0 : slabdata 1557 1557 0 taskstats 2256 2256 344 47 4 : tunables 0 0 0 : slabdata 48 48 0 proc_dir_entry 4578 4578 192 42 2 : tunables 0 0 0 : slabdata 109 109 0 pde_opener 79050 79050 40 102 1 : tunables 0 0 0 : slabdata 775 775 0 proc_inode_cache 153717 156498 680 48 8 : tunables 0 0 0 : slabdata 3263 3263 0 bdev_cache 1092 1092 832 39 8 : tunables 0 0 0 : slabdata 28 28 0 shmem_inode_cache 28213 28800 720 45 8 : tunables 0 0 0 : slabdata 640 640 0 kernfs_node_cache 195825 200730 136 30 1 : tunables 0 0 0 : slabdata 6691 6691 0 mnt_cache 13984 14076 320 51 4 : tunables 0 0 0 : slabdata 276 276 0 filp 250898 253328 256 32 2 : tunables 0 0 0 : slabdata 7917 7917 0 inode_cache 140359 142937 608 53 8 : tunables 0 0 0 : slabdata 2712 2712 0 dentry 27263153 58131675 192 42 2 : tunables 0 0 0 : slabdata 1384093 1384093 0 names_cache 617 633 4096 8 8 : tunables 0 0 0 : slabdata 80 80 0 iint_cache 0 0 120 34 1 : tunables 0 0 0 : slabdata 0 0 0 lsm_file_cache 87405 87890 24 170 1 : tunables 0 0 0 : slabdata 517 517 0 buffer_head 3298954 3785808 104 39 1 : tunables 0 0 0 : slabdata 97072 97072 0 uts_namespace 1776 1776 440 37 4 : tunables 0 0 0 : slabdata 48 48 0 nsproxy 3504 3504 56 73 1 : tunables 0 0 0 : slabdata 48 48 0 vm_area_struct 265005 265785 208 39 2 : tunables 0 0 0 : slabdata 6815 6815 0 mm_struct 19926 19926 1088 30 8 : tunables 0 0 0 : slabdata 666 666 0 files_cache 28029 28029 704 46 8 : tunables 0 0 0 : slabdata 612 612 0 signal_cache 28910 29154 1152 28 8 : tunables 0 0 0 : slabdata 1043 1043 0 sighand_cache 11738 11795 2112 15 8 : tunables 0 0 0 : slabdata 791 791 0 task_struct 7323 7693 7616 4 8 : tunables 0 0 0 : slabdata 1924 1924 0 cred_jar 81837 81837 192 42 2 : tunables 0 0 0 : slabdata 1949 1949 0 anon_vma_chain 350482 351552 64 64 1 : tunables 0 0 0 : slabdata 5493 5493 0 anon_vma 231854 233220 88 46 1 : tunables 0 0 0 : slabdata 5070 5070 0 pid 113960 114336 128 32 1 : tunables 0 0 0 : slabdata 3573 3573 0 Acpi-Operand 189280 189280 72 56 1 : tunables 0 0 0 : slabdata 3380 3380 0 Acpi-ParseExt 18174 18174 104 39 1 : tunables 0 0 0 : slabdata 466 466 0 Acpi-State 10098 10098 80 51 1 : tunables 0 0 0 : slabdata 198 198 0 numa_policy 62 62 264 31 2 : tunables 0 0 0 : slabdata 2 2 0 trace_event_file 2622 2622 88 46 1 : tunables 0 0 0 : slabdata 57 57 0 ftrace_event_field 28220 28220 48 85 1 : tunables 0 0 0 : slabdata 332 332 0 pool_workqueue 8513 8544 256 32 2 : tunables 0 0 0 : slabdata 267 267 0 radix_tree_node 6248549 8844010 584 28 4 : tunables 0 0 0 : slabdata 315865 315865 0 task_group 2448 2448 640 51 8 : tunables 0 0 0 : slabdata 48 48 0 vmap_area 24174 64640 64 64 1 : tunables 0 0 0 : slabdata 1010 1010 0 dma-kmalloc-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-1k 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-1k 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-192 64441 82992 192 42 2 : tunables 0 0 0 : slabdata 1976 1976 0 kmalloc-rcl-128 723176 936960 128 32 1 : tunables 0 0 0 : slabdata 29280 29280 0 kmalloc-rcl-96 10652323 18961866 96 42 1 : tunables 0 0 0 : slabdata 451473 451473 0 kmalloc-rcl-64 6044167 11369536 64 64 1 : tunables 0 0 0 : slabdata 177649 177649 0 kmalloc-rcl-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-8k 3114 3172 8192 4 8 : tunables 0 0 0 : slabdata 793 793 0 kmalloc-4k 9499 9632 4096 8 8 : tunables 0 0 0 : slabdata 1204 1204 0 kmalloc-2k 12732 13312 2048 16 8 : tunables 0 0 0 : slabdata 832 832 0 kmalloc-1k 183625 539936 1024 32 8 : tunables 0 0 0 : slabdata 16873 16873 0 kmalloc-512 655588 1568608 512 32 4 : tunables 0 0 0 : slabdata 49022 49022 0 kmalloc-256 98952 342912 256 32 2 : tunables 0 0 0 : slabdata 10716 10716 0 kmalloc-192 204049 482370 192 42 2 : tunables 0 0 0 : slabdata 11485 11485 0 kmalloc-128 311838 730848 128 32 1 : tunables 0 0 0 : slabdata 22839 22839 0 kmalloc-96 1930979 3409056 96 42 1 : tunables 0 0 0 : slabdata 81168 81168 0 kmalloc-64 8181387 8266624 64 64 1 : tunables 0 0 0 : slabdata 129166 129166 0 kmalloc-32 8544206 16602368 32 128 1 : tunables 0 0 0 : slabdata 129706 129706 0 kmalloc-16 6563402 21336064 16 256 1 : tunables 0 0 0 : slabdata 83344 83344 0 kmalloc-8 119808 119808 8 512 1 : tunables 0 0 0 : slabdata 234 234 0 kmem_cache_node 8235 9920 64 64 1 : tunables 0 0 0 : slabdata 155 155 0 kmem_cache 10216 10332 448 36 4 : tunables 0 0 0 : slabdata 287 287 0 Linux# cat /proc/meminfo MemTotal: 263782936 kB MemFree: 5950596 kB MemAvailable: 187604140 kB Buffers: 590176 kB Cached: 88517408 kB SwapCached: 0 kB Active: 33425084 kB Inactive: 78773572 kB Active(anon): 22977948 kB Inactive(anon): 1768 kB Active(file): 10447136 kB Inactive(file): 78771804 kB Unevictable: 28 kB Mlocked: 28 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 1944 kB Writeback: 0 kB AnonPages: 23028212 kB Mapped: 370632 kB Shmem: 3352 kB KReclaimable: 97013384 kB Slab: 108591792 kB SReclaimable: 97013384 kB SUnreclaim: 11578408 kB KernelStack: 29600 kB PageTables: 69120 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 131891468 kB Committed_AS: 33922344 kB VmallocTotal: 34359738367 kB VmallocUsed: 288528 kB VmallocChunk: 0 kB Percpu: 79680 kB HardwareCorrupted: 0 kB AnonHugePages: 53248 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 31415244 kB DirectMap2M: 231421952 kB DirectMap1G: 7340032 kB > > > ######################################################################### > > > > The peak slab memory usage could spike all the way to 100GB+. > > Is that all? :) > > > We are using Ubuntu 18.04 and the xfs version is 4.9, kernel version is 5.4 > > Ah, I don't think there's anything upstream can do for you. We > rewrote large portions of the XFS inode reclaim in 5.9 (3 years ago) > to address the issues with memory reclaim getting stuck on dirty XFS > inodes, so inode reclaim behaviour in modern kernels is completely > different to old kernels. > > I'd suggest that you need to upgrade your systems to run a more > modern kernel and see if that fixes the issues you are seeing... Will try it out, thanks for the suggestion. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- Jianan Wang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-01 5:25 ` Jianan Wang @ 2023-06-01 15:06 ` Darrick J. Wong 0 siblings, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2023-06-01 15:06 UTC (permalink / raw) To: Jianan Wang; +Cc: Dave Chinner, linux-xfs On Wed, May 31, 2023 at 10:25:12PM -0700, Jianan Wang wrote: > Hi Dave, > > Thanks for the prompt response! > > On Wed, May 31, 2023 at 5:08 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Wed, May 31, 2023 at 02:29:52PM -0700, Jianan Wang wrote: > > > Hi all, > > > > > > I have a question regarding the xfs slab memory usage when operating a > > > filesystem with 1-2 billion inodes (raid 0 with 6 disks, totally > > > 18TB). On this partition, whenever there is a high disk io operation, > > > like removing millions of small files, the slab kernel memory usage > > > will increase a lot, leading to many OOM issues happening for the > > > services running on this node. You could check some of the stats as > > > the following (only includes the xfs related): > > > > You didn't include all the XFS related slabs. At minimum, the inode > > log item slab needs to be shown (xfs_ili) because that tells us how > > many of the inodes in the cache have been dirtied. > > > > As it is, I'm betting the problem is the disk subsystem can't write > > back dirty inodes fast enough to keep up with memory demand and so > > reclaim is declaring OOM faster than your disks can clean inodes to > > enable them to be reclaimed. > > We have similar feelings about this. Do you think 1-2 billion inodes > and fast walk of millions of files could be an issue overtime on the > current xfs implementation. In a production environment, do you have > any suggestion on tuning the xfs performance to fit this kind of large > number of small files workload or we shall consider reducing the data > volume and io workload to more nodes? > > > > > > ######################################################################### > > > Active / Total Objects (% used): 281803052 / 317485764 (88.8%) > > > Active / Total Slabs (% used): 13033144 / 13033144 (100.0%) > > > Active / Total Caches (% used): 126 / 180 (70.0%) > > > Active / Total Size (% used): 114671057.99K / 127265108.19K (90.1%) > > > Minium / Average / Maximum Object : 0.01K / 0.40K / 16.75K > > > > > > OBJS ACTIVE USE OBJ SIZE SLABS > > > OBJ/SLAB CACHE SIZE NAME > > > 78207920 70947541 0% 1.00K 7731010 > > > 32 247392320K xfs_inode > > > 59945928 46548798 0% 0.19K 1433102 > > > 42 11464816K dentry > > > 25051296 25051282 0% 0.38K 599680 > > > 42 9594880K xfs_buf > > > > Ok, that's from slabtop. Please don't autowrap stuff you've pasted > > in - it makes it really hard to read. (reformatted so I can read > > it). > > Got it, will pay more attention to this. > > > > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > > 78207920 70947541 0% 1.00K 7731010 32 247392320K xfs_inode > > 59945928 46548798 0% 0.19K 1433102 42 11464816K dentry > > 25051296 25051282 0% 0.38K 599680 42 9594880K xfs_buf > > > > So, 70 million cached inodes, with a cache size of 240GB. There are > > 7.7 million slabs, 32 objects per slab, and that's roughly 240GB. > > > > But why does the slab report only 78 million objects in the slab > > when at 240GB there should be 240 million objects in the slab? > > > > It looks like theres some kind of accounting problem here, likely in > > the slabtop program. I have always found slabtop to be unreliable > > like this.... > > > > Can you attach the output of 'cat /proc/slabinfo' and 'cat > > /proc/meminfo' when you have a large slab cache in memory? > > I do not have those output you requested for the exact situation I > pasted originally, but this is another node where xfs consumes a lot > of slab memory using the same xfs version: > > Linux # cat /proc/slabinfo > slabinfo - version: 2.1 > # name <active_objs> <num_objs> <objsize> <objperslab> > <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : > slabdata <active_slabs> <num_slabs> <sharedavail> > nf_conntrack 15716 20349 320 51 4 : tunables 0 0 > 0 : slabdata 399 399 0 > au_finfo 0 0 192 42 2 : tunables 0 0 > 0 : slabdata 0 0 0 > au_icntnr 0 0 832 39 8 : tunables 0 0 > 0 : slabdata 0 0 0 > au_dinfo 0 0 192 42 2 : tunables 0 0 > 0 : slabdata 0 0 0 > ovl_inode 41792 42757 688 47 8 : tunables 0 0 > 0 : slabdata 941 941 0 > ufs_inode_cache 0 0 808 40 8 : tunables 0 0 > 0 : slabdata 0 0 0 > qnx4_inode_cache 0 0 680 48 8 : tunables 0 0 > 0 : slabdata 0 0 0 > hfsplus_attr_cache 0 0 3840 8 8 : tunables 0 0 > 0 : slabdata 0 0 0 > hfsplus_icache 0 0 896 36 8 : tunables 0 0 > 0 : slabdata 0 0 0 > hfs_inode_cache 0 0 832 39 8 : tunables 0 0 > 0 : slabdata 0 0 0 > minix_inode_cache 0 0 672 48 8 : tunables 0 0 > 0 : slabdata 0 0 0 > ntfs_big_inode_cache 0 0 960 34 8 : tunables 0 > 0 0 : slabdata 0 0 0 > ntfs_inode_cache 0 0 296 55 4 : tunables 0 0 > 0 : slabdata 0 0 0 > jfs_ip 0 0 1280 25 8 : tunables 0 0 > 0 : slabdata 0 0 0 > xfs_dqtrx 0 0 528 31 4 : tunables 0 0 > 0 : slabdata 0 0 0 > xfs_dquot 0 0 496 33 4 : tunables 0 0 > 0 : slabdata 0 0 0 > xfs_buf 2545661 3291582 384 42 4 : tunables 0 > 0 0 : slabdata 78371 78371 0 > xfs_rui_item 0 0 696 47 8 : tunables 0 0 > 0 : slabdata 0 0 0 > xfs_rud_item 0 0 176 46 2 : tunables 0 0 > 0 : slabdata 0 0 0 > xfs_inode 23063278 77479540 1024 32 8 : tunables 0 > 0 0 : slabdata 2425069 2425069 0 > xfs_efd_item 4662 4847 440 37 4 : tunables 0 0 > 0 : slabdata 131 131 0 > xfs_buf_item 8610 8760 272 30 2 : tunables 0 0 > 0 : slabdata 292 292 0 > xfs_trans 1925 1925 232 35 2 : tunables 0 0 > 0 : slabdata 55 55 0 > xfs_da_state 1632 1632 480 34 4 : tunables 0 0 > 0 : slabdata 48 48 0 > xfs_btree_cur 1728 1728 224 36 2 : tunables 0 0 > 0 : slabdata 48 48 0 > kvm_async_pf 0 0 136 30 1 : tunables 0 0 > 0 : slabdata 0 0 0 > kvm_vcpu 0 0 17152 1 8 : tunables 0 0 > 0 : slabdata 0 0 0 > kvm_mmu_page_header 0 0 168 48 2 : tunables 0 > 0 0 : slabdata 0 0 0 > x86_fpu 0 0 4160 7 8 : tunables 0 0 > 0 : slabdata 0 0 0 > ext4_groupinfo_4k 7196 7196 144 28 1 : tunables 0 0 > 0 : slabdata 257 257 0 > btrfs_delayed_node 0 0 312 52 4 : tunables 0 0 > 0 : slabdata 0 0 0 > btrfs_ordered_extent 0 0 416 39 4 : tunables 0 > 0 0 : slabdata 0 0 0 > btrfs_inode 0 0 1168 28 8 : tunables 0 0 > 0 : slabdata 0 0 0 > mlx5_fs_ftes 560 560 584 28 4 : tunables 0 0 > 0 : slabdata 20 20 0 > mlx5_fs_fgs 100 100 648 50 8 : tunables 0 0 > 0 : slabdata 2 2 0 > scsi_sense_cache 16896 16896 128 32 1 : tunables 0 0 > 0 : slabdata 528 528 0 > fsverity_info 0 0 248 33 2 : tunables 0 0 > 0 : slabdata 0 0 0 > ip6-frags 21560 21736 184 44 2 : tunables 0 0 > 0 : slabdata 494 494 0 > PINGv6 26 26 1216 26 8 : tunables 0 0 > 0 : slabdata 1 1 0 > RAWv6 390 390 1216 26 8 : tunables 0 0 > 0 : slabdata 15 15 0 > UDPv6 4032 4032 1344 24 8 : tunables 0 0 > 0 : slabdata 168 168 0 > tw_sock_TCPv6 4785 4785 248 33 2 : tunables 0 0 > 0 : slabdata 145 145 0 > request_sock_TCPv6 0 0 304 53 4 : tunables 0 0 > 0 : slabdata 0 0 0 > TCPv6 3809 3874 2432 13 8 : tunables 0 0 > 0 : slabdata 298 298 0 > kcopyd_job 0 0 3312 9 8 : tunables 0 0 > 0 : slabdata 0 0 0 > dm_uevent 0 0 2632 12 8 : tunables 0 0 > 0 : slabdata 0 0 0 > mqueue_inode_cache 1632 1632 960 34 8 : tunables 0 0 > 0 : slabdata 48 48 0 > fuse_request 1344 1344 144 28 1 : tunables 0 0 > 0 : slabdata 48 48 0 > fuse_inode 13428 13830 832 39 8 : tunables 0 0 > 0 : slabdata 360 360 0 > ecryptfs_key_record_cache 0 0 576 28 4 : tunables > 0 0 0 : slabdata 0 0 0 > ecryptfs_inode_cache 0 0 1024 32 8 : tunables 0 > 0 0 : slabdata 0 0 0 > ecryptfs_file_cache 0 0 16 256 1 : tunables 0 > 0 0 : slabdata 0 0 0 > ecryptfs_auth_tok_list_item 0 0 832 39 8 : tunables > 0 0 0 : slabdata 0 0 0 > fat_inode_cache 176 176 744 44 8 : tunables 0 0 > 0 : slabdata 4 4 0 > fat_cache 0 0 40 102 1 : tunables 0 0 > 0 : slabdata 0 0 0 > squashfs_inode_cache 46 46 704 46 8 : tunables 0 > 0 0 : slabdata 1 1 0 > jbd2_journal_handle 4080 4080 48 85 1 : tunables 0 > 0 0 : slabdata 48 48 0 > jbd2_journal_head 10438 10608 120 34 1 : tunables 0 0 > 0 : slabdata 312 312 0 > jbd2_revoke_table_s 1024 1024 16 256 1 : tunables 0 > 0 0 : slabdata 4 4 0 > ext4_inode_cache 56239 67562 1096 29 8 : tunables 0 0 > 0 : slabdata 2700 2700 0 > ext4_allocation_context 1536 1536 128 32 1 : tunables 0 > 0 0 : slabdata 48 48 0 > ext4_system_zone 816 816 40 102 1 : tunables 0 0 > 0 : slabdata 8 8 0 > ext4_io_end 24832 24896 64 64 1 : tunables 0 0 > 0 : slabdata 389 389 0 > ext4_pending_reservation 67072 67456 32 128 1 : tunables > 0 0 0 : slabdata 527 527 0 > ext4_extent_status 44359 55386 40 102 1 : tunables 0 0 > 0 : slabdata 543 543 0 > mbcache 50005 50005 56 73 1 : tunables 0 0 > 0 : slabdata 685 685 0 > userfaultfd_ctx_cache 0 0 192 42 2 : tunables 0 > 0 0 : slabdata 0 0 0 > dnotify_struct 0 0 32 128 1 : tunables 0 0 > 0 : slabdata 0 0 0 > pid_namespace 1872 1872 208 39 2 : tunables 0 0 > 0 : slabdata 48 48 0 > ip4-frags 0 0 200 40 2 : tunables 0 0 > 0 : slabdata 0 0 0 > xfrm_state 0 0 704 46 8 : tunables 0 0 > 0 : slabdata 0 0 0 > PING 25440 25440 1024 32 8 : tunables 0 0 > 0 : slabdata 795 795 0 > RAW 832 832 1024 32 8 : tunables 0 0 > 0 : slabdata 26 26 0 > tw_sock_TCP 21153 21153 248 33 2 : tunables 0 0 > 0 : slabdata 641 641 0 > request_sock_TCP 13674 13780 304 53 4 : tunables 0 0 > 0 : slabdata 260 260 0 > TCP 8470 8666 2240 14 8 : tunables 0 0 > 0 : slabdata 619 619 0 > hugetlbfs_inode_cache 102 102 632 51 8 : tunables 0 > 0 0 : slabdata 2 2 0 > dquot 1536 1536 256 32 2 : tunables 0 0 > 0 : slabdata 48 48 0 > eventpoll_pwq 81872 81928 72 56 1 : tunables 0 0 > 0 : slabdata 1463 1463 0 > dax_cache 42 42 768 42 8 : tunables 0 0 > 0 : slabdata 1 1 0 > request_queue 180 255 2104 15 8 : tunables 0 0 > 0 : slabdata 17 17 0 > biovec-max 1120 1192 4096 8 8 : tunables 0 0 > 0 : slabdata 149 149 0 > biovec-128 2546 2642 2048 16 8 : tunables 0 0 > 0 : slabdata 166 166 0 > biovec-64 5492 5656 1024 32 8 : tunables 0 0 > 0 : slabdata 182 182 0 > khugepaged_mm_slot 1440 1440 112 36 1 : tunables 0 0 > 0 : slabdata 40 40 0 > user_namespace 0 0 536 30 4 : tunables 0 0 > 0 : slabdata 0 0 0 > uid_cache 16514 16640 128 32 1 : tunables 0 0 > 0 : slabdata 520 520 0 > dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 > 0 0 : slabdata 1 1 0 > dmaengine-unmap-128 30 30 1088 30 8 : tunables 0 > 0 0 : slabdata 1 1 0 > sock_inode_cache 62080 62433 832 39 8 : tunables 0 0 > 0 : slabdata 1617 1617 0 > skbuff_ext_cache 16454495 32746392 192 42 2 : tunables 0 > 0 0 : slabdata 779676 779676 0 > skbuff_fclone_cache 6752 7008 512 32 4 : tunables 0 > 0 0 : slabdata 219 219 0 > skbuff_head_cache 48769 49184 256 32 2 : tunables 0 0 > 0 : slabdata 1537 1537 0 > file_lock_cache 1776 1776 216 37 2 : tunables 0 0 > 0 : slabdata 48 48 0 > fsnotify_mark_connector 6144 6144 32 128 1 : tunables 0 > 0 0 : slabdata 48 48 0 > net_namespace 18 18 4928 6 8 : tunables 0 0 > 0 : slabdata 3 3 0 > task_delay_info 79305 79407 80 51 1 : tunables 0 0 > 0 : slabdata 1557 1557 0 > taskstats 2256 2256 344 47 4 : tunables 0 0 > 0 : slabdata 48 48 0 > proc_dir_entry 4578 4578 192 42 2 : tunables 0 0 > 0 : slabdata 109 109 0 > pde_opener 79050 79050 40 102 1 : tunables 0 0 > 0 : slabdata 775 775 0 > proc_inode_cache 153717 156498 680 48 8 : tunables 0 0 > 0 : slabdata 3263 3263 0 > bdev_cache 1092 1092 832 39 8 : tunables 0 0 > 0 : slabdata 28 28 0 > shmem_inode_cache 28213 28800 720 45 8 : tunables 0 0 > 0 : slabdata 640 640 0 > kernfs_node_cache 195825 200730 136 30 1 : tunables 0 0 > 0 : slabdata 6691 6691 0 > mnt_cache 13984 14076 320 51 4 : tunables 0 0 > 0 : slabdata 276 276 0 > filp 250898 253328 256 32 2 : tunables 0 0 > 0 : slabdata 7917 7917 0 > inode_cache 140359 142937 608 53 8 : tunables 0 0 > 0 : slabdata 2712 2712 0 > dentry 27263153 58131675 192 42 2 : tunables 0 > 0 0 : slabdata 1384093 1384093 0 > names_cache 617 633 4096 8 8 : tunables 0 0 > 0 : slabdata 80 80 0 > iint_cache 0 0 120 34 1 : tunables 0 0 > 0 : slabdata 0 0 0 > lsm_file_cache 87405 87890 24 170 1 : tunables 0 0 > 0 : slabdata 517 517 0 > buffer_head 3298954 3785808 104 39 1 : tunables 0 > 0 0 : slabdata 97072 97072 0 > uts_namespace 1776 1776 440 37 4 : tunables 0 0 > 0 : slabdata 48 48 0 > nsproxy 3504 3504 56 73 1 : tunables 0 0 > 0 : slabdata 48 48 0 > vm_area_struct 265005 265785 208 39 2 : tunables 0 0 > 0 : slabdata 6815 6815 0 > mm_struct 19926 19926 1088 30 8 : tunables 0 0 > 0 : slabdata 666 666 0 > files_cache 28029 28029 704 46 8 : tunables 0 0 > 0 : slabdata 612 612 0 > signal_cache 28910 29154 1152 28 8 : tunables 0 0 > 0 : slabdata 1043 1043 0 > sighand_cache 11738 11795 2112 15 8 : tunables 0 0 > 0 : slabdata 791 791 0 > task_struct 7323 7693 7616 4 8 : tunables 0 0 > 0 : slabdata 1924 1924 0 > cred_jar 81837 81837 192 42 2 : tunables 0 0 > 0 : slabdata 1949 1949 0 > anon_vma_chain 350482 351552 64 64 1 : tunables 0 0 > 0 : slabdata 5493 5493 0 > anon_vma 231854 233220 88 46 1 : tunables 0 0 > 0 : slabdata 5070 5070 0 > pid 113960 114336 128 32 1 : tunables 0 0 > 0 : slabdata 3573 3573 0 > Acpi-Operand 189280 189280 72 56 1 : tunables 0 0 > 0 : slabdata 3380 3380 0 > Acpi-ParseExt 18174 18174 104 39 1 : tunables 0 0 > 0 : slabdata 466 466 0 > Acpi-State 10098 10098 80 51 1 : tunables 0 0 > 0 : slabdata 198 198 0 > numa_policy 62 62 264 31 2 : tunables 0 0 > 0 : slabdata 2 2 0 > trace_event_file 2622 2622 88 46 1 : tunables 0 0 > 0 : slabdata 57 57 0 > ftrace_event_field 28220 28220 48 85 1 : tunables 0 0 > 0 : slabdata 332 332 0 > pool_workqueue 8513 8544 256 32 2 : tunables 0 0 > 0 : slabdata 267 267 0 > radix_tree_node 6248549 8844010 584 28 4 : tunables 0 > 0 0 : slabdata 315865 315865 0 > task_group 2448 2448 640 51 8 : tunables 0 0 > 0 : slabdata 48 48 0 > vmap_area 24174 64640 64 64 1 : tunables 0 0 > 0 : slabdata 1010 1010 0 > dma-kmalloc-8k 0 0 8192 4 8 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-2k 0 0 2048 16 8 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-1k 0 0 1024 32 8 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 > 0 : slabdata 0 0 0 > dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-2k 0 0 2048 16 8 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-1k 0 0 1024 32 8 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-512 0 0 512 32 4 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-256 0 0 256 32 2 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-192 64441 82992 192 42 2 : tunables 0 0 > 0 : slabdata 1976 1976 0 > kmalloc-rcl-128 723176 936960 128 32 1 : tunables 0 0 > 0 : slabdata 29280 29280 0 > kmalloc-rcl-96 10652323 18961866 96 42 1 : tunables 0 > 0 0 : slabdata 451473 451473 0 > kmalloc-rcl-64 6044167 11369536 64 64 1 : tunables 0 > 0 0 : slabdata 177649 177649 0 > kmalloc-rcl-32 0 0 32 128 1 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-16 0 0 16 256 1 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-rcl-8 0 0 8 512 1 : tunables 0 0 > 0 : slabdata 0 0 0 > kmalloc-8k 3114 3172 8192 4 8 : tunables 0 0 > 0 : slabdata 793 793 0 > kmalloc-4k 9499 9632 4096 8 8 : tunables 0 0 > 0 : slabdata 1204 1204 0 > kmalloc-2k 12732 13312 2048 16 8 : tunables 0 0 > 0 : slabdata 832 832 0 > kmalloc-1k 183625 539936 1024 32 8 : tunables 0 0 > 0 : slabdata 16873 16873 0 > kmalloc-512 655588 1568608 512 32 4 : tunables 0 0 > 0 : slabdata 49022 49022 0 > kmalloc-256 98952 342912 256 32 2 : tunables 0 0 > 0 : slabdata 10716 10716 0 > kmalloc-192 204049 482370 192 42 2 : tunables 0 0 > 0 : slabdata 11485 11485 0 > kmalloc-128 311838 730848 128 32 1 : tunables 0 0 > 0 : slabdata 22839 22839 0 > kmalloc-96 1930979 3409056 96 42 1 : tunables 0 > 0 0 : slabdata 81168 81168 0 > kmalloc-64 8181387 8266624 64 64 1 : tunables 0 > 0 0 : slabdata 129166 129166 0 > kmalloc-32 8544206 16602368 32 128 1 : tunables 0 > 0 0 : slabdata 129706 129706 0 > kmalloc-16 6563402 21336064 16 256 1 : tunables 0 > 0 0 : slabdata 83344 83344 0 > kmalloc-8 119808 119808 8 512 1 : tunables 0 0 > 0 : slabdata 234 234 0 > kmem_cache_node 8235 9920 64 64 1 : tunables 0 0 > 0 : slabdata 155 155 0 > kmem_cache 10216 10332 448 36 4 : tunables 0 0 > 0 : slabdata 287 287 0 > > Linux# cat /proc/meminfo > MemTotal: 263782936 kB > MemFree: 5950596 kB > MemAvailable: 187604140 kB > Buffers: 590176 kB > Cached: 88517408 kB > SwapCached: 0 kB > Active: 33425084 kB > Inactive: 78773572 kB > Active(anon): 22977948 kB > Inactive(anon): 1768 kB > Active(file): 10447136 kB > Inactive(file): 78771804 kB > Unevictable: 28 kB > Mlocked: 28 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 1944 kB > Writeback: 0 kB > AnonPages: 23028212 kB > Mapped: 370632 kB > Shmem: 3352 kB > KReclaimable: 97013384 kB > Slab: 108591792 kB > SReclaimable: 97013384 kB > SUnreclaim: 11578408 kB > KernelStack: 29600 kB > PageTables: 69120 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 131891468 kB > Committed_AS: 33922344 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 288528 kB > VmallocChunk: 0 kB > Percpu: 79680 kB > HardwareCorrupted: 0 kB > AnonHugePages: 53248 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > FileHugePages: 0 kB > FilePmdMapped: 0 kB > CmaTotal: 0 kB > CmaFree: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > Hugetlb: 0 kB > DirectMap4k: 31415244 kB > DirectMap2M: 231421952 kB > DirectMap1G: 7340032 kB > > > > > > ######################################################################### > > > > > > The peak slab memory usage could spike all the way to 100GB+. > > > > Is that all? :) > > > > > We are using Ubuntu 18.04 and the xfs version is 4.9, kernel version is 5.4 Your vendor EOL'd this release yesterday. Please contact your extended support contact for assistance. That is, assuming they ever fixed the severe memory reclaim problems in their 5.4 kernel... --D > > > > Ah, I don't think there's anything upstream can do for you. We > > rewrote large portions of the XFS inode reclaim in 5.9 (3 years ago) > > to address the issues with memory reclaim getting stuck on dirty XFS > > inodes, so inode reclaim behaviour in modern kernels is completely > > different to old kernels. > > > > I'd suggest that you need to upgrade your systems to run a more > > modern kernel and see if that fixes the issues you are seeing... > > Will try it out, thanks for the suggestion. > > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com > > > > -- > Jianan Wang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-01 0:08 ` Dave Chinner 2023-06-01 5:25 ` Jianan Wang @ 2023-06-01 6:21 ` Jianan Wang 2023-06-01 21:43 ` Dave Chinner 1 sibling, 1 reply; 10+ messages in thread From: Jianan Wang @ 2023-06-01 6:21 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Seems the auto-wraping issue is on my gmail.... using thunderbird should be better... Resend the slabinfo and meminfo output here: Linux # cat /proc/slabinfo slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> nf_conntrack 15716 20349 320 51 4 : tunables 0 0 0 : slabdata 399 399 0 au_finfo 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 au_icntnr 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 au_dinfo 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 ovl_inode 41792 42757 688 47 8 : tunables 0 0 0 : slabdata 941 941 0 ufs_inode_cache 0 0 808 40 8 : tunables 0 0 0 : slabdata 0 0 0 qnx4_inode_cache 0 0 680 48 8 : tunables 0 0 0 : slabdata 0 0 0 hfsplus_attr_cache 0 0 3840 8 8 : tunables 0 0 0 : slabdata 0 0 0 hfsplus_icache 0 0 896 36 8 : tunables 0 0 0 : slabdata 0 0 0 hfs_inode_cache 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 minix_inode_cache 0 0 672 48 8 : tunables 0 0 0 : slabdata 0 0 0 ntfs_big_inode_cache 0 0 960 34 8 : tunables 0 0 0 : slabdata 0 0 0 ntfs_inode_cache 0 0 296 55 4 : tunables 0 0 0 : slabdata 0 0 0 jfs_ip 0 0 1280 25 8 : tunables 0 0 0 : slabdata 0 0 0 xfs_dqtrx 0 0 528 31 4 : tunables 0 0 0 : slabdata 0 0 0 xfs_dquot 0 0 496 33 4 : tunables 0 0 0 : slabdata 0 0 0 xfs_buf 2545661 3291582 384 42 4 : tunables 0 0 0 : slabdata 78371 78371 0 xfs_rui_item 0 0 696 47 8 : tunables 0 0 0 : slabdata 0 0 0 xfs_rud_item 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0 xfs_inode 23063278 77479540 1024 32 8 : tunables 0 0 0 : slabdata 2425069 2425069 0 xfs_efd_item 4662 4847 440 37 4 : tunables 0 0 0 : slabdata 131 131 0 xfs_buf_item 8610 8760 272 30 2 : tunables 0 0 0 : slabdata 292 292 0 xfs_trans 1925 1925 232 35 2 : tunables 0 0 0 : slabdata 55 55 0 xfs_da_state 1632 1632 480 34 4 : tunables 0 0 0 : slabdata 48 48 0 xfs_btree_cur 1728 1728 224 36 2 : tunables 0 0 0 : slabdata 48 48 0 kvm_async_pf 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0 kvm_vcpu 0 0 17152 1 8 : tunables 0 0 0 : slabdata 0 0 0 kvm_mmu_page_header 0 0 168 48 2 : tunables 0 0 0 : slabdata 0 0 0 x86_fpu 0 0 4160 7 8 : tunables 0 0 0 : slabdata 0 0 0 ext4_groupinfo_4k 7196 7196 144 28 1 : tunables 0 0 0 : slabdata 257 257 0 btrfs_delayed_node 0 0 312 52 4 : tunables 0 0 0 : slabdata 0 0 0 btrfs_ordered_extent 0 0 416 39 4 : tunables 0 0 0 : slabdata 0 0 0 btrfs_inode 0 0 1168 28 8 : tunables 0 0 0 : slabdata 0 0 0 mlx5_fs_ftes 560 560 584 28 4 : tunables 0 0 0 : slabdata 20 20 0 mlx5_fs_fgs 100 100 648 50 8 : tunables 0 0 0 : slabdata 2 2 0 scsi_sense_cache 16896 16896 128 32 1 : tunables 0 0 0 : slabdata 528 528 0 fsverity_info 0 0 248 33 2 : tunables 0 0 0 : slabdata 0 0 0 ip6-frags 21560 21736 184 44 2 : tunables 0 0 0 : slabdata 494 494 0 PINGv6 26 26 1216 26 8 : tunables 0 0 0 : slabdata 1 1 0 RAWv6 390 390 1216 26 8 : tunables 0 0 0 : slabdata 15 15 0 UDPv6 4032 4032 1344 24 8 : tunables 0 0 0 : slabdata 168 168 0 tw_sock_TCPv6 4785 4785 248 33 2 : tunables 0 0 0 : slabdata 145 145 0 request_sock_TCPv6 0 0 304 53 4 : tunables 0 0 0 : slabdata 0 0 0 TCPv6 3809 3874 2432 13 8 : tunables 0 0 0 : slabdata 298 298 0 kcopyd_job 0 0 3312 9 8 : tunables 0 0 0 : slabdata 0 0 0 dm_uevent 0 0 2632 12 8 : tunables 0 0 0 : slabdata 0 0 0 mqueue_inode_cache 1632 1632 960 34 8 : tunables 0 0 0 : slabdata 48 48 0 fuse_request 1344 1344 144 28 1 : tunables 0 0 0 : slabdata 48 48 0 fuse_inode 13428 13830 832 39 8 : tunables 0 0 0 : slabdata 360 360 0 ecryptfs_key_record_cache 0 0 576 28 4 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_inode_cache 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_file_cache 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_auth_tok_list_item 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 fat_inode_cache 176 176 744 44 8 : tunables 0 0 0 : slabdata 4 4 0 fat_cache 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0 squashfs_inode_cache 46 46 704 46 8 : tunables 0 0 0 : slabdata 1 1 0 jbd2_journal_handle 4080 4080 48 85 1 : tunables 0 0 0 : slabdata 48 48 0 jbd2_journal_head 10438 10608 120 34 1 : tunables 0 0 0 : slabdata 312 312 0 jbd2_revoke_table_s 1024 1024 16 256 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_inode_cache 56239 67562 1096 29 8 : tunables 0 0 0 : slabdata 2700 2700 0 ext4_allocation_context 1536 1536 128 32 1 : tunables 0 0 0 : slabdata 48 48 0 ext4_system_zone 816 816 40 102 1 : tunables 0 0 0 : slabdata 8 8 0 ext4_io_end 24832 24896 64 64 1 : tunables 0 0 0 : slabdata 389 389 0 ext4_pending_reservation 67072 67456 32 128 1 : tunables 0 0 0 : slabdata 527 527 0 ext4_extent_status 44359 55386 40 102 1 : tunables 0 0 0 : slabdata 543 543 0 mbcache 50005 50005 56 73 1 : tunables 0 0 0 : slabdata 685 685 0 userfaultfd_ctx_cache 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 dnotify_struct 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 pid_namespace 1872 1872 208 39 2 : tunables 0 0 0 : slabdata 48 48 0 ip4-frags 0 0 200 40 2 : tunables 0 0 0 : slabdata 0 0 0 xfrm_state 0 0 704 46 8 : tunables 0 0 0 : slabdata 0 0 0 PING 25440 25440 1024 32 8 : tunables 0 0 0 : slabdata 795 795 0 RAW 832 832 1024 32 8 : tunables 0 0 0 : slabdata 26 26 0 tw_sock_TCP 21153 21153 248 33 2 : tunables 0 0 0 : slabdata 641 641 0 request_sock_TCP 13674 13780 304 53 4 : tunables 0 0 0 : slabdata 260 260 0 TCP 8470 8666 2240 14 8 : tunables 0 0 0 : slabdata 619 619 0 hugetlbfs_inode_cache 102 102 632 51 8 : tunables 0 0 0 : slabdata 2 2 0 dquot 1536 1536 256 32 2 : tunables 0 0 0 : slabdata 48 48 0 eventpoll_pwq 81872 81928 72 56 1 : tunables 0 0 0 : slabdata 1463 1463 0 dax_cache 42 42 768 42 8 : tunables 0 0 0 : slabdata 1 1 0 request_queue 180 255 2104 15 8 : tunables 0 0 0 : slabdata 17 17 0 biovec-max 1120 1192 4096 8 8 : tunables 0 0 0 : slabdata 149 149 0 biovec-128 2546 2642 2048 16 8 : tunables 0 0 0 : slabdata 166 166 0 biovec-64 5492 5656 1024 32 8 : tunables 0 0 0 : slabdata 182 182 0 khugepaged_mm_slot 1440 1440 112 36 1 : tunables 0 0 0 : slabdata 40 40 0 user_namespace 0 0 536 30 4 : tunables 0 0 0 : slabdata 0 0 0 uid_cache 16514 16640 128 32 1 : tunables 0 0 0 : slabdata 520 520 0 dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0 dmaengine-unmap-128 30 30 1088 30 8 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 62080 62433 832 39 8 : tunables 0 0 0 : slabdata 1617 1617 0 skbuff_ext_cache 16454495 32746392 192 42 2 : tunables 0 0 0 : slabdata 779676 779676 0 skbuff_fclone_cache 6752 7008 512 32 4 : tunables 0 0 0 : slabdata 219 219 0 skbuff_head_cache 48769 49184 256 32 2 : tunables 0 0 0 : slabdata 1537 1537 0 file_lock_cache 1776 1776 216 37 2 : tunables 0 0 0 : slabdata 48 48 0 fsnotify_mark_connector 6144 6144 32 128 1 : tunables 0 0 0 : slabdata 48 48 0 net_namespace 18 18 4928 6 8 : tunables 0 0 0 : slabdata 3 3 0 task_delay_info 79305 79407 80 51 1 : tunables 0 0 0 : slabdata 1557 1557 0 taskstats 2256 2256 344 47 4 : tunables 0 0 0 : slabdata 48 48 0 proc_dir_entry 4578 4578 192 42 2 : tunables 0 0 0 : slabdata 109 109 0 pde_opener 79050 79050 40 102 1 : tunables 0 0 0 : slabdata 775 775 0 proc_inode_cache 153717 156498 680 48 8 : tunables 0 0 0 : slabdata 3263 3263 0 bdev_cache 1092 1092 832 39 8 : tunables 0 0 0 : slabdata 28 28 0 shmem_inode_cache 28213 28800 720 45 8 : tunables 0 0 0 : slabdata 640 640 0 kernfs_node_cache 195825 200730 136 30 1 : tunables 0 0 0 : slabdata 6691 6691 0 mnt_cache 13984 14076 320 51 4 : tunables 0 0 0 : slabdata 276 276 0 filp 250898 253328 256 32 2 : tunables 0 0 0 : slabdata 7917 7917 0 inode_cache 140359 142937 608 53 8 : tunables 0 0 0 : slabdata 2712 2712 0 dentry 27263153 58131675 192 42 2 : tunables 0 0 0 : slabdata 1384093 1384093 0 names_cache 617 633 4096 8 8 : tunables 0 0 0 : slabdata 80 80 0 iint_cache 0 0 120 34 1 : tunables 0 0 0 : slabdata 0 0 0 lsm_file_cache 87405 87890 24 170 1 : tunables 0 0 0 : slabdata 517 517 0 buffer_head 3298954 3785808 104 39 1 : tunables 0 0 0 : slabdata 97072 97072 0 uts_namespace 1776 1776 440 37 4 : tunables 0 0 0 : slabdata 48 48 0 nsproxy 3504 3504 56 73 1 : tunables 0 0 0 : slabdata 48 48 0 vm_area_struct 265005 265785 208 39 2 : tunables 0 0 0 : slabdata 6815 6815 0 mm_struct 19926 19926 1088 30 8 : tunables 0 0 0 : slabdata 666 666 0 files_cache 28029 28029 704 46 8 : tunables 0 0 0 : slabdata 612 612 0 signal_cache 28910 29154 1152 28 8 : tunables 0 0 0 : slabdata 1043 1043 0 sighand_cache 11738 11795 2112 15 8 : tunables 0 0 0 : slabdata 791 791 0 task_struct 7323 7693 7616 4 8 : tunables 0 0 0 : slabdata 1924 1924 0 cred_jar 81837 81837 192 42 2 : tunables 0 0 0 : slabdata 1949 1949 0 anon_vma_chain 350482 351552 64 64 1 : tunables 0 0 0 : slabdata 5493 5493 0 anon_vma 231854 233220 88 46 1 : tunables 0 0 0 : slabdata 5070 5070 0 pid 113960 114336 128 32 1 : tunables 0 0 0 : slabdata 3573 3573 0 Acpi-Operand 189280 189280 72 56 1 : tunables 0 0 0 : slabdata 3380 3380 0 Acpi-ParseExt 18174 18174 104 39 1 : tunables 0 0 0 : slabdata 466 466 0 Acpi-State 10098 10098 80 51 1 : tunables 0 0 0 : slabdata 198 198 0 numa_policy 62 62 264 31 2 : tunables 0 0 0 : slabdata 2 2 0 trace_event_file 2622 2622 88 46 1 : tunables 0 0 0 : slabdata 57 57 0 ftrace_event_field 28220 28220 48 85 1 : tunables 0 0 0 : slabdata 332 332 0 pool_workqueue 8513 8544 256 32 2 : tunables 0 0 0 : slabdata 267 267 0 radix_tree_node 6248549 8844010 584 28 4 : tunables 0 0 0 : slabdata 315865 315865 0 task_group 2448 2448 640 51 8 : tunables 0 0 0 : slabdata 48 48 0 vmap_area 24174 64640 64 64 1 : tunables 0 0 0 : slabdata 1010 1010 0 dma-kmalloc-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-1k 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-1k 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-192 64441 82992 192 42 2 : tunables 0 0 0 : slabdata 1976 1976 0 kmalloc-rcl-128 723176 936960 128 32 1 : tunables 0 0 0 : slabdata 29280 29280 0 kmalloc-rcl-96 10652323 18961866 96 42 1 : tunables 0 0 0 : slabdata 451473 451473 0 kmalloc-rcl-64 6044167 11369536 64 64 1 : tunables 0 0 0 : slabdata 177649 177649 0 kmalloc-rcl-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-8k 3114 3172 8192 4 8 : tunables 0 0 0 : slabdata 793 793 0 kmalloc-4k 9499 9632 4096 8 8 : tunables 0 0 0 : slabdata 1204 1204 0 kmalloc-2k 12732 13312 2048 16 8 : tunables 0 0 0 : slabdata 832 832 0 kmalloc-1k 183625 539936 1024 32 8 : tunables 0 0 0 : slabdata 16873 16873 0 kmalloc-512 655588 1568608 512 32 4 : tunables 0 0 0 : slabdata 49022 49022 0 kmalloc-256 98952 342912 256 32 2 : tunables 0 0 0 : slabdata 10716 10716 0 kmalloc-192 204049 482370 192 42 2 : tunables 0 0 0 : slabdata 11485 11485 0 kmalloc-128 311838 730848 128 32 1 : tunables 0 0 0 : slabdata 22839 22839 0 kmalloc-96 1930979 3409056 96 42 1 : tunables 0 0 0 : slabdata 81168 81168 0 kmalloc-64 8181387 8266624 64 64 1 : tunables 0 0 0 : slabdata 129166 129166 0 kmalloc-32 8544206 16602368 32 128 1 : tunables 0 0 0 : slabdata 129706 129706 0 kmalloc-16 6563402 21336064 16 256 1 : tunables 0 0 0 : slabdata 83344 83344 0 kmalloc-8 119808 119808 8 512 1 : tunables 0 0 0 : slabdata 234 234 0 kmem_cache_node 8235 9920 64 64 1 : tunables 0 0 0 : slabdata 155 155 0 kmem_cache 10216 10332 448 36 4 : tunables 0 0 0 : slabdata 287 287 0 Linux# cat /proc/meminfo MemTotal: 263782936 kB MemFree: 5950596 kB MemAvailable: 187604140 kB Buffers: 590176 kB Cached: 88517408 kB SwapCached: 0 kB Active: 33425084 kB Inactive: 78773572 kB Active(anon): 22977948 kB Inactive(anon): 1768 kB Active(file): 10447136 kB Inactive(file): 78771804 kB Unevictable: 28 kB Mlocked: 28 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 1944 kB Writeback: 0 kB AnonPages: 23028212 kB Mapped: 370632 kB Shmem: 3352 kB KReclaimable: 97013384 kB Slab: 108591792 kB SReclaimable: 97013384 kB SUnreclaim: 11578408 kB KernelStack: 29600 kB PageTables: 69120 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 131891468 kB Committed_AS: 33922344 kB VmallocTotal: 34359738367 kB VmallocUsed: 288528 kB VmallocChunk: 0 kB Percpu: 79680 kB HardwareCorrupted: 0 kB AnonHugePages: 53248 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 31415244 kB DirectMap2M: 231421952 kB DirectMap1G: 7340032 kB On 5/31/23 17:08, Dave Chinner wrote: > On Wed, May 31, 2023 at 02:29:52PM -0700, Jianan Wang wrote: >> Hi all, >> >> I have a question regarding the xfs slab memory usage when operating a >> filesystem with 1-2 billion inodes (raid 0 with 6 disks, totally >> 18TB). On this partition, whenever there is a high disk io operation, >> like removing millions of small files, the slab kernel memory usage >> will increase a lot, leading to many OOM issues happening for the >> services running on this node. You could check some of the stats as >> the following (only includes the xfs related): > You didn't include all the XFS related slabs. At minimum, the inode > log item slab needs to be shown (xfs_ili) because that tells us how > many of the inodes in the cache have been dirtied. > > As it is, I'm betting the problem is the disk subsystem can't write > back dirty inodes fast enough to keep up with memory demand and so > reclaim is declaring OOM faster than your disks can clean inodes to > enable them to be reclaimed. > >> ######################################################################### >> Active / Total Objects (% used): 281803052 / 317485764 (88.8%) >> Active / Total Slabs (% used): 13033144 / 13033144 (100.0%) >> Active / Total Caches (% used): 126 / 180 (70.0%) >> Active / Total Size (% used): 114671057.99K / 127265108.19K (90.1%) >> Minium / Average / Maximum Object : 0.01K / 0.40K / 16.75K >> >> OBJS ACTIVE USE OBJ SIZE SLABS >> OBJ/SLAB CACHE SIZE NAME >> 78207920 70947541 0% 1.00K 7731010 >> 32 247392320K xfs_inode >> 59945928 46548798 0% 0.19K 1433102 >> 42 11464816K dentry >> 25051296 25051282 0% 0.38K 599680 >> 42 9594880K xfs_buf > Ok, that's from slabtop. Please don't autowrap stuff you've pasted > in - it makes it really hard to read. (reformatted so I can read > it). > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > 78207920 70947541 0% 1.00K 7731010 32 247392320K xfs_inode > 59945928 46548798 0% 0.19K 1433102 42 11464816K dentry > 25051296 25051282 0% 0.38K 599680 42 9594880K xfs_buf > > So, 70 million cached inodes, with a cache size of 240GB. There are > 7.7 million slabs, 32 objects per slab, and that's roughly 240GB. > > But why does the slab report only 78 million objects in the slab > when at 240GB there should be 240 million objects in the slab? > > It looks like theres some kind of accounting problem here, likely in > the slabtop program. I have always found slabtop to be unreliable > like this.... > > Can you attach the output of 'cat /proc/slabinfo' and 'cat > /proc/meminfo' when you have a large slab cache in memory? > >> ######################################################################### >> >> The peak slab memory usage could spike all the way to 100GB+. > Is that all? :) > >> We are using Ubuntu 18.04 and the xfs version is 4.9, kernel version is 5.4 > Ah, I don't think there's anything upstream can do for you. We > rewrote large portions of the XFS inode reclaim in 5.9 (3 years ago) > to address the issues with memory reclaim getting stuck on dirty XFS > inodes, so inode reclaim behaviour in modern kernels is completely > different to old kernels. > > I'd suggest that you need to upgrade your systems to run a more > modern kernel and see if that fixes the issues you are seeing... > > Cheers, > > Dave. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-01 6:21 ` Jianan Wang @ 2023-06-01 21:43 ` Dave Chinner 2023-06-01 23:59 ` Jianan Wang 2023-06-06 23:00 ` Jianan Wang 0 siblings, 2 replies; 10+ messages in thread From: Dave Chinner @ 2023-06-01 21:43 UTC (permalink / raw) To: Jianan Wang; +Cc: linux-xfs On Wed, May 31, 2023 at 11:21:41PM -0700, Jianan Wang wrote: > Seems the auto-wraping issue is on my gmail.... using thunderbird should be better... Thanks! > Resend the slabinfo and meminfo output here: > > Linux # cat /proc/slabinfo > slabinfo - version: 2.1 > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> ..... > xfs_dqtrx 0 0 528 31 4 : tunables 0 0 0 : slabdata 0 0 0 > xfs_dquot 0 0 496 33 4 : tunables 0 0 0 : slabdata 0 0 0 > xfs_buf 2545661 3291582 384 42 4 : tunables 0 0 0 : slabdata 78371 78371 0 > xfs_rui_item 0 0 696 47 8 : tunables 0 0 0 : slabdata 0 0 0 > xfs_rud_item 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0 > xfs_inode 23063278 77479540 1024 32 8 : tunables 0 0 0 : slabdata 2425069 2425069 0 > xfs_efd_item 4662 4847 440 37 4 : tunables 0 0 0 : slabdata 131 131 0 > xfs_buf_item 8610 8760 272 30 2 : tunables 0 0 0 : slabdata 292 292 0 > xfs_trans 1925 1925 232 35 2 : tunables 0 0 0 : slabdata 55 55 0 > xfs_da_state 1632 1632 480 34 4 : tunables 0 0 0 : slabdata 48 48 0 > xfs_btree_cur 1728 1728 224 36 2 : tunables 0 0 0 : slabdata 48 48 0 There's no xfs_ili slab cache - this kernel must be using merged slabs, so I'm going to have to infer how many inodes are dirty from other slabs. The inode log item is ~190 bytes in size, so.... > skbuff_ext_cache 16454495 32746392 192 42 2 : tunables 0 0 0 : slabdata 779676 779676 0 Yup, there were - 192 byte slab, 16 million active objects. Not all of those inodes will be dirty right now, but ~65% of the inodes cached in memory have been dirty at some point. So, yes, it is highly likely that your memory reclaim/OOM problems are caused by blocking on dirty inodes in memory reclaim, which you can only fix by upgrading to a newer kernel. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-01 21:43 ` Dave Chinner @ 2023-06-01 23:59 ` Jianan Wang 2023-06-06 23:00 ` Jianan Wang 1 sibling, 0 replies; 10+ messages in thread From: Jianan Wang @ 2023-06-01 23:59 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Hi Dave, On 6/1/23 14:43, Dave Chinner wrote: > On Wed, May 31, 2023 at 11:21:41PM -0700, Jianan Wang wrote: >> Seems the auto-wraping issue is on my gmail.... using thunderbird should be better... > Thanks! > >> Resend the slabinfo and meminfo output here: >> >> Linux # cat /proc/slabinfo >> slabinfo - version: 2.1 >> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> > ..... >> xfs_dqtrx 0 0 528 31 4 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_dquot 0 0 496 33 4 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_buf 2545661 3291582 384 42 4 : tunables 0 0 0 : slabdata 78371 78371 0 >> xfs_rui_item 0 0 696 47 8 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_rud_item 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_inode 23063278 77479540 1024 32 8 : tunables 0 0 0 : slabdata 2425069 2425069 0 >> xfs_efd_item 4662 4847 440 37 4 : tunables 0 0 0 : slabdata 131 131 0 >> xfs_buf_item 8610 8760 272 30 2 : tunables 0 0 0 : slabdata 292 292 0 >> xfs_trans 1925 1925 232 35 2 : tunables 0 0 0 : slabdata 55 55 0 >> xfs_da_state 1632 1632 480 34 4 : tunables 0 0 0 : slabdata 48 48 0 >> xfs_btree_cur 1728 1728 224 36 2 : tunables 0 0 0 : slabdata 48 48 0 > There's no xfs_ili slab cache - this kernel must be using merged > slabs, so I'm going to have to infer how many inodes are dirty from > other slabs. The inode log item is ~190 bytes in size, so.... > >> skbuff_ext_cache 16454495 32746392 192 42 2 : tunables 0 0 0 : slabdata 779676 779676 0 > Yup, there were - 192 byte slab, 16 million active objects. Not all > of those inodes will be dirty right now, but ~65% of the inodes > cached in memory have been dirty at some point. > > So, yes, it is highly likely that your memory reclaim/OOM problems > are caused by blocking on dirty inodes in memory reclaim, which you > can only fix by upgrading to a newer kernel. Thanks for the suggestion! Do you have any kernel version recommendation in this case? We plan to use ubuntu 20.04 with 5.15 kernel for this, and probably rebuild the xfs and install by ourselves to bypass the default ones to test xfs 5.9. Is this a good plan from your perspective? > -Dave. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-01 21:43 ` Dave Chinner 2023-06-01 23:59 ` Jianan Wang @ 2023-06-06 23:00 ` Jianan Wang 2023-06-07 2:21 ` Dave Chinner 1 sibling, 1 reply; 10+ messages in thread From: Jianan Wang @ 2023-06-06 23:00 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Hi Dave, Just to follow up on this. We have performed the testing using the Ubuntu 20.04 with 5.15 kernel as well as our custom built xfs 5.9, but we still see significant slab memory build-up during the process. Below are the information for your reference: Linux# xfs_info /dev/sdb1 meta-data=/dev/sdb1 isize=512 agcount=32, agsize=146492160 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=4687748608, imaxpct=5 = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Linux# xfs_db -r /dev/sdb1 xfs_db> version versionnum [0xb5b5+0x18a] = V5,NLINK,DIRV2,ATTR,ALIGN,DALIGN,LOGV2,EXTFLG,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK Linux# xfs_info -V xfs_info version 5.9.0 Linux# cat /proc/meminfo MemTotal: 526966076 kB MemFree: 128253892 kB MemAvailable: 422280036 kB Buffers: 309532 kB Cached: 265523976 kB SwapCached: 0 kB Active: 101563884 kB Inactive: 165695060 kB Active(anon): 17320 kB Inactive(anon): 1374072 kB Active(file): 101546564 kB Inactive(file): 164320988 kB Unevictable: 18472 kB Mlocked: 18472 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 1281636 kB Mapped: 320712 kB Shmem: 14156 kB KReclaimable: 33278880 kB Slab: 56547064 kB SReclaimable: 33278880 kB SUnreclaim: 23268184 kB KernelStack: 41488 kB PageTables: 19824 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 263483036 kB Committed_AS: 12538508 kB VmallocTotal: 34359738367 kB VmallocUsed: 440260 kB VmallocChunk: 0 kB Percpu: 141760 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 113564068 kB DirectMap2M: 268806144 kB DirectMap1G: 155189248 kB root@sjc1-training-prod-104:~# cat /proc/slabinfo slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> wg_peer 0 0 1552 21 8 : tunables 0 0 0 : slabdata 0 0 0 ufs_inode_cache 0 0 840 39 8 : tunables 0 0 0 : slabdata 0 0 0 qnx4_inode_cache 0 0 712 46 8 : tunables 0 0 0 : slabdata 0 0 0 hfsplus_attr_cache 0 0 3840 8 8 : tunables 0 0 0 : slabdata 0 0 0 hfsplus_icache 0 0 960 34 8 : tunables 0 0 0 : slabdata 0 0 0 hfs_inode_cache 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 minix_inode_cache 0 0 704 46 8 : tunables 0 0 0 : slabdata 0 0 0 ntfs_big_inode_cache 0 0 960 34 8 : tunables 0 0 0 : slabdata 0 0 0 ntfs_inode_cache 0 0 296 55 4 : tunables 0 0 0 : slabdata 0 0 0 jfs_ip 0 0 1312 24 8 : tunables 0 0 0 : slabdata 0 0 0 au_vdir 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0 au_finfo 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 au_icntnr 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 au_dinfo 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 xfs_dqtrx 0 0 528 62 8 : tunables 0 0 0 : slabdata 0 0 0 xfs_dquot 0 0 496 33 4 : tunables 0 0 0 : slabdata 0 0 0 xfs_buf 6161830 6162282 384 42 4 : tunables 0 0 0 : slabdata 146721 146721 0 xfs_rui_item 0 0 680 48 8 : tunables 0 0 0 : slabdata 0 0 0 xfs_rud_item 8784 8784 168 48 2 : tunables 0 0 0 : slabdata 183 183 0 xfs_icr 33396 33810 176 46 2 : tunables 0 0 0 : slabdata 735 735 0 xfs_inode 20062909 24750334 960 34 8 : tunables 0 0 0 : slabdata 727951 727951 0 xfs_efd_item 10360 10656 432 37 4 : tunables 0 0 0 : slabdata 288 288 0 xfs_trans 4550 4550 232 35 2 : tunables 0 0 0 : slabdata 130 130 0 xfs_da_state 2720 2720 480 34 4 : tunables 0 0 0 : slabdata 80 80 0 xfs_btree_cur 2880 2880 224 36 2 : tunables 0 0 0 : slabdata 80 80 0 kvm_async_pf 4800 4800 136 60 2 : tunables 0 0 0 : slabdata 80 80 0 kvm_vcpu 0 0 10880 3 8 : tunables 0 0 0 : slabdata 0 0 0 kvm_mmu_page_header 0 0 184 44 2 : tunables 0 0 0 : slabdata 0 0 0 x86_emulator 0 0 2672 12 8 : tunables 0 0 0 : slabdata 0 0 0 rbd_img_request 0 0 160 51 2 : tunables 0 0 0 : slabdata 0 0 0 uvm_tools_event_tracker_t 0 0 1128 29 8 : tunables 0 0 0 : slabdata 0 0 0 migrate_vma_state_t 0 0 302152 1 128 : tunables 0 0 0 : slabdata 0 0 0 uvm_range_group_range_t 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0 uvm_va_block_context_t 0 0 1472 22 8 : tunables 0 0 0 : slabdata 0 0 0 uvm_va_block_t 608 608 848 38 8 : tunables 0 0 0 : slabdata 16 16 0 uvm_va_range_t 4743 4811 1896 17 8 : tunables 0 0 0 : slabdata 283 283 0 ceph_osd_request 0 0 1200 27 8 : tunables 0 0 0 : slabdata 0 0 0 ceph_msg 0 0 240 34 2 : tunables 0 0 0 : slabdata 0 0 0 ovl_inode 41832 43110 720 45 8 : tunables 0 0 0 : slabdata 958 958 0 nf_conntrack 4437 4437 320 51 4 : tunables 0 0 0 : slabdata 87 87 0 ext4_groupinfo_4k 19658572 25481736 192 42 2 : tunables 0 0 0 : slabdata 606708 606708 0 btrfs_delayed_node 0 0 312 52 4 : tunables 0 0 0 : slabdata 0 0 0 btrfs_ordered_extent 9576 9880 424 38 4 : tunables 0 0 0 : slabdata 260 260 0 btrfs_extent_map 0 0 144 56 2 : tunables 0 0 0 : slabdata 0 0 0 btrfs_trans_handle 0 0 112 36 1 : tunables 0 0 0 : slabdata 0 0 0 btrfs_inode 0 0 1208 27 8 : tunables 0 0 0 : slabdata 0 0 0 nvidia_stack_cache 844 866 12288 2 8 : tunables 0 0 0 : slabdata 433 433 0 scsi_sense_cache 114832 114848 128 32 1 : tunables 0 0 0 : slabdata 3589 3589 0 fsverity_info 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 fscrypt_info 0 0 136 60 2 : tunables 0 0 0 : slabdata 0 0 0 MPTCPv6 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 ip6-frags 5148 5148 184 44 2 : tunables 0 0 0 : slabdata 117 117 0 PINGv6 0 0 1216 26 8 : tunables 0 0 0 : slabdata 0 0 0 RAWv6 4264 4446 1216 26 8 : tunables 0 0 0 : slabdata 171 171 0 UDPv6 2952 2952 1344 24 8 : tunables 0 0 0 : slabdata 123 123 0 tw_sock_TCPv6 1320 1320 248 33 2 : tunables 0 0 0 : slabdata 40 40 0 request_sock_TCPv6 0 0 304 53 4 : tunables 0 0 0 : slabdata 0 0 0 TCPv6 1040 1040 2432 13 8 : tunables 0 0 0 : slabdata 80 80 0 kcopyd_job 0 0 3240 10 8 : tunables 0 0 0 : slabdata 0 0 0 dm_uevent 0 0 2888 11 8 : tunables 0 0 0 : slabdata 0 0 0 mqueue_inode_cache 1802 1802 960 34 8 : tunables 0 0 0 : slabdata 53 53 0 fuse_request 0 0 152 53 2 : tunables 0 0 0 : slabdata 0 0 0 fuse_inode 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_inode_cache 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 ecryptfs_file_cache 17664 17664 16 256 1 : tunables 0 0 0 : slabdata 69 69 0 ecryptfs_auth_tok_list_item 0 0 832 39 8 : tunables 0 0 0 : slabdata 0 0 0 fat_inode_cache 0 0 776 42 8 : tunables 0 0 0 : slabdata 0 0 0 fat_cache 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0 squashfs_inode_cache 920 920 704 46 8 : tunables 0 0 0 : slabdata 20 20 0 jbd2_journal_head 3978 3978 120 34 1 : tunables 0 0 0 : slabdata 117 117 0 jbd2_revoke_table_s 512 512 16 256 1 : tunables 0 0 0 : slabdata 2 2 0 ext4_fc_dentry_update 0 0 80 51 1 : tunables 0 0 0 : slabdata 0 0 0 ext4_inode_cache 15687 15687 1176 27 8 : tunables 0 0 0 : slabdata 581 581 0 ext4_allocation_context 4480 4480 144 56 2 : tunables 0 0 0 : slabdata 80 80 0 ext4_io_end 5120 5120 64 64 1 : tunables 0 0 0 : slabdata 80 80 0 ext4_pending_reservation 10240 10240 32 128 1 : tunables 0 0 0 : slabdata 80 80 0 ext4_extent_status 14484 14484 40 102 1 : tunables 0 0 0 : slabdata 142 142 0 mbcache 5840 5840 56 73 1 : tunables 0 0 0 : slabdata 80 80 0 kioctx 224 224 576 56 8 : tunables 0 0 0 : slabdata 4 4 0 userfaultfd_ctx_cache 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 dnotify_struct 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 pid_namespace 3600 3600 136 60 2 : tunables 0 0 0 : slabdata 60 60 0 UNIX 2400 2400 1088 30 8 : tunables 0 0 0 : slabdata 80 80 0 ip4-frags 5040 5080 200 40 2 : tunables 0 0 0 : slabdata 127 127 0 MPTCP 0 0 1920 17 8 : tunables 0 0 0 : slabdata 0 0 0 request_sock_subflow 0 0 376 43 4 : tunables 0 0 0 : slabdata 0 0 0 xfrm_dst_cache 51 51 320 51 4 : tunables 0 0 0 : slabdata 1 1 0 xfrm_state 0 0 768 42 8 : tunables 0 0 0 : slabdata 0 0 0 ip_fib_trie 5865 5865 48 85 1 : tunables 0 0 0 : slabdata 69 69 0 ip_fib_alias 5037 5037 56 73 1 : tunables 0 0 0 : slabdata 69 69 0 PING 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 RAW 6528 6528 1024 32 8 : tunables 0 0 0 : slabdata 204 204 0 tw_sock_TCP 2673 2673 248 33 2 : tunables 0 0 0 : slabdata 81 81 0 request_sock_TCP 4240 4240 304 53 4 : tunables 0 0 0 : slabdata 80 80 0 TCP 1610 1610 2240 14 8 : tunables 0 0 0 : slabdata 115 115 0 hugetlbfs_inode_cache 98 98 664 49 8 : tunables 0 0 0 : slabdata 2 2 0 dquot 2560 2560 256 32 2 : tunables 0 0 0 : slabdata 80 80 0 ep_head 20480 20480 16 256 1 : tunables 0 0 0 : slabdata 80 80 0 dax_cache 39 39 832 39 8 : tunables 0 0 0 : slabdata 1 1 0 bio_crypt_ctx 5575008 12859548 40 102 1 : tunables 0 0 0 : slabdata 126074 126074 0 request_queue 167 225 2128 15 8 : tunables 0 0 0 : slabdata 15 15 0 biovec-max 1872 1928 4096 8 8 : tunables 0 0 0 : slabdata 241 241 0 biovec-128 5938 6016 2048 16 8 : tunables 0 0 0 : slabdata 376 376 0 biovec-64 5952 5952 1024 32 8 : tunables 0 0 0 : slabdata 186 186 0 khugepaged_mm_slot 1620 1620 112 36 1 : tunables 0 0 0 : slabdata 45 45 0 user_namespace 260 260 624 52 8 : tunables 0 0 0 : slabdata 5 5 0 dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0 dmaengine-unmap-128 30 30 1088 30 8 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 20562 20943 832 39 8 : tunables 0 0 0 : slabdata 537 537 0 skbuff_ext_cache 9828 9828 192 42 2 : tunables 0 0 0 : slabdata 234 234 0 skbuff_fclone_cache 9440 9440 512 32 4 : tunables 0 0 0 : slabdata 295 295 0 skbuff_head_cache 14485 14592 256 32 2 : tunables 0 0 0 : slabdata 456 456 0 file_lock_cache 2960 2960 216 37 2 : tunables 0 0 0 : slabdata 80 80 0 file_lock_ctx 72197 72197 56 73 1 : tunables 0 0 0 : slabdata 989 989 0 fsnotify_mark_connector 81895 88192 32 128 1 : tunables 0 0 0 : slabdata 689 689 0 buffer_head 163644 272571 104 39 1 : tunables 0 0 0 : slabdata 6989 6989 0 x86_lbr 0 0 800 40 8 : tunables 0 0 0 : slabdata 0 0 0 taskstats 3680 3680 352 46 4 : tunables 0 0 0 : slabdata 80 80 0 proc_dir_entry 11046 11046 192 42 2 : tunables 0 0 0 : slabdata 263 263 0 pde_opener 8160 8160 40 102 1 : tunables 0 0 0 : slabdata 80 80 0 proc_inode_cache 26161 29118 712 46 8 : tunables 0 0 0 : slabdata 633 633 0 seq_file 3264 3264 120 34 1 : tunables 0 0 0 : slabdata 96 96 0 sigqueue 16677 16677 80 51 1 : tunables 0 0 0 : slabdata 327 327 0 bdev_cache 100 100 1600 20 8 : tunables 0 0 0 : slabdata 5 5 0 shmem_inode_cache 39065 42054 760 43 8 : tunables 0 0 0 : slabdata 978 978 0 kernfs_node_cache 303712 303712 128 32 1 : tunables 0 0 0 : slabdata 9491 9491 0 mnt_cache 23154 23154 320 51 4 : tunables 0 0 0 : slabdata 454 454 0 filp 24710 25536 256 32 2 : tunables 0 0 0 : slabdata 798 798 0 inode_cache 65841 72726 640 51 8 : tunables 0 0 0 : slabdata 1426 1426 0 dentry 2281389 3225894 192 42 2 : tunables 0 0 0 : slabdata 76807 76807 0 names_cache 1640 1664 4096 8 8 : tunables 0 0 0 : slabdata 208 208 0 net_namespace 329 329 4352 7 8 : tunables 0 0 0 : slabdata 47 47 0 iint_cache 0 0 120 34 1 : tunables 0 0 0 : slabdata 0 0 0 lsm_file_cache 20349776 51397800 24 170 1 : tunables 0 0 0 : slabdata 302340 302340 0 uts_namespace 2220 2220 432 37 4 : tunables 0 0 0 : slabdata 60 60 0 nsproxy 4480 4480 72 56 1 : tunables 0 0 0 : slabdata 80 80 0 vm_area_struct 47648 47814 208 39 2 : tunables 0 0 0 : slabdata 1226 1226 0 mm_struct 2760 2760 1088 30 8 : tunables 0 0 0 : slabdata 92 92 0 files_cache 4094 4094 704 46 8 : tunables 0 0 0 : slabdata 89 89 0 signal_cache 8704 8904 1152 28 8 : tunables 0 0 0 : slabdata 318 318 0 sighand_cache 4921 4965 2112 15 8 : tunables 0 0 0 : slabdata 331 331 0 task_struct 2981 3192 8192 4 8 : tunables 0 0 0 : slabdata 798 798 0 cred_jar 48258 48258 192 42 2 : tunables 0 0 0 : slabdata 1149 1149 0 anon_vma_chain 70629 72320 64 64 1 : tunables 0 0 0 : slabdata 1130 1130 0 anon_vma 47399 47840 88 46 1 : tunables 0 0 0 : slabdata 1040 1040 0 pid 43666 44352 128 32 1 : tunables 0 0 0 : slabdata 1386 1386 0 Acpi-Operand 103376 103376 72 56 1 : tunables 0 0 0 : slabdata 1846 1846 0 Acpi-ParseExt 234 234 104 39 1 : tunables 0 0 0 : slabdata 6 6 0 Acpi-State 459 459 80 51 1 : tunables 0 0 0 : slabdata 9 9 0 numa_policy 22071 22506 264 62 4 : tunables 0 0 0 : slabdata 363 363 0 perf_event 2160 2160 1192 27 8 : tunables 0 0 0 : slabdata 80 80 0 trace_event_file 6210 6210 88 46 1 : tunables 0 0 0 : slabdata 135 135 0 ftrace_event_field 29750 29750 48 85 1 : tunables 0 0 0 : slabdata 350 350 0 pool_workqueue 31616 31776 256 32 2 : tunables 0 0 0 : slabdata 993 993 0 radix_tree_node 2745972 3417008 584 56 8 : tunables 0 0 0 : slabdata 61018 61018 0 task_group 5202 5202 640 51 8 : tunables 0 0 0 : slabdata 102 102 0 vmap_area 50712 52160 64 64 1 : tunables 0 0 0 : slabdata 815 815 0 dma-kmalloc-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-1k 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0 dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-4k 280 280 4096 8 8 : tunables 0 0 0 : slabdata 35 35 0 kmalloc-rcl-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-1k 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-192 55331 55944 192 42 2 : tunables 0 0 0 : slabdata 1332 1332 0 kmalloc-rcl-128 52841 92480 128 32 1 : tunables 0 0 0 : slabdata 2890 2890 0 kmalloc-rcl-96 75188 81732 96 42 1 : tunables 0 0 0 : slabdata 1946 1946 0 kmalloc-rcl-64 223406 330624 64 64 1 : tunables 0 0 0 : slabdata 5166 5166 0 kmalloc-rcl-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-rcl-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0 kmalloc-cg-8k 320 320 8192 4 8 : tunables 0 0 0 : slabdata 80 80 0 kmalloc-cg-4k 1696 1736 4096 8 8 : tunables 0 0 0 : slabdata 217 217 0 kmalloc-cg-2k 2416 2416 2048 16 8 : tunables 0 0 0 : slabdata 151 151 0 kmalloc-cg-1k 4640 4640 1024 32 8 : tunables 0 0 0 : slabdata 145 145 0 kmalloc-cg-512 2976 2976 512 32 4 : tunables 0 0 0 : slabdata 93 93 0 kmalloc-cg-256 2208 2208 256 32 2 : tunables 0 0 0 : slabdata 69 69 0 kmalloc-cg-192 3402 3402 192 42 2 : tunables 0 0 0 : slabdata 81 81 0 kmalloc-cg-128 2560 2560 128 32 1 : tunables 0 0 0 : slabdata 80 80 0 kmalloc-cg-96 3360 3360 96 42 1 : tunables 0 0 0 : slabdata 80 80 0 kmalloc-cg-64 5696 5696 64 64 1 : tunables 0 0 0 : slabdata 89 89 0 kmalloc-cg-32 10240 10240 32 128 1 : tunables 0 0 0 : slabdata 80 80 0 kmalloc-cg-16 61952 61952 16 256 1 : tunables 0 0 0 : slabdata 242 242 0 kmalloc-cg-8 40960 40960 8 512 1 : tunables 0 0 0 : slabdata 80 80 0 kmalloc-8k 1870 1984 8192 4 8 : tunables 0 0 0 : slabdata 496 496 0 kmalloc-4k 4917 5408 4096 8 8 : tunables 0 0 0 : slabdata 676 676 0 kmalloc-2k 50935 52528 2048 16 8 : tunables 0 0 0 : slabdata 3283 3283 0 kmalloc-1k 1069096 2780992 1024 32 8 : tunables 0 0 0 : slabdata 86906 86906 0 kmalloc-512 11268929 22504704 512 32 4 : tunables 0 0 0 : slabdata 703272 703272 0 kmalloc-256 2255033 6208960 256 32 2 : tunables 0 0 0 : slabdata 194030 194030 0 kmalloc-192 10336133 19638192 192 42 2 : tunables 0 0 0 : slabdata 467576 467576 0 kmalloc-128 2112454 5171456 128 32 1 : tunables 0 0 0 : slabdata 161608 161608 0 kmalloc-96 1078742 2495976 96 42 1 : tunables 0 0 0 : slabdata 59428 59428 0 kmalloc-64 2065281 5393984 64 64 1 : tunables 0 0 0 : slabdata 84281 84281 0 kmalloc-32 4692190 6049792 32 128 1 : tunables 0 0 0 : slabdata 47264 47264 0 kmalloc-16 1449497 3389184 16 256 1 : tunables 0 0 0 : slabdata 13239 13239 0 kmalloc-8 53760 53760 8 512 1 : tunables 0 0 0 : slabdata 105 105 0 kmem_cache_node 1492 1536 64 64 1 : tunables 0 0 0 : slabdata 24 24 0 kmem_cache 704 704 256 32 2 : tunables 0 0 0 : slabdata 22 22 0 Linux# uname -a Linux sjc1-training-prod-104 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Linux# cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu-Server 20.04.6 2023.05.30 (Cubic 2023-05-30 13:13)" Please let us know if you could share any suggestion or recommendation on this. Best Regards. Jianan. On 6/1/23 14:43, Dave Chinner wrote: > On Wed, May 31, 2023 at 11:21:41PM -0700, Jianan Wang wrote: >> Seems the auto-wraping issue is on my gmail.... using thunderbird should be better... > Thanks! > >> Resend the slabinfo and meminfo output here: >> >> Linux # cat /proc/slabinfo >> slabinfo - version: 2.1 >> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> > ..... >> xfs_dqtrx 0 0 528 31 4 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_dquot 0 0 496 33 4 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_buf 2545661 3291582 384 42 4 : tunables 0 0 0 : slabdata 78371 78371 0 >> xfs_rui_item 0 0 696 47 8 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_rud_item 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0 >> xfs_inode 23063278 77479540 1024 32 8 : tunables 0 0 0 : slabdata 2425069 2425069 0 >> xfs_efd_item 4662 4847 440 37 4 : tunables 0 0 0 : slabdata 131 131 0 >> xfs_buf_item 8610 8760 272 30 2 : tunables 0 0 0 : slabdata 292 292 0 >> xfs_trans 1925 1925 232 35 2 : tunables 0 0 0 : slabdata 55 55 0 >> xfs_da_state 1632 1632 480 34 4 : tunables 0 0 0 : slabdata 48 48 0 >> xfs_btree_cur 1728 1728 224 36 2 : tunables 0 0 0 : slabdata 48 48 0 > There's no xfs_ili slab cache - this kernel must be using merged > slabs, so I'm going to have to infer how many inodes are dirty from > other slabs. The inode log item is ~190 bytes in size, so.... > >> skbuff_ext_cache 16454495 32746392 192 42 2 : tunables 0 0 0 : slabdata 779676 779676 0 > Yup, there were - 192 byte slab, 16 million active objects. Not all > of those inodes will be dirty right now, but ~65% of the inodes > cached in memory have been dirty at some point. > > So, yes, it is highly likely that your memory reclaim/OOM problems > are caused by blocking on dirty inodes in memory reclaim, which you > can only fix by upgrading to a newer kernel. > > -Dave. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-06 23:00 ` Jianan Wang @ 2023-06-07 2:21 ` Dave Chinner 2023-06-27 18:40 ` Jianan Wang 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2023-06-07 2:21 UTC (permalink / raw) To: Jianan Wang; +Cc: linux-xfs On Tue, Jun 06, 2023 at 04:00:56PM -0700, Jianan Wang wrote: > Hi Dave, > > Just to follow up on this. We have performed the testing using the > Ubuntu 20.04 with 5.15 kernel as well as our custom built xfs 5.9, > but we still see significant slab memory build-up during the > process. That's to be expected. Nothing has changed with respect to inode cache size management. All the changes were to how the XFS inode cache gets reclaimed. Are you getting OOM killer reports when under memory pressure on 5.15 like you originally reported for the 5.4 kernel you were running? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question on the xfs inode slab memory 2023-06-07 2:21 ` Dave Chinner @ 2023-06-27 18:40 ` Jianan Wang 0 siblings, 0 replies; 10+ messages in thread From: Jianan Wang @ 2023-06-27 18:40 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Hi Dave, Sorry for the late response. No, we actually did not get OOM kill issue in a small-scale testing phase when we try to stress the filesystem i/o. We plan to roll out to larger cluster for scale-testing. Could you please help advice if we need to reformat the xfs volume to take effect or we could simply upgrade the kernel module and expect it to work? Best Regards. Jianan. On 6/6/23 19:21, Dave Chinner wrote: > On Tue, Jun 06, 2023 at 04:00:56PM -0700, Jianan Wang wrote: >> Hi Dave, >> >> Just to follow up on this. We have performed the testing using the >> Ubuntu 20.04 with 5.15 kernel as well as our custom built xfs 5.9, >> but we still see significant slab memory build-up during the >> process. > That's to be expected. Nothing has changed with respect to inode > cache size management. All the changes were to how the XFS inode > cache gets reclaimed. Are you getting OOM killer reports when under > memory pressure on 5.15 like you originally reported for the 5.4 > kernel you were running? > > Cheers, > > Dave. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-06-27 18:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-31 21:29 Question on the xfs inode slab memory Jianan Wang 2023-06-01 0:08 ` Dave Chinner 2023-06-01 5:25 ` Jianan Wang 2023-06-01 15:06 ` Darrick J. Wong 2023-06-01 6:21 ` Jianan Wang 2023-06-01 21:43 ` Dave Chinner 2023-06-01 23:59 ` Jianan Wang 2023-06-06 23:00 ` Jianan Wang 2023-06-07 2:21 ` Dave Chinner 2023-06-27 18:40 ` Jianan Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox