public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jianan Wang <wangjianan.zju@gmail.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Question on the xfs inode slab memory
Date: Thu, 1 Jun 2023 10:08:02 +1000	[thread overview]
Message-ID: <ZHfhYsqln68N1HyO@dread.disaster.area> (raw)
In-Reply-To: <CAMj1M42L6hH9weqroQNaWu_SG+Yg8NrAuzgNO1b8jiWPJ2M-5A@mail.gmail.com>

On Wed, May 31, 2023 at 02:29:52PM -0700, Jianan Wang wrote:
> Hi all,
> 
> I have a question regarding the xfs slab memory usage when operating a
> filesystem with 1-2 billion inodes (raid 0 with 6 disks, totally
> 18TB). On this partition, whenever there is a high disk io operation,
> like removing millions of small files, the slab kernel memory usage
> will increase a lot, leading to many OOM issues happening for the
> services running on this node. You could check some of the stats as
> the following (only includes the xfs related):

You didn't include all the XFS related slabs. At minimum, the inode
log item slab needs to be shown (xfs_ili) because that tells us how
many of the inodes in the cache have been dirtied.

As it is, I'm betting the problem is the disk subsystem can't write
back dirty inodes fast enough to keep up with memory demand and so
reclaim is declaring OOM faster than your disks can clean inodes to
enable them to be reclaimed.

> #########################################################################
> Active / Total Objects (% used):  281803052 / 317485764 (88.8%)
> Active / Total Slabs (% used): 13033144 / 13033144 (100.0%)
> Active / Total Caches (% used): 126 / 180 (70.0%)
> Active / Total Size (% used): 114671057.99K / 127265108.19K (90.1%)
> Minium / Average / Maximum Object : 0.01K / 0.40K / 16.75K
> 
> OBJS               ACTIVE      USE     OBJ SIZE     SLABS
> OBJ/SLAB    CACHE SIZE    NAME
> 78207920      70947541      0%       1.00K           7731010
>  32            247392320K     xfs_inode
> 59945928      46548798      0%       0.19K           1433102
>  42              11464816K     dentry
> 25051296      25051282      0%       0.38K           599680
>   42            9594880K         xfs_buf

Ok, that's from slabtop. Please don't autowrap stuff you've pasted
in - it makes it really hard to read. (reformatted so I can read
it).

OBJS           ACTIVE      USE     OBJ SIZE     SLABS OBJ/SLAB    CACHE SIZE    NAME
78207920      70947541      0%       1.00K     7731010   32       247392320K     xfs_inode
59945928      46548798      0%       0.19K     1433102   42        11464816K     dentry
25051296      25051282      0%       0.38K      599680   42         9594880K         xfs_buf

So, 70 million cached inodes, with a cache size of 240GB. There are
7.7 million slabs, 32 objects per slab, and that's roughly 240GB.

But why does the slab report only 78 million objects in the slab
when at 240GB there should be 240 million objects in the slab?

It looks like theres some kind of accounting problem here, likely in
the slabtop program. I have always found slabtop to be unreliable
like this....

Can you attach the output of 'cat /proc/slabinfo' and 'cat
/proc/meminfo' when you have a large slab cache in memory?

> #########################################################################
> 
> The peak slab memory usage could spike all the way to 100GB+.

Is that all? :)

> We are using Ubuntu 18.04 and the xfs version is 4.9, kernel version is 5.4

Ah, I don't think there's anything upstream can do for you. We
rewrote large portions of the XFS inode reclaim in 5.9 (3 years ago)
to address the issues with memory reclaim getting stuck on dirty XFS
inodes, so inode reclaim behaviour in modern kernels is completely
different to old kernels.

I'd suggest that you need to upgrade your systems to run a more
modern kernel and see if that fixes the issues you are seeing...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-06-01  0:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 21:29 Question on the xfs inode slab memory Jianan Wang
2023-06-01  0:08 ` Dave Chinner [this message]
2023-06-01  5:25   ` Jianan Wang
2023-06-01 15:06     ` Darrick J. Wong
2023-06-01  6:21   ` Jianan Wang
2023-06-01 21:43     ` Dave Chinner
2023-06-01 23:59       ` Jianan Wang
2023-06-06 23:00       ` Jianan Wang
2023-06-07  2:21         ` Dave Chinner
2023-06-27 18:40           ` Jianan Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHfhYsqln68N1HyO@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wangjianan.zju@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox