All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Linux-MM <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers
Date: Wed, 29 Nov 2023 10:14:54 +0100	[thread overview]
Message-ID: <ZWcBDglmDKUJdwMv@tiehlicka> (raw)
In-Reply-To: <ZWaHG09fY2BYjyGD@P9FQF9L96D.corp.robot.car>

On Tue 28-11-23 16:34:35, Roman Gushchin wrote:
> On Tue, Nov 28, 2023 at 02:23:36PM +0800, Qi Zheng wrote:
[...]
> > Now I think adding this method might not be a good idea. If we allow
> > shrinkers to report thier own private information, OOM logs may become
> > cluttered. Most people only care about some general information when
> > troubleshooting OOM problem, but not the private information of a
> > shrinker.
> 
> I agree with that.
> 
> It seems that the feature is mostly useful for kernel developers and it's easily
> achievable by attaching a bpf program to the oom handler. If it requires a bit
> of work on the bpf side, we can do that instead, but probably not. And this
> solution can potentially provide way more information in a more flexible way.
> 
> So I'm not convinced it's a good idea to make the generic oom handling code
> more complicated and fragile for everybody, as well as making oom reports differ
> more between kernel versions and configurations.

Completely agreed! From my many years of experience of oom reports
analysing from production systems I would conclude the following categories
	- clear runaways (and/or memory leaks)
		- userspace consumers - either shmem or anonymous memory
		  predominantly consumes the memory, swap is either depleted
		  or not configured.
		  OOM report is usually useful to pinpoint those as we
		  have required counters available
		- kernel memory consumers - if we are lucky they are
		  using slab allocator and unreclaimable slab is a huge
		  part of the memory consumption. If this is a page
		  allocator user the oom repport only helps to deduce
		  the fact by looking at how much user + slab + page
		  table etc. form. But identifying the root cause is
		  close to impossible without something like page_owner
		  or a crash dump.
	- misbehaving memory reclaim
		- minority of issues and the oom report is usually
		  insufficient to drill down to the root cause. If the
		  problem is reproducible then collecting vmstat data
		  can give a much better clue.
		- high number of slab reclaimable objects or free swap
		  are good indicators. Shrinkers data could be
		  potentially helpful in the slab case but I really have
		  hard time to remember any such situation.
On non-production systems the situation is quite different. I can see
how it could be very beneficial to add a very specific debugging data
for subsystem/shrinker which is developed and could cause the OOM. For
that purpose the proposed scheme is rather inflexible AFAICS.

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2023-11-29  9:15 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-22 23:25 [PATCH 0/7] shrinker debugging improvements Kent Overstreet
2023-11-22 23:25 ` [PATCH 1/7] seq_buf: seq_buf_human_readable_u64() Kent Overstreet
2023-11-22 23:25 ` [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers Kent Overstreet
2023-11-23  3:32   ` Qi Zheng
2023-11-23 21:24     ` Kent Overstreet
2023-11-24  3:08       ` Qi Zheng
2023-11-25  0:30         ` Kent Overstreet
2023-11-28  3:27           ` Muchun Song
2023-11-28  3:53             ` Kent Overstreet
2023-11-28  6:23               ` Qi Zheng
2023-11-29  0:34                 ` Roman Gushchin
2023-11-29  9:14                   ` Michal Hocko [this message]
2023-11-29 23:11                     ` Kent Overstreet
2023-11-30  3:09                       ` Qi Zheng
2023-11-30  3:21                         ` Kent Overstreet
2023-11-30  3:42                           ` Qi Zheng
2023-11-30  4:14                             ` Kent Overstreet
2023-11-30 19:01                           ` Roman Gushchin
2023-12-01  0:00                             ` Kent Overstreet
2023-12-01  1:18                             ` Dave Chinner
2023-12-01 20:01                               ` Roman Gushchin
2023-12-01 21:51                                 ` Kent Overstreet
2023-12-06  8:16                                 ` Dave Chinner
2023-12-06 19:13                                   ` Kent Overstreet
2023-12-09  1:44                                     ` Roman Gushchin
2023-12-09  2:04                                       ` Kent Overstreet
2023-11-30  8:14                       ` Michal Hocko
2023-12-01  1:47                         ` Kent Overstreet
2023-12-01 10:04                           ` Michal Hocko
2023-12-01 21:25                             ` Kent Overstreet
2023-12-04 10:33                               ` Michal Hocko
2023-12-04 18:15                                 ` Kent Overstreet
2023-12-05  8:49                                   ` Michal Hocko
2023-12-05 23:21                                     ` Kent Overstreet
2023-11-24 11:46   ` kernel test robot
2023-11-28 10:01   ` Michal Hocko
2023-11-28 17:48     ` Kent Overstreet
2023-11-29 16:02       ` Michal Hocko
2023-11-29 22:36         ` Kent Overstreet
2023-11-22 23:25 ` [PATCH 3/7] mm: shrinker: Add new stats for .to_text() Kent Overstreet
2023-11-22 23:25 ` [PATCH 4/7] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
2023-11-28 10:07   ` Michal Hocko
2023-11-28 17:54     ` Kent Overstreet
2023-11-29  8:59       ` Michal Hocko
2023-11-22 23:25 ` [PATCH 5/7] mm: shrinker: Add shrinker_to_text() to debugfs interface Kent Overstreet
2023-11-22 23:25 ` [PATCH 6/7] bcachefs: shrinker.to_text() methods Kent Overstreet
2023-11-22 23:25 ` [PATCH 7/7] bcachefs: add counters for failed shrinker reclaim Kent Overstreet
2023-11-28  9:59 ` [PATCH 0/7] shrinker debugging improvements Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZWcBDglmDKUJdwMv@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.