From: Tejun Heo <tj@kernel.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>,
	Suren Baghdasaryan <surenb@google.com>,
	akpm@linux-foundation.org, vbabka@suse.cz, hannes@cmpxchg.org,
	roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net,
	willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net,
	void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com,
	ldufour@linux.ibm.com, catalin.marinas@arm.com, will@kernel.org,
	arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com,
	david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org,
	masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org,
	muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org,
	pasha.tatashin@soleen.com, yosryahmed@google.com,
	yuzhao@google.com, dhowells@redhat.com, hughd@google.com,
	andreyknvl@gmail.com, keescook@chromium.org,
	ndesaulniers@google.com, gregkh@linuxfoundation.org,
	ebiggers@google.com, ytcoode@gmail.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com,
	vschneid@redhat.com, cl@linux.com, penberg@kernel.org,
	iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com,
	elver@google.com, dvyukov@google.com, shakeelb@google.com,
	songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com,
	minchan@google.com, kaleshsingh@google.com,
	kernel-team@android.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-modules@vger.kernel.org,
	kasan-dev@googlegroups.com, cgroups@vger.kernel.org
Subject: Re: [PATCH 00/40] Memory allocation profiling
Date: Wed, 3 May 2023 06:35:49 -1000	[thread overview]
Message-ID: <ZFKNZZwC8EUbOLMv@slm.duckdns.org> (raw)
In-Reply-To: <ZFIVtB8JyKk0ddA5@moria.home.lan>
Hello, Kent.
On Wed, May 03, 2023 at 04:05:08AM -0400, Kent Overstreet wrote:
> No, we're still waiting on the tracing people to _demonstrate_, not
> claim, that this is at all possible in a comparable way with tracing. 
So, we (meta) happen to do stuff like this all the time in the fleet to hunt
down tricky persistent problems like memory leaks, ref leaks, what-have-you.
In recent kernels, with kprobe and BPF, our ability to debug these sorts of
problems has improved a great deal. Below, I'm attaching a bcc script I used
to hunt down, IIRC, a double vfree. It's not exactly for a leak but leaks
can follow the same pattern.
There are of course some pros and cons to this approach:
Pros:
* The framework doesn't really have any runtime overhead, so we can have it
  deployed in the entire fleet and debug wherever problem is.
* It's fully flexible and programmable which enables non-trivial filtering
  and summarizing to be done inside kernel w/ BPF as necessary, which is
  pretty handy for tracking high frequency events.
* BPF is pretty performant. Dedicated built-in kernel code can do better of
  course but BPF's jit compiled code & its data structures are fast enough.
  I don't remember any time this was a problem.
Cons:
* BPF has some learning curve. Also the fact that what it provides is a wide
  open field rather than something scoped out for a specific problem can
  make it seem a bit daunting at the beginning.
* Because tracking starts when the script starts running, it doesn't know
  anything which has happened upto that point, so you gotta pay attention to
  handling e.g. handling frees which don't match allocs. It's kinda annoying
  but not a huge problem usually. There are ways to build in BPF progs into
  the kernel and load it early but I haven't experiemnted with it yet
  personally.
I'm not necessarily against adding dedicated memory debugging mechanism but
do wonder whether the extra benefits would be enough to justify the code and
maintenance overhead.
Oh, a bit of delta but for anyone who's more interested in debugging
problems like this, while I tend to go for bcc
(https://github.com/iovisor/bcc) for this sort of problems. Others prefer to
write against libbpf directly or use bpftrace
(https://github.com/iovisor/bpftrace).
Thanks.
#!/usr/bin/env bcc-py
import bcc
import time
import datetime
import argparse
import os
import sys
import errno
description = """
Record vmalloc/vfrees and trigger on unmatched vfree
"""
bpf_source = """
#include <uapi/linux/ptrace.h>
#include <linux/vmalloc.h>
struct vmalloc_rec {
	unsigned long		ptr;
	int			last_alloc_stkid;
	int			last_free_stkid;
	int			this_stkid;
	bool			allocated;
};
BPF_STACK_TRACE(stacks, 8192);
BPF_HASH(vmallocs, unsigned long, struct vmalloc_rec, 131072);
BPF_ARRAY(dup_free, struct vmalloc_rec, 1);
int kpret_vmalloc_node_range(struct pt_regs *ctx)
{
        unsigned long ptr = PT_REGS_RC(ctx);
	uint32_t zkey = 0;
	struct vmalloc_rec rec_init = { };
	struct vmalloc_rec *rec;
	int stkid;
	if (!ptr)
		return 0;
	stkid = stacks.get_stackid(ctx, 0);
        rec_init.ptr = ptr;
        rec_init.last_alloc_stkid = -1;
        rec_init.last_free_stkid = -1;
        rec_init.this_stkid = -1;
	rec = vmallocs.lookup_or_init(&ptr, &rec_init);
	rec->allocated = true;
	rec->last_alloc_stkid = stkid;
	return 0;
}
int kp_vfree(struct pt_regs *ctx, const void *addr)
{
	unsigned long ptr = (unsigned long)addr;
	uint32_t zkey = 0;
	struct vmalloc_rec rec_init = { };
	struct vmalloc_rec *rec;
	int stkid;
	stkid = stacks.get_stackid(ctx, 0);
        rec_init.ptr = ptr;
        rec_init.last_alloc_stkid = -1;
        rec_init.last_free_stkid = -1;
        rec_init.this_stkid = -1;
	rec = vmallocs.lookup_or_init(&ptr, &rec_init);
	if (!rec->allocated && rec->last_alloc_stkid >= 0) {
		rec->this_stkid = stkid;
		dup_free.update(&zkey, rec);
	}
	rec->allocated = false;
	rec->last_free_stkid = stkid;
        return 0;
}
"""
bpf = bcc.BPF(text=bpf_source)
bpf.attach_kretprobe(event="__vmalloc_node_range", fn_name="kpret_vmalloc_node_range");
bpf.attach_kprobe(event="vfree", fn_name="kp_vfree");
bpf.attach_kprobe(event="vfree_atomic", fn_name="kp_vfree");
stacks = bpf["stacks"]
vmallocs = bpf["vmallocs"]
dup_free = bpf["dup_free"]
last_dup_free_ptr = dup_free[0].ptr
def print_stack(stkid):
    for addr in stacks.walk(stkid):
        sym = bpf.ksym(addr)
        print('  {}'.format(sym))
def print_dup(dup):
    print('allocated={} ptr={}'.format(dup.allocated, hex(dup.ptr)))
    if (dup.last_alloc_stkid >= 0):
        print('last_alloc_stack: ')
        print_stack(dup.last_alloc_stkid)
    if (dup.last_free_stkid >= 0):
        print('last_free_stack: ')
        print_stack(dup.last_free_stkid)
    if (dup.this_stkid >= 0):
        print('this_stack: ')
        print_stack(dup.this_stkid)
while True:
    time.sleep(1)
    
    if dup_free[0].ptr != last_dup_free_ptr:
        print('\nDUP_FREE:')
        print_dup(dup_free[0])
        last_dup_free_ptr = dup_free[0].ptr
next prev parent reply	other threads:[~2023-05-03 16:36 UTC|newest]
Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-01 16:54 [PATCH 00/40] Memory allocation profiling Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 01/40] lib/string_helpers: Drop space in string_get_size's output Suren Baghdasaryan
2023-05-01 18:13   ` Davidlohr Bueso
2023-05-01 19:35     ` Kent Overstreet
2023-05-01 19:57       ` Andy Shevchenko
2023-05-01 21:16         ` Kent Overstreet
2023-05-01 21:33         ` Liam R. Howlett
2023-05-02  0:11           ` Kent Overstreet
2023-05-02  0:53         ` Kent Overstreet
2023-05-02  2:22       ` James Bottomley
2023-05-02  3:17         ` Kent Overstreet
2023-05-02  5:33           ` Andy Shevchenko
2023-05-02  6:21             ` Kent Overstreet
2023-05-02 15:19               ` Andy Shevchenko
2023-05-03  2:07                 ` Kent Overstreet
2023-05-03  6:30                   ` Andy Shevchenko
2023-05-03  7:12                     ` Kent Overstreet
2023-05-03  9:12                       ` Andy Shevchenko
2023-05-03  9:16                         ` Kent Overstreet
2023-05-02 11:42           ` James Bottomley
2023-05-02 22:50             ` Dave Chinner
2023-05-03  9:28               ` Vlastimil Babka
2023-05-03  9:44                 ` Andy Shevchenko
2023-05-03 12:15               ` James Bottomley
2023-05-02  7:55   ` Jani Nikula
2023-05-01 16:54 ` [PATCH 02/40] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 03/40] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
2023-05-02 12:35   ` Petr Tesařík
2023-05-02 19:57     ` Kent Overstreet
2023-05-02 20:20       ` Petr Tesařík
2023-05-01 16:54 ` [PATCH 04/40] nodemask: Split out include/linux/nodemask_types.h Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 05/40] prandom: Remove unused include Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 06/40] lib/string.c: strsep_no_empty() Suren Baghdasaryan
2023-05-02 12:37   ` Petr Tesařík
2023-05-01 16:54 ` [PATCH 07/40] Lazy percpu counters Suren Baghdasaryan
2023-05-01 19:17   ` Randy Dunlap
2023-05-01 16:54 ` [PATCH 08/40] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 09/40] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
2023-05-02 12:50   ` Petr Tesařík
2023-05-02 18:33     ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 10/40] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 11/40] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 12/40] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 13/40] lib: code tagging framework Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 14/40] lib: code tagging module support Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 15/40] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 16/40] lib: code tagging query helper functions Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 17/40] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 18/40] lib: introduce support for page allocation tagging Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 19/40] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
2023-05-02 15:50   ` Petr Tesařík
2023-05-02 18:38     ` Suren Baghdasaryan
2023-05-02 20:09       ` Petr Tesařík
2023-05-02 20:18         ` Kent Overstreet
2023-05-02 20:24         ` Suren Baghdasaryan
2023-05-02 20:39           ` Petr Tesařík
2023-05-02 20:41             ` Suren Baghdasaryan
2023-05-03 16:25   ` Steven Rostedt
2023-05-03 18:03     ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 20/40] mm: enable page allocation tagging Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 21/40] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 22/40] mm: create new codetag references during page splitting Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 23/40] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 24/40] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 25/40] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 26/40] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 27/40] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 28/40] timekeeping: Fix a circular include dependency Suren Baghdasaryan
2023-05-02 15:50   ` Thomas Gleixner
2023-05-01 16:54 ` [PATCH 29/40] mm: percpu: Introduce pcpuobj_ext Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 30/40] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 31/40] mm: percpu: enable per-cpu allocation tagging Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 32/40] arm64: Fix circular header dependency Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 33/40] move stack capture functionality into a separate function for reuse Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 34/40] lib: code tagging context capture support Suren Baghdasaryan
2023-05-03  7:35   ` Michal Hocko
2023-05-03 15:18     ` Suren Baghdasaryan
2023-05-03 15:26       ` Dave Hansen
2023-05-03 19:45         ` Suren Baghdasaryan
2023-05-04  8:04       ` Michal Hocko
2023-05-04 14:31         ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 35/40] lib: implement context capture support for tagged allocations Suren Baghdasaryan
2023-05-03  7:39   ` Michal Hocko
2023-05-03 15:24     ` Suren Baghdasaryan
2023-05-04  8:09       ` Michal Hocko
2023-05-04 16:22         ` Suren Baghdasaryan
2023-05-05  8:40           ` Michal Hocko
2023-05-05 18:10             ` Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 36/40] lib: add memory allocations report in show_mem() Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 37/40] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 38/40] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 39/40] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
2023-05-01 16:54 ` [PATCH 40/40] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
2023-05-01 17:47 ` [PATCH 00/40] Memory " Roman Gushchin
2023-05-01 18:08   ` Suren Baghdasaryan
2023-05-01 18:14     ` Roman Gushchin
2023-05-01 19:37       ` Kent Overstreet
2023-05-01 21:18         ` Roman Gushchin
2023-05-03  7:25 ` Michal Hocko
2023-05-03  7:34   ` Kent Overstreet
2023-05-03  7:51     ` Michal Hocko
2023-05-03  8:05       ` Kent Overstreet
2023-05-03 13:21         ` Steven Rostedt
2023-05-03 16:35         ` Tejun Heo [this message]
2023-05-03 17:42           ` Suren Baghdasaryan
2023-05-03 18:06             ` Tejun Heo
2023-05-03 17:44           ` Kent Overstreet
2023-05-03 17:51           ` Kent Overstreet
2023-05-03 18:24             ` Tejun Heo
2023-05-03 18:07           ` Johannes Weiner
2023-05-03 18:19             ` Tejun Heo
2023-05-03 18:40               ` Tejun Heo
2023-05-03 18:56                 ` Kent Overstreet
2023-05-03 18:58                   ` Tejun Heo
2023-05-03 19:09                     ` Tejun Heo
2023-05-03 19:41                       ` Suren Baghdasaryan
2023-05-03 19:48                         ` Tejun Heo
2023-05-03 20:00                           ` Tejun Heo
2023-05-03 20:14                             ` Suren Baghdasaryan
2023-05-04  2:25                               ` Tejun Heo
2023-05-04  3:33                                 ` Kent Overstreet
2023-05-04  3:33                                 ` Suren Baghdasaryan
2023-05-04  8:00                               ` Petr Tesařík
2023-05-03 20:08                           ` Suren Baghdasaryan
2023-05-03 20:11                             ` Johannes Weiner
2023-05-04  2:16                             ` Tejun Heo
2023-05-03 20:04           ` Andrey Ryabinin
2023-05-03  9:50       ` Petr Tesařík
2023-05-03  9:54         ` Kent Overstreet
2023-05-03 10:24           ` Petr Tesařík
2023-05-03  9:57         ` Kent Overstreet
2023-05-03 10:26           ` Petr Tesařík
2023-05-03 15:30             ` Kent Overstreet
2023-05-03 12:33           ` James Bottomley
2023-05-03 14:31             ` Suren Baghdasaryan
2023-05-03 15:28             ` Kent Overstreet
2023-05-03 15:37               ` Lorenzo Stoakes
2023-05-03 16:03                 ` Kent Overstreet
2023-05-03 15:49               ` James Bottomley
2023-05-03 15:09   ` Suren Baghdasaryan
2023-05-03 16:28     ` Steven Rostedt
2023-05-03 17:40       ` Suren Baghdasaryan
2023-05-03 18:03         ` Steven Rostedt
2023-05-03 18:07           ` Suren Baghdasaryan
2023-05-03 18:12           ` Kent Overstreet
2023-05-04  9:07     ` Michal Hocko
2023-05-04 15:08       ` Suren Baghdasaryan
2023-05-07 10:27         ` Michal Hocko
2023-05-07 17:01           ` Kent Overstreet
2023-05-07 17:20       ` Kent Overstreet
2023-05-07 20:55         ` Steven Rostedt
2023-05-07 21:53           ` Kent Overstreet
2023-05-07 22:09             ` Steven Rostedt
2023-05-07 22:17               ` Kent Overstreet
2023-05-08 15:52         ` Petr Tesařík
2023-05-08 15:57           ` Kent Overstreet
2023-05-08 16:09             ` Petr Tesařík
2023-05-08 16:28               ` Kent Overstreet
2023-05-08 18:59                 ` Petr Tesařík
2023-05-08 20:48                   ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=ZFKNZZwC8EUbOLMv@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dvyukov@google.com \
    --cc=ebiggers@google.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=iommu@lists.linux.dev \
    --cc=jbaron@akamai.com \
    --cc=juri.lelli@redhat.com \
    --cc=kaleshsingh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=keescook@chromium.org \
    --cc=kent.overstreet@linux.dev \
    --cc=kernel-team@android.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=penberg@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yosryahmed@google.com \
    --cc=ytcoode@gmail.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).