Linux Documentation
 help / color / mirror / Atom feed
From: Hao Ge <hao.ge@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Sourav Panda <souravpanda@google.com>,
	Abhishek Bapat <abhishekbapat@google.com>
Subject: Re: [PATCH v2 0/6] alloc_tag: introduce IOCTL-based filtering for MAP
Date: Mon, 25 May 2026 15:32:20 +0800	[thread overview]
Message-ID: <4ae038f0-cc33-4a60-b59b-ae86bb541735@linux.dev> (raw)
In-Reply-To: <20260522131108.f972659717367c67082f3766@linux-foundation.org>

Hi Andrew and Suren


On 2026/5/23 04:11, Andrew Morton wrote:
> On Fri, 22 May 2026 17:45:32 +0000 Abhishek Bapat <abhishekbapat@google.com> wrote:
>
>> Currently, memory allocation profiling data is primarily exposed through
>> /proc/allocinfo. While useful for manual inspection, this text-based
>> interface poses challenges for production monitoring and large-scale
>> analysis:
>>
>> 1. Userspace must parse large amounts of text to extract specific
>> fields.
>> 2. To find specific tags, userspace must read the entire dataset,
>> requiring many context switches and high data copying.
>> 3. The kernel currently aggregates per-CPU counters for every allocation
>> size, even those the user intends to filter out immediately.
>>
>> This series introduces a new IOCTL-based binary interface for allocinfo
>> that supports kernel-side filtering. By allowing the user to specify a
>> filter mask, we significantly reduce the work performed in-kernel and
>> the amount of data transferred to userspace.
>>
>> Performance measurements were conducted on an Intel Xeon Platinum 8481C
>> (224 CPUs) with caches dropped before each run.
>>
>> The IOCTL mechanism shows a ~20x performance improvement for
>> filtered queries. The kernel avoids the expensive per-CPU counter
>> aggregation (alloc_tag_read) for any tags that fail the initial string
>> or location filters.
>>
>> Scenario 1: Specific File Filtering (arch/x86/events/rapl.c)
>> 1. Traditional (cat /proc/allocinfo | grep): 22ms (sys)
>> 2. IOCTL Interface: 1ms (sys)
>>
>> Scenario 2: Compound Filtering (Filename + Size)
>> 1. Traditional: (cat ... | grep | awk): 21ms (sys)
>> 2. IOCTL Interface: 1ms (sys)
>>
>> Scenario 3: Size-Based Filtering (min_size = 1MB)
>> 1. Traditional: (cat ... | awk): 21ms (sys)
>> 2. IOCTL Interface: 14ms (sys)
> Yup, textual interfaces aren't fast.
>
> And ioctl-baed interfaces aren't popular.  One would prefer to see an
> interface which uses read()/lseek(), pread(), etc.  It would be
> appropriate for this [0/N] to have a discussion of why that approach
> was not chosen.
>
>>   .../userspace-api/ioctl/ioctl-number.rst      |   2 +
>>   MAINTAINERS                                   |   2 +
>>   include/linux/codetag.h                       |   1 +
>>   include/uapi/linux/alloc_tag.h                |  87 +++
>>   lib/alloc_tag.c                               | 303 ++++++++++-
>>   lib/codetag.c                                 |  11 +
>>   tools/testing/selftests/alloc_tag/Makefile    |   9 +
>>   .../alloc_tag/allocinfo_ioctl_test.c          | 505 ++++++++++++++++++
>>   8 files changed, 918 insertions(+), 2 deletions(-)
>>   create mode 100644 include/uapi/linux/alloc_tag.h
>>   create mode 100644 tools/testing/selftests/alloc_tag/Makefile
>>   create mode 100644 tools/testing/selftests/alloc_tag/allocinfo_ioctl_test.c
> At some point this should grow user-facing documentation, please.
>
> And the right time for that is now, because such documentation is
> useful for code review - it makes that review both easier and more
> useful.
>
> Sashiko had a few things to say:
>
> 	https://sashiko.dev/#/patchset/cover.1779471082.git.abhishekbapat@google.com

I notice that Sashiko has reported a pre-existing issue, as described below:


 >  static void *allocinfo_start(struct seq_file *m, loff_t *pos)
This is a pre-existing issue, but can resuming a sequential read on
/proc/allocinfo cause a use-after-free if a kernel module is unloaded
between read() system calls?
The seq_file read operation updates priv->iter.ct during allocinfo_next(),
stops iteration, and returns to userspace. If the module containing
priv->iter.ct is unloaded while the lock is dropped, the module's codetag
memory is freed.
On the next read() system call, allocinfo_start() with pos > 0 reacquires
the lock but returns priv without validating if priv->iter.ct still belongs
to a valid module. Does allocinfo_show() then dereference this dangling
pointer?
[ ... ]

This issue is unrelated to the current patch series and can be resolved

by reverting commit 9f44df50fee4.

Therefore, I have submitted a separate patch addressing this issue,

which is available at the link below:

https://lore.kernel.org/all/20260525072117.112779-1-hao.ge@linux.dev/

Thanks

Best Regards

Hao


      reply	other threads:[~2026-05-25  7:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 17:45 [PATCH v2 0/6] alloc_tag: introduce IOCTL-based filtering for MAP Abhishek Bapat
2026-05-22 17:45 ` [PATCH v2 1/6] alloc_tag: add ioctl to /proc/allocinfo Abhishek Bapat
2026-05-22 20:11   ` Andrew Morton
2026-05-25  2:20   ` Hao Ge
2026-05-22 17:45 ` [PATCH v2 2/6] alloc_tag: add ioctl filters " Abhishek Bapat
2026-05-25  2:59   ` Hao Ge
2026-05-22 17:45 ` [PATCH v2 3/6] alloc_tag: add size-based filtering to ioctl Abhishek Bapat
2026-05-22 17:45 ` [PATCH v2 4/6] alloc_tag: add accuracy based " Abhishek Bapat
2026-05-22 17:45 ` [PATCH v2 5/6] kselftest: alloc_tag: add kselftest for ioctl interface Abhishek Bapat
2026-05-22 17:45 ` [PATCH v2 6/6] kselftest: alloc_tag: extend the allocinfo ioctl kselftest Abhishek Bapat
2026-05-22 20:11 ` [PATCH v2 0/6] alloc_tag: introduce IOCTL-based filtering for MAP Andrew Morton
2026-05-25  7:32   ` Hao Ge [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ae038f0-cc33-4a60-b59b-ae86bb541735@linux.dev \
    --to=hao.ge@linux.dev \
    --cc=abhishekbapat@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=skhan@linuxfoundation.org \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox