Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: "David Wang" <00107082@163.com>
To: "Kent Overstreet" <kent.overstreet@linux.dev>
Cc: "Suren Baghdasaryan" <surenb@google.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo
Date: Fri, 9 May 2025 00:24:56 +0800 (CST)	[thread overview]
Message-ID: <7bf1ee37.b6a4.196b0b6dce1.Coremail.00107082@163.com> (raw)
In-Reply-To: <y6d7vzvii5wvfby5446ukpvdmulwd5lzcyki6rpxckh432d6jz@xwtlwnkhztuo>

At 2025-05-08 21:33:50, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
>On Thu, May 08, 2025 at 01:51:48PM +0800, David Wang wrote:
>> At 2025-05-08 12:07:40, "Kent Overstreet" <kent.overstreet@linux.dev> wrote:
>> >Another thing to note is that memory layout - avoiding pointer chasing -
>> >is hugely important, but it'll almost never show up as allocator calls.
>> >
>> >To give you some examples, mempools and biosets used to be separately
>> >allocated. This was mainly to make error paths in outer object
>> >constructors/destructors easier and safer: instead of keeping track of
>> >what's initialized and what's not, if you've got a pointer to a
>> >mempool/bioset you call *_free() on it.
>> >
>> >(People hadn't yet clued that you can just kzalloc() the entire outer
>> >object, and then if the inner object is zeroed it wasn't initialized).
>> >
>> >But that means you're adding a pointer chase to every mempool_alloc()
>> >call, and since bioset itself has mempools allocating bios had _two_
>> >unnecessary pointer derefs. That's death for performance when you're
>> >running cache cold, but since everyone benchmarks cache-hot...
>> >
>> >(I was the one who fixed that).
>> >
>> >Another big one was generic_file_buffered_read(). Main buffered read
>> >path, everyone wants it to be as fast as possible.
>> >
>> >But the core is (was) a loop that walks the pagecache radix tree to get
>> >the page, then copies 4k of data out to userspace (there goes l1), then
>> >repeats all that pointer chasing for the next 4k. Pre large folios, it
>> >was horrific.
>> >
>> >Solution - vectorize it. Look up all the pages we're copying from all at
>> >once, stuff them in a (dynamically allocated! for each read!) vector,
>> >and then do the copying out to userspace all at once. Massive
>> >performance gain.
>> >
>> >Of course, to do that I first had to clean up a tangled 250+ line
>> >monstrosity of half baked, poorly thought out "optimizations" (the worst
>> >spaghetti of gotos you'd ever seen) and turn it into something
>> >manageable...
>> >
>> >So - keep things simple, don't overthink the little stuff, so you can
>> >spot and tackle the big algorithmic wins :)
>> I will keep this in mind~! :)
>> 
>> And thanks for the enlightening notes~!! 
>> 
>> Though I could not quite catch up with the first one,  I think I got
>> the point: avoid unnecessary pointer chasing and  keep the pointer
>> chasing as short(balanced) as possible~ 
>
>To illustrate - DRAM latency is 30-70n.
>
>At 4GHz, that's 120-280 cycles, and a properly fed CPU can do multiple
>instructions per clock - so a cache miss all the way to DRAM can cost
>you hundreds of instructions.

Oh, I understand cache miss is bad, it is the "mempools and biosets" I
that I have hard time to connect dots with, due to lack of knowledge.....


>
>> The second one, about copy 4k by 4k, seems  quite similar to seq_file,
>> at least the "4k" part, literally. seq_file read()  defaults to alloc
>> 4k buffer, and read data until EOF or the 4k buffer is full,   and
>> start over again for the next read().   
>>
>> One solution could be make changes to seq_file, do not stop until user
>> buffer is full for each read. kind of similar to your second note, in
>> a sequential style,  I think.
>>
>> If  user read with 128K buffer,  and seq_file fill the buffer 4k by
>> 4k, it would only need ~3 read calls for allocinfo. (I did post a
>> patch for seq_file to fill user buffer, but start/stop still happens
>> at  4k boundary , so no help for 
>> the iterator rewinding when read /proc/allocinfo yet.
>> https://lore.kernel.org/lkml/20241220140819.9887-1-00107082@163.com/ )
>> The solution in this patch is keeping the iterator alive and valid
>> cross read boundary, this can  also avoid the cost for each start
>> over.
>
>The first question is - does it matter? If the optimization is just for
>/proc/allocinfo, who's reading it at a high enough rate that we care?
>
>If it's only being used interactively, it doesn't matter. If it's being
>read at a high rate by some sort of profiling program, we'd want to skip
>the text interface entirely and add an ioctl to read the data out in a
>binary format.
...^_^, Actually, I have been running tools parsing /proc/allocinfo every 5 seconds
,and feeding data to a prometheus server for a quite long while...
5 seconds seems not that frequent, but I also have all other proc files to read, 
I would like optimization for all the proc files......

Ioctl or other binary interfaces are indeed more efficient, but most are
not well documented, while most proc files are self-documented. If proc files
are efficient enough, I think I would stay with proc files even with a binary
interface alternate tens of fold faster.


>
>The idea of changing seq_file to continue until the user buffer is full
>- that'd be a good one, if you're making changes that benefit all
>seq_file users.
I did make that patch, I think I am still waiting feedback......

next prev parent reply	other threads:[~2025-05-08 16:25 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 17:55 [PATCH] alloc_tag: avoid mem alloc and iter reset when reading allocinfo David Wang
2025-05-07 18:19 ` David Wang
2025-05-07 23:42   ` [PATCH] " Suren Baghdasaryan
2025-05-08  0:01     ` Kent Overstreet
2025-05-08  3:06       ` David Wang
2025-05-08  3:31         ` Kent Overstreet
2025-05-08  3:35           ` David Wang
2025-05-08  4:07             ` Kent Overstreet
2025-05-08  5:51               ` David Wang
2025-05-08 13:33                 ` Kent Overstreet
2025-05-08 16:24                   ` David Wang [this message]
2025-05-08 16:34                     ` Kent Overstreet
2025-05-08 16:58                       ` David Wang
2025-05-08 17:17                         ` David Wang
2025-05-08 17:26                           ` Kent Overstreet
2025-05-08  2:24     ` David Wang
2025-05-07 23:36 ` Suren Baghdasaryan
2025-05-08  3:10   ` David Wang
2025-05-08 15:32   ` David Wang
2025-05-08 21:41     ` Suren Baghdasaryan
2025-05-09  5:53 ` [PATCH 2/2] alloc_tag: keep codetag iterator cross read() calls David Wang
2025-05-09 17:34   ` Suren Baghdasaryan
2025-05-09 17:45     ` David Wang
2025-05-09 17:39 ` [PATCH v2 2/2] alloc_tag: keep codetag iterator active between " David Wang
2025-05-09 18:33   ` Tim Chen
2025-05-09 19:36     ` Suren Baghdasaryan
2025-05-09 19:46       ` Tim Chen
2025-05-09 20:46         ` Suren Baghdasaryan
2025-05-09 21:15           ` Suren Baghdasaryan
2025-05-10  3:10             ` David Wang
2025-05-10  3:30               ` Suren Baghdasaryan
2025-05-10  3:58                 ` David Wang
2025-05-10  4:03                   ` Suren Baghdasaryan
2025-05-10  3:35         ` David Wang
2025-05-10  3:25     ` David Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bf1ee37.b6a4.196b0b6dce1.Coremail.00107082@163.com \
    --to=00107082@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox