From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: wuqiang <wuqiang.matt@bytedance.com>
Cc: davem@davemloft.net, anil.s.keshavamurthy@intel.com,
naveen.n.rao@linux.ibm.com, rostedt@goodmis.org,
peterz@infradead.org, akpm@linux-foundation.org,
sander@svanheule.net, ebiggers@google.com,
dan.j.williams@intel.com, jpoimboe@kernel.org,
linux-kernel@vger.kernel.org, mattwu@163.com,
kernel test robot <lkp@intel.com>
Subject: Re: [PATCH v4] kprobes,lib: kretprobe scalability improvement
Date: Fri, 4 Nov 2022 10:28:10 +0900 [thread overview]
Message-ID: <20221104102810.b2c10d3334483eb0b3a2ef42@kernel.org> (raw)
In-Reply-To: <15d382af-512f-cb06-d8c2-cd9e16d870f6@bytedance.com>
On Fri, 4 Nov 2022 00:45:19 +0800
wuqiang <wuqiang.matt@bytedance.com> wrote:
> On 2022/11/3 10:51, Masami Hiramatsu (Google) wrote:
> > Hi wuqiang (or Matt?),
> >
> > Thanks for updating the patch! I have some comments below,
>
> Thanks for your time :)
>
> > On Wed, 2 Nov 2022 10:30:12 +0800
> > wuqiang <wuqiang.matt@bytedance.com> wrote:
> >
> ...
> >> Changes since V3:
> >> 1) build warning: unused variable in fprobe_init_rethook
> >> Reported-by: kernel test robot <lkp@intel.com>
> >>
> >> Changes since V2:
> >> 1) the percpu-extended version of the freelist replaced by new percpu-
> >> ring-array. freelist has data-contention in freelist_node (refs and
> >> next) even after node is removed from freelist and the node could
> >> be polluted easily (with freelist_node defined in union)
> >> 2) routines split to objpool.h and objpool.c according to cold & hot
> >> pathes, and the latter moved to lib, as suggested by Masami
> >> 3) test module (test_objpool.ko) added to lib for functional testings
> >>
> >> Changes since V1:
> >> 1) reformat to a single patch as Masami Hiramatsu suggested
> >> 2) use __vmalloc_node to replace vmalloc_node for vmalloc
> >> 3) a few minor fixes: typo and coding-style issues
> >
> > Recording change log is very good. But if it becomes too long,
> > you can put a URL of the previous series on LKML instead of
> > appending the change logs.
> > You can get the URL (permlink) by "lkml.kernel.org/r/<your-message-id>"
>
> Got it.
>
> >>
> >> Signed-off-by: wuqiang <wuqiang.matt@bytedance.com>
> >> ---
> >> include/linux/freelist.h | 129 -----
> >> include/linux/kprobes.h | 9 +-
> >> include/linux/objpool.h | 151 ++++++
> >> include/linux/rethook.h | 15 +-
> >> kernel/kprobes.c | 95 ++--
> >> kernel/trace/fprobe.c | 17 +-
> >> kernel/trace/rethook.c | 80 +--
> >> lib/Kconfig.debug | 11 +
> >> lib/Makefile | 4 +-
> >> lib/objpool.c | 480 ++++++++++++++++++
> >> lib/test_objpool.c | 1031 ++++++++++++++++++++++++++++++++++++++
> >> 11 files changed, 1772 insertions(+), 250 deletions(-)
> >
> > Hmm, this does too much things in 1 patch.
> > Can you split this in below 5 patches? This also makes clear that who
> > needs to review which part.
>
> I was ever considering of splitting, but finally decided to mix them in a big
> one mostly because it's only for kretprobe improvement.
>
> Next version I'll split to smaller patches. Thank you for the advice.
>
> >
> > - Add generic scalable objpool
> > - Add objpool test
> > - Use objpool in kretprobe
> > - Use objpool in fprobe and rethook
> > - Remove unused freelist
> >
> >> delete mode 100644 include/linux/freelist.h
> >> create mode 100644 include/linux/objpool.h
> >> create mode 100644 lib/objpool.c
> >> create mode 100644 lib/test_objpool.c
> >>
> > [...]
> >> +
> >> +struct objpool_slot {
> >> + uint32_t os_head; /* head of ring array */
> >
> > If all fields have "os_" prefix, it is meaningless.
> >
> >> + uint32_t os_tail; /* tail of ring array */
> >> + uint32_t os_size; /* max item slots, pow of 2 */
> >> + uint32_t os_mask; /* os_size - 1 */
> >> +/*
> >> + * uint32_t os_ages[]; // ring epoch id
> >> + * void *os_ents[]; // objects array
> >
> > "entries[]"
>
> I'll refine the comments here to better explain the memory layout.
>
> >
> >> + */
> >> +};
> >> +
> >> +/* caller-specified object initial callback to setup each object, only called once */
> >> +typedef int (*objpool_init_node_cb)(void *context, void *obj);
> >> +
> >> +/* caller-specified cleanup callback for private objects/pool/context */
> >> +typedef int (*objpool_release_cb)(void *context, void *ptr, uint32_t flags);
> >> +
> >> +/* called for object releasing: ptr points to an object */
> >> +#define OBJPOOL_FLAG_NODE (0x00000001)
> >> +/* for user pool and context releasing, ptr could be NULL */
> >> +#define OBJPOOL_FLAG_POOL (0x00001000)
> >> +/* the object or pool to be released is user-managed */
> >> +#define OBJPOOL_FLAG_USER (0x00008000)
> >> +
> >> +/*
> >> + * objpool_head: object pooling metadata
> >> + */
> >> +
> >> +struct objpool_head {
> >> + uint32_t oh_objsz; /* object & element size */
> >> + uint32_t oh_nobjs; /* total objs (pre-allocated) */
> >> + uint32_t oh_nents; /* max objects per cpuslot */
> >> + uint32_t oh_ncpus; /* num of possible cpus */
> >
> > If all fields have "oh_" prefix, it is meaningless.
> > Also, if there is no reason to be 32bit (like align the structure size
> > for cache, or pack the structure for streaming etc.) use appropriate types.
> >
> > And please do not align the length of field name unnaturally. e.g.
>
> Kind of obsessive-compulsive symptom, haha :) The struct size of objpool_head
> doesn't matter. The size of objpool_slot does matter, as managed in a single
> cache-line.
Yeah, so objpool_slot is good to use uint32_t. You may also need __packed and
__cacheline_aligned for objpool_slot ;)
Thank you!
>
> >
> > size_t obj_size; /* size_t or unsigned int, I don't care. */
> > int nr_objs; /* I think just 'int' is enough because the value should be
> > checked and limited under INT_MAX */
> > int max_entries;
> > unsigned int nr_cpus;
> >
> > (BTW why we need to limit the nr_cpus here? we have num_possible_cpus())
>
> You are right that nr_cpus is unnecessary. num_possible_cpus() just keeps the
> same even with new hot-plugged cpus.
>
> >
> >> + uint32_t oh_in_user:1; /* user-specified buffer */
> >> + uint32_t oh_in_slot:1; /* objs alloced with slots */
> >> + uint32_t oh_vmalloc:1; /* alloc from vmalloc zone */
> >
> > Please use "bool" or "unsigned long flags" with flag bits.
> >
> >> + gfp_t oh_gfp; /* k/vmalloc gfp flags */
> >> + uint32_t oh_sz_pool; /* user pool size in byes */
> >
> > size_t pool_size
> >
> >> + void *oh_pool; /* user managed memory pool */
> >> + struct objpool_slot **oh_slots; /* array of percpu slots */
> >> + uint32_t *oh_sz_slots; /* size in bytes of slots */
> >
> > size_t slot_size
> >
>
> Will apply these changes in next version.
>
> >> + objpool_release_cb oh_release; /* resource cleanup callback */
> >> + void *oh_context; /* caller-provided context */
> >> +};
> >
> > Thank you,
> >
>
> Best regards,
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
next prev parent reply other threads:[~2022-11-04 1:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-01 1:43 [PATCH v3] kprobes,lib: kretprobe scalability improvement wuqiang
2022-11-01 12:27 ` kernel test robot
2022-11-02 2:30 ` [PATCH v4] " wuqiang
2022-11-02 21:33 ` Andrew Morton
2022-11-03 16:46 ` wuqiang
2022-11-03 2:51 ` Masami Hiramatsu
2022-11-03 16:45 ` wuqiang
2022-11-04 1:28 ` Masami Hiramatsu [this message]
2022-11-06 5:34 ` [PATCH v5 0/4] lib,kprobes: " wuqiang
2022-11-06 5:34 ` [PATCH v5 1/4] lib: objpool added: ring-array based lockless MPMC queue wuqiang
2022-11-06 5:34 ` [PATCH v5 2/4] lib: objpool test module added wuqiang
2022-11-06 5:34 ` [PATCH v5 3/4] kprobes: kretprobe scalability improvement with objpool wuqiang
2022-11-06 5:34 ` [PATCH v5 4/4] kprobes: freelist.h removed wuqiang
2022-11-08 7:14 ` [PATCH v6 0/4] lib,kprobes: kretprobe scalability improvement wuqiang
2022-11-08 7:14 ` [PATCH v6 1/4] lib: objpool added: ring-array based lockless MPMC queue wuqiang
2022-11-14 15:54 ` Masami Hiramatsu
2022-11-16 10:42 ` wuqiang
2022-11-08 7:14 ` [PATCH v6 2/4] lib: objpool test module added wuqiang
2022-11-08 7:14 ` [PATCH v6 3/4] kprobes: kretprobe scalability improvement with objpool wuqiang
2022-11-14 15:56 ` Masami Hiramatsu
2022-11-14 15:56 ` Masami Hiramatsu
2022-11-16 10:49 ` wuqiang
2022-11-08 7:14 ` [PATCH v6 4/4] kprobes: freelist.h removed wuqiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221104102810.b2c10d3334483eb0b3a2ef42@kernel.org \
--to=mhiramat@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=anil.s.keshavamurthy@intel.com \
--cc=dan.j.williams@intel.com \
--cc=davem@davemloft.net \
--cc=ebiggers@google.com \
--cc=jpoimboe@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mattwu@163.com \
--cc=naveen.n.rao@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sander@svanheule.net \
--cc=wuqiang.matt@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox