public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shile Zhang <shile.zhang@linux.alibaba.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
Date: Fri, 1 May 2020 09:22:35 -0400 (EDT)	[thread overview]
Message-ID: <2063204938.79085.1588339355917.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200501002018.76f1e4b6@gandalf.local.home>

----- On May 1, 2020, at 12:20 AM, rostedt rostedt@goodmis.org wrote:

> On Thu, 30 Apr 2020 22:26:55 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
>> The tracers just have to make sure they perform their vmalloc'd memory
>> allocation before registering the tracepoint which can touch it, else they
>> need to issue vmalloc_sync_mappings() on their own before making the
>> newly allocated memory observable by instrumentation.
> 
> What gets me is that I added the patch below (which adds a
> vmalloc_sync_mappings() just after the alloc_percpu()), but I also recorded
> all instances of vmalloc() with a stackdump, and I get this:
> 
>          colord-1673  [002] ....    84.764804: __vmalloc_node_range+0x5/0x2c0: vmalloc
>          called here
>          colord-1673  [002] ....    84.764807: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => module_alloc+0x7e/0xd0
> => bpf_jit_binary_alloc+0x70/0x110
> => bpf_int_jit_compile+0x139/0x40a
> => bpf_prog_select_runtime+0xa3/0x120
> => bpf_prepare_filter+0x533/0x5a0
> => sk_attach_filter+0x13/0x50
> => sock_setsockopt+0xd2f/0xf90
> => __sys_setsockopt+0x18a/0x1a0
> => __x64_sys_setsockopt+0x20/0x30
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> 
> 
> [ the above is from before the tracing started ]
> 
>       trace-cmd-1687  [002] ....   103.908850: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1687  [002] ....   103.908856: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0x23d/0x2b0
> => pid_write.isra.62+0xd1/0x2f0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
>       trace-cmd-1697  [003] ....   104.088950: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1697  [003] ....   104.088954: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0x23d/0x2b0
> => pid_write.isra.62+0xd1/0x2f0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
>       trace-cmd-1697  [003] ....   104.089666: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1697  [003] ....   104.089669: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0xc1/0x2b0
> => pid_write.isra.62+0xd1/0x2f0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
>       trace-cmd-1697  [003] ....   104.098920: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1697  [003] ....   104.098924: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0xc1/0x2b0
> => pid_write.isra.62+0xd1/0x2f0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
>       trace-cmd-1697  [003] ....   104.114518: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1697  [003] ....   104.114520: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0xc1/0x2b0
> => pid_write.isra.62+0xd1/0x2f0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
>       trace-cmd-1697  [003] ....   104.130705: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1697  [003] ....   104.130707: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0x23d/0x2b0
> => event_pid_write.isra.30+0x21b/0x3b0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
>       trace-cmd-1687  [001] ....   106.000510: __vmalloc_node_range+0x5/0x2c0: vmalloc
>       called here
>       trace-cmd-1687  [001] ....   106.000514: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => vzalloc+0x48/0x50
> => trace_pid_write+0x23d/0x2b0
> => pid_write.isra.62+0xd1/0x2f0
> => vfs_write+0xa8/0x1b0
> => ksys_write+0x67/0xe0
> => do_syscall_64+0x60/0x230
> => entry_SYSCALL_64_after_hwframe+0x49/0xb3
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> => 0
> 
> The above is the calls to adding pids to set_event_pid. (I see I should
> probably make that code a bit more efficient, it calls the vmalloc code a
> bit too much).
> 
> But what is missing, is the call to vmalloc from alloc_percpu(). In fact, I
> put in printks in the vmalloc() that's in alloc_percpu() and it doesn't
> trigger from the tracing code, and it does show up in my trace from other
> areas of the kernel:
> 
>     kworker/1:3-204   [001] ....    42.888340: __vmalloc_node_range+0x5/0x2c0:
>     vmalloc called here
>     kworker/1:3-204   [001] ....    42.888342: <stack trace>
> => __ftrace_trace_stack+0x161/0x1a0
> => __vmalloc_node_range+0x4d/0x2c0
> => __vmalloc+0x30/0x40
> => pcpu_create_chunk+0x77/0x220
> => pcpu_balance_workfn+0x407/0x650
> => process_one_work+0x25e/0x5c0
> => worker_thread+0x30/0x380
> => kthread+0x139/0x160
> => ret_from_fork+0x3a/0x50
> 
> So I'm still not 100% sure why the percpu data is causing a problem?

I suspect that this is simply because alloc_percpu is calling __vmalloc()
to allocate a "chunk" before you even started tracing, possibly early at
boot. Then it happens that your own alloc_percpu allocation fits in an
already vmallocated area which is still "free".

Thanks,

Mathieu

> 
> -- Steve
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 8d2b98812625..10e4970a150c 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -8486,6 +8486,7 @@ allocate_trace_buffer(struct trace_array *tr, struct
> array_buffer *buf, int size
> 		return -ENOMEM;
> 
> 	buf->data = alloc_percpu(struct trace_array_cpu);
> +	vmalloc_sync_mappings();
> 	if (!buf->data) {
> 		ring_buffer_free(buf->buffer);
> 		buf->buffer = NULL;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 9a8227afa073..489cf0620edc 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2543,6 +2543,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned
> long align,
> 	void *addr;
> 	unsigned long real_size = size;
> 
> +	trace_printk("vmalloc called here\n");
> +	trace_dump_stack(0);
> 	size = PAGE_ALIGN(size);
> 	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
>  		goto fail;

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

      reply	other threads:[~2020-05-01 13:22 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29  9:48 [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-04-29 10:59 ` Joerg Roedel
2020-04-29 12:28   ` Steven Rostedt
2020-04-29 14:07     ` Steven Rostedt
2020-04-29 14:10       ` Joerg Roedel
2020-04-29 14:32         ` Steven Rostedt
2020-04-29 15:44           ` Peter Zijlstra
2020-04-29 16:17       ` Joerg Roedel
2020-04-29 16:20         ` Joerg Roedel
2020-04-29 16:52           ` Steven Rostedt
2020-04-29 17:29             ` Mathieu Desnoyers
2020-04-29 18:51               ` Peter Zijlstra
2020-04-30 14:11       ` Joerg Roedel
2020-04-30 14:50         ` Joerg Roedel
2020-04-30 15:20           ` Mathieu Desnoyers
2020-04-30 16:16             ` Steven Rostedt
2020-04-30 16:18               ` Mathieu Desnoyers
2020-04-30 16:30                 ` Steven Rostedt
2020-04-30 16:35                   ` Mathieu Desnoyers
2020-04-30 15:23         ` Mathieu Desnoyers
2020-04-30 16:12           ` Steven Rostedt
2020-04-30 16:11         ` Steven Rostedt
2020-04-30 16:16           ` Mathieu Desnoyers
2020-04-30 16:25             ` Steven Rostedt
2020-04-30 19:14           ` Joerg Roedel
2020-05-01  1:13             ` Steven Rostedt
2020-05-01  2:26               ` Mathieu Desnoyers
2020-05-01  2:39                 ` Steven Rostedt
2020-05-01 10:16                   ` Joerg Roedel
2020-05-01 13:35                   ` Mathieu Desnoyers
2020-05-04 15:12                   ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Joerg Roedel
2020-05-04 15:28                     ` Mathieu Desnoyers
2020-05-04 15:31                       ` Joerg Roedel
2020-05-04 15:38                         ` Mathieu Desnoyers
2020-05-04 15:51                           ` Joerg Roedel
2020-05-04 17:04                           ` Steven Rostedt
2020-05-04 17:40                     ` Steven Rostedt
2020-05-04 18:38                       ` Joerg Roedel
2020-05-04 19:10                         ` Steven Rostedt
2020-05-05 12:31                           ` [PATCH] tracing: Call vmalloc_sync_mappings() after alloc_percpu() Joerg Roedel
2020-05-06 15:17                             ` Steven Rostedt
2020-05-08 14:42                               ` Joerg Roedel
2020-05-04 20:25                     ` [PATCH] percpu: Sync vmalloc mappings in pcpu_alloc() and free_percpu() Peter Zijlstra
2020-05-04 20:43                       ` Steven Rostedt
2020-05-01  4:20                 ` [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Steven Rostedt
2020-05-01 13:22                   ` Mathieu Desnoyers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2063204938.79085.1588339355917.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=shile.zhang@linux.alibaba.com \
    --cc=tglx@linutronix.de \
    --cc=tz.stoyanov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox