From: "dust.li" <dust.li@linux.alibaba.com>
To: David Miller <davem@davemloft.net>
Cc: kuba@kernel.org, edumazet@google.com, satoru.moriya@hds.com,
netdev@vger.kernel.org
Subject: Re: [PATCH v2] net: tracepoint: fix print wrong sysctl_mem value
Date: Thu, 10 Sep 2020 22:34:49 +0800 [thread overview]
Message-ID: <20200910143449.GA26740@linux.alibaba.com> (raw)
In-Reply-To: <20200908.194952.784011770473577866.davem@davemloft.net>
On Tue, Sep 08, 2020 at 07:49:52PM -0700, David Miller wrote:
>From: Dust Li <dust.li@linux.alibaba.com>
>Date: Tue, 8 Sep 2020 10:09:39 +0800
>
>> @@ -98,7 +98,7 @@ TRACE_EVENT(sock_exceed_buf_limit,
>>
>> TP_STRUCT__entry(
>> __array(char, name, 32)
>> - __field(long *, sysctl_mem)
>> + __array(long, sysctl_mem, 3)
>> __field(long, allocated)
>> __field(int, sysctl_rmem)
>> __field(int, rmem_alloc)
>> @@ -110,7 +110,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
>>
>> TP_fast_assign(
>> strncpy(__entry->name, prot->name, 32);
>> - __entry->sysctl_mem = prot->sysctl_mem;
>> + __entry->sysctl_mem[0] = prot->sysctl_mem[0];
>> + __entry->sysctl_mem[1] = prot->sysctl_mem[1];
>> + __entry->sysctl_mem[2] = prot->sysctl_mem[2];
>
>I can't understand at all why the current code doesn't work.
>
>We assign a pointer to entry->sysctl_mem and then print out the
>three words pointed to by that.
>
>It's so wasteful to copy this over every tracepoint entry so
>the pointer approach is very desirable.
Thanks for your reply!
I took a close look at the code generated by tracepoint and
found the problem is not the tracepoint itself, but `perf trace`.
My previous output was got by running:
`perf trace -e sock:sock_exceed_buf_limit`
This time, I tried directly read from /sys/kernel/debug/tracing/trace,
and everything is right :)
So I checked the code of perf tool, and found the foundamatal difference
is `perf trace` did the string formatting in the userspace, but raw ftrace
did it in the kernel.
When using `perf trace`, the kernel passes the string format and the
data to perf using the perf ringbuffer, and no one in the kernel will
try to visit the pointer sysctl_mem is pointed to, so the the userspace
perf got the original pointer of sysctl_mem and tries to do the formating,
which result in the wrong data in the commit log.
The key call trace when using `perf trace` is this:
trace_sock_exceed_buf_limit()
--> perf_trace_sock_exceed_buf_limit()
{
...
perf_fetch_caller_regs(__regs);
{
strncpy(entry->name, prot->name, 32);
entry->sysctl_mem = prot->sysctl_mem;
entry->allocated = allocated;
entry->sysctl_rmem = sk_get_rmem0(sk, prot);
entry->rmem_alloc = atomic_read(&sk->sk_backlog.rmem_alloc);
entry->sysctl_wmem = sk_get_wmem0(sk, prot);
entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
entry->wmem_queued = sk->sk_wmem_queued;
entry->kind = kind;
}
perf_trace_run_bpf_submit(entry, __entry_size, rctx, event_call, \
__count, __regs, head, __task);
}
Here *entry* is directly passed in to perf_trace_run_bpf_submit()
as raw data, and perf_trace_run_bpf_submit() won't do string formatting
but just pass them to the userspace perf, which will finnally did the
formatting, but it's already too late to get sysctl_mem[x].
So, any pointer dereference in tracepoint entry should failed in
`perf trace`.
Thanks.
Dust.Li
prev parent reply other threads:[~2020-09-10 21:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-08 2:09 [PATCH v2] net: tracepoint: fix print wrong sysctl_mem value Dust Li
2020-09-09 2:49 ` David Miller
2020-09-10 14:34 ` dust.li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200910143449.GA26740@linux.alibaba.com \
--to=dust.li@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=satoru.moriya@hds.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.