From: "dust.li" <dust.li@linux.alibaba.com>
To: David Miller <davem@davemloft.net>
Cc: kuba@kernel.org, edumazet@google.com, satoru.moriya@hds.com,
netdev@vger.kernel.org
Subject: Re: [PATCH v2] net: tracepoint: fix print wrong sysctl_mem value
Date: Thu, 10 Sep 2020 22:34:49 +0800 [thread overview]
Message-ID: <20200910143449.GA26740@linux.alibaba.com> (raw)
In-Reply-To: <20200908.194952.784011770473577866.davem@davemloft.net>
On Tue, Sep 08, 2020 at 07:49:52PM -0700, David Miller wrote:
>From: Dust Li <dust.li@linux.alibaba.com>
>Date: Tue, 8 Sep 2020 10:09:39 +0800
>
>> @@ -98,7 +98,7 @@ TRACE_EVENT(sock_exceed_buf_limit,
>>
>> TP_STRUCT__entry(
>> __array(char, name, 32)
>> - __field(long *, sysctl_mem)
>> + __array(long, sysctl_mem, 3)
>> __field(long, allocated)
>> __field(int, sysctl_rmem)
>> __field(int, rmem_alloc)
>> @@ -110,7 +110,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
>>
>> TP_fast_assign(
>> strncpy(__entry->name, prot->name, 32);
>> - __entry->sysctl_mem = prot->sysctl_mem;
>> + __entry->sysctl_mem[0] = prot->sysctl_mem[0];
>> + __entry->sysctl_mem[1] = prot->sysctl_mem[1];
>> + __entry->sysctl_mem[2] = prot->sysctl_mem[2];
>
>I can't understand at all why the current code doesn't work.
>
>We assign a pointer to entry->sysctl_mem and then print out the
>three words pointed to by that.
>
>It's so wasteful to copy this over every tracepoint entry so
>the pointer approach is very desirable.
Thanks for your reply!
I took a close look at the code generated by tracepoint and
found the problem is not the tracepoint itself, but `perf trace`.
My previous output was got by running:
`perf trace -e sock:sock_exceed_buf_limit`
This time, I tried directly read from /sys/kernel/debug/tracing/trace,
and everything is right :)
So I checked the code of perf tool, and found the foundamatal difference
is `perf trace` did the string formatting in the userspace, but raw ftrace
did it in the kernel.
When using `perf trace`, the kernel passes the string format and the
data to perf using the perf ringbuffer, and no one in the kernel will
try to visit the pointer sysctl_mem is pointed to, so the the userspace
perf got the original pointer of sysctl_mem and tries to do the formating,
which result in the wrong data in the commit log.
The key call trace when using `perf trace` is this:
trace_sock_exceed_buf_limit()
--> perf_trace_sock_exceed_buf_limit()
{
...
perf_fetch_caller_regs(__regs);
{
strncpy(entry->name, prot->name, 32);
entry->sysctl_mem = prot->sysctl_mem;
entry->allocated = allocated;
entry->sysctl_rmem = sk_get_rmem0(sk, prot);
entry->rmem_alloc = atomic_read(&sk->sk_backlog.rmem_alloc);
entry->sysctl_wmem = sk_get_wmem0(sk, prot);
entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
entry->wmem_queued = sk->sk_wmem_queued;
entry->kind = kind;
}
perf_trace_run_bpf_submit(entry, __entry_size, rctx, event_call, \
__count, __regs, head, __task);
}
Here *entry* is directly passed in to perf_trace_run_bpf_submit()
as raw data, and perf_trace_run_bpf_submit() won't do string formatting
but just pass them to the userspace perf, which will finnally did the
formatting, but it's already too late to get sysctl_mem[x].
So, any pointer dereference in tracepoint entry should failed in
`perf trace`.
Thanks.
Dust.Li
prev parent reply other threads:[~2020-09-10 21:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-08 2:09 [PATCH v2] net: tracepoint: fix print wrong sysctl_mem value Dust Li
2020-09-09 2:49 ` David Miller
2020-09-10 14:34 ` dust.li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200910143449.GA26740@linux.alibaba.com \
--to=dust.li@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=satoru.moriya@hds.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).