From: "Wangnan (F)" <wangnan0@huawei.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <acme@kernel.org>, <linux-kernel@vger.kernel.org>,
<pi3orama@163.com>, <lizefan@huawei.com>,
<netdev@vger.kernel.org>, <davem@davemloft.net>,
Adrian Hunter <adrian.hunter@intel.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
David Ahern <dsahern@gmail.com>, Ingo Molnar <mingo@kernel.org>,
Yunlong Song <yunlong.song@huawei.com>
Subject: Re: [PATCH 27/53] perf/core: Put size of a sample at the end of it by PERF_SAMPLE_TAILSIZE
Date: Tue, 12 Jan 2016 13:33:28 +0800 [thread overview]
Message-ID: <56949028.2070208@huawei.com> (raw)
In-Reply-To: <20160111180913.GA25950@ast-mbp.thefacebook.com>
On 2016/1/12 2:09, Alexei Starovoitov wrote:
> On Mon, Jan 11, 2016 at 01:48:18PM +0000, Wang Nan wrote:
>> This patch introduces a PERF_SAMPLE_TAILSIZE flag which allows a size
>> field attached at the end of a sample. The idea comes from [1] that,
>> with tie size at tail of an event, it is possible for user program who
>> read from the ring buffer parse events backward.
>>
>> For example:
>>
>> head
>> |
>> V
>> +--+---+-------+----------+------+---+
>> |E6|...| B 8| C 11| D 7|E..|
>> +--+---+-------+----------+------+---+
>>
>> In this case, from the 'head' pointer provided by kernel, user program
>> can first see '6' by (*(head - sizeof(u64))), then it can get the start
>> pointer of record 'E', then it can read size and find start position
>> of record D, C, B in similar way.
> adding extra 8 bytes for every sample is quite unfortunate.
> How about another idea:
> . update data_tail pointer when head is about to overwrite it
>
> Ex:
> head data_tail
> | |
> V V
> +--+-------+-------+---+----+---+
> |E | ... | B | C | D | E |
> +--+-------+-------+---+----+---+
>
> if new sample F is about to overwrite B, the kernel would need
> to read the size of B from B's header and update data_tail to point C.
> Or even further.
> Comparing to TAILSIZE approach, now kernel will be doing both reads
> and writes into ring-buffer and there is a concern that reads may
> be hitting cold data, but if the records are small they may be
> actually on the same cache line brought by the previous
> read A's header, write E record cycle. So I think we shouldn't see
> cache misses.
After ring buffer rewind, we need a read before nearly
every write operations. The performance penalty depends on
configuration of write allocate. In addition, another data
dependency is required: we must wait for the size of
event B is retrived before overwrite it.
Even in the very first try at 2013 in [1], reading from the ring
buffer is avoided. I don't think Peter changes his mind now.
> Another concern is validity of records stored. If user space messes
> with ring-buffer, kernel won't be able to move data_tail properly
> and would need to indicate that to userspace somehow.
> But memory saving of 8 bytes per record could be sizable
Yes. But I have already discussed with Peter on this in [2].
Last month I suggested:
<quote>
1. If PERF_SAMPLE_SIZE is selected, we can avoid outputting the event
size in header. Which eliminate extra space cost;
</quote>
However:
<quote>
That would mandate you always parse the stream backwards. Which seems
rather unfortunate. Also, no you cannot recoup the extra space, see the
alignment and size requirement.
</quote>
> and
> user space wouldn't need to walk the whole buffer backwards and
> can just start from valid data_tail, so the dumps of overwrite
> ring-buffer will be faster too.
> Thoughts?
>
Please also refer to [3]. In that patch we introduced a userspace
ring buffer in perf and make it continously collect data from
normal ring buffers. Since we have to wake up perf to read data,
the cost is even higher.
[1]
http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net
[2]
http://lkml.kernel.org/r/20151203100801.GV3816@twins.programming.kicks-ass.net
[3]
http://lkml.kernel.org/r/1448373632-8806-1-git-send-email-yunlong.song@huawei.com
next prev parent reply other threads:[~2016-01-12 5:33 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-11 13:47 [PATCH 00/53] perf tools: Bugfix, BPF improvement and perf record flight record mode Wang Nan
2016-01-11 13:47 ` [PATCH 01/53] perf tools: Add -lutil in python lib list for broken python-config Wang Nan
2016-01-12 9:43 ` Jiri Olsa
2016-01-11 13:47 ` [PATCH 02/53] perf tools: Fix phony build target for build-test Wang Nan
2016-01-11 13:47 ` [PATCH 03/53] perf tools: Set parallel making options build-test Wang Nan
2016-01-11 13:47 ` [PATCH 04/53] perf tools: Pass O option to Makefile.perf in build-test Wang Nan
2016-01-11 13:47 ` [PATCH 05/53] perf tools: Test correct path of perf " Wang Nan
2016-01-11 15:24 ` Arnaldo Carvalho de Melo
2016-01-11 22:06 ` Arnaldo Carvalho de Melo
2016-01-11 22:39 ` Arnaldo Carvalho de Melo
2016-01-11 22:39 ` Arnaldo Carvalho de Melo
2016-01-12 7:16 ` Wangnan (F)
2016-01-12 14:08 ` Arnaldo Carvalho de Melo
2016-01-11 13:47 ` [PATCH 06/53] perf tools: Fix PowerPC native building Wang Nan
2016-01-11 13:47 ` [PATCH 07/53] tools: Move Makefile.arch from perf/config to tools/scripts Wang Nan
2016-01-11 13:52 ` Wangnan (F)
2016-01-11 14:10 ` Arnaldo Carvalho de Melo
2016-01-11 13:47 ` [PATCH 08/53] perf tools: Add missing sources in perf's MANIFEST Wang Nan
2016-01-11 13:48 ` [PATCH 09/53] perf: bpf: Fix build breakage due to libbpf Wang Nan
2016-01-11 13:48 ` [PATCH 10/53] tools build: Add BPF feature check to test-all Wang Nan
2016-01-11 13:48 ` [PATCH 11/53] perf test: Fix false TEST_OK result for 'perf test hist' Wang Nan
2016-01-11 14:25 ` Sergei Shtylyov
2016-01-11 14:58 ` Arnaldo Carvalho de Melo
2016-01-11 15:32 ` Arnaldo Carvalho de Melo
2016-01-11 13:48 ` [PATCH 12/53] perf test: Reset err after using it hold errcode in hist testcases Wang Nan
2016-01-11 13:48 ` [PATCH 13/53] perf tools: Prevent calling machine__delete() on non-allocated machine Wang Nan
2016-01-11 15:42 ` Arnaldo Carvalho de Melo
2016-01-12 7:03 ` Wangnan (F)
2016-01-12 14:07 ` Arnaldo Carvalho de Melo
2016-01-11 13:48 ` [PATCH 14/53] perf test: Check environment before start real BPF test Wang Nan
2016-01-11 21:55 ` Arnaldo Carvalho de Melo
2016-01-12 7:40 ` Wangnan (F)
2016-01-12 14:10 ` Arnaldo Carvalho de Melo
2016-01-11 13:48 ` [PATCH 15/53] perf tools: Fix symbols searching for offline module in buildid-cache Wang Nan
2016-01-11 13:48 ` [PATCH 16/53] perf tools: Fix mmap2 event allocation in synthesize code Wang Nan
2016-01-11 13:48 ` [PATCH 17/53] perf test: Improve bp_signal Wang Nan
2016-01-11 21:37 ` Arnaldo Carvalho de Melo
2016-01-12 4:13 ` Wangnan (F)
2016-01-12 9:21 ` Jiri Olsa
2016-01-12 14:11 ` Arnaldo Carvalho de Melo
2016-01-12 14:17 ` Will Deacon
2016-01-11 13:48 ` [PATCH 18/53] perf tools: Add API to config maps in bpf object Wang Nan
2016-01-11 13:48 ` [PATCH 19/53] perf tools: Enable BPF object configure syntax Wang Nan
2016-01-11 13:48 ` [PATCH 20/53] perf record: Apply config to BPF objects before recording Wang Nan
2016-01-11 13:48 ` [PATCH 21/53] perf tools: Enable passing event to BPF object Wang Nan
2016-01-11 13:48 ` [PATCH 22/53] perf tools: Support perf event alias name Wang Nan
2016-01-11 13:48 ` [PATCH 23/53] perf tools: Support setting different slots in a BPF map separately Wang Nan
2016-01-11 13:48 ` [PATCH 24/53] perf tools: Enable indices setting syntax for BPF maps Wang Nan
2016-01-11 13:48 ` [PATCH 25/53] perf tools: Introduce bpf-output event Wang Nan
2016-01-11 13:48 ` [PATCH 26/53] perf data: Support converting data from bpf_perf_event_output() Wang Nan
2016-01-11 13:48 ` [PATCH 27/53] perf/core: Put size of a sample at the end of it by PERF_SAMPLE_TAILSIZE Wang Nan
2016-01-11 18:09 ` Alexei Starovoitov
2016-01-12 5:33 ` Wangnan (F) [this message]
2016-01-12 6:11 ` Alexei Starovoitov
2016-01-12 12:36 ` Wangnan (F)
2016-01-12 19:56 ` Alexei Starovoitov
2016-01-13 4:34 ` Wangnan (F)
2016-01-13 5:14 ` Alexei Starovoitov
2016-01-12 14:05 ` Peter Zijlstra
2016-01-12 14:14 ` Peter Zijlstra
2016-01-11 13:48 ` [PATCH 28/53] perf tools: Move timestamp creation to util Wang Nan
2016-01-11 13:48 ` [PATCH 29/53] perf tools: Make ordered_events reusable Wang Nan
2016-01-11 21:33 ` Arnaldo Carvalho de Melo
2016-01-11 13:48 ` [PATCH 30/53] perf record: Extract synthesize code to record__synthesize() Wang Nan
2016-01-11 13:48 ` [PATCH 31/53] perf tools: Add perf_data_file__switch() helper Wang Nan
2016-01-11 13:48 ` [PATCH 32/53] perf record: Turns auxtrace_snapshot_enable into 3 states Wang Nan
2016-01-11 13:48 ` [PATCH 33/53] perf record: Introduce record__finish_output() to finish a perf.data Wang Nan
2016-01-11 13:48 ` [PATCH 34/53] perf record: Use OPT_BOOLEAN_SET for buildid cache related options Wang Nan
2016-01-11 13:48 ` [PATCH 35/53] perf record: Add '--timestamp-filename' option to append timestamp to output filename Wang Nan
2016-01-11 13:48 ` [PATCH 36/53] perf record: Split output into multiple files via '--switch-output' Wang Nan
2016-01-11 13:48 ` [PATCH 37/53] perf record: Force enable --timestamp-filename when --switch-output is provided Wang Nan
2016-01-11 13:48 ` [PATCH 38/53] perf record: Disable buildid cache options by default in switch output mode Wang Nan
2016-01-11 13:48 ` [PATCH 39/53] perf record: Re-synthesize tracking events after output switching Wang Nan
2016-01-11 13:48 ` [PATCH 40/53] perf record: Generate tracking events for process forked by perf Wang Nan
2016-01-11 13:48 ` [PATCH 41/53] perf record: Ensure return non-zero rc when mmap fail Wang Nan
2016-01-11 13:48 ` [PATCH 42/53] perf record: Prevent reading invalid data in record__mmap_read Wang Nan
2016-01-11 14:21 ` Sergei Shtylyov
2016-01-11 15:00 ` Arnaldo Carvalho de Melo
2016-01-11 15:01 ` Arnaldo Carvalho de Melo
2016-01-11 13:48 ` [PATCH 43/53] perf tools: Add evlist channel helpers Wang Nan
2016-01-11 13:48 ` [PATCH 44/53] perf tools: Automatically add new channel according to evlist Wang Nan
2016-01-11 13:48 ` [PATCH 45/53] perf tools: Operate multiple channels Wang Nan
2016-01-11 13:48 ` [PATCH 46/53] perf tools: Squash overwrite setting into channel Wang Nan
2016-01-11 13:48 ` [PATCH 47/53] perf record: Don't read from and poll overwrite channel Wang Nan
2016-01-11 13:48 ` [PATCH 48/53] perf tools: Enable overwrite settings Wang Nan
2016-01-11 13:48 ` [PATCH 49/53] perf tools: Consider TAILSIZE bit when caclulate is_pos Wang Nan
2016-01-11 13:48 ` [PATCH 50/53] perf tools: Set tailsize attribut bit for overwrite events Wang Nan
2016-01-11 13:48 ` [PATCH 51/53] perf record: Read from tailsize ring buffer Wang Nan
2016-01-11 13:48 ` [PATCH 52/53] perf record: Toggle tailsize ring buffer for reading Wang Nan
2016-01-11 13:48 ` [PATCH 53/53] perf record: Allow generate tracking events at the end of output Wang Nan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56949028.2070208@huawei.com \
--to=wangnan0@huawei.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@kernel.org \
--cc=acme@redhat.com \
--cc=adrian.hunter@intel.com \
--cc=alexei.starovoitov@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pi3orama@163.com \
--cc=yunlong.song@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).