From: Daniel Borkmann <daniel@iogearbox.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: davem@davemloft.net, alexei.starovoitov@gmail.com, tgraf@suug.ch,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next 1/3] perf, events: add non-linear data support for raw records
Date: Wed, 13 Jul 2016 15:16:37 +0200 [thread overview]
Message-ID: <57863F35.2060501@iogearbox.net> (raw)
In-Reply-To: <20160713121036.GS30154@twins.programming.kicks-ass.net>
On 07/13/2016 02:10 PM, Peter Zijlstra wrote:
> On Wed, Jul 13, 2016 at 11:24:13AM +0200, Daniel Borkmann wrote:
>> On 07/13/2016 09:52 AM, Peter Zijlstra wrote:
>>> On Wed, Jul 13, 2016 at 12:36:17AM +0200, Daniel Borkmann wrote:
>>>> This patch adds support for non-linear data on raw records. It means
>>>> that for such data, the newly introduced __output_custom() helper will
>>>> be used instead of __output_copy(). __output_custom() will invoke
>>>> whatever custom callback is passed in via struct perf_raw_record_frag
>>>> to extract the data into the ring buffer slot.
>>>>
>>>> To keep changes in perf_prepare_sample() and in perf_output_sample()
>>>> minimal, size/size_head split was added to perf_raw_record that call
>>>> sites fill out, so that two extra tests in fast-path can be avoided.
>>>>
>>>> The few users of raw records are adapted to initialize their size_head
>>>> and frag data; no change in behavior for them. Later patch will extend
>>>> BPF side with a first user and callback for this facility, future users
>>>> could be things like XDP BPF programs (that work on different context
>>>> though and would thus have a different callback), etc.
>>>
>>> Why? What problem are we solving?
>>
>> I've tried to summarize it in patch 3/3,
>
> Which is pretty useless if you're staring at this patch.
>
>> This currently has 3 issues we'd like to resolve:
>
>> i) We need two copies instead of just a single one for the skb data.
>> The data can be non-linear, see also skb_copy_bits() as an example for
>> walking/extracting it,
>
> I'm not familiar enough with the network gunk to be able to read that.
> But upto skb_walk_frags() it looks entirely linear to me.
Hm, fair enough, there are three parts, skb can have a linear part
which is taken via skb->data, either in its entirety or there can be a
non-linear part appended to that which can consist of pages that are in
shared info section (skb_shinfo(skb) -> frags[], nr_frags members), that
will be linearized, and in addition to that, appended after the frags[]
data there can be further skbs to the 'root' skb that contain fragmented
data, which is all what skb_copy_bits() copies linearized into 'to' buffer.
So depending on the origin of the skb, its structure can be quite different
and skb_copy_bits() covers all the cases generically. Maybe [1] summarizes
it better if you want to familiarize yourself with how skbs work,
although some parts are not up to date anymore.
[1] http://vger.kernel.org/~davem/skb_data.html
>> ii) for static verification reasons, the bpf_skb_load_bytes() helper
>> needs to see a constant size on the passed buffer to make sure BPF
>> verifier can do its sanity checks on it during verification time, so
>> just passing in skb->len (or any other non-constant value) wouldn't
>> work, but changing bpf_skb_load_bytes() is also not the real solution
>> since we still have two copies we'd like to avoid as well, and
>
>> iii) bpf_skb_load_bytes() is just for rather smaller buffers (e.g.
>> headers) since they need to sit on the limited eBPF stack anyway. The
>> set would improve the BPF helper to address all 3 at once.
>
> Humm, maybe. Lemme go try and reverse engineer that patch, because I'm
> not at all sure wth it's supposed to do, nor am I entirely sure this
> clarified things :/
next prev parent reply other threads:[~2016-07-13 13:17 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-12 22:36 [PATCH net-next 0/3] BPF event output helper improvements Daniel Borkmann
2016-07-12 22:36 ` [PATCH net-next 1/3] perf, events: add non-linear data support for raw records Daniel Borkmann
2016-07-13 7:52 ` Peter Zijlstra
2016-07-13 9:24 ` Daniel Borkmann
2016-07-13 12:10 ` Peter Zijlstra
2016-07-13 13:16 ` Daniel Borkmann [this message]
2016-07-13 13:42 ` Peter Zijlstra
2016-07-13 14:08 ` Daniel Borkmann
2016-07-13 16:40 ` Peter Zijlstra
2016-07-13 16:53 ` Daniel Borkmann
2016-07-12 22:36 ` [PATCH net-next 2/3] bpf, perf: split bpf_perf_event_output Daniel Borkmann
2016-07-12 22:36 ` [PATCH net-next 3/3] bpf: avoid stack copy and use skb ctx for event output Daniel Borkmann
2016-07-12 23:25 ` kbuild test robot
2016-07-12 23:45 ` Daniel Borkmann
2016-07-13 0:01 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57863F35.2060501@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=alexei.starovoitov@gmail.com \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.