From: Jesper Dangaard Brouer <brouer@redhat.com>
To: David Ahern <dsahern@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>,
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Roopa Prabhu <roopa@cumulusnetworks.com>,
bridge@lists.linux-foundation.org,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
brouer@redhat.com, "davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [Bridge] [PATCH net-next] bridge: add tracepoint in br_fdb_update
Date: Thu, 31 Aug 2017 16:20:20 -0000 [thread overview]
Message-ID: <20170831182012.5d321c6a@redhat.com> (raw)
In-Reply-To: <a9349049-bfd7-b6c0-d1c7-2f70b0b0ab11@gmail.com>
On Thu, 31 Aug 2017 09:30:05 -0600
David Ahern <dsahern@gmail.com> wrote:
> On 8/31/17 9:21 AM, Roopa Prabhu wrote:
> > On Thu, Aug 31, 2017 at 5:38 AM, Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> >> On Wed, 30 Aug 2017 22:18:13 -0700
> >> Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> >>
> >>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> >>>
> >>> This extends bridge fdb table tracepoints to also cover
> >>> learned fdb entries in the br_fdb_update path. Note that
> >>> unlike other tracepoints I have moved this to when the fdb
> >>> is modified because this is in the datapath and can generate
> >>> a lot of noise in the trace output. br_fdb_update is also called
> >>> from added_by_user context in the NTF_USE case which is already
> >>> traced ..hence the !added_by_user check.
> >>>
> >>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> >>> ---
> >>> include/trace/events/bridge.h | 31 +++++++++++++++++++++++++++++++
> >>> net/bridge/br_fdb.c | 5 ++++-
> >>> net/core/net-traces.c | 1 +
> >>> 3 files changed, 36 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> >>> index 0f1cde0..1bee3e7 100644
> >>> --- a/include/trace/events/bridge.h
> >>> +++ b/include/trace/events/bridge.h
> >>> @@ -92,6 +92,37 @@ TRACE_EVENT(fdb_delete,
> >>> __entry->addr[4], __entry->addr[5], __entry->vid)
> >>> );
> >>>
> >>> +TRACE_EVENT(br_fdb_update,
> >>> +
> >>> + TP_PROTO(struct net_bridge *br, struct net_bridge_port *source,
> >>> + const unsigned char *addr, u16 vid, bool added_by_user),
> >>> +
> >>> + TP_ARGS(br, source, addr, vid, added_by_user),
> >>> +
> >>> + TP_STRUCT__entry(
> >>> + __string(br_dev, br->dev->name)
> >>> + __string(dev, source->dev->name)
> >>
> >> I have found that using the device string name is
> >>
> >> (1) slow as it involves strcpy+strlen
> >>
> >> See [1]+[2] where a single dev-name costed me 16 ns, and the base
> >> overhead of a bpf attached tracepoint is 25 ns (see [3]).
> >>
> >> [1] https://git.kernel.org/davem/net-next/c/e7d12ce121a
> >> [2] https://git.kernel.org/davem/net-next/c/315ec3990ef
> >> [3] https://git.kernel.org/davem/net-next/c/25d4dae1a64
> >>
> >> (2) strings are also harder to work-with/extract when attaching a bpf_prog
> >>
> >> See the trouble I'm in accessing a dev string here napi:napi_poll here:
> >> https://github.com/netoptimizer/prototype-kernel/blob/103b955a080/kernel/samples/bpf/napi_monitor_kern.c#L52-L58
> >>
> >> Using ifindex'es in userspace is fairly easy see man if_indextoname(3).
> >>
> >
> > Jesper thanks for the data!. GTK. Looking at include/trace/events,
> > currently almost all tracepoints use dev->name.
True, but with my recent experience and benchmarking, I consider this
generally a bad choice we have made for all these tracepoints. In your
case with 2 strings, 2x16=32ns, you basically introduced a overhead
that is larger that to invocation cost.
> > These bridge tracepoints in context are primarily for debugging fdb
> > updates only, not for every packet and hence not in the performance
> > path.
> > In large scale deployments with thousands of bridge ports and fdb
> > entries, dev->name will definately make it easier to trouble-shoot.
> > So, I did like to leave these with dev->name unless there are strong objections.
>
> +1 for user friendliness for debugging tracepoints. The device name is
> also more user friendly when adding filters to the data collection.
>
> Being able to add bpf everywhere certainly changes the game a bit, but
> we should not relinquish ease of use and understanding for the potential
> that someone might want to put a bpf program on the tracepoint and want
> to maintain high performance.
(Cc. Acme and Peterz)
I wonder if we can create a special perf-tracepoint type for ifindex'es
and the tool reading (e.g. perf-script) can perform the name lookup in
userspace (calling if_indextoname(3)) ?
I don't know the perf tools well enough to know if this is possible?
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: David Ahern <dsahern@gmail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
Florian Fainelli <f.fainelli@gmail.com>,
Andrew Lunn <andrew@lunn.ch>,
bridge@lists.linux-foundation.org, brouer@redhat.com,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH net-next] bridge: add tracepoint in br_fdb_update
Date: Thu, 31 Aug 2017 18:20:12 +0200 [thread overview]
Message-ID: <20170831182012.5d321c6a@redhat.com> (raw)
In-Reply-To: <a9349049-bfd7-b6c0-d1c7-2f70b0b0ab11@gmail.com>
On Thu, 31 Aug 2017 09:30:05 -0600
David Ahern <dsahern@gmail.com> wrote:
> On 8/31/17 9:21 AM, Roopa Prabhu wrote:
> > On Thu, Aug 31, 2017 at 5:38 AM, Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> >> On Wed, 30 Aug 2017 22:18:13 -0700
> >> Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> >>
> >>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> >>>
> >>> This extends bridge fdb table tracepoints to also cover
> >>> learned fdb entries in the br_fdb_update path. Note that
> >>> unlike other tracepoints I have moved this to when the fdb
> >>> is modified because this is in the datapath and can generate
> >>> a lot of noise in the trace output. br_fdb_update is also called
> >>> from added_by_user context in the NTF_USE case which is already
> >>> traced ..hence the !added_by_user check.
> >>>
> >>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> >>> ---
> >>> include/trace/events/bridge.h | 31 +++++++++++++++++++++++++++++++
> >>> net/bridge/br_fdb.c | 5 ++++-
> >>> net/core/net-traces.c | 1 +
> >>> 3 files changed, 36 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
> >>> index 0f1cde0..1bee3e7 100644
> >>> --- a/include/trace/events/bridge.h
> >>> +++ b/include/trace/events/bridge.h
> >>> @@ -92,6 +92,37 @@ TRACE_EVENT(fdb_delete,
> >>> __entry->addr[4], __entry->addr[5], __entry->vid)
> >>> );
> >>>
> >>> +TRACE_EVENT(br_fdb_update,
> >>> +
> >>> + TP_PROTO(struct net_bridge *br, struct net_bridge_port *source,
> >>> + const unsigned char *addr, u16 vid, bool added_by_user),
> >>> +
> >>> + TP_ARGS(br, source, addr, vid, added_by_user),
> >>> +
> >>> + TP_STRUCT__entry(
> >>> + __string(br_dev, br->dev->name)
> >>> + __string(dev, source->dev->name)
> >>
> >> I have found that using the device string name is
> >>
> >> (1) slow as it involves strcpy+strlen
> >>
> >> See [1]+[2] where a single dev-name costed me 16 ns, and the base
> >> overhead of a bpf attached tracepoint is 25 ns (see [3]).
> >>
> >> [1] https://git.kernel.org/davem/net-next/c/e7d12ce121a
> >> [2] https://git.kernel.org/davem/net-next/c/315ec3990ef
> >> [3] https://git.kernel.org/davem/net-next/c/25d4dae1a64
> >>
> >> (2) strings are also harder to work-with/extract when attaching a bpf_prog
> >>
> >> See the trouble I'm in accessing a dev string here napi:napi_poll here:
> >> https://github.com/netoptimizer/prototype-kernel/blob/103b955a080/kernel/samples/bpf/napi_monitor_kern.c#L52-L58
> >>
> >> Using ifindex'es in userspace is fairly easy see man if_indextoname(3).
> >>
> >
> > Jesper thanks for the data!. GTK. Looking at include/trace/events,
> > currently almost all tracepoints use dev->name.
True, but with my recent experience and benchmarking, I consider this
generally a bad choice we have made for all these tracepoints. In your
case with 2 strings, 2x16=32ns, you basically introduced a overhead
that is larger that to invocation cost.
> > These bridge tracepoints in context are primarily for debugging fdb
> > updates only, not for every packet and hence not in the performance
> > path.
> > In large scale deployments with thousands of bridge ports and fdb
> > entries, dev->name will definately make it easier to trouble-shoot.
> > So, I did like to leave these with dev->name unless there are strong objections.
>
> +1 for user friendliness for debugging tracepoints. The device name is
> also more user friendly when adding filters to the data collection.
>
> Being able to add bpf everywhere certainly changes the game a bit, but
> we should not relinquish ease of use and understanding for the potential
> that someone might want to put a bpf program on the tracepoint and want
> to maintain high performance.
(Cc. Acme and Peterz)
I wonder if we can create a special perf-tracepoint type for ifindex'es
and the tool reading (e.g. perf-script) can perform the name lookup in
userspace (calling if_indextoname(3)) ?
I don't know the perf tools well enough to know if this is possible?
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-08-31 16:20 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-31 5:18 [Bridge] [PATCH net-next] bridge: add tracepoint in br_fdb_update Roopa Prabhu
2017-08-31 5:18 ` Roopa Prabhu
2017-08-31 12:38 ` Jesper Dangaard Brouer
2017-08-31 12:38 ` [Bridge] " Jesper Dangaard Brouer
2017-08-31 15:21 ` Roopa Prabhu
2017-08-31 15:21 ` Roopa Prabhu
2017-08-31 15:30 ` [Bridge] " David Ahern
2017-08-31 15:30 ` David Ahern
2017-08-31 16:20 ` Jesper Dangaard Brouer [this message]
2017-08-31 16:20 ` [Bridge] " Jesper Dangaard Brouer
2017-08-31 18:56 ` Arnaldo Carvalho de Melo
2017-08-31 18:57 ` [Bridge] " Arnaldo Carvalho de Melo
2017-08-31 14:19 ` Nikolay Aleksandrov
2017-08-31 14:19 ` Nikolay Aleksandrov
2017-08-31 18:43 ` [Bridge] " David Miller
2017-08-31 18:43 ` David Miller
2017-08-31 21:50 ` Jesper Dangaard Brouer
2017-08-31 21:50 ` [Bridge] " Jesper Dangaard Brouer
2017-08-31 22:43 ` Stephen Hemminger
2017-08-31 22:43 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170831182012.5d321c6a@redhat.com \
--to=brouer@redhat.com \
--cc=acme@redhat.com \
--cc=andrew@lunn.ch \
--cc=bridge@lists.linux-foundation.org \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=f.fainelli@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=nikolay@cumulusnetworks.com \
--cc=peterz@infradead.org \
--cc=roopa@cumulusnetworks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.