Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 0/2] IPv6 sk-lookup fixes
From: Daniel Borkmann @ 2018-10-15 20:39 UTC (permalink / raw)
  To: Joe Stringer, ast; +Cc: netdev
In-Reply-To: <20181015172746.6475-1-joe@wand.net.nz>

On 10/15/2018 07:27 PM, Joe Stringer wrote:
> This series includes a couple of fixups for the IPv6 socket lookup
> helper, to make the API more consistent (always supply all arguments in
> network byte-order) and to allow its use when IPv6 is compiled as a
> module.
> 
> Joe Stringer (2):
>   bpf: Allow sk_lookup with IPv6 module
>   bpf: Fix IPv6 dport byte-order in bpf_sk_lookup
> 
>  include/net/addrconf.h |  5 +++++
>  net/core/filter.c      | 15 +++++++++------
>  net/ipv6/af_inet6.c    |  1 +
>  3 files changed, 15 insertions(+), 6 deletions(-)
> 

LGTM, thanks for following up on this. Series:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Stephen Hemminger @ 2018-10-15 20:33 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ido Schimmel, Ido Schimmel, netdev
In-Reply-To: <20181015133041.6118aab1@cakuba.netronome.com>

On Mon, 15 Oct 2018 13:30:41 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Mon, 15 Oct 2018 23:27:41 +0300, Ido Schimmel wrote:
> > On Mon, Oct 15, 2018 at 01:16:42PM -0700, Stephen Hemminger wrote:  
> > > On Mon, 15 Oct 2018 22:57:48 +0300
> > > Ido Schimmel <idosch@mellanox.com> wrote:
> > >     
> > > > On Mon, Oct 15, 2018 at 11:57:56AM -0700, Jakub Kicinski wrote:    
> > > > > On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:      
> > > > > > Add the ability to determine whether a netdev is a VxLAN netdev by
> > > > > > calling the above mentioned function that checks the netdev's private
> > > > > > flags.
> > > > > > 
> > > > > > This will allow modules to identify netdev events involving a VxLAN
> > > > > > netdev and act accordingly. For example, drivers capable of VxLAN
> > > > > > offload will need to configure the underlying device when a VxLAN netdev
> > > > > > is being enslaved to an offloaded bridge.
> > > > > > 
> > > > > > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > > > > > Reviewed-by: Petr Machata <petrm@mellanox.com>      
> > > > > 
> > > > > Is this preferable over
> > > > > 
> > > > > !strcmp(netdev->rtnl_link_ops->kind, "vxlan")
> > > > > 
> > > > > which is what TC offloads do?      
> > > > 
> > > > Using a flag seemed like the more standard way.
> > > > 
> > > > That being said, we considered using net_device_ops instead, given we
> > > > are about to run out of available private flags, so I don't mind
> > > > adopting a technique already employed by another driver.
> > > > 
> > > > P.S. Had to Cc netdev again. I think your client somehow messed the Cc
> > > > list? I see Cc list in your reply, but with back slashes at the end of
> > > > two email addresses.    
> > > 
> > > Agree that using a global resource bit in flags is probably overkill.
> > > If you can use kind that would be good example for other drivers as well.    
> > 
> > OK, will change.
> > 
> > Jakub, any objections if I implement netif_is_vxlan() using 'kind' and
> > convert nfp to use the helper? Having all these helpers in the same
> > location will increase the chances of others reusing them.  
> 
> Sounds very good :)

We could even do this for bridge, and other devices that are using private flags.

^ permalink raw reply

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Jakub Kicinski @ 2018-10-15 20:30 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Stephen Hemminger, Ido Schimmel, netdev
In-Reply-To: <20181015202741.GA27066@splinter>

On Mon, 15 Oct 2018 23:27:41 +0300, Ido Schimmel wrote:
> On Mon, Oct 15, 2018 at 01:16:42PM -0700, Stephen Hemminger wrote:
> > On Mon, 15 Oct 2018 22:57:48 +0300
> > Ido Schimmel <idosch@mellanox.com> wrote:
> >   
> > > On Mon, Oct 15, 2018 at 11:57:56AM -0700, Jakub Kicinski wrote:  
> > > > On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:    
> > > > > Add the ability to determine whether a netdev is a VxLAN netdev by
> > > > > calling the above mentioned function that checks the netdev's private
> > > > > flags.
> > > > > 
> > > > > This will allow modules to identify netdev events involving a VxLAN
> > > > > netdev and act accordingly. For example, drivers capable of VxLAN
> > > > > offload will need to configure the underlying device when a VxLAN netdev
> > > > > is being enslaved to an offloaded bridge.
> > > > > 
> > > > > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > > > > Reviewed-by: Petr Machata <petrm@mellanox.com>    
> > > > 
> > > > Is this preferable over
> > > > 
> > > > !strcmp(netdev->rtnl_link_ops->kind, "vxlan")
> > > > 
> > > > which is what TC offloads do?    
> > > 
> > > Using a flag seemed like the more standard way.
> > > 
> > > That being said, we considered using net_device_ops instead, given we
> > > are about to run out of available private flags, so I don't mind
> > > adopting a technique already employed by another driver.
> > > 
> > > P.S. Had to Cc netdev again. I think your client somehow messed the Cc
> > > list? I see Cc list in your reply, but with back slashes at the end of
> > > two email addresses.  
> > 
> > Agree that using a global resource bit in flags is probably overkill.
> > If you can use kind that would be good example for other drivers as well.  
> 
> OK, will change.
> 
> Jakub, any objections if I implement netif_is_vxlan() using 'kind' and
> convert nfp to use the helper? Having all these helpers in the same
> location will increase the chances of others reusing them.

Sounds very good :)

^ permalink raw reply

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Ido Schimmel @ 2018-10-15 20:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ido Schimmel, Jakub Kicinski, netdev
In-Reply-To: <20181015131642.4bd6e564@xeon-e3>

On Mon, Oct 15, 2018 at 01:16:42PM -0700, Stephen Hemminger wrote:
> On Mon, 15 Oct 2018 22:57:48 +0300
> Ido Schimmel <idosch@mellanox.com> wrote:
> 
> > On Mon, Oct 15, 2018 at 11:57:56AM -0700, Jakub Kicinski wrote:
> > > On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:  
> > > > Add the ability to determine whether a netdev is a VxLAN netdev by
> > > > calling the above mentioned function that checks the netdev's private
> > > > flags.
> > > > 
> > > > This will allow modules to identify netdev events involving a VxLAN
> > > > netdev and act accordingly. For example, drivers capable of VxLAN
> > > > offload will need to configure the underlying device when a VxLAN netdev
> > > > is being enslaved to an offloaded bridge.
> > > > 
> > > > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > > > Reviewed-by: Petr Machata <petrm@mellanox.com>  
> > > 
> > > Is this preferable over
> > > 
> > > !strcmp(netdev->rtnl_link_ops->kind, "vxlan")
> > > 
> > > which is what TC offloads do?  
> > 
> > Using a flag seemed like the more standard way.
> > 
> > That being said, we considered using net_device_ops instead, given we
> > are about to run out of available private flags, so I don't mind
> > adopting a technique already employed by another driver.
> > 
> > P.S. Had to Cc netdev again. I think your client somehow messed the Cc
> > list? I see Cc list in your reply, but with back slashes at the end of
> > two email addresses.
> 
> Agree that using a global resource bit in flags is probably overkill.
> If you can use kind that would be good example for other drivers as well.

OK, will change.

Jakub, any objections if I implement netif_is_vxlan() using 'kind' and
convert nfp to use the helper? Having all these helpers in the same
location will increase the chances of others reusing them.

^ permalink raw reply

* [iproute PATCH] ip-addrlabel: Fix printing of label value
From: Phil Sutter @ 2018-10-15 20:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Passing the return value of RTA_DATA() to rta_getattr_u32() is wrong
since that function will call RTA_DATA() by itself already.

Fixes: a7ad1c8a6845d ("ipaddrlabel: add json support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 ip/ipaddrlabel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipaddrlabel.c b/ip/ipaddrlabel.c
index 2f79c56dcead2..8abe5722bafd1 100644
--- a/ip/ipaddrlabel.c
+++ b/ip/ipaddrlabel.c
@@ -95,7 +95,7 @@ int print_addrlabel(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg
 	}
 
 	if (tb[IFAL_LABEL] && RTA_PAYLOAD(tb[IFAL_LABEL]) == sizeof(uint32_t)) {
-		uint32_t label = rta_getattr_u32(RTA_DATA(tb[IFAL_LABEL]));
+		uint32_t label = rta_getattr_u32(tb[IFAL_LABEL]);
 
 		print_uint(PRINT_ANY,
 			   "label", "label %u ", label);
-- 
2.19.0

^ permalink raw reply related

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Stephen Hemminger @ 2018-10-15 20:16 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Jakub Kicinski, netdev
In-Reply-To: <20181015195748.GA25940@splinter>

On Mon, 15 Oct 2018 22:57:48 +0300
Ido Schimmel <idosch@mellanox.com> wrote:

> On Mon, Oct 15, 2018 at 11:57:56AM -0700, Jakub Kicinski wrote:
> > On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:  
> > > Add the ability to determine whether a netdev is a VxLAN netdev by
> > > calling the above mentioned function that checks the netdev's private
> > > flags.
> > > 
> > > This will allow modules to identify netdev events involving a VxLAN
> > > netdev and act accordingly. For example, drivers capable of VxLAN
> > > offload will need to configure the underlying device when a VxLAN netdev
> > > is being enslaved to an offloaded bridge.
> > > 
> > > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > > Reviewed-by: Petr Machata <petrm@mellanox.com>  
> > 
> > Is this preferable over
> > 
> > !strcmp(netdev->rtnl_link_ops->kind, "vxlan")
> > 
> > which is what TC offloads do?  
> 
> Using a flag seemed like the more standard way.
> 
> That being said, we considered using net_device_ops instead, given we
> are about to run out of available private flags, so I don't mind
> adopting a technique already employed by another driver.
> 
> P.S. Had to Cc netdev again. I think your client somehow messed the Cc
> list? I see Cc list in your reply, but with back slashes at the end of
> two email addresses.

Agree that using a global resource bit in flags is probably overkill.
If you can use kind that would be good example for other drivers as well.

^ permalink raw reply

* Re: [PATCH bpf-next] tools: bpftool: add map create command
From: Alexei Starovoitov @ 2018-10-15 19:58 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: daniel, netdev, oss-drivers
In-Reply-To: <20181015094908.2993a27b@cakuba.netronome.com>

On Mon, Oct 15, 2018 at 09:49:08AM -0700, Jakub Kicinski wrote:
> On Fri, 12 Oct 2018 23:16:59 -0700, Alexei Starovoitov wrote:
> > On Fri, Oct 12, 2018 at 11:06:14AM -0700, Jakub Kicinski wrote:
> > > Add a way of creating maps from user space.  The command takes
> > > as parameters most of the attributes of the map creation system
> > > call command.  After map is created its pinned to bpffs.  This makes
> > > it possible to easily and dynamically (without rebuilding programs)
> > > test various corner cases related to map creation.
> > > 
> > > Map type names are taken from bpftool's array used for printing.
> > > In general these days we try to make use of libbpf type names, but
> > > there are no map type names in libbpf as of today.
> > > 
> > > As with most features I add the motivation is testing (offloads) :)
> > > 
> > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>  
> > ...
> > >  	fprintf(stderr,
> > >  		"Usage: %s %s { show | list }   [MAP]\n"
> > > +		"       %s %s create     FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
> > > +		"                              entries MAX_ENTRIES [name NAME] [flags FLAGS] \\\n"
> > > +		"                              [dev NAME]\n"  
> > 
> > I suspect as soon as bpftool has an ability to create standalone maps
> > some folks will start relying on such interface.
> 
> That'd be cool, do you see any real life use cases where its useful
> outside of corner case testing?

In our XDP use case we have an odd protocol for different apps to share
common prog_array that is pinned in bpffs.
If cmdline creation of it via bpftool was available that would have been
an option to consider. Not saying that it would have been a better option.
Just another option.

> 
> > Therefore I'd like to request to make 'name' argument to be mandatory.
> 
> Will do in v2!

thx!
 
> > I think in the future we will require BTF to be mandatory too.
> > We need to move towards more transparent and debuggable infra.
> > Do you think requiring json description of key/value would be managable to implement?
> > Then bpftool could convert it to BTF and the map full be fully defined.
> > I certainly understand that bpf prog can disregard the key/value layout today,
> > but we will make verifier to enforce that in the future too.
> 
> I was hoping that we can leave BTF support as a future extension, and
> then once we have the option for the verifier to enforce BTF (a sysctl?)
> the bpftool map create without a BTF will get rejected as one would
> expect.  

right. something like sysctl in the future.

> IOW it's fine not to make BTF required at bpftool level and
> leave it to system configuration.
> 
> I'd love to implement the BTF support right away, but I'm not sure I
> can afford that right now time-wise.  The whole map create command is
> pretty trivial, but for BTF we don't even have a way of dumping it
> AFAICT.  We can pretty print values, but what is the format in which to
> express the BTF itself?  We could do JSON, do we use an external
> library?  Should we have a separate BTF command for that?

I prefer standard C type description for both input and output :)
Anyway that wasn't a request for you to do it now. More of the feature
request for somebody to put on todo list :)

^ permalink raw reply

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Ido Schimmel @ 2018-10-15 19:57 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev
In-Reply-To: <20181015115756.13b6c0da@cakuba.netronome.com>

On Mon, Oct 15, 2018 at 11:57:56AM -0700, Jakub Kicinski wrote:
> On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:
> > Add the ability to determine whether a netdev is a VxLAN netdev by
> > calling the above mentioned function that checks the netdev's private
> > flags.
> > 
> > This will allow modules to identify netdev events involving a VxLAN
> > netdev and act accordingly. For example, drivers capable of VxLAN
> > offload will need to configure the underlying device when a VxLAN netdev
> > is being enslaved to an offloaded bridge.
> > 
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > Reviewed-by: Petr Machata <petrm@mellanox.com>
> 
> Is this preferable over
> 
> !strcmp(netdev->rtnl_link_ops->kind, "vxlan")
> 
> which is what TC offloads do?

Using a flag seemed like the more standard way.

That being said, we considered using net_device_ops instead, given we
are about to run out of available private flags, so I don't mind
adopting a technique already employed by another driver.

P.S. Had to Cc netdev again. I think your client somehow messed the Cc
list? I see Cc list in your reply, but with back slashes at the end of
two email addresses.

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/8] sockmap integration for ktls
From: Alexei Starovoitov @ 2018-10-15 19:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: john.fastabend, davejwatson, netdev
In-Reply-To: <20181013004603.3747-1-daniel@iogearbox.net>

On Sat, Oct 13, 2018 at 02:45:55AM +0200, Daniel Borkmann wrote:
> This work adds a generic sk_msg layer and converts both sockmap
> and later ktls over to make use of it as a common data structure
> for application data (similarly as sk_buff for network packets).
> With that in place the sk_msg framework spans accross ULP layer
> in the kernel and allows for introspection or filtering of L7
> data with the help of BPF programs operating on a common input
> context.
> 
> In a second step, we enable the latter for ktls which was previously
> not possible, meaning, ktls and sk_msg verdict programs were
> mutually exclusive in the ULP layer which created challenges for
> the orchestrator when trying to apply TCP based policy, for
> example. Leveraging the prior consolidation we can finally overcome
> this limitation.
> 
> Note, there's no change in behavior when ktls is not used in
> combination with BPF, and also no change in behavior for stand
> alone sockmap. The kselftest suites for ktls, sockmap and ktls
> with sockmap combined also runs through successfully. For further
> details please see individual patches.
> 
> Thanks!
> 
> v1 -> v2:
>   - Removed leftover comment spotted by Alexei
>   - Improved commit messages, rebase

Applied, Thanks

^ permalink raw reply

* [PATCH net-next] net: phy: merge phy_start_aneg and phy_start_aneg_priv
From: Heiner Kallweit @ 2018-10-15 19:25 UTC (permalink / raw)
  To: David Miller, Andrew Lunn, Florian Fainelli; +Cc: netdev@vger.kernel.org

After commit 9f2959b6b52d ("net: phy: improve handling delayed work")
the sync parameter isn't needed any longer in phy_start_aneg_priv().
This allows to merge phy_start_aneg() and phy_start_aneg_priv().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d03bdbbd1..1d73ac330 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -482,16 +482,15 @@ static int phy_config_aneg(struct phy_device *phydev)
 }
 
 /**
- * phy_start_aneg_priv - start auto-negotiation for this PHY device
+ * phy_start_aneg - start auto-negotiation for this PHY device
  * @phydev: the phy_device struct
- * @sync: indicate whether we should wait for the workqueue cancelation
  *
  * Description: Sanitizes the settings (if we're not autonegotiating
  *   them), and then calls the driver's config_aneg function.
  *   If the PHYCONTROL Layer is operating, we change the state to
  *   reflect the beginning of Auto-negotiation or forcing.
  */
-static int phy_start_aneg_priv(struct phy_device *phydev, bool sync)
+int phy_start_aneg(struct phy_device *phydev)
 {
 	bool trigger = 0;
 	int err;
@@ -541,20 +540,6 @@ static int phy_start_aneg_priv(struct phy_device *phydev, bool sync)
 
 	return err;
 }
-
-/**
- * phy_start_aneg - start auto-negotiation for this PHY device
- * @phydev: the phy_device struct
- *
- * Description: Sanitizes the settings (if we're not autonegotiating
- *   them), and then calls the driver's config_aneg function.
- *   If the PHYCONTROL Layer is operating, we change the state to
- *   reflect the beginning of Auto-negotiation or forcing.
- */
-int phy_start_aneg(struct phy_device *phydev)
-{
-	return phy_start_aneg_priv(phydev, true);
-}
 EXPORT_SYMBOL(phy_start_aneg);
 
 static int phy_poll_aneg_done(struct phy_device *phydev)
@@ -1085,7 +1070,7 @@ void phy_state_machine(struct work_struct *work)
 	mutex_unlock(&phydev->lock);
 
 	if (needs_aneg)
-		err = phy_start_aneg_priv(phydev, false);
+		err = phy_start_aneg(phydev);
 	else if (do_suspend)
 		phy_suspend(phydev);
 
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH net] net/sched: properly init chain in case of multiple control actions
From: Cong Wang @ 2018-10-15 18:31 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <dbb93ac2c87c14d412a18f35701890dcc87d0cdb.camel@redhat.com>

On Sat, Oct 13, 2018 at 8:23 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Fri, 2018-10-12 at 13:57 -0700, Cong Wang wrote:
> > Why not just validate the fallback action in each action init()?
> > For example, checking tcfg_paction in tcf_gact_init().
> >
> > I don't see the need of making it generic.
>
> hello Cong, once again thanks for looking at this.
>
> what you say is doable, and I evaluated doing it before proposing this
> patch.
>
> But I felt unconfortable, because I needed to pass struct tcf_proto *tp in
> tcf_gact_init() to initialize a->goto_chain with the chain_idx encoded in
> the fallback action. So, I would have changed all the init() functions in
> all TC actions, just to fix two of them.
>
> A (legal?) trick  is to let tcf_action store the fallback action when it
> contains a 'goto chain' command, I just posted a proposal for gact. If you
> think it's ok, I will test and post the same for act_police.

Do we really need to support TC_ACT_GOTO_CHAIN for
gact->tcfg_paction etc.? I mean, is it useful in practice or is it just for
completeness?

IF we don't need to support it, we can just make it invalid without needing
to initialize it in ->init() at all.

If we do, however, we really need to move it into each ->init(), because
we have to lock each action if we are modifying an existing one. With
your patch, tcf_action_goto_chain_init() is still called without the per-action
lock.

What's more, if we support two different actions in gact, that is, tcfg_paction
and tcf_action, how could you still only have one a->goto_chain pointer?
There should be two pointers for each of them. :)

Thanks.

^ permalink raw reply

* Re: [bpf-next PATCH v3 2/2] bpf: bpftool, add flag to allow non-compat map definitions
From: Jakub Kicinski @ 2018-10-15 18:26 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev
In-Reply-To: <20181015181955.8673.75006.stgit@john-Precision-Tower-5810>

On Mon, 15 Oct 2018 11:19:55 -0700, John Fastabend wrote:
> Multiple map definition structures exist and user may have non-zero
> fields in their definition that are not recognized by bpftool and
> libbpf. The normal behavior is to then fail loading the map. Although
> this is a good default behavior users may still want to load the map
> for debugging or other reasons. This patch adds a --mapcompat flag
> that can be used to override the default behavior and allow loading
> the map even when it has additional non-zero fields.
> 
> For now the only user is 'bpftool prog' we can switch over other
> subcommands as needed. The library exposes an API that consumes
> a flags field now but I kept the original API around also in case
> users of the API don't want to expose this. The flags field is an
> int in case we need more control over how the API call handles
> errors/features/etc in the future.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Thank you!

^ permalink raw reply

* Re: [PATCH] dt-bindings: can: rcar_can: Add r8a7744 support
From: Rob Herring @ 2018-10-15 18:26 UTC (permalink / raw)
  To: Biju Das
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, Mark Rutland,
	David S. Miller, linux-can, netdev, devicetree, Simon Horman,
	Geert Uytterhoeven, Chris Paterson, Fabrizio Castro,
	linux-renesas-soc
In-Reply-To: <1538050128-47356-1-git-send-email-biju.das@bp.renesas.com>

On Thu, Sep 27, 2018 at 01:08:48PM +0100, Biju Das wrote:
> Document RZ/G1N (r8a7744) SoC specific bindings.
> 
> Signed-off-by: Biju Das <biju.das@bp.renesas.com>
> Reviewed-by: Chris Paterson <Chris.Paterson2@renesas.com>
> ---
> This patch is tested against linux-next next-20180927
> ---
>  Documentation/devicetree/bindings/net/can/rcar_can.txt | 1 +
>  1 file changed, 1 insertion(+)

Applied.

^ permalink raw reply

* [bpf-next PATCH v3 2/2] bpf: bpftool, add flag to allow non-compat map definitions
From: John Fastabend @ 2018-10-15 18:19 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev
In-Reply-To: <20181015181857.8673.46183.stgit@john-Precision-Tower-5810>

Multiple map definition structures exist and user may have non-zero
fields in their definition that are not recognized by bpftool and
libbpf. The normal behavior is to then fail loading the map. Although
this is a good default behavior users may still want to load the map
for debugging or other reasons. This patch adds a --mapcompat flag
that can be used to override the default behavior and allow loading
the map even when it has additional non-zero fields.

For now the only user is 'bpftool prog' we can switch over other
subcommands as needed. The library exposes an API that consumes
a flags field now but I kept the original API around also in case
users of the API don't want to expose this. The flags field is an
int in case we need more control over how the API call handles
errors/features/etc in the future.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool.rst |    4 ++++
 tools/bpf/bpftool/bash-completion/bpftool   |    2 +-
 tools/bpf/bpftool/main.c                    |    7 ++++++-
 tools/bpf/bpftool/main.h                    |    3 ++-
 tools/bpf/bpftool/prog.c                    |    2 +-
 5 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index 25c0872..6548831 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -57,6 +57,10 @@ OPTIONS
 	-p, --pretty
 		  Generate human-readable JSON output. Implies **-j**.
 
+	-m, --mapcompat
+		  Allow loading maps with unknown map definitions.
+
+
 SEE ALSO
 ========
 	**bpftool-map**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 0826519..ac85207 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -184,7 +184,7 @@ _bpftool()
 
     # Deal with options
     if [[ ${words[cword]} == -* ]]; then
-        local c='--version --json --pretty --bpffs'
+        local c='--version --json --pretty --bpffs --mapcompat'
         COMPREPLY=( $( compgen -W "$c" -- "$cur" ) )
         return 0
     fi
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 79dc3f1..828dde3 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -55,6 +55,7 @@
 bool pretty_output;
 bool json_output;
 bool show_pinned;
+int bpf_flags;
 struct pinned_obj_table prog_table;
 struct pinned_obj_table map_table;
 
@@ -341,6 +342,7 @@ int main(int argc, char **argv)
 		{ "pretty",	no_argument,	NULL,	'p' },
 		{ "version",	no_argument,	NULL,	'V' },
 		{ "bpffs",	no_argument,	NULL,	'f' },
+		{ "mapcompat",	no_argument,	NULL,	'm' },
 		{ 0 }
 	};
 	int opt, ret;
@@ -355,7 +357,7 @@ int main(int argc, char **argv)
 	hash_init(map_table.table);
 
 	opterr = 0;
-	while ((opt = getopt_long(argc, argv, "Vhpjf",
+	while ((opt = getopt_long(argc, argv, "Vhpjfm",
 				  options, NULL)) >= 0) {
 		switch (opt) {
 		case 'V':
@@ -379,6 +381,9 @@ int main(int argc, char **argv)
 		case 'f':
 			show_pinned = true;
 			break;
+		case 'm':
+			bpf_flags = MAPS_RELAX_COMPAT;
+			break;
 		default:
 			p_err("unrecognized option '%s'", argv[optind - 1]);
 			if (json_output)
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 40492cd..91fd697 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -74,7 +74,7 @@
 #define HELP_SPEC_PROGRAM						\
 	"PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }"
 #define HELP_SPEC_OPTIONS						\
-	"OPTIONS := { {-j|--json} [{-p|--pretty}] | {-f|--bpffs} }"
+	"OPTIONS := { {-j|--json} [{-p|--pretty}] | {-f|--bpffs} | {-m|--mapcompat}"
 #define HELP_SPEC_MAP							\
 	"MAP := { id MAP_ID | pinned FILE }"
 
@@ -89,6 +89,7 @@ enum bpf_obj_type {
 extern json_writer_t *json_wtr;
 extern bool json_output;
 extern bool show_pinned;
+extern int bpf_flags;
 extern struct pinned_obj_table prog_table;
 extern struct pinned_obj_table map_table;
 
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 99ab42c..3350289 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -908,7 +908,7 @@ static int do_load(int argc, char **argv)
 		}
 	}
 
-	obj = bpf_object__open_xattr(&attr);
+	obj = __bpf_object__open_xattr(&attr, bpf_flags);
 	if (IS_ERR_OR_NULL(obj)) {
 		p_err("failed to open object file");
 		goto err_free_reuse_maps;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 87520a8..69a4d40 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -69,6 +69,9 @@ struct bpf_load_program_attr {
 	__u32 prog_ifindex;
 };
 
+/* Flags to direct loading requirements */
+#define MAPS_RELAX_COMPAT	0x01
+
 /* Recommend log buffer size */
 #define BPF_LOG_BUF_SIZE (256 * 1024)
 int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 176cf55..bd71efc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -562,8 +562,9 @@ static int compare_bpf_map(const void *_a, const void *_b)
 }
 
 static int
-bpf_object__init_maps(struct bpf_object *obj)
+bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
+	bool strict = !(flags & MAPS_RELAX_COMPAT);
 	int i, map_idx, map_def_sz, nr_maps = 0;
 	Elf_Scn *scn;
 	Elf_Data *data;
@@ -685,7 +686,8 @@ static int compare_bpf_map(const void *_a, const void *_b)
 						   "has unrecognized, non-zero "
 						   "options\n",
 						   obj->path, map_name);
-					return -EINVAL;
+					if (strict)
+						return -EINVAL;
 				}
 			}
 			memcpy(&obj->maps[map_idx].def, def,
@@ -716,7 +718,7 @@ static bool section_have_execinstr(struct bpf_object *obj, int idx)
 	return false;
 }
 
-static int bpf_object__elf_collect(struct bpf_object *obj)
+static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 {
 	Elf *elf = obj->efile.elf;
 	GElf_Ehdr *ep = &obj->efile.ehdr;
@@ -843,7 +845,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 		return LIBBPF_ERRNO__FORMAT;
 	}
 	if (obj->efile.maps_shndx >= 0) {
-		err = bpf_object__init_maps(obj);
+		err = bpf_object__init_maps(obj, flags);
 		if (err)
 			goto out;
 	}
@@ -1515,7 +1517,7 @@ static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 
 static struct bpf_object *
 __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
-		   bool needs_kver)
+		   bool needs_kver, int flags)
 {
 	struct bpf_object *obj;
 	int err;
@@ -1531,7 +1533,7 @@ static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 
 	CHECK_ERR(bpf_object__elf_init(obj), err, out);
 	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
-	CHECK_ERR(bpf_object__elf_collect(obj), err, out);
+	CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
 	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
 	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
 
@@ -1542,7 +1544,8 @@ static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 	return ERR_PTR(err);
 }
 
-struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
+struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
+					    int flags)
 {
 	/* param validation */
 	if (!attr->file)
@@ -1551,7 +1554,13 @@ struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
 	pr_debug("loading %s\n", attr->file);
 
 	return __bpf_object__open(attr->file, NULL, 0,
-				  bpf_prog_type__needs_kver(attr->prog_type));
+				  bpf_prog_type__needs_kver(attr->prog_type),
+				  flags);
+}
+
+struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
+{
+	return __bpf_object__open_xattr(attr, 0);
 }
 
 struct bpf_object *bpf_object__open(const char *path)
@@ -1584,7 +1593,7 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf,
 	pr_debug("loading object '%s' from buffer\n",
 		 name);
 
-	return __bpf_object__open(name, obj_buf, obj_buf_sz, true);
+	return __bpf_object__open(name, obj_buf, obj_buf_sz, true, true);
 }
 
 int bpf_object__unload(struct bpf_object *obj)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 8af8d36..7e9c801 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -61,6 +61,8 @@ struct bpf_object_open_attr {
 
 struct bpf_object *bpf_object__open(const char *path);
 struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr);
+struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
+					    int flags);
 struct bpf_object *bpf_object__open_buffer(void *obj_buf,
 					   size_t obj_buf_sz,
 					   const char *name);

^ permalink raw reply related

* [bpf-next PATCH v3 1/2] bpf: bpftool, add support for attaching programs to maps
From: John Fastabend @ 2018-10-15 18:19 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev
In-Reply-To: <20181015181857.8673.46183.stgit@john-Precision-Tower-5810>

Sock map/hash introduce support for attaching programs to maps. To
date I have been doing this with custom tooling but this is less than
ideal as we shift to using bpftool as the single CLI for our BPF uses.
This patch adds new sub commands 'attach' and 'detach' to the 'prog'
command to attach programs to maps and then detach them.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst |   11 ++
 tools/bpf/bpftool/Documentation/bpftool.rst      |    2 
 tools/bpf/bpftool/bash-completion/bpftool        |   19 ++++
 tools/bpf/bpftool/prog.c                         |   99 ++++++++++++++++++++++
 4 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 64156a1..12c8030 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -25,6 +25,8 @@ MAP COMMANDS
 |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
 |	**bpftool** **prog pin** *PROG* *FILE*
 |	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
+|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
 |	**bpftool** **prog help**
 |
 |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -37,6 +39,7 @@ MAP COMMANDS
 |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
 |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
 |	}
+|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
 
 
 DESCRIPTION
@@ -90,6 +93,14 @@ DESCRIPTION
 
 		  Note: *FILE* must be located in *bpffs* mount.
 
+        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
+                  Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
+                  to the map *MAP*.
+
+        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
+                  Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
+                  from the map *MAP*.
+
 	**bpftool prog help**
 		  Print short help message.
 
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index 8dda77d..25c0872 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -26,7 +26,7 @@ SYNOPSIS
 	| **pin** | **event_pipe** | **help** }
 
 	*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin**
-	| **load** | **help** }
+	| **load** | **attach** | **detach** | **help** }
 
 	*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
 
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index df1060b..0826519 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -292,6 +292,23 @@ _bpftool()
                     fi
                     return 0
                     ;;
+                attach|detach)
+                    if [[ ${#words[@]} == 7 ]]; then
+                        COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
+                        return 0
+                    fi
+
+                    if [[ ${#words[@]} == 6 ]]; then
+                        COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse" -- "$cur" ) )
+                        return 0
+                    fi
+
+                    if [[ $prev == "$command" ]]; then
+                        COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
+                        return 0
+                    fi
+                    return 0
+                    ;;
                 load)
                     local obj
 
@@ -347,7 +364,7 @@ _bpftool()
                     ;;
                 *)
                     [[ $prev == $object ]] && \
-                        COMPREPLY=( $( compgen -W 'dump help pin load \
+                        COMPREPLY=( $( compgen -W 'dump help pin attach detach load \
                             show list' -- "$cur" ) )
                     ;;
             esac
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index b1cd3bc..99ab42c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -77,6 +77,26 @@
 	[BPF_PROG_TYPE_FLOW_DISSECTOR]	= "flow_dissector",
 };
 
+static const char * const attach_type_strings[] = {
+	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
+	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
+	[BPF_SK_MSG_VERDICT] = "msg_verdict",
+	[__MAX_BPF_ATTACH_TYPE] = NULL,
+};
+
+enum bpf_attach_type parse_attach_type(const char *str)
+{
+	enum bpf_attach_type type;
+
+	for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
+		if (attach_type_strings[type] &&
+		    is_prefix(str, attach_type_strings[type]))
+			return type;
+	}
+
+	return __MAX_BPF_ATTACH_TYPE;
+}
+
 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
 {
 	struct timespec real_time_ts, boot_time_ts;
@@ -697,6 +717,77 @@ int map_replace_compar(const void *p1, const void *p2)
 	return a->idx - b->idx;
 }
 
+static int do_attach(int argc, char **argv)
+{
+	enum bpf_attach_type attach_type;
+	int err, mapfd, progfd;
+
+	if (!REQ_ARGS(5)) {
+		p_err("too few parameters for map attach");
+		return -EINVAL;
+	}
+
+	progfd = prog_parse_fd(&argc, &argv);
+	if (progfd < 0)
+		return progfd;
+
+	attach_type = parse_attach_type(*argv);
+	if (attach_type == __MAX_BPF_ATTACH_TYPE) {
+		p_err("invalid attach type");
+		return -EINVAL;
+	}
+	NEXT_ARG();
+
+	mapfd = map_parse_fd(&argc, &argv);
+	if (mapfd < 0)
+		return mapfd;
+
+	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
+	if (err) {
+		p_err("failed prog attach to map");
+		return -EINVAL;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+	return 0;
+}
+
+static int do_detach(int argc, char **argv)
+{
+	enum bpf_attach_type attach_type;
+	int err, mapfd, progfd;
+
+	if (!REQ_ARGS(5)) {
+		p_err("too few parameters for map detach");
+		return -EINVAL;
+	}
+
+	progfd = prog_parse_fd(&argc, &argv);
+	if (progfd < 0)
+		return progfd;
+
+	attach_type = parse_attach_type(*argv);
+	if (attach_type == __MAX_BPF_ATTACH_TYPE) {
+		p_err("invalid attach type");
+		return -EINVAL;
+	}
+	NEXT_ARG();
+
+	mapfd = map_parse_fd(&argc, &argv);
+	if (mapfd < 0)
+		return mapfd;
+
+	err = bpf_prog_detach2(progfd, mapfd, attach_type);
+	if (err) {
+		p_err("failed prog detach from map");
+		return -EINVAL;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+	return 0;
+}
 static int do_load(int argc, char **argv)
 {
 	enum bpf_attach_type expected_attach_type;
@@ -942,6 +1033,8 @@ static int do_help(int argc, char **argv)
 		"       %s %s pin   PROG FILE\n"
 		"       %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
 		"                         [map { idx IDX | name NAME } MAP]\n"
+		"       %s %s attach PROG ATTACH_TYPE MAP\n"
+		"       %s %s detach PROG ATTACH_TYPE MAP\n"
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_MAP "\n"
@@ -953,10 +1046,12 @@ static int do_help(int argc, char **argv)
 		"                 cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
 		"                 cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
 		"                 cgroup/sendmsg4 | cgroup/sendmsg6 }\n"
+		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
-		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2]);
 
 	return 0;
 }
@@ -968,6 +1063,8 @@ static int do_help(int argc, char **argv)
 	{ "dump",	do_dump },
 	{ "pin",	do_pin },
 	{ "load",	do_load },
+	{ "attach",	do_attach },
+	{ "detach",	do_detach },
 	{ 0 }
 };
 

^ permalink raw reply related

* [bpf-next PATCH v3 0/2] bpftool support for sockmap use cases
From: John Fastabend @ 2018-10-15 18:19 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev

The first patch adds support for attaching programs to maps. This is
needed to support sock{map|hash} use from bpftool. Currently, I carry
around custom code to do this so doing it using standard bpftool will
be great.

The second patch adds a compat mode to ignore non-zero entries in
the map def. This allows using bpftool with maps that have a extra
fields that the user knows can be ignored. This is needed to work
correctly with maps being loaded by other tools or directly via
syscalls.

v3: add bash completion and doc updates for --mapcompat

---

John Fastabend (2):
      bpf: bpftool, add support for attaching programs to maps
      bpf: bpftool, add flag to allow non-compat map definitions

 tools/bpf/bpftool/Documentation/bpftool-prog.rst |   11 ++
 tools/bpf/bpftool/Documentation/bpftool.rst      |    6 +
 tools/bpf/bpftool/bash-completion/bpftool        |   21 ++++-
 tools/bpf/bpftool/main.c                         |    7 +-
 tools/bpf/bpftool/main.h                         |    3 -
 tools/bpf/bpftool/prog.c                         |  101 ++++++++++++++++++++++
 6 files changed, 142 insertions(+), 7 deletions(-)

^ permalink raw reply

* [PATCH bpf-next 2/2] bpf: Fix IPv6 dport byte-order in bpf_sk_lookup
From: Joe Stringer @ 2018-10-15 17:27 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev
In-Reply-To: <20181015172746.6475-1-joe@wand.net.nz>

Commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
mistakenly passed the destination port in network byte-order to the IPv6
TCP/UDP socket lookup functions, which meant that BPF writers would need
to either manually swap the byte-order of this field or otherwise IPv6
sockets could not be located via this helper.

Fix the issue by swapping the byte-order appropriately in the helper.
This also makes the API more consistent with the IPv4 version.

Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 net/core/filter.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 21aba2a521c7..d877c4c599ce 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4846,17 +4846,18 @@ static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
 	} else {
 		struct in6_addr *src6 = (struct in6_addr *)&tuple->ipv6.saddr;
 		struct in6_addr *dst6 = (struct in6_addr *)&tuple->ipv6.daddr;
+		u16 hnum = ntohs(tuple->ipv6.dport);
 		int sdif = inet6_sdif(skb);
 
 		if (proto == IPPROTO_TCP)
 			sk = __inet6_lookup(net, &tcp_hashinfo, skb, 0,
 					    src6, tuple->ipv6.sport,
-					    dst6, tuple->ipv6.dport,
+					    dst6, hnum,
 					    dif, sdif, &refcounted);
 		else if (likely(ipv6_bpf_stub))
 			sk = ipv6_bpf_stub->udp6_lib_lookup(net,
 							    src6, tuple->ipv6.sport,
-							    dst6, tuple->ipv6.dport,
+							    dst6, hnum,
 							    dif, sdif,
 							    &udp_table, skb);
 #endif
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next 1/2] bpf: Allow sk_lookup with IPv6 module
From: Joe Stringer @ 2018-10-15 17:27 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev
In-Reply-To: <20181015172746.6475-1-joe@wand.net.nz>

This is a more complete fix than d71019b54bff ("net: core: Fix build
with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
module is loaded (not just if it's compiled in).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 include/net/addrconf.h |  5 +++++
 net/core/filter.c      | 12 +++++++-----
 net/ipv6/af_inet6.c    |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 6def0351bcc3..14b789a123e7 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -265,6 +265,11 @@ extern const struct ipv6_stub *ipv6_stub __read_mostly;
 struct ipv6_bpf_stub {
 	int (*inet6_bind)(struct sock *sk, struct sockaddr *uaddr, int addr_len,
 			  bool force_bind_address_no_port, bool with_lock);
+	struct sock *(*udp6_lib_lookup)(struct net *net,
+					const struct in6_addr *saddr, __be16 sport,
+					const struct in6_addr *daddr, __be16 dport,
+					int dif, int sdif, struct udp_table *tbl,
+					struct sk_buff *skb);
 };
 extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index b844761b5d4c..21aba2a521c7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4842,7 +4842,7 @@ static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
 			sk = __udp4_lib_lookup(net, src4, tuple->ipv4.sport,
 					       dst4, tuple->ipv4.dport,
 					       dif, sdif, &udp_table, skb);
-#if IS_REACHABLE(CONFIG_IPV6)
+#if IS_ENABLED(CONFIG_IPV6)
 	} else {
 		struct in6_addr *src6 = (struct in6_addr *)&tuple->ipv6.saddr;
 		struct in6_addr *dst6 = (struct in6_addr *)&tuple->ipv6.daddr;
@@ -4853,10 +4853,12 @@ static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
 					    src6, tuple->ipv6.sport,
 					    dst6, tuple->ipv6.dport,
 					    dif, sdif, &refcounted);
-		else
-			sk = __udp6_lib_lookup(net, src6, tuple->ipv6.sport,
-					       dst6, tuple->ipv6.dport,
-					       dif, sdif, &udp_table, skb);
+		else if (likely(ipv6_bpf_stub))
+			sk = ipv6_bpf_stub->udp6_lib_lookup(net,
+							    src6, tuple->ipv6.sport,
+							    dst6, tuple->ipv6.dport,
+							    dif, sdif,
+							    &udp_table, skb);
 #endif
 	}
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e9c8cfdf4b4c..3f4d61017a69 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -901,6 +901,7 @@ static const struct ipv6_stub ipv6_stub_impl = {
 
 static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = {
 	.inet6_bind = __inet6_bind,
+	.udp6_lib_lookup = __udp6_lib_lookup,
 };
 
 static int __init inet6_init(void)
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next 0/2] IPv6 sk-lookup fixes
From: Joe Stringer @ 2018-10-15 17:27 UTC (permalink / raw)
  To: daniel, ast; +Cc: netdev

This series includes a couple of fixups for the IPv6 socket lookup
helper, to make the API more consistent (always supply all arguments in
network byte-order) and to allow its use when IPv6 is compiled as a
module.

Joe Stringer (2):
  bpf: Allow sk_lookup with IPv6 module
  bpf: Fix IPv6 dport byte-order in bpf_sk_lookup

 include/net/addrconf.h |  5 +++++
 net/core/filter.c      | 15 +++++++++------
 net/ipv6/af_inet6.c    |  1 +
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
2.17.1

^ permalink raw reply

* Read Business Letter
From: info @ 2018-10-15 17:19 UTC (permalink / raw)
  To: netdev

Steven Peter Walker(Esq)
Stone Chambers, 4 Field Court,
Gray's Inn, London,
WC1R 5EF..
Email: stevenwalkerchambers@workmail.co.za

Greetings To You,

This is a personal email directed to you and I request that it be 
treated as such. I am Steven Walker, a personal attorney/sole 
executor to the late Engineer Robert M, herein after referred to 
as" my client" I represent the interest of my client killed with 
his immediate family in a fatal motor accident in East London on 
November 5, 2002.and I will like to negotiate the terms of 
investment of resources available to him.

My late client worked as consulting engineer & sub-comptroller 
with Genesis Oil and Gas Consultants Ltd here in the United 
Kingdom and had left behind a deposit of Six Million Eight 
Hundred Thousand British Pounds Sterling only (£6.8million) with 
a finance company. The funds originated from contract 
transactions he executed in his registered area of business. Just 
after his death, I was contacted by the finance house to provide 
his next of kin, reasons been that his deposit agreement contains 
a residuary clause giving his personal attorney express authority 
to nominate the beneficiary to his funds. Unknown to the bank, 
Robert had left no possible trace of any of his close relative 
with me, making all efforts in my part to locate his family 
relative to be unfruitful since his death. In addition, from 
Robert's own story, he was only adopted and his foster parents 
whom he lost in 1976, according to him had no possible trace of 
his real family.

The funds had remained unclaimed since his death, but I had made 
effort writing several letters to the embassy with intent to 
locate any of his extended relatives whom shall be 
claimants/beneficiaries of his abandoned personal estate, and all 
such efforts have been to no avail. More so, I have received 
official letters in the last few weeks suggesting a likely 
proceeding for confiscation of his abandoned personal assets in 
line with existing laws by the bank However, it will interest you 
to know that I discovered that some directors of this finance 
company are making plans already to have this fund to themselves 
only to use the excuse that since I am unable to find a next of 
kin to my late client then the funds should be confiscated, 
meanwhile their intentions is to have the funds retrieved for 
themselves.

I reasoned very professionally and resolved to use a legal means 
to retrieve the abandoned funds, and that is to present the next 
of kin of my deceased client to the bank. This is legally 
possible and would be done in accordance with the laws. On this 
note, I decided to search for a credible person and finding that 
you bear a similar last name, I was urged to contact you, that I 
may, with your consent, present you to the "trustee" bank as my 
late client's surviving family member so as to enable you put up 
a claim to the bank in that capacity as a next of kin of my 
client. I find this to be possible for the fuller reasons that 
you are of the same nationality and you bear a similar last name 
with my late client making it a lot easier for you to put up a 
claim in that capacity. I have all vital documents that would 
confer you the legal right to lay claim to the funds, and it 
would back up your claim. I am willing to make these documents 
available to you so that the proceeds of this bank account valued 
at £6.8million can be paid to you before it is confiscated or 
declared unserviceable to the bank where this huge amount is 
lodged.

I do sincerely sympathize the death of my client but I think that 
it is unprofitable for his funds to be submitted to the 
government of this country or some financial institution. I seek 
your assistance since I have been unable to locate the relatives 
for the past three years now and since no one would come for the 
claim. I seek your consent to present you as the next of kin of 
the deceased since you have the same last name giving you the 
advantage which also makes the claim most credible . In that 
stand, the proceeds of this account can be paid to you. Then, we 
talk about percentage. I know there are others with the same 
surname as my client, but after a little search, my instinct 
tells me to contact you. I shall assemble all the necessary 
documents that would be used to back up your claim.

I guarantee that this will be executed under a legitimate 
arrangement that will protect you from any breach of law. I will 
not fail to bring to your notice that this proposal is hitch-free 
and that you should not entertain any fears as the required 
arrangements have been made for the completion of this transfer. 
As I said, I require only a solemn confidentiality on this. 
Please get in touch via my alternative 
email{stevenwalkerchambers@workmail.co.za} for better 
confidentiality and if it's okay to you send me your telephone 
and fax numbers to enable us discuss further on this transaction, 
please do not take undue advantage of the trust I have bestowed 
in you, Thanks for your understanding.

Kind Regards.
Barrister Steven Peter Walker.

^ permalink raw reply

* Re: [PATCH bpf-next] tools: bpftool: add map create command
From: Jakub Kicinski @ 2018-10-15 16:49 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: daniel, netdev, oss-drivers
In-Reply-To: <20181013061657.a5jxlxr7h3nvdass@ast-mbp.dhcp.thefacebook.com>

On Fri, 12 Oct 2018 23:16:59 -0700, Alexei Starovoitov wrote:
> On Fri, Oct 12, 2018 at 11:06:14AM -0700, Jakub Kicinski wrote:
> > Add a way of creating maps from user space.  The command takes
> > as parameters most of the attributes of the map creation system
> > call command.  After map is created its pinned to bpffs.  This makes
> > it possible to easily and dynamically (without rebuilding programs)
> > test various corner cases related to map creation.
> > 
> > Map type names are taken from bpftool's array used for printing.
> > In general these days we try to make use of libbpf type names, but
> > there are no map type names in libbpf as of today.
> > 
> > As with most features I add the motivation is testing (offloads) :)
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>  
> ...
> >  	fprintf(stderr,
> >  		"Usage: %s %s { show | list }   [MAP]\n"
> > +		"       %s %s create     FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
> > +		"                              entries MAX_ENTRIES [name NAME] [flags FLAGS] \\\n"
> > +		"                              [dev NAME]\n"  
> 
> I suspect as soon as bpftool has an ability to create standalone maps
> some folks will start relying on such interface.

That'd be cool, do you see any real life use cases where its useful
outside of corner case testing?

> Therefore I'd like to request to make 'name' argument to be mandatory.

Will do in v2!

> I think in the future we will require BTF to be mandatory too.
> We need to move towards more transparent and debuggable infra.
> Do you think requiring json description of key/value would be managable to implement?
> Then bpftool could convert it to BTF and the map full be fully defined.
> I certainly understand that bpf prog can disregard the key/value layout today,
> but we will make verifier to enforce that in the future too.

I was hoping that we can leave BTF support as a future extension, and
then once we have the option for the verifier to enforce BTF (a sysctl?)
the bpftool map create without a BTF will get rejected as one would
expect.  IOW it's fine not to make BTF required at bpftool level and
leave it to system configuration.

I'd love to implement the BTF support right away, but I'm not sure I
can afford that right now time-wise.  The whole map create command is
pretty trivial, but for BTF we don't even have a way of dumping it
AFAICT.  We can pretty print values, but what is the format in which to
express the BTF itself?  We could do JSON, do we use an external
library?  Should we have a separate BTF command for that?

^ permalink raw reply

* Re: [PATCH iproute 2/2] utils: fix get_rtnl_link_stats_rta stats parsing
From: Stephen Hemminger @ 2018-10-15 16:41 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: netdev
In-Reply-To: <20181011122401.GA10363@localhost.localdomain>

On Thu, 11 Oct 2018 14:24:03 +0200
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:

> > > iproute2 walks through the list of available tunnels using netlink
> > > protocol in order to get device info instead of reading
> > > them from proc filesystem. However the kernel reports device statistics
> > > using IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS attributes nested in
> > > IFLA_PROTINFO one but iproutes expects these info in
> > > IFLA_STATS64/IFLA_STATS attributes.
> > > The issue can be triggered with the following reproducer:
> > > 
> > > $ip link add ip6d0 type ip6tnl mode ip6ip6 local 1111::1 remote 2222::1
> > > $ip -6 -d -s tunnel show ip6d0
> > > ip6d0: ipv6/ipv6 remote 2222::1 local 1111::1 encaplimit 4 hoplimit 64
> > > tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
> > > Dump terminated
> > > 
> > > Fix the issue introducing IFLA_INET6_STATS attribute parsing
> > > 
> > > Fixes: 3e953938717f ("iptunnel/ip6tunnel: Use netlink to walk through
> > > tunnels list")
> > > 
> > > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>  
> > 
> > Can't we fix the kernel to report statistics properly, rather than
> > starting iproute2 doing more /proc interfaces.
> >   
> 
> Hi Stephen,
> 
> sorry, I did not get what you mean. Current iproute implementation
> walks through tunnels list using netlink protocol and parses device
> statistics in the kernel netlink message. However it does not take
> into account the actual netlink message layout since the statistic
> attribute is nested in IFLA_PROTINFO one.
> Moreover AFAIU the related kernel code has not changed since iproute
> commit 3e953938717f, so I guess we should fix the issue in iproute code
> instead in the kernel one. Do you agree?
> 
> Regards,
> Lorenzo

Applied to current iproute2.

^ permalink raw reply

* [PATCH net-next 7/7] tcp: cdg: use tcp high resolution clock cache
From: Eric Dumazet @ 2018-10-15 16:37 UTC (permalink / raw)
  To: David S . Miller, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Gasper Zejn
  Cc: netdev, Eric Dumazet, Eric Dumazet
In-Reply-To: <20181015163758.232436-1-edumazet@google.com>

We store in tcp socket a cache of most recent high resolution
clock, there is no need to call local_clock() again, since
this cache is good enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_cdg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index 06fbe102a425f28b43294925d8d13af4a13ec776..37eebd9103961be4731323cfb4d933b51954e802 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -146,7 +146,7 @@ static void tcp_cdg_hystart_update(struct sock *sk)
 		return;
 
 	if (hystart_detect & HYSTART_ACK_TRAIN) {
-		u32 now_us = div_u64(local_clock(), NSEC_PER_USEC);
+		u32 now_us = tp->tcp_mstamp;
 
 		if (ca->last_ack == 0 || !tcp_is_cwnd_limited(sk)) {
 			ca->last_ack = now_us;
-- 
2.19.0.605.g01d371f741-goog

^ permalink raw reply related

* [PATCH net-next 6/7] tcp_bbr: fix typo in bbr_pacing_margin_percent
From: Eric Dumazet @ 2018-10-15 16:37 UTC (permalink / raw)
  To: David S . Miller, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Gasper Zejn
  Cc: netdev, Eric Dumazet, Eric Dumazet
In-Reply-To: <20181015163758.232436-1-edumazet@google.com>

From: Neal Cardwell <ncardwell@google.com>

There was a typo in this parameter name.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_bbr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 33f4358615e6d63b5c98a30484f12ffae66334a2..b88081285fd172444a844b6aec5d038c0f882594 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -129,7 +129,7 @@ static const u32 bbr_probe_rtt_mode_ms = 200;
 static const int bbr_min_tso_rate = 1200000;
 
 /* Pace at ~1% below estimated bw, on average, to reduce queue at bottleneck. */
-static const int bbr_pacing_marging_percent = 1;
+static const int bbr_pacing_margin_percent = 1;
 
 /* We use a high_gain value of 2/ln(2) because it's the smallest pacing gain
  * that will allow a smoothly increasing pacing rate that will double each RTT
@@ -214,7 +214,7 @@ static u64 bbr_rate_bytes_per_sec(struct sock *sk, u64 rate, int gain)
 	rate *= mss;
 	rate *= gain;
 	rate >>= BBR_SCALE;
-	rate *= USEC_PER_SEC / 100 * (100 - bbr_pacing_marging_percent);
+	rate *= USEC_PER_SEC / 100 * (100 - bbr_pacing_margin_percent);
 	return rate >> BW_SCALE;
 }
 
-- 
2.19.0.605.g01d371f741-goog

^ permalink raw reply related

* [PATCH net-next 5/7] tcp: optimize tcp internal pacing
From: Eric Dumazet @ 2018-10-15 16:37 UTC (permalink / raw)
  To: David S . Miller, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Gasper Zejn
  Cc: netdev, Eric Dumazet, Eric Dumazet
In-Reply-To: <20181015163758.232436-1-edumazet@google.com>

When TCP implements its own pacing (when no fq packet scheduler is used),
it is arming high resolution timer after a packet is sent.

But in many cases (like TCP_RR kind of workloads), this high resolution
timer expires before the application attempts to write the following
packet. This overhead also happens when the flow is ACK clocked and
cwnd limited instead of being limited by the pacing rate.

This leads to extra overhead (high number of IRQ)

Now tcp_wstamp_ns is reserved for the pacing timer only
(after commit "tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh"),
we can setup the timer only when a packet is about to be sent,
and if tcp_wstamp_ns is in the future.

This leads to a ~10% performance increase in TCP_RR workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5474c9854f252e50cdb1136435417873861d7618..d212e4cbc68902e873afb4a12b43b467ccd6069b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -975,16 +975,6 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-static void tcp_internal_pacing(struct sock *sk)
-{
-	if (!tcp_needs_internal_pacing(sk))
-		return;
-	hrtimer_start(&tcp_sk(sk)->pacing_timer,
-		      ns_to_ktime(tcp_sk(sk)->tcp_wstamp_ns),
-		      HRTIMER_MODE_ABS_PINNED_SOFT);
-	sock_hold(sk);
-}
-
 static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb,
 				      u64 prior_wstamp)
 {
@@ -1005,8 +995,6 @@ static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb,
 			/* take into account OS jitter */
 			len_ns -= min_t(u64, len_ns / 2, credit);
 			tp->tcp_wstamp_ns += len_ns;
-
-			tcp_internal_pacing(sk);
 		}
 	}
 	list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
@@ -2186,10 +2174,23 @@ static int tcp_mtu_probe(struct sock *sk)
 	return -1;
 }
 
-static bool tcp_pacing_check(const struct sock *sk)
+static bool tcp_pacing_check(struct sock *sk)
 {
-	return tcp_needs_internal_pacing(sk) &&
-	       hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (!tcp_needs_internal_pacing(sk))
+		return false;
+
+	if (tp->tcp_wstamp_ns <= tp->tcp_clock_cache)
+		return false;
+
+	if (!hrtimer_is_queued(&tp->pacing_timer)) {
+		hrtimer_start(&tp->pacing_timer,
+			      ns_to_ktime(tp->tcp_wstamp_ns),
+			      HRTIMER_MODE_ABS_PINNED_SOFT);
+		sock_hold(sk);
+	}
+	return true;
 }
 
 /* TCP Small Queues :
-- 
2.19.0.605.g01d371f741-goog

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox