Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: John Hurley @ 2017-09-26 15:11 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAJ3xEMgmC+4V+Yr9PeO1wJ=ACBoPMTrasd7Joyx--F9PjV1yvg@mail.gmail.com>

On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hurley@netronome.com> wrote:
>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>> <simon.horman@netronome.com> wrote:
>>>> From: John Hurley <john.hurley@netronome.com>
>>>>
>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>
>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>
>>
>> Sorry, this is a typo  and should refer to TC.
>>
>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> @@ -52,8 +52,25 @@
>>>>          BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>          BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>          BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>
>>> this series takes care of IPv6 tunnels too?
>>
>> IPv6 is not included in this set.
>> The reason the IPv6 bit is included here is to account for behavior we
>> have noticed in TC flower.
>> If, for example, I add a filter with the following match fields:
>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>> enc_dst_port 4789 enc_key_id 123'
>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>> addresses as 'used'.
>> I am not sure if this is a bug in TC or that we are expected to check
>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>
> you should have your code to check enc_control->addr_type to be
> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>
>
>> Including the IPv6 used_keys bit in our whitelist approach allows us
>> to accept legitimate IPv4 tunnel rules in these situations.
>
> mmm can please take a look on fl_init_dissector() and tell me if you
> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
> I am not sure.


The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
array of keys which are then translated to the used_keys values.
The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
checks if any mask bits are set in a particular field - if so it
eventually marks it as used.
In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
represented as a union of the 2.
Therefore, if we have masked bits set for IPv4, they are also being
set for the IPv6 field.


>
>> If it is found to be IPv6 when the rule is parsed, it will be rejected here.

^ permalink raw reply

* [PATCH v2] p54: don't unregister leds when they are not initialized
From: Andrey Konovalov @ 2017-09-26 15:11 UTC (permalink / raw)
  To: Christian Lamparter, Kalle Valo, linux-wireless, netdev,
	linux-kernel
  Cc: Dmitry Vyukov, Kostya Serebryany, Andrey Konovalov

ieee80211_register_hw() in p54_register_common() may fail and leds won't
get initialized. Currently p54_unregister_common() doesn't check that and
always calls p54_unregister_leds(). The fix is to check priv->registered
flag before calling p54_unregister_leds().

Found by syzkaller.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 1404 Comm: kworker/1:1 Not tainted
4.14.0-rc1-42251-gebb2c2437d80-dirty #205
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x292/0x395 lib/dump_stack.c:52
 register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
 __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
 lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
 flush_work+0xf0/0x8c0 kernel/workqueue.c:2886
 __cancel_work_timer+0x51d/0x870 kernel/workqueue.c:2961
 cancel_delayed_work_sync+0x1f/0x30 kernel/workqueue.c:3081
 p54_unregister_leds+0x6c/0xc0 drivers/net/wireless/intersil/p54/led.c:160
 p54_unregister_common+0x3d/0xb0 drivers/net/wireless/intersil/p54/main.c:856
 p54u_disconnect+0x86/0x120 drivers/net/wireless/intersil/p54/p54usb.c:1073
 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
 __device_release_driver drivers/base/dd.c:861
 device_release_driver_internal+0x4f4/0x5c0 drivers/base/dd.c:893
 device_release_driver+0x1e/0x30 drivers/base/dd.c:918
 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
 device_del+0x5c4/0xab0 drivers/base/core.c:1985
 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
 hub_port_connect drivers/usb/core/hub.c:4754
 hub_port_connect_change drivers/usb/core/hub.c:5009
 port_event drivers/usb/core/hub.c:5115
 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
 process_scheduled_works kernel/workqueue.c:2179
 worker_thread+0xb2b/0x1850 kernel/workqueue.c:2255
 kthread+0x3a1/0x470 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---

changes in v2:
- fixed typo in patch subject

---
 drivers/net/wireless/intersil/p54/main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index d5a3bf91a03e..ab6d39e12069 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -852,12 +852,11 @@ void p54_unregister_common(struct ieee80211_hw *dev)
 {
 	struct p54_common *priv = dev->priv;

-#ifdef CONFIG_P54_LEDS
-	p54_unregister_leds(priv);
-#endif /* CONFIG_P54_LEDS */
-
 	if (priv->registered) {
 		priv->registered = false;
+#ifdef CONFIG_P54_LEDS
+		p54_unregister_leds(priv);
+#endif /* CONFIG_P54_LEDS */
 		ieee80211_unregister_hw(dev);
 	}

-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* Re: [PATCH] p54: don't unregister leds when they are inited
From: Andrey Konovalov @ 2017-09-26 15:12 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Christian Lamparter, Kalle Valo, linux-wireless, netdev, LKML,
	Dmitry Vyukov, Kostya Serebryany
In-Reply-To: <1506438516.22427.21.camel@sipsolutions.net>

On Tue, Sep 26, 2017 at 5:08 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Subject should say *not* initialized?

Yes, sent v2.

>
> johannes

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Eric Dumazet @ 2017-09-26 15:13 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <266d30e7-5164-48e8-b802-56bb93558823@mellanox.com>

On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>
>
> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>
>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>
>>>
>>> Hi Eric,
>>>
>>> We see a regression introduced in this series, specifically in the
>>> patches
>>> touching lib/kobject_uevent.c.
>>> We tried to figure out what is wrong there, but couldn't point it out.
>>>
>>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>>> According to module dependencies, both mlx4_en and mlx4_ib should have
>>> been
>>> unloaded at this point
>>> Please see log below.
>>>
>>> This looks to be some kind of a race, as the repro is not deterministic.
>>> Probably the en/ib modules are now mistakenly reloaded.
>>>
>>> Any idea what could this be?
>>>
>>> Regards,
>>> Tariq
>>>
>>>
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>> Unloading HCA driver:                                      [  OK  ]
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>> Unloading mlx4_core                                        [FAILED]
>>> rmmod: ERROR: Module mlx4_core is in use
>>
>> I have absolutely no idea. Please bisect.
>
> We previously saw a similar issue, that was reported in mailing list.
> Dmitry Torokhov suggested the following fix:
> https://lkml.org/lkml/2017/9/12/523
>
> And indeed, it solved the issue.
>
> We kept the suggested patch in our internal branch, and rebased.
> Issue appeared again once your series was accepted.
>
> By bisecting, we see that the issue re-appears in this patch:
> 4a336a23d619 kobject: copy env blob in one go
>
>>
>> Are you really using netns in the first place ?
>
> No. But seems like it still affects the modules load/unload.
>
> Regards,
> Tariq

Ah this makes sense now.

Dmitry Torokhov hack breaks the assumption I used in my patch.

Since it is not upstream yet, I believe that it will need more work
before being in a proper state.

Thanks.

^ permalink raw reply

* [PATCH][next] cxgb4: make function ch_flower_stats_cb, fixes warning
From: Colin King @ 2017-09-26 15:14 UTC (permalink / raw)
  To: Ganesh Goudar, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The function ch_flower_stats_cb is local to the source and does not need
to be in global scope, so make it static.

Cleans up sparse warnings:
symbol 'ch_flower_stats_cb' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index a36bd66d2834..92a311767381 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -366,7 +366,7 @@ int cxgb4_tc_flower_destroy(struct net_device *dev,
 	return ret;
 }
 
-void ch_flower_stats_cb(unsigned long data)
+static void ch_flower_stats_cb(unsigned long data)
 {
 	struct adapter *adap = (struct adapter *)data;
 	struct ch_tc_flower_entry *flower_entry;
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH v2 2/2] ip_tunnel: add mpls over gre encapsulation
From: Roopa Prabhu @ 2017-09-26 15:15 UTC (permalink / raw)
  To: Amine Kherbouche; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <c40295d24ec5207f5be695a2f888bfa840e2ef2c.1506416988.git.amine.kherbouche@6wind.com>

On Tue, Sep 26, 2017 at 2:22 AM, Amine Kherbouche
<amine.kherbouche@6wind.com> wrote:
> This commit introduces the MPLSoGRE support (RFC 4023), using ip tunnel
> API.
>
> Encap:
>   - Add a new iptunnel type mpls.
>   - Share tx path: gre type mpls loaded from skb->protocol.
>
> Decap:
>   - pull gre hdr and call mpls_forward().
>
> Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
> ---
>  include/net/gre.h              |  3 +++
>  include/uapi/linux/if_tunnel.h |  1 +
>  net/ipv4/gre_demux.c           | 22 ++++++++++++++++++++++
>  net/ipv4/ip_gre.c              |  9 +++++++++
>  net/ipv6/ip6_gre.c             |  7 +++++++
>  net/mpls/af_mpls.c             | 40 ++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 82 insertions(+)
>
> diff --git a/include/net/gre.h b/include/net/gre.h
> index d25d836..88a8343 100644
> --- a/include/net/gre.h
> +++ b/include/net/gre.h
> @@ -35,6 +35,9 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
>                                        u8 name_assign_type);
>  int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
>                      bool *csum_err, __be16 proto, int nhs);
> +#if IS_ENABLED(CONFIG_MPLS)
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len);
> +#endif
>
>  static inline int gre_calc_hlen(__be16 o_flags)
>  {
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 2e52088..a2f48c0 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -84,6 +84,7 @@ enum tunnel_encap_types {
>         TUNNEL_ENCAP_NONE,
>         TUNNEL_ENCAP_FOU,
>         TUNNEL_ENCAP_GUE,
> +       TUNNEL_ENCAP_MPLS,
>  };
>
>  #define TUNNEL_ENCAP_FLAG_CSUM         (1<<0)
> diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
> index b798862..a6a937e 100644
> --- a/net/ipv4/gre_demux.c
> +++ b/net/ipv4/gre_demux.c
> @@ -23,6 +23,9 @@
>  #include <linux/netdevice.h>
>  #include <linux/if_tunnel.h>
>  #include <linux/spinlock.h>
> +#if IS_ENABLED(CONFIG_MPLS)
> +#include <linux/mpls.h>
> +#endif
>  #include <net/protocol.h>
>  #include <net/gre.h>
>
> @@ -122,6 +125,25 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
>  }
>  EXPORT_SYMBOL(gre_parse_header);
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len)
> +{
> +       if (unlikely(!pskb_may_pull(skb, gre_hdr_len)))
> +               goto drop;
> +
> +       /* Pop GRE hdr and reset the skb */
> +       skb_pull(skb, gre_hdr_len);
> +       skb_reset_network_header(skb);
> +
> +       mpls_forward(skb, skb->dev, NULL, NULL);

pls check return value

can mpls_gre_rcv be moved to af_mpls.c ?


> +
> +       return 0;
> +drop:
> +       return NET_RX_DROP;
> +}
> +EXPORT_SYMBOL(mpls_gre_rcv);
> +#endif
> +
>  static int gre_rcv(struct sk_buff *skb)
>  {
>         const struct gre_protocol *proto;
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 9cee986..dd4431c 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -412,10 +412,19 @@ static int gre_rcv(struct sk_buff *skb)
>                         return 0;
>         }
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +               if (mpls_gre_rcv(skb, hdr_len))
> +                       goto drop;
> +               return 0;
> +       }
> +#endif
> +
>         if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
>                 return 0;
>
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> +

unnecessary new line..

>  drop:
>         kfree_skb(skb);
>         return 0;
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index c82d41e..e52396d 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -476,6 +476,13 @@ static int gre_rcv(struct sk_buff *skb)
>         if (hdr_len < 0)
>                 goto drop;
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +               if (mpls_gre_rcv(skb, hdr_len))
> +                       goto drop;
> +               return 0;
> +       }

+newline

also would be nice if the IS_ENABLED could be moved to around mpls_gre_rcv.


> +#endif
>         if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false))
>                 goto drop;
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 36ea2ad..5505074 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -16,6 +16,7 @@
>  #include <net/arp.h>
>  #include <net/ip_fib.h>
>  #include <net/netevent.h>
> +#include <net/ip_tunnels.h>
>  #include <net/netns/generic.h>
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include <net/ipv6.h>
> @@ -39,6 +40,40 @@ static int one = 1;
>  static int label_limit = (1 << 20) - 1;
>  static int ttl_max = 255;
>
> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
> +{
> +       return sizeof(struct mpls_shim_hdr);
> +}
> +
> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
> +                           u8 *protocol, struct flowi4 *fl4)
> +{
> +       return 0;
> +}
> +
> +static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
> +       .encap_hlen     = ipgre_mpls_encap_hlen,
> +       .build_header   = ipgre_mpls_build_header,

There are checks for build header before calling it in iptunnel code,
so, any reason
you can't skip declaring .build_header ?


> +};
> +
> +static int ipgre_tunnel_encap_add_mpls_ops(void)
> +{
> +       int ret = -1;
> +
> +#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
> +       ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> +#endif
> +
> +       return ret;
> +}
> +
> +static void ipgre_tunnel_encap_del_mpls_ops(void)
> +{
> +#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
> +       ip_tunnel_encap_del_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> +#endif
> +}
> +
>  static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
>                        struct nlmsghdr *nlh, struct net *net, u32 portid,
>                        unsigned int nlm_flags);
> @@ -2486,6 +2521,10 @@ static int __init mpls_init(void)
>                       0);
>         rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
>                       mpls_netconf_dump_devconf, 0);
> +       err = ipgre_tunnel_encap_add_mpls_ops();
> +       if (err)
> +               pr_err("Can't add mpls over gre tunnel ops\n");
> +

This will throw an error  if CONFIG_NET_IP_TUNNEL is not enabled.
Can you pls put the CONFIG_NET_IP_TUNNEL around
ipgre_tunnel_encap_add_mpls_ops ?
see CONFIG_INET checks in the rest of af_mpls as example.
same for del_ops


>         err = 0;
>  out:
>         return err;
> @@ -2503,6 +2542,7 @@ static void __exit mpls_exit(void)
>         dev_remove_pack(&mpls_packet_type);
>         unregister_netdevice_notifier(&mpls_dev_notifier);
>         unregister_pernet_subsys(&mpls_net_ops);
> +       ipgre_tunnel_encap_del_mpls_ops();
>  }
>  module_exit(mpls_exit);
>
> --
> 2.1.4
>

^ permalink raw reply

* Re: [PATCH V2] r8152: add Linksys USB3GIGV1 id
From: Doug Anderson @ 2017-09-26 15:19 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Hayes Wang, linux-usb, David S . Miller, LKML, netdev,
	Greg Kroah-Hartman
In-Reply-To: <20170926010925.114436-1-grundler@chromium.org>

Hi

On Mon, Sep 25, 2017 at 6:09 PM, Grant Grundler <grundler@chromium.org> wrote:
> This linksys dongle by default comes up in cdc_ether mode.
> This patch allows r8152 to claim the device:
>    Bus 002 Device 002: ID 13b1:0041 Linksys
>
> Signed-off-by: Grant Grundler <grundler@chromium.org>
> ---
>  drivers/net/usb/cdc_ether.c | 8 ++++++++
>  drivers/net/usb/r8152.c     | 2 ++
>  2 files changed, 10 insertions(+)
>
> V2: add LINKSYS_VENDOR_ID to cdc_ether blacklist

I understand that in v1 people pointed out that if we didn't add this
to the cdc_ether blacklist that we might end up picking the wrong
driver.  ...but one thing concerns me: what happens if someone has the
CDC_ETHER driver configured in their system but _not_ the R8152
driver?  All of a sudden this USB Ethernet adapter which used to work
fine with the CDC Ethernet driver will stop working.

I know that for at least some of the adapters in the CDC Ethernet
blacklist it was claimed that the CDC Ethernet support in the adapter
was kinda broken anyway so the blacklist made sense.  ...but for the
Linksys Gigabit adapter the CDC Ethernet driver seems to work OK, it's
just not quite as full featured / efficient as the R8152 driver.

Is that not a concern?  I guess you could tell people in this
situation that they simply need to enable the R8152 driver to get
continued support for their Ethernet adapter?

-Doug

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Dmitry Torokhov @ 2017-09-26 15:22 UTC (permalink / raw)
  To: Eric Dumazet, Tariq Toukan
  Cc: David S . Miller, netdev, Eric W . Biederman, Eric Dumazet,
	Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CANn89iJ4io_gX-V8A4hE04cH=czGZYs8WGBOpaiBpWMMHOgVVw@mail.gmail.com>

On September 26, 2017 8:13:21 AM PDT, Eric Dumazet <edumazet@google.com> wrote:
>On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com>
>wrote:
>>
>>
>> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>>
>>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com>
>wrote:
>>>>
>>>>
>>>> Hi Eric,
>>>>
>>>> We see a regression introduced in this series, specifically in the
>>>> patches
>>>> touching lib/kobject_uevent.c.
>>>> We tried to figure out what is wrong there, but couldn't point it
>out.
>>>>
>>>> Bug is that mlx4 driver restart fails, because mlx4_core is still
>in use.
>>>> According to module dependencies, both mlx4_en and mlx4_ib should
>have
>>>> been
>>>> unloaded at this point
>>>> Please see log below.
>>>>
>>>> This looks to be some kind of a race, as the repro is not
>deterministic.
>>>> Probably the en/ib modules are now mistakenly reloaded.
>>>>
>>>> Any idea what could this be?
>>>>
>>>> Regards,
>>>> Tariq
>>>>
>>>>
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading HCA driver:                                      [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading mlx4_core                                        [FAILED]
>>>> rmmod: ERROR: Module mlx4_core is in use
>>>
>>> I have absolutely no idea. Please bisect.
>>
>> We previously saw a similar issue, that was reported in mailing list.
>> Dmitry Torokhov suggested the following fix:
>> https://lkml.org/lkml/2017/9/12/523
>>
>> And indeed, it solved the issue.
>>
>> We kept the suggested patch in our internal branch, and rebased.
>> Issue appeared again once your series was accepted.
>>
>> By bisecting, we see that the issue re-appears in this patch:
>> 4a336a23d619 kobject: copy env blob in one go
>>
>>>
>>> Are you really using netns in the first place ?
>>
>> No. But seems like it still affects the modules load/unload.
>>
>> Regards,
>> Tariq
>
>Ah this makes sense now.
>
>Dmitry Torokhov hack breaks the assumption I used in my patch.
>
>Since it is not upstream yet, I believe that it will need more work
>before being in a proper state.

It is in Greg's tree where all kobject patches should go through as far as I know.


Thanks.

-- 
Dmitry

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Tariq Toukan @ 2017-09-26 15:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Torokhov, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <CANn89iJ4io_gX-V8A4hE04cH=czGZYs8WGBOpaiBpWMMHOgVVw@mail.gmail.com>



On 26/09/2017 6:13 PM, Eric Dumazet wrote:
> On Tue, Sep 26, 2017 at 8:04 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>
>> On 26/09/2017 3:51 PM, Eric Dumazet wrote:
>>> On Tue, Sep 26, 2017 at 4:21 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> We see a regression introduced in this series, specifically in the
>>>> patches
>>>> touching lib/kobject_uevent.c.
>>>> We tried to figure out what is wrong there, but couldn't point it out.
>>>>
>>>> Bug is that mlx4 driver restart fails, because mlx4_core is still in use.
>>>> According to module dependencies, both mlx4_en and mlx4_ib should have
>>>> been
>>>> unloaded at this point
>>>> Please see log below.
>>>>
>>>> This looks to be some kind of a race, as the repro is not deterministic.
>>>> Probably the en/ib modules are now mistakenly reloaded.
>>>>
>>>> Any idea what could this be?
>>>>
>>>> Regards,
>>>> Tariq
>>>>
>>>>
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading HCA driver:                                      [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd start
>>>> Loading HCA driver and Access Layer:                       [  OK  ]
>>>> [root@reg-l-vrt-41016-009 ~]# /etc/init.d/openibd stop
>>>> Unloading mlx4_core                                        [FAILED]
>>>> rmmod: ERROR: Module mlx4_core is in use
>>> I have absolutely no idea. Please bisect.
>> We previously saw a similar issue, that was reported in mailing list.
>> Dmitry Torokhov suggested the following fix:
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2017%2F9%2F12%2F523&data=02%7C01%7Ctariqt%40mellanox.com%7C4a275c766aeb4224376e08d504f12193%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420356043309380&sdata=GGeDFkX277R%2BKShsUPsePoAD6p5yaO2v0CteABtCrcY%3D&reserved=0
>>
>> And indeed, it solved the issue.
>>
>> We kept the suggested patch in our internal branch, and rebased.
>> Issue appeared again once your series was accepted.
>>
>> By bisecting, we see that the issue re-appears in this patch:
>> 4a336a23d619 kobject: copy env blob in one go
>>
>>> Are you really using netns in the first place ?
>> No. But seems like it still affects the modules load/unload.
>>
>> Regards,
>> Tariq
> Ah this makes sense now.
>
> Dmitry Torokhov hack breaks the assumption I used in my patch.
>
> Since it is not upstream yet, I believe that it will need more work
> before being in a proper state.
>
> Thanks.
I see. Thanks for the clarification.
I guess we'll keep only one patch for now, until issues are resolved.

Regards.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/7] net: speedup netns create/delete time
From: Eric Dumazet @ 2017-09-26 15:30 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Tariq Toukan, David S . Miller, netdev, Eric W . Biederman,
	Eric Dumazet, Majd Dibbiny, Yonatan Cohen, Eran Ben Elisha
In-Reply-To: <33E69A03-6594-423A-86E6-02029046BE7D@gmail.com>

On Tue, Sep 26, 2017 at 8:22 AM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:

> It is in Greg's tree where all kobject patches should go through as far as I know.

Yes, I will fix this, adding a second memmove()

^ permalink raw reply

* [PATCH v2 net-next 0/2] bpf/verifier: disassembly improvements
From: Edward Cree @ 2017-09-26 15:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, alexei.starovoitov, ys114321

Fix the output of print_bpf_insn() for ALU ops that don't look like
 compound assignment (i.e. BPF_END and BPF_NEG).

Sample output for a short test program:
0: (b4) (u32) r0 = (u32) 0
1: (dc) r0 = be32 r0
2: (84) r0 = (u32) -r0
3: (95) exit
processed 4 insns, stack depth 0

Edward Cree (2):
  bpf/verifier: improve disassembly of BPF_END instructions
  bpf/verifier: improve disassembly of BPF_NEG instructions

 kernel/bpf/verifier.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: Or Gerlitz @ 2017-09-26 15:33 UTC (permalink / raw)
  To: John Hurley
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAK+XE=kBG9761tN6uC3ziiqP-jrAjLJ_3J7M-cyfKvTqHVBk7A@mail.gmail.com>

On Tue, Sep 26, 2017 at 6:11 PM, John Hurley <john.hurley@netronome.com> wrote:
> On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hurley@netronome.com> wrote:
>>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>>> <simon.horman@netronome.com> wrote:
>>>>> From: John Hurley <john.hurley@netronome.com>
>>>>>
>>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>>
>>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>>
>>>
>>> Sorry, this is a typo  and should refer to TC.
>>>
>>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>>> @@ -52,8 +52,25 @@
>>>>>          BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>>          BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>>          BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>>
>>>> this series takes care of IPv6 tunnels too?
>>>
>>> IPv6 is not included in this set.
>>> The reason the IPv6 bit is included here is to account for behavior we
>>> have noticed in TC flower.
>>> If, for example, I add a filter with the following match fields:
>>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>>> enc_dst_port 4789 enc_key_id 123'
>>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>>> addresses as 'used'.
>>> I am not sure if this is a bug in TC or that we are expected to check
>>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>>
>> you should have your code to check enc_control->addr_type to be
>> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>>
>>
>>> Including the IPv6 used_keys bit in our whitelist approach allows us
>>> to accept legitimate IPv4 tunnel rules in these situations.
>>
>> mmm can please take a look on fl_init_dissector() and tell me if you
>> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
>> I am not sure.
>
>
> The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
> array of keys which are then translated to the used_keys values.
> The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
> checks if any mask bits are set in a particular field - if so it
> eventually marks it as used.
> In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
> represented as a union of the 2.
> Therefore, if we have masked bits set for IPv4, they are also being
> set for the IPv6 field.

I see, do you consider it a bug?

^ permalink raw reply

* [PATCH v2 net-next 1/2] bpf/verifier: improve disassembly of BPF_END instructions
From: Edward Cree @ 2017-09-26 15:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, alexei.starovoitov, ys114321
In-Reply-To: <52270348-67f1-4e7a-cd2f-9d611ae94064@solarflare.com>

print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
 different structure: it has a size in insn->imm (even if it's BPF_X) and
 uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
 needs different code to print it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 kernel/bpf/verifier.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 799b245..3aaa3262 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -325,26 +325,40 @@ static const char *const bpf_jmp_string[16] = {
 	[BPF_EXIT >> 4] = "exit",
 };
 
+static void print_bpf_end_insn(const struct bpf_verifier_env *env,
+			       const struct bpf_insn *insn)
+{
+	verbose("(%02x) r%d = %s%d r%d\n", insn->code, insn->dst_reg,
+		BPF_SRC(insn->code) == BPF_TO_BE ? "be" : "le",
+		insn->imm, insn->dst_reg);
+}
+
 static void print_bpf_insn(const struct bpf_verifier_env *env,
 			   const struct bpf_insn *insn)
 {
 	u8 class = BPF_CLASS(insn->code);
 
 	if (class == BPF_ALU || class == BPF_ALU64) {
-		if (BPF_SRC(insn->code) == BPF_X)
+		if (BPF_OP(insn->code) == BPF_END) {
+			if (class == BPF_ALU64)
+				verbose("BUG_alu64_%02x\n", insn->code);
+			else
+				print_bpf_end_insn(env, insn);
+		} else if (BPF_SRC(insn->code) == BPF_X) {
 			verbose("(%02x) %sr%d %s %sr%d\n",
 				insn->code, class == BPF_ALU ? "(u32) " : "",
 				insn->dst_reg,
 				bpf_alu_string[BPF_OP(insn->code) >> 4],
 				class == BPF_ALU ? "(u32) " : "",
 				insn->src_reg);
-		else
+		} else {
 			verbose("(%02x) %sr%d %s %s%d\n",
 				insn->code, class == BPF_ALU ? "(u32) " : "",
 				insn->dst_reg,
 				bpf_alu_string[BPF_OP(insn->code) >> 4],
 				class == BPF_ALU ? "(u32) " : "",
 				insn->imm);
+		}
 	} else if (class == BPF_STX) {
 		if (BPF_MODE(insn->code) == BPF_MEM)
 			verbose("(%02x) *(%s *)(r%d %+d) = r%d\n",

^ permalink raw reply related

* [PATCH net-next 0/2] tools: add bpftool
From: Jakub Kicinski @ 2017-09-26 15:35 UTC (permalink / raw)
  To: netdev
  Cc: daniel, alexei.starovoitov, davem, hannes, dsahern, oss-drivers,
	Jakub Kicinski

Hi!

I'm looking for a home for bpftool, Daniel suggested that 
tools/net could be a good place, since there are only BPF
utilities there already.

The tool should be complete for simple use cases and we
will continue extending it as we go along.  E.g. providing
disassembly of loaded programs directly using LLVM library
and JSON output are high on the priority list.

The first patch renames tools/net to tools/bpf, while the
second one adds the new code.


Jakub Kicinski (2):
  tools: rename tools/net directory to tools/bpf
  tools: bpf: add bpftool

 MAINTAINERS                         |   3 +-
 tools/Makefile                      |  20 +-
 tools/{net => bpf}/Makefile         |  18 +-
 tools/{net => bpf}/bpf_asm.c        |   0
 tools/{net => bpf}/bpf_dbg.c        |   0
 tools/{net => bpf}/bpf_exp.l        |   0
 tools/{net => bpf}/bpf_exp.y        |   0
 tools/{net => bpf}/bpf_jit_disasm.c |   0
 tools/bpf/bpftool/Makefile          |  80 ++++
 tools/bpf/bpftool/common.c          | 214 +++++++++++
 tools/bpf/bpftool/jit_disasm.c      |  83 ++++
 tools/bpf/bpftool/main.c            | 212 +++++++++++
 tools/bpf/bpftool/main.h            |  99 +++++
 tools/bpf/bpftool/map.c             | 728 ++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/prog.c            | 392 +++++++++++++++++++
 15 files changed, 1834 insertions(+), 15 deletions(-)
 rename tools/{net => bpf}/Makefile (73%)
 rename tools/{net => bpf}/bpf_asm.c (100%)
 rename tools/{net => bpf}/bpf_dbg.c (100%)
 rename tools/{net => bpf}/bpf_exp.l (100%)
 rename tools/{net => bpf}/bpf_exp.y (100%)
 rename tools/{net => bpf}/bpf_jit_disasm.c (100%)
 create mode 100644 tools/bpf/bpftool/Makefile
 create mode 100644 tools/bpf/bpftool/common.c
 create mode 100644 tools/bpf/bpftool/jit_disasm.c
 create mode 100644 tools/bpf/bpftool/main.c
 create mode 100644 tools/bpf/bpftool/main.h
 create mode 100644 tools/bpf/bpftool/map.c
 create mode 100644 tools/bpf/bpftool/prog.c

-- 
2.14.1

^ permalink raw reply

* [PATCH net-next 1/2] tools: rename tools/net directory to tools/bpf
From: Jakub Kicinski @ 2017-09-26 15:35 UTC (permalink / raw)
  To: netdev
  Cc: daniel, alexei.starovoitov, davem, hannes, dsahern, oss-drivers,
	Jakub Kicinski
In-Reply-To: <20170926153522.31500-1-jakub.kicinski@netronome.com>

We currently only have BPF tools in the tools/net directory.
We are about to add more BPF tools there, not necessarily
networking related, rename the directory and related Makefile
targets to bpf.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 MAINTAINERS                         |  3 +--
 tools/Makefile                      | 14 +++++++-------
 tools/{net => bpf}/Makefile         |  0
 tools/{net => bpf}/bpf_asm.c        |  0
 tools/{net => bpf}/bpf_dbg.c        |  0
 tools/{net => bpf}/bpf_exp.l        |  0
 tools/{net => bpf}/bpf_exp.y        |  0
 tools/{net => bpf}/bpf_jit_disasm.c |  0
 8 files changed, 8 insertions(+), 9 deletions(-)
 rename tools/{net => bpf}/Makefile (100%)
 rename tools/{net => bpf}/bpf_asm.c (100%)
 rename tools/{net => bpf}/bpf_dbg.c (100%)
 rename tools/{net => bpf}/bpf_exp.l (100%)
 rename tools/{net => bpf}/bpf_exp.y (100%)
 rename tools/{net => bpf}/bpf_jit_disasm.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6671f375f7fc..2f79b94a41ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2725,7 +2725,7 @@ F:	net/core/filter.c
 F:	net/sched/act_bpf.c
 F:	net/sched/cls_bpf.c
 F:	samples/bpf/
-F:	tools/net/bpf*
+F:	tools/bpf/
 F:	tools/testing/selftests/bpf/
 
 BROADCOM B44 10/100 ETHERNET DRIVER
@@ -9416,7 +9416,6 @@ F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
 F:	include/uapi/linux/net_namespace.h
-F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
 
diff --git a/tools/Makefile b/tools/Makefile
index 9dfede37c8ff..df6fcb293fbc 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -19,7 +19,7 @@ include scripts/Makefile.include
 	@echo '  kvm_stat               - top-like utility for displaying kvm statistics'
 	@echo '  leds                   - LEDs  tools'
 	@echo '  liblockdep             - user-space wrapper for kernel locking-validator'
-	@echo '  net                    - misc networking tools'
+	@echo '  bpf                    - misc BPF tools'
 	@echo '  perf                   - Linux performance measurement and analysis tool'
 	@echo '  selftests              - various kernel selftests'
 	@echo '  spi                    - spi tools'
@@ -57,7 +57,7 @@ acpi: FORCE
 cpupower: FORCE
 	$(call descend,power/$@)
 
-cgroup firewire hv guest spi usb virtio vm net iio gpio objtool leds: FORCE
+cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds: FORCE
 	$(call descend,$@)
 
 liblockdep: FORCE
@@ -91,7 +91,7 @@ kvm_stat: FORCE
 
 all: acpi cgroup cpupower gpio hv firewire liblockdep \
 		perf selftests spi turbostat usb \
-		virtio vm net x86_energy_perf_policy \
+		virtio vm bpf x86_energy_perf_policy \
 		tmon freefall iio objtool kvm_stat
 
 acpi_install:
@@ -100,7 +100,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
 cpupower_install:
 	$(call descend,power/$(@:_install=),install)
 
-cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install net_install objtool_install:
+cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install:
 	$(call descend,$(@:_install=),install)
 
 liblockdep_install:
@@ -124,7 +124,7 @@ all: acpi cgroup cpupower gpio hv firewire liblockdep \
 install: acpi_install cgroup_install cpupower_install gpio_install \
 		hv_install firewire_install iio_install liblockdep_install \
 		perf_install selftests_install turbostat_install usb_install \
-		virtio_install vm_install net_install x86_energy_perf_policy_install \
+		virtio_install vm_install bpf_install x86_energy_perf_policy_install \
 		tmon_install freefall_install objtool_install kvm_stat_install
 
 acpi_clean:
@@ -133,7 +133,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \
 cpupower_clean:
 	$(call descend,power/cpupower,clean)
 
-cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean net_clean iio_clean gpio_clean objtool_clean leds_clean:
+cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean:
 	$(call descend,$(@:_clean=),clean)
 
 liblockdep_clean:
@@ -169,7 +169,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \
 
 clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \
 		perf_clean selftests_clean turbostat_clean spi_clean usb_clean virtio_clean \
-		vm_clean net_clean iio_clean x86_energy_perf_policy_clean tmon_clean \
+		vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \
 		freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \
 		gpio_clean objtool_clean leds_clean
 
diff --git a/tools/net/Makefile b/tools/bpf/Makefile
similarity index 100%
rename from tools/net/Makefile
rename to tools/bpf/Makefile
diff --git a/tools/net/bpf_asm.c b/tools/bpf/bpf_asm.c
similarity index 100%
rename from tools/net/bpf_asm.c
rename to tools/bpf/bpf_asm.c
diff --git a/tools/net/bpf_dbg.c b/tools/bpf/bpf_dbg.c
similarity index 100%
rename from tools/net/bpf_dbg.c
rename to tools/bpf/bpf_dbg.c
diff --git a/tools/net/bpf_exp.l b/tools/bpf/bpf_exp.l
similarity index 100%
rename from tools/net/bpf_exp.l
rename to tools/bpf/bpf_exp.l
diff --git a/tools/net/bpf_exp.y b/tools/bpf/bpf_exp.y
similarity index 100%
rename from tools/net/bpf_exp.y
rename to tools/bpf/bpf_exp.y
diff --git a/tools/net/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
similarity index 100%
rename from tools/net/bpf_jit_disasm.c
rename to tools/bpf/bpf_jit_disasm.c
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2 net-next 2/2] bpf/verifier: improve disassembly of BPF_NEG instructions
From: Edward Cree @ 2017-09-26 15:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, daniel, alexei.starovoitov, ys114321
In-Reply-To: <52270348-67f1-4e7a-cd2f-9d611ae94064@solarflare.com>

BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
 compound-assignments.  So give it its own format in print_bpf_insn().

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 kernel/bpf/verifier.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3aaa3262..04e0508 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -344,6 +344,11 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
 				verbose("BUG_alu64_%02x\n", insn->code);
 			else
 				print_bpf_end_insn(env, insn);
+		} else if (BPF_OP(insn->code) == BPF_NEG) {
+			verbose("(%02x) r%d = %s-r%d\n",
+				insn->code, insn->dst_reg,
+				class == BPF_ALU ? "(u32) " : "",
+				insn->dst_reg);
 		} else if (BPF_SRC(insn->code) == BPF_X) {
 			verbose("(%02x) %sr%d %s %sr%d\n",
 				insn->code, class == BPF_ALU ? "(u32) " : "",

^ permalink raw reply related

* [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jakub Kicinski @ 2017-09-26 15:35 UTC (permalink / raw)
  To: netdev
  Cc: daniel, alexei.starovoitov, davem, hannes, dsahern, oss-drivers,
	Jakub Kicinski
In-Reply-To: <20170926153522.31500-1-jakub.kicinski@netronome.com>

Add a simple tool for querying and updating BPF objects on the system.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 tools/bpf/Makefile             |  18 +-
 tools/bpf/bpftool/Makefile     |  80 +++++
 tools/bpf/bpftool/common.c     | 214 ++++++++++++
 tools/bpf/bpftool/jit_disasm.c |  83 +++++
 tools/bpf/bpftool/main.c       | 212 ++++++++++++
 tools/bpf/bpftool/main.h       |  99 ++++++
 tools/bpf/bpftool/map.c        | 742 +++++++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/prog.c       | 392 ++++++++++++++++++++++
 8 files changed, 1837 insertions(+), 3 deletions(-)
 create mode 100644 tools/bpf/bpftool/Makefile
 create mode 100644 tools/bpf/bpftool/common.c
 create mode 100644 tools/bpf/bpftool/jit_disasm.c
 create mode 100644 tools/bpf/bpftool/main.c
 create mode 100644 tools/bpf/bpftool/main.h
 create mode 100644 tools/bpf/bpftool/map.c
 create mode 100644 tools/bpf/bpftool/prog.c

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index ddf888010652..325a35e1c28e 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -3,6 +3,7 @@ prefix = /usr
 CC = gcc
 LEX = flex
 YACC = bison
+MAKE = make
 
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
@@ -13,7 +14,7 @@ CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
 %.lex.c: %.l
 	$(LEX) -o $@ $<
 
-all : bpf_jit_disasm bpf_dbg bpf_asm
+all: bpf_jit_disasm bpf_dbg bpf_asm bpftool
 
 bpf_jit_disasm : CFLAGS += -DPACKAGE='bpf_jit_disasm'
 bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl
@@ -26,10 +27,21 @@ bpf_asm : LDLIBS =
 bpf_asm : bpf_asm.o bpf_exp.yacc.o bpf_exp.lex.o
 bpf_exp.lex.o : bpf_exp.yacc.c
 
-clean :
+clean: bpftool_clean
 	rm -rf *.o bpf_jit_disasm bpf_dbg bpf_asm bpf_exp.yacc.* bpf_exp.lex.*
 
-install :
+install: bpftool_install
 	install bpf_jit_disasm $(prefix)/bin/bpf_jit_disasm
 	install bpf_dbg $(prefix)/bin/bpf_dbg
 	install bpf_asm $(prefix)/bin/bpf_asm
+
+bpftool:
+	$(MAKE) -C bpftool
+
+bpftool_install:
+	$(MAKE) -C bpftool install
+
+bpftool_clean:
+	$(MAKE) -C bpftool clean
+
+.PHONY: bpftool FORCE
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
new file mode 100644
index 000000000000..a7151f47fb40
--- /dev/null
+++ b/tools/bpf/bpftool/Makefile
@@ -0,0 +1,80 @@
+include ../../scripts/Makefile.include
+
+include ../../scripts/utilities.mak
+
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(CURDIR)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+#$(info Determined 'srctree' to be $(srctree))
+endif
+
+ifneq ($(objtree),)
+#$(info Determined 'objtree' to be $(objtree))
+endif
+
+ifneq ($(OUTPUT),)
+#$(info Determined 'OUTPUT' to be $(OUTPUT))
+# Adding $(OUTPUT) as a directory to look for source files,
+# because use generated output files as sources dependency
+# for flex/bison parsers.
+VPATH += $(OUTPUT)
+export VPATH
+endif
+
+ifeq ($(V),1)
+  Q =
+else
+  Q = @
+endif
+
+BPF_DIR	= $(srctree)/tools/lib/bpf/
+
+ifneq ($(OUTPUT),)
+  BPF_PATH=$(OUTPUT)
+else
+  BPF_PATH=$(BPF_DIR)
+endif
+
+LIBBPF = $(BPF_PATH)libbpf.a
+
+$(LIBBPF): FORCE
+	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a FEATURES_DUMP=$(FEATURE_DUMP_EXPORT)
+
+$(LIBBPF)-clean:
+	$(call QUIET_CLEAN, libbpf)
+	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) clean >/dev/null
+
+prefix = /usr
+
+CC = gcc
+
+CFLAGS += -O2
+CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
+CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
+LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
+
+include $(wildcard *.d)
+
+all: $(OUTPUT)bpftool
+
+SRCS=$(wildcard *.c)
+OBJS=$(patsubst %.c,$(OUTPUT)%.o,$(SRCS))
+
+$(OUTPUT)bpftool: $(OBJS) $(LIBBPF)
+	$(QUIET_LINK)$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
+
+$(OUTPUT)%.o: %.c
+	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
+
+clean: $(LIBBPF)-clean
+	$(call QUIET_CLEAN, bpftool)
+	$(Q)rm -rf $(OUTPUT)bpftool $(OUTPUT)*.o $(OUTPUT)*.d
+
+install:
+	install $(OUTPUT)bpftool $(prefix)/sbin/bpftool
+
+FORCE:
+
+.PHONY: all clean FORCE
+.DEFAULT_GOAL := all
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
new file mode 100644
index 000000000000..db7bb966c844
--- /dev/null
+++ b/tools/bpf/bpftool/common.c
@@ -0,0 +1,214 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <errno.h>
+#include <libgen.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <linux/limits.h>
+#include <linux/magic.h>
+#include <sys/types.h>
+#include <sys/vfs.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+static bool is_bpffs(char *path)
+{
+	struct statfs st_fs;
+
+	if (statfs(path, &st_fs) < 0)
+		return false;
+
+	return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
+}
+
+int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
+{
+	enum bpf_obj_type type;
+	int fd;
+
+	fd = bpf_obj_get(path);
+	if (fd < 1) {
+		err("bpf obj get (%s): %s\n", path,
+		    errno == EACCES && !is_bpffs(dirname(path)) ?
+		    "directory not in bpf file system (bpffs)" :
+		    strerror(errno));
+		return -1;
+	}
+
+	type = get_fd_type(fd);
+	if (type < 0) {
+		close(fd);
+		return type;
+	}
+	if (type != exp_type) {
+		err("incorrect object type: %s\n", get_fd_type_name(type));
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+{
+	unsigned int id;
+	char *endptr;
+	int err;
+	int fd;
+
+	if (!is_prefix(*argv, "id")) {
+		err("expected 'id' got %s\n", *argv);
+		return -1;
+	}
+	NEXT_ARG();
+
+	id = strtoul(*argv, &endptr, 0);
+	if (*endptr) {
+		err("can't parse %s as ID\n", *argv);
+		return -1;
+	}
+	NEXT_ARG();
+
+	if (argc != 1)
+		usage();
+
+	fd = get_fd_by_id(id);
+	if (fd < 1) {
+		err("can't get prog by id (%u): %s\n", id, strerror(errno));
+		return -1;
+	}
+
+	err = bpf_obj_pin(fd, *argv);
+	close(fd);
+	if (err) {
+		err("can't pin the object (%s): %s\n", *argv,
+		    errno == EACCES && !is_bpffs(dirname(*argv)) ?
+		    "directory not in bpf file system (bpffs)" :
+		    strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
+const char *get_fd_type_name(enum bpf_obj_type type)
+{
+	static const char * const names[] = {
+		[BPF_OBJ_PROG]	= "prog",
+		[BPF_OBJ_MAP]	= "map",
+	};
+
+	if (type > 0 && type < ARRAY_SIZE(names) && names[type])
+		return names[type];
+
+	return "unknown";
+}
+
+int get_fd_type(int fd)
+{
+	char path[PATH_MAX];
+	char buf[512];
+	ssize_t n;
+
+	snprintf(path, sizeof(path), "/proc/%d/fd/%d", getpid(), fd);
+
+	n = readlink(path, buf, sizeof(buf));
+	if (n < 0) {
+		err("can't read link type: %s\n", strerror(errno));
+		return -1;
+	}
+	if (n == sizeof(path)) {
+		err("can't read link type: path too long!\n");
+		return -1;
+	}
+
+	if (strstr(buf, "bpf-map"))
+		return BPF_OBJ_MAP;
+	else if (strstr(buf, "bpf-prog"))
+		return BPF_OBJ_PROG;
+
+	return BPF_OBJ_UNKNOWN;
+}
+
+char *get_fdinfo(int fd, const char *key)
+{
+	char path[PATH_MAX];
+	char *line = NULL;
+	size_t line_n = 0;
+	ssize_t n;
+	FILE *fdi;
+
+	snprintf(path, sizeof(path), "/proc/%d/fdinfo/%d", getpid(), fd);
+
+	fdi = fopen(path, "r");
+	if (!fdi) {
+		err("can't open fdinfo: %s\n", strerror(errno));
+		return NULL;
+	}
+
+	while ((n = getline(&line, &line_n, fdi))) {
+		char *value;
+		int len;
+
+		if (!strstr(line, key))
+			continue;
+
+		fclose(fdi);
+
+		value = strchr(line, '\t');
+		if (!value) {
+			err("malformed fdinfo!?\n");
+			free(line);
+			return NULL;
+		}
+		value++;
+
+		len = strlen(value);
+		memmove(line, value, len);
+		line[len - 1] = '\0';
+
+		return line;
+	}
+
+	err("key '%s' not found in fdinfo\n", key);
+	fclose(fdi);
+	return NULL;
+}
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
new file mode 100644
index 000000000000..e2bcfbf9b824
--- /dev/null
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -0,0 +1,83 @@
+/*
+ * Based on:
+ *
+ * Minimal BPF JIT image disassembler
+ *
+ * Disassembles BPF JIT compiler emitted opcodes back to asm insn's for
+ * debugging or verification purposes.
+ *
+ * Copyright 2013 Daniel Borkmann <daniel@iogearbox.net>
+ * Licensed under the GNU General Public License, version 2.0 (GPLv2)
+ */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <unistd.h>
+#include <string.h>
+#include <bfd.h>
+#include <dis-asm.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+static void get_exec_path(char *tpath, size_t size)
+{
+	ssize_t len;
+	char *path;
+
+	snprintf(tpath, size, "/proc/%d/exe", (int) getpid());
+	tpath[size - 1] = 0;
+
+	path = strdup(tpath);
+	assert(path);
+
+	len = readlink(path, tpath, size);
+	tpath[len] = 0;
+
+	free(path);
+}
+
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
+{
+	disassembler_ftype disassemble;
+	struct disassemble_info info;
+	int count, i, pc = 0;
+	char tpath[256];
+	bfd *bfdf;
+
+	memset(tpath, 0, sizeof(tpath));
+	get_exec_path(tpath, sizeof(tpath));
+
+	bfdf = bfd_openr(tpath, NULL);
+	assert(bfdf);
+	assert(bfd_check_format(bfdf, bfd_object));
+
+	init_disassemble_info(&info, stdout, (fprintf_ftype) fprintf);
+	info.arch = bfd_get_arch(bfdf);
+	info.mach = bfd_get_mach(bfdf);
+	info.buffer = image;
+	info.buffer_length = len;
+
+	disassemble_init_for_target(&info);
+
+	disassemble = disassembler(bfdf);
+	assert(disassemble);
+
+	do {
+		printf("%4x:\t", pc);
+
+		count = disassemble(pc, &info);
+
+		if (opcodes) {
+			printf("\n\t");
+			for (i = 0; i < count; ++i)
+				printf("%02x ", (uint8_t) image[pc + i]);
+		}
+		printf("\n");
+
+		pc += count;
+	} while (count > 0 && pc < len);
+
+	bfd_close(bfdf);
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
new file mode 100644
index 000000000000..622be3b3a28a
--- /dev/null
+++ b/tools/bpf/bpftool/main.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <bfd.h>
+#include <ctype.h>
+#include <errno.h>
+#include <linux/bpf.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+const char *bin_name;
+static int last_argc;
+static char **last_argv;
+static int (*last_do_help)(int argc, char **argv);
+
+void usage(void)
+{
+	last_do_help(last_argc - 1, last_argv + 1);
+
+	exit(-1);
+}
+
+static int do_help(int argc, char **argv)
+{
+	fprintf(stderr,
+		"Usage: %s OBJECT { COMMAND | help }\n"
+		"       %s batch file FILE\n"
+		"\n"
+		"       OBJECT := { prog | map }\n",
+		bin_name, bin_name);
+
+	return 0;
+}
+
+int cmd_select(const struct cmd *cmds, int argc, char **argv,
+	       int (*help)(int argc, char **argv))
+{
+	unsigned int i;
+
+	last_argc = argc;
+	last_argv = argv;
+	last_do_help = help;
+
+	if (argc < 1 && cmds[0].func)
+		return cmds[0].func(argc, argv);
+
+	for (i = 0; cmds[i].func; i++)
+		if (is_prefix(*argv, cmds[i].cmd))
+			return cmds[i].func(argc - 1, argv + 1);
+
+	help(argc - 1, argv + 1);
+
+	return -1;
+}
+
+bool is_prefix(const char *pfx, const char *str)
+{
+	if (!pfx)
+		return false;
+	if (strlen(str) < strlen(pfx))
+		return false;
+
+	return !memcmp(str, pfx, strlen(pfx));
+}
+
+void print_hex(void *arg, unsigned int n, const char *sep)
+{
+	unsigned char *data = arg;
+	unsigned int i;
+
+	for (i = 0; i < n; i++) {
+		const char *pfx = "";
+
+		if (!i)
+			/* nothing */;
+		else if (!(i % 16))
+			printf("\n");
+		else if (!(i % 8))
+			printf("  ");
+		else
+			pfx = sep;
+
+		printf("%s%02hhx", i ? pfx : "", data[i]);
+	}
+}
+
+static int do_batch(int argc, char **argv);
+
+static const struct cmd cmds[] = {
+	{ "help",	do_help },
+	{ "batch",	do_batch },
+	{ "prog",	do_prog },
+	{ "map",	do_map },
+	{ 0 }
+};
+
+static int do_batch(int argc, char **argv)
+{
+	unsigned int lines = 0;
+	char *n_argv[4096];
+	char buf[65536];
+	int n_argc;
+	char *ptr;
+	FILE *fp;
+	int err;
+
+	if (argc < 2) {
+		err("too few parameters for batch\n");
+		return -1;
+	} else if (!is_prefix(*argv, "file")) {
+		err("expected 'file', got: %s\n", *argv);
+		return -1;
+	} else if (argc > 2) {
+		err("too many parameters for batch\n");
+		return -1;
+	}
+	NEXT_ARG();
+
+	fp = fopen(*argv, "r");
+	if (!fp) {
+		err("Can't open file (%s): %s\n", *argv, strerror(errno));
+		return -1;
+	}
+
+	while (fgets(buf, sizeof(buf), fp)) {
+		if (strlen(buf) == sizeof(buf) - 1) {
+			errno = E2BIG;
+			break;
+		}
+
+		ptr = buf;
+		n_argc = 0;
+		while (*ptr) {
+			if (isspace(*ptr)) {
+				ptr++;
+				continue;
+			}
+			n_argv[n_argc++] = ptr;
+
+			ptr += strcspn(ptr, " \t\n");
+			*ptr++ = 0;
+		}
+
+		if (!n_argc)
+			continue;
+
+		err = cmd_select(cmds, n_argc, n_argv, do_help);
+		if (err)
+			goto err_close;
+
+		lines++;
+	}
+
+	if (errno && errno != ENOENT) {
+		perror("reading batch file failed");
+		err = -1;
+	} else {
+		info("processed %d lines\n", lines);
+		err = 0;
+	}
+err_close:
+	fclose(fp);
+
+	return err;
+}
+
+int main(int argc, char **argv)
+{
+	bin_name = argv[0];
+	NEXT_ARG();
+
+	bfd_init();
+
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
new file mode 100644
index 000000000000..85d2d7870a58
--- /dev/null
+++ b/tools/bpf/bpftool/main.h
@@ -0,0 +1,99 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#ifndef __BPF_TOOL_H
+#define __BPF_TOOL_H
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <linux/bpf.h>
+
+#define ARRAY_SIZE(a)	(sizeof(a) / sizeof(a[0]))
+
+#define err(msg...)	fprintf(stderr, "Error: " msg)
+#define warn(msg...)	fprintf(stderr, "Warning: " msg)
+#define info(msg...)	fprintf(stderr, msg)
+
+#define ptr_to_u64(ptr)	((__u64)(unsigned long)(ptr))
+
+#define min(a, b)							\
+	({ typeof(a) _a = (a); typeof(b) _b = (b); _a > _b ? _b : _a; })
+#define max(a, b)							\
+	({ typeof(a) _a = (a); typeof(b) _b = (b); _a < _b ? _b : _a; })
+
+#define NEXT_ARG()	({ argc--; argv++; if (argc < 0) usage(); })
+#define NEXT_ARGP()	({ (*argc)--; (*argv)++; if (*argc < 0) usage(); })
+#define BAD_ARG()	({ err("what is '%s'?\n", *argv); -1; })
+
+#define BPF_TAG_FMT	"%02hhx:%02hhx:%02hhx:%02hhx:"	\
+			"%02hhx:%02hhx:%02hhx:%02hhx"
+
+#define HELP_SPEC_PROGRAM						\
+	"PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }"
+
+enum bpf_obj_type {
+	BPF_OBJ_UNKNOWN,
+	BPF_OBJ_PROG,
+	BPF_OBJ_MAP,
+};
+
+extern const char *bin_name;
+
+bool is_prefix(const char *pfx, const char *str);
+void print_hex(void *arg, unsigned int n, const char *sep);
+void usage(void) __attribute__((noreturn));
+
+struct cmd {
+	const char *cmd;
+	int (*func)(int argc, char **argv);
+};
+
+int cmd_select(const struct cmd *cmds, int argc, char **argv,
+	       int (*help)(int argc, char **argv));
+
+int get_fd_type(int fd);
+const char *get_fd_type_name(enum bpf_obj_type type);
+char *get_fdinfo(int fd, const char *key);
+int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
+
+int do_prog(int argc, char **arg);
+int do_map(int argc, char **arg);
+
+int prog_parse_fd(int *argc, char ***argv);
+
+void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes);
+
+#endif
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
new file mode 100644
index 000000000000..db46986fef73
--- /dev/null
+++ b/tools/bpf/bpftool/map.c
@@ -0,0 +1,742 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+static const char * const map_type_name[] = {
+	[BPF_MAP_TYPE_UNSPEC]		= "unspec",
+	[BPF_MAP_TYPE_HASH]		= "hash",
+	[BPF_MAP_TYPE_ARRAY]		= "array",
+	[BPF_MAP_TYPE_PROG_ARRAY]	= "prog_array",
+	[BPF_MAP_TYPE_PERF_EVENT_ARRAY]	= "perf_event_array",
+	[BPF_MAP_TYPE_PERCPU_HASH]	= "percpu_hash",
+	[BPF_MAP_TYPE_PERCPU_ARRAY]	= "percpu_array",
+	[BPF_MAP_TYPE_STACK_TRACE]	= "stack_trace",
+	[BPF_MAP_TYPE_CGROUP_ARRAY]	= "cgroup_array",
+	[BPF_MAP_TYPE_LRU_HASH]		= "lru_hash",
+	[BPF_MAP_TYPE_LRU_PERCPU_HASH]	= "lru_percpu_hash",
+	[BPF_MAP_TYPE_LPM_TRIE]		= "lpm_trie",
+	[BPF_MAP_TYPE_ARRAY_OF_MAPS]	= "array_of_maps",
+	[BPF_MAP_TYPE_HASH_OF_MAPS]	= "hash_of_maps",
+	[BPF_MAP_TYPE_DEVMAP]		= "devmap",
+	[BPF_MAP_TYPE_SOCKMAP]		= "sockmap",
+};
+
+static unsigned int get_possible_cpus(void)
+{
+	static unsigned int result;
+	char buf[128];
+	long int n;
+	char *ptr;
+	int fd;
+
+	if (result)
+		return result;
+
+	fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
+	if (fd < 1) {
+		err("can't open sysfs possible cpus\n");
+		exit(-1);
+	}
+
+	n = read(fd, buf, sizeof(buf));
+	if (n < 2) {
+		err("can't read sysfs possible cpus\n");
+		exit(-1);
+	}
+	close(fd);
+
+	if (n == sizeof(buf)) {
+		err("read sysfs possible cpus overflow\n");
+		exit(-1);
+	}
+
+	ptr = buf;
+	n = 0;
+	while (*ptr && *ptr != '\n') {
+		unsigned int a, b;
+
+		if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
+			n += b - a + 1;
+
+			ptr = strchr(ptr, '-') + 1;
+		} else if (sscanf(ptr, "%u", &a) == 1) {
+			n++;
+		}
+
+		while (isdigit(*ptr))
+			ptr++;
+		if (*ptr == ',')
+			ptr++;
+	}
+
+	result = n;
+
+	return result;
+}
+
+static bool map_is_per_cpu(__u32 type)
+{
+	return type == BPF_MAP_TYPE_PERCPU_HASH ||
+	       type == BPF_MAP_TYPE_PERCPU_ARRAY ||
+	       type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
+}
+
+static bool map_is_map_of_maps(__u32 type)
+{
+	return type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
+	       type == BPF_MAP_TYPE_HASH_OF_MAPS;
+}
+
+static bool map_is_map_of_progs(__u32 type)
+{
+	return type == BPF_MAP_TYPE_PROG_ARRAY;
+}
+
+static void *alloc_value(struct bpf_map_info *info)
+{
+	if (map_is_per_cpu(info->type))
+		return malloc(info->value_size * get_possible_cpus());
+	else
+		return malloc(info->value_size);
+}
+
+static int map_parse_fd(int *argc, char ***argv)
+{
+	int fd;
+
+	if (is_prefix(**argv, "id")) {
+		unsigned int id;
+		char *endptr;
+
+		NEXT_ARGP();
+
+		id = strtoul(**argv, &endptr, 0);
+		if (*endptr) {
+			err("can't parse %s as ID\n", **argv);
+			return -1;
+		}
+		NEXT_ARGP();
+
+		fd = bpf_map_get_fd_by_id(id);
+		if (fd < 1)
+			err("get map by id (%u): %s\n", id, strerror(errno));
+		return fd;
+	} else if (is_prefix(**argv, "pinned")) {
+		char *path;
+
+		NEXT_ARGP();
+
+		path = **argv;
+		NEXT_ARGP();
+
+		return open_obj_pinned_any(path, BPF_OBJ_MAP);
+	}
+
+	err("expected 'id' or 'pinned', got: '%s'?\n", **argv);
+	return -1;
+}
+
+static int
+map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
+{
+	int err;
+	int fd;
+
+	fd = map_parse_fd(argc, argv);
+	if (fd < 1)
+		return -1;
+
+	err = bpf_obj_get_info_by_fd(fd, info, info_len);
+	if (err) {
+		err("can't get map info: %s\n", strerror(errno));
+		close(fd);
+		return err;
+	}
+
+	return fd;
+}
+
+static void print_entry(struct bpf_map_info *info, unsigned char *key,
+			unsigned char *value)
+{
+	unsigned int i, n;
+
+	if (!map_is_per_cpu(info->type)) {
+		/* Single line print */
+		if (info->key_size + info->value_size <= 24 &&
+		    max(info->key_size, info->value_size) <= 16) {
+			printf("key: ");
+			print_hex(key, info->key_size, " ");
+			printf("  value: ");
+			print_hex(value, info->value_size, " ");
+			printf("\n");
+			return;
+		}
+
+		printf("key: ");
+		print_hex(key, info->key_size, " ");
+		printf("\nvalue: ");
+		print_hex(value, info->value_size, " ");
+		printf("\n");
+
+		return;
+	}
+
+	n = get_possible_cpus();
+
+	printf("key:\n");
+	print_hex(key, info->key_size, " ");
+	printf("\n");
+	for (i = 0; i < n; i++) {
+		printf("value (CPU %02d):%c",
+		       i, info->value_size > 16 ? '\n' : ' ');
+		print_hex(value + i * info->value_size, info->value_size, " ");
+		printf("\n");
+	}
+}
+
+static char **parse_val(char **argv, const char *name, unsigned char *val,
+			unsigned int n)
+{
+	unsigned int i = 0;
+	char *endptr;
+
+	while (i < n && argv[i]) {
+		val[i] = strtoul(argv[i], &endptr, 0);
+		if (*endptr) {
+			err("error parsing byte: %s\n", argv[i]);
+			break;
+		}
+		i++;
+	}
+
+	if (i != n) {
+		err("%s expected %d bytes got %d\n", name, n, i);
+		return NULL;
+	}
+
+	return argv + i;
+}
+
+static int parse_elem(char **argv, struct bpf_map_info *info,
+		      void *key, void *value, __u32 key_size, __u32 value_size,
+		      __u32 *flags, __u32 **value_fd)
+{
+	if (!*argv) {
+		if (!key && !value)
+			return 0;
+		err("did not find %s\n", key ? "key" : "value");
+		return -1;
+	}
+
+	if (is_prefix(*argv, "key")) {
+		if (!key) {
+			if (key_size)
+				err("duplicate key\n");
+			else
+				err("unnecessary key\n");
+			return -1;
+		}
+
+		argv = parse_val(argv + 1, "key", key, key_size);
+		if (!argv)
+			return -1;
+
+		return parse_elem(argv, info, NULL, value, key_size, value_size,
+				  flags, value_fd);
+	} else if (is_prefix(*argv, "value")) {
+		int fd;
+
+		if (!value) {
+			if (value_size)
+				err("duplicate value\n");
+			else
+				err("unnecessary value\n");
+			return -1;
+		}
+
+		argv++;
+
+		if (map_is_map_of_maps(info->type)) {
+			int argc = 2;
+
+			if (value_size != 4) {
+				err("value smaller than 4B for map in map?\n");
+				return -1;
+			}
+			if (!argv[0] || !argv[1]) {
+				err("not enough value arguments for map in map\n");
+				return -1;
+			}
+
+			fd = map_parse_fd(&argc, &argv);
+			if (fd < 1)
+				return -1;
+
+			*value_fd = value;
+			**value_fd = fd;
+		} else if (map_is_map_of_progs(info->type)) {
+			int argc = 2;
+
+			if (value_size != 4) {
+				err("value smaller than 4B for map of progs?\n");
+				return -1;
+			}
+			if (!argv[0] || !argv[1]) {
+				err("not enough value arguments for map of progs\n");
+				return -1;
+			}
+
+			fd = prog_parse_fd(&argc, &argv);
+			if (fd < 1)
+				return -1;
+
+			*value_fd = value;
+			**value_fd = fd;
+		} else {
+			argv = parse_val(argv, "value", value, value_size);
+			if (!argv)
+				return -1;
+		}
+
+		return parse_elem(argv, info, key, NULL, key_size, value_size,
+				  flags, NULL);
+	} else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") ||
+		   is_prefix(*argv, "exist")) {
+		if (!flags) {
+			err("flags specified multiple times: %s\n", *argv);
+			return -1;
+		}
+
+		if (is_prefix(*argv, "any"))
+			*flags = BPF_ANY;
+		else if (is_prefix(*argv, "noexist"))
+			*flags = BPF_NOEXIST;
+		else if (is_prefix(*argv, "exist"))
+			*flags = BPF_EXIST;
+
+		return parse_elem(argv + 1, info, key, value, key_size,
+				  value_size, NULL, value_fd);
+	}
+
+	err("expected key or value, got: %s\n", *argv);
+	return -1;
+}
+
+static int show_map_close(int fd, struct bpf_map_info *info)
+{
+	char *memlock;
+
+	memlock = get_fdinfo(fd, "memlock");
+	close(fd);
+
+	printf("   %u: ", info->id);
+	if (info->type < ARRAY_SIZE(map_type_name))
+		printf("%s  ", map_type_name[info->type]);
+	else
+		printf("type:%u  ", info->type);
+
+	printf("flags:0x%x  key:%uB  value:%uB  max_entries:%u",
+	       info->map_flags, info->key_size, info->value_size,
+	       info->max_entries);
+
+	if (memlock)
+		printf("  memlock:%sB", memlock);
+	free(memlock);
+
+	printf("\n");
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	__u32 id = 0;
+	int err;
+	int fd;
+
+	if (argc == 2) {
+		fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+		if (fd < 0)
+			return -1;
+
+		return show_map_close(fd, &info);
+	}
+
+	if (argc)
+		return BAD_ARG();
+
+	while (true) {
+		err = bpf_map_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			err("can't get next map: %s\n", strerror(errno));
+			if (errno == EINVAL)
+				err("kernel too old?\n");
+			return -1;
+		}
+
+		fd = bpf_map_get_fd_by_id(id);
+		if (fd < 1) {
+			err("can't get map by id (%u): %s\n",
+			    id, strerror(errno));
+			return -1;
+		}
+
+		err = bpf_obj_get_info_by_fd(fd, &info, &len);
+		if (err) {
+			err("can't get map info: %s\n", strerror(errno));
+			close(fd);
+			return -1;
+		}
+
+		show_map_close(fd, &info);
+	}
+
+	return errno == ENOENT ? 0 : -1;
+}
+
+static int do_dump(int argc, char **argv)
+{
+	void *key, *value, *prev_key;
+	unsigned int num_elems = 0;
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	int err;
+	int fd;
+
+	if (argc != 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	if (map_is_map_of_maps(info.type) || map_is_map_of_progs(info.type)) {
+		err("Dumping maps of maps and program maps not supported\n");
+		close(fd);
+		return -1;
+	}
+
+	key = malloc(info.key_size);
+	value = alloc_value(&info);
+	if (!key || !value) {
+		err("mem alloc failed\n");
+		err = -1;
+		goto exit_free;
+	}
+
+	prev_key = NULL;
+	while (true) {
+		err = bpf_map_get_next_key(fd, prev_key, key);
+		if (err) {
+			if (errno == ENOENT)
+				err = 0;
+			break;
+		}
+
+		err = bpf_map_lookup_elem(fd, key, value);
+		if (err) {
+			info("can't lookup element with key: ");
+			print_hex(key, info.key_size, " ");
+			printf("\n");
+			goto next_key;
+		}
+
+		print_entry(&info, key, value);
+next_key:
+		prev_key = key;
+		num_elems++;
+	}
+
+	printf("Found %u element%s\n", num_elems, num_elems != 1 ? "s" : "");
+
+exit_free:
+	free(key);
+	free(value);
+	close(fd);
+
+	return err;
+}
+
+static int do_update(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	__u32 *value_fd = NULL;
+	__u32 flags = BPF_ANY;
+	void *key, *value;
+	int fd, err;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	value = alloc_value(&info);
+	if (!key || !value) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	err = parse_elem(argv, &info, key, value, info.key_size,
+			 info.value_size, &flags, &value_fd);
+	if (err)
+		goto exit_free;
+
+	err = bpf_map_update_elem(fd, key, value, flags);
+	if (err) {
+		err("update failed: %s\n", strerror(errno));
+		goto exit_free;
+	}
+
+exit_free:
+	if (value_fd)
+		close(*value_fd);
+	free(key);
+	free(value);
+	close(fd);
+
+	return err;
+}
+
+static int do_lookup(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	void *key, *value;
+	int err;
+	int fd;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	value = alloc_value(&info);
+	if (!key || !value) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL);
+	if (err)
+		goto exit_free;
+
+	err = bpf_map_lookup_elem(fd, key, value);
+	if (!err) {
+		print_entry(&info, key, value);
+	} else if (errno == ENOENT) {
+		printf("key:\n");
+		print_hex(key, info.key_size, " ");
+		printf("\n\nNot found\n");
+	} else {
+		err("lookup failed: %s\n", strerror(errno));
+	}
+
+exit_free:
+	free(key);
+	free(value);
+	close(fd);
+
+	return err;
+}
+
+static int do_getnext(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	void *key, *nextkey;
+	int err;
+	int fd;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	nextkey = malloc(info.key_size);
+	if (!key || !nextkey) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	if (argc) {
+		err = parse_elem(argv, &info, key, NULL, info.key_size, 0,
+				 NULL, NULL);
+		if (err)
+			goto exit_free;
+	} else {
+		free(key);
+		key = NULL;
+	}
+
+	err = bpf_map_get_next_key(fd, key, nextkey);
+	if (err) {
+		err("can't get next key: %s\n", strerror(errno));
+		goto exit_free;
+	}
+
+	if (key) {
+		printf("key:\n");
+		print_hex(key, info.key_size, " ");
+		printf("\n");
+	} else {
+		printf("key: None\n");
+	}
+
+	printf("next key:\n");
+	print_hex(nextkey, info.key_size, " ");
+	printf("\n");
+
+exit_free:
+	free(nextkey);
+	free(key);
+	close(fd);
+
+	return err;
+}
+
+static int do_delete(int argc, char **argv)
+{
+	struct bpf_map_info info = {};
+	__u32 len = sizeof(info);
+	void *key;
+	int err;
+	int fd;
+
+	if (argc < 2)
+		usage();
+
+	fd = map_parse_fd_and_info(&argc, &argv, &info, &len);
+	if (fd < 0)
+		return -1;
+
+	key = malloc(info.key_size);
+	if (!key) {
+		err("mem alloc failed");
+		err = -1;
+		goto exit_free;
+	}
+
+	err = parse_elem(argv, &info, key, NULL, info.key_size, 0, NULL, NULL);
+	if (err)
+		goto exit_free;
+
+	err = bpf_map_delete_elem(fd, key);
+	if (err)
+		err("delete failed: %s\n", strerror(errno));
+
+exit_free:
+	free(key);
+	close(fd);
+
+	return err;
+}
+
+static int do_pin(int argc, char **argv)
+{
+	return do_pin_any(argc, argv, bpf_map_get_fd_by_id);
+}
+
+static int do_help(int argc, char **argv)
+{
+	fprintf(stderr,
+		"Usage: %s %s show   [MAP]\n"
+		"       %s %s dump    MAP\n"
+		"       %s %s update  MAP  key BYTES value VALUE [UPDATE_FLAGS]\n"
+		"       %s %s lookup  MAP  key BYTES\n"
+		"       %s %s getnext MAP [key BYTES]\n"
+		"       %s %s delete  MAP  key BYTES\n"
+		"       %s %s pin     MAP  FILE\n"
+		"       %s %s help\n"
+		"\n"
+		"       MAP := { id MAP_ID | pinned FILE }\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"       VALUE := { BYTES | MAP | PROG }\n"
+		"       UPDATE_FLAGS := { any | exist | noexist }\n"
+		"",
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "help",	do_help },
+	{ "dump",	do_dump },
+	{ "update",	do_update },
+	{ "lookup",	do_lookup },
+	{ "getnext",	do_getnext },
+	{ "delete",	do_delete },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_map(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
new file mode 100644
index 000000000000..3129159c593e
--- /dev/null
+++ b/tools/bpf/bpftool/prog.c
@@ -0,0 +1,392 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* Author: Jakub Kicinski <kubakici@wp.pl> */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <bpf.h>
+
+#include "main.h"
+
+static const char * const prog_type_name[] = {
+	[BPF_PROG_TYPE_UNSPEC]		= "unspec",
+	[BPF_PROG_TYPE_SOCKET_FILTER]	= "socket_filter",
+	[BPF_PROG_TYPE_KPROBE]		= "kprobe",
+	[BPF_PROG_TYPE_SCHED_CLS]	= "sched_cls",
+	[BPF_PROG_TYPE_SCHED_ACT]	= "sched_act",
+	[BPF_PROG_TYPE_TRACEPOINT]	= "tracepoint",
+	[BPF_PROG_TYPE_XDP]		= "xdp",
+	[BPF_PROG_TYPE_PERF_EVENT]	= "perf_event",
+	[BPF_PROG_TYPE_CGROUP_SKB]	= "cgroup_skb",
+	[BPF_PROG_TYPE_CGROUP_SOCK]	= "cgroup_sock",
+	[BPF_PROG_TYPE_LWT_IN]		= "lwt_in",
+	[BPF_PROG_TYPE_LWT_OUT]		= "lwt_out",
+	[BPF_PROG_TYPE_LWT_XMIT]	= "lwt_xmit",
+	[BPF_PROG_TYPE_SOCK_OPS]	= "sock_ops",
+	[BPF_PROG_TYPE_SK_SKB]		= "sk_skb",
+};
+
+static int prog_fd_by_tag(unsigned char *tag)
+{
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	unsigned int id = 0;
+	int err;
+	int fd;
+
+	while (true) {
+		err = bpf_prog_get_next_id(id, &id);
+		if (err) {
+			err("%s\n", strerror(errno));
+			return -1;
+		}
+
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 1) {
+			err("can't get prog by id (%u): %s\n",
+			    id, strerror(errno));
+			return -1;
+		}
+
+		err = bpf_obj_get_info_by_fd(fd, &info, &len);
+		if (err) {
+			err("can't get prog info (%u): %s\n",
+			    id, strerror(errno));
+			close(fd);
+			return -1;
+		}
+
+		if (!memcmp(tag, info.tag, BPF_TAG_SIZE))
+			return fd;
+
+		close(fd);
+	}
+}
+
+int prog_parse_fd(int *argc, char ***argv)
+{
+	int fd;
+
+	if (is_prefix(**argv, "id")) {
+		unsigned int id;
+		char *endptr;
+
+		NEXT_ARGP();
+
+		id = strtoul(**argv, &endptr, 0);
+		if (*endptr) {
+			err("can't parse %s as ID\n", **argv);
+			return -1;
+		}
+		NEXT_ARGP();
+
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 1)
+			err("get by id (%u): %s\n", id, strerror(errno));
+		return fd;
+	} else if (is_prefix(**argv, "tag")) {
+		unsigned char tag[BPF_TAG_SIZE];
+
+		NEXT_ARGP();
+
+		if (sscanf(**argv, BPF_TAG_FMT, tag, tag + 1, tag + 2,
+			   tag + 3, tag + 4, tag + 5, tag + 6, tag + 7)
+		    != BPF_TAG_SIZE) {
+			err("can't parse tag\n");
+			return -1;
+		}
+		NEXT_ARGP();
+
+		return prog_fd_by_tag(tag);
+	} else if (is_prefix(**argv, "pinned")) {
+		char *path;
+
+		NEXT_ARGP();
+
+		path = **argv;
+		NEXT_ARGP();
+
+		return open_obj_pinned_any(path, BPF_OBJ_PROG);
+	}
+
+	err("expected 'id', 'tag' or 'pinned', got: '%s'?\n", **argv);
+	return -1;
+}
+
+static int show_prog(int fd)
+{
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	char *memlock;
+	int err;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		err("can't get prog info: %s\n", strerror(errno));
+		return -1;
+	}
+
+	printf("   %u: ", info.id);
+	if (info.type < ARRAY_SIZE(prog_type_name))
+		printf("%s  ", prog_type_name[info.type]);
+	else
+		printf("type:%u  ", info.type);
+
+	printf("tag ");
+	print_hex(info.tag, BPF_TAG_SIZE, ":");
+
+	printf("  xlated:%uB", info.xlated_prog_len);
+
+	if (info.jited_prog_len)
+		printf("  jited:%uB", info.jited_prog_len);
+	else
+		printf("  not jited");
+
+	memlock = get_fdinfo(fd, "memlock");
+	if (memlock)
+		printf("  memlock:%sB", memlock);
+	free(memlock);
+
+	printf("\n");
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{	__u32 id = 0;
+	int err;
+	int fd;
+
+	if (argc == 2) {
+		fd = prog_parse_fd(&argc, &argv);
+		if (fd < 1)
+			return -1;
+
+		return show_prog(fd);
+	}
+
+	if (argc)
+		return BAD_ARG();
+
+	while (true) {
+		err = bpf_prog_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			err("can't get next program: %s\n", strerror(errno));
+			if (errno == EINVAL)
+				err("kernel too old?\n");
+			return -1;
+		}
+
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 1) {
+			err("can't get prog by id (%u): %s\n",
+			    id, strerror(errno));
+			return -1;
+		}
+
+		err = show_prog(fd);
+		close(fd);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int do_dump(int argc, char **argv)
+{
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	bool can_disasm = false;
+	unsigned int buf_size;
+	char *filepath = NULL;
+	bool opcodes = false;
+	unsigned char *buf;
+	__u32 *member_len;
+	__u64 *member_ptr;
+	ssize_t n;
+	int err;
+	int fd;
+
+	if (is_prefix(*argv, "jited")) {
+		member_len = &info.jited_prog_len;
+		member_ptr = &info.jited_prog_insns;
+		can_disasm = true;
+	} else if (is_prefix(*argv, "xlated")) {
+		member_len = &info.xlated_prog_len;
+		member_ptr = &info.xlated_prog_insns;
+	} else {
+		err("expected 'xlated' or 'jited', got: %s\n", *argv);
+		return -1;
+	}
+	NEXT_ARG();
+
+	if (argc < 2)
+		usage();
+
+	fd = prog_parse_fd(&argc, &argv);
+	if (fd < 0)
+		return -1;
+
+	if (is_prefix(*argv, "file")) {
+		NEXT_ARG();
+		if (!argc) {
+			err("expected file path\n");
+			return -1;
+		}
+
+		filepath = *argv;
+		NEXT_ARG();
+	} else if (is_prefix(*argv, "opcodes")) {
+		opcodes = true;
+		NEXT_ARG();
+	}
+
+	if (!filepath && !can_disasm) {
+		err("expected 'file' got %s\n", *argv);
+		return -1;
+	}
+	if (argc) {
+		usage();
+		return -1;
+	}
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		err("can't get prog info: %s\n", strerror(errno));
+		return -1;
+	}
+
+	if (!*member_len) {
+		info("no instructions returned\n");
+		close(fd);
+		return 0;
+	}
+
+	buf_size = *member_len;
+
+	buf = malloc(buf_size);
+	if (!buf) {
+		err("mem alloc failed\n");
+		close(fd);
+		return -1;
+	}
+
+	memset(&info, 0, sizeof(info));
+
+	*member_ptr = ptr_to_u64(buf);
+	*member_len = buf_size;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	close(fd);
+	if (err) {
+		err("can't get prog info: %s\n", strerror(errno));
+		goto err_free;
+	}
+
+	if (*member_len > buf_size) {
+		info("too many instructions returned\n");
+		goto err_free;
+	}
+
+	if (filepath) {
+		fd = open(filepath, O_WRONLY | O_CREAT | O_TRUNC, 0600);
+		if (fd < 1) {
+			err("can't open file %s: %s\n", filepath,
+			    strerror(errno));
+			goto err_free;
+		}
+
+		n = write(fd, buf, *member_len);
+		close(fd);
+		if (n != *member_len) {
+			err("error writing output file: %s\n",
+			    n < 0 ? strerror(errno) : "short write");
+			goto err_free;
+		}
+	} else {
+		disasm_print_insn(buf, *member_len, opcodes);
+	}
+
+	free(buf);
+
+	return 0;
+
+err_free:
+	free(buf);
+	return -1;
+}
+
+static int do_pin(int argc, char **argv)
+{
+	return do_pin_any(argc, argv, bpf_prog_get_fd_by_id);
+}
+
+static int do_help(int argc, char **argv)
+{
+	fprintf(stderr,
+		"Usage: %s %s show [PROG]\n"
+		"       %s %s dump xlated PROG  file FILE\n"
+		"       %s %s dump jited  PROG [file FILE] [opcodes]\n"
+		"       %s %s pin   PROG FILE\n"
+		"       %s %s help\n"
+		"\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"",
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "dump",	do_dump },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_prog(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
-- 
2.14.1

^ permalink raw reply related

* [PATCH v2] ebtables: fix race condition in frame_filter_net_init()
From: Artem Savkov @ 2017-09-26 15:39 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel,
	Artem Savkov
In-Reply-To: <20170926124211.GA14971@breakpoint.cc>

It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov <asavkov@redhat.com>
---
 include/linux/netfilter_bridge/ebtables.h |  7 ++++---
 net/bridge/netfilter/ebtable_broute.c     |  4 ++--
 net/bridge/netfilter/ebtable_filter.c     |  4 ++--
 net/bridge/netfilter/ebtable_nat.c        |  4 ++--
 net/bridge/netfilter/ebtables.c           | 17 ++++++++---------
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 		     ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-					    const struct ebt_table *table,
-					    const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+			      const struct ebt_table *table,
+			      const struct nf_hook_ops *ops,
+			      struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 				 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-	net->xt.broute_table = ebt_register_table(net, &broute_table, NULL);
-	return PTR_ERR_OR_ZERO(net->xt.broute_table);
+	return ebt_register_table(net, &broute_table, NULL,
+				  &net->xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-	net->xt.frame_filter = ebt_register_table(net, &frame_filter, ebt_ops_filter);
-	return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+	return ebt_register_table(net, &frame_filter, ebt_ops_filter,
+				  &net->xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-	net->xt.frame_nat = ebt_register_table(net, &frame_nat, ebt_ops_nat);
-	return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+	return ebt_register_table(net, &frame_nat, ebt_ops_nat,
+				  &net->xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..aa81afe81f23 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, struct ebt_table *table)
 	kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-		   const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+		       const struct nf_hook_ops *ops, struct ebt_table **res)
 {
 	struct ebt_table_info *newinfo;
 	struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 	    repl->entries == NULL || repl->entries_size == 0 ||
 	    repl->counters != NULL || input_table->private != NULL) {
 		BUGPRINT("Bad table data for ebt_register_table!!!\n");
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 	}
 
 	/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,16 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 	list_add(&table->list, &net->xt.tables[NFPROTO_BRIDGE]);
 	mutex_unlock(&ebt_mutex);
 
-	if (!ops)
-		return table;
+	WRITE_ONCE(*res, table);
 
 	ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
 	if (ret) {
 		__ebt_unregister_table(net, table);
-		return ERR_PTR(ret);
+		*res = NULL;
+		return ret;
 	}
 
-	return table;
+	return 0;
 free_unlock:
 	mutex_unlock(&ebt_mutex);
 free_chainstack:
@@ -1276,7 +1275,7 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 free_table:
 	kfree(table);
 out:
-	return ERR_PTR(ret);
+	return ret;
 }
 
 void ebt_unregister_table(struct net *net, struct ebt_table *table,
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH net-next 2/7] nfp: compile flower vxlan tunnel metadata match fields
From: John Hurley @ 2017-09-26 15:39 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Simon Horman, David Miller, Jakub Kicinski, Linux Netdev List,
	oss-drivers
In-Reply-To: <CAJ3xEMhio5+5BK3p5mq=TvA8SzAvjSJm75mN0XiChdWkW3C89g@mail.gmail.com>

On Tue, Sep 26, 2017 at 4:33 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 6:11 PM, John Hurley <john.hurley@netronome.com> wrote:
>> On Tue, Sep 26, 2017 at 3:12 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Tue, Sep 26, 2017 at 4:58 PM, John Hurley <john.hurley@netronome.com> wrote:
>>>> On Mon, Sep 25, 2017 at 7:35 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Mon, Sep 25, 2017 at 1:23 PM, Simon Horman
>>>>> <simon.horman@netronome.com> wrote:
>>>>>> From: John Hurley <john.hurley@netronome.com>
>>>>>>
>>>>>> Compile ovs-tc flower vxlan metadata match fields for offloading. Only
>>>>>
>>>>> anything in the npf kernel bits has direct relation to ovs? what?
>>>>>
>>>>
>>>> Sorry, this is a typo  and should refer to TC.
>>>>
>>>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>>>> @@ -52,8 +52,25 @@
>>>>>>          BIT(FLOW_DISSECTOR_KEY_PORTS) | \
>>>>>>          BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | \
>>>>>>          BIT(FLOW_DISSECTOR_KEY_VLAN) | \
>>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | \
>>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | \
>>>>>> +        BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | \
>>>>>
>>>>> this series takes care of IPv6 tunnels too?
>>>>
>>>> IPv6 is not included in this set.
>>>> The reason the IPv6 bit is included here is to account for behavior we
>>>> have noticed in TC flower.
>>>> If, for example, I add a filter with the following match fields:
>>>> 'protocol ip flower enc_src_ip 10.0.0.1 enc_dst_ip 10.0.0.2
>>>> enc_dst_port 4789 enc_key_id 123'
>>>> The 'used_keys' value in the dissector marks both IPv4 and IPv6 encap
>>>> addresses as 'used'.
>>>> I am not sure if this is a bug in TC or that we are expected to check
>>>> the enc_control fields to determine if IPv4 or v6 addresses are used.
>>>
>>> you should have your code to check enc_control->addr_type to be
>>> FLOW_DISSECTOR_KEY_IPV4_ADDRS or IPV6_ADDRS
>>>
>>>
>>>> Including the IPv6 used_keys bit in our whitelist approach allows us
>>>> to accept legitimate IPv4 tunnel rules in these situations.
>>>
>>> mmm can please take a look on fl_init_dissector() and tell me if you
>>> see why FLOW_DISSECTOR_KEY_IPV6_ADDRS is set for ipv4 tunnels,
>>> I am not sure.
>>
>>
>> The fl_init_dissector uses the FL_KEY_SET_IF_MASKED macro to set an
>> array of keys which are then translated to the used_keys values.
>> The FL_KEY_SET_IF_MASKED takes a 'struct fl_flow_key' as input and
>> checks if any mask bits are set in a particular field - if so it
>> eventually marks it as used.
>> In struct fl_flow_key, the encap ipv4 and ipv6 addresses are
>> represented as a union of the 2.
>> Therefore, if we have masked bits set for IPv4, they are also being
>> set for the IPv6 field.
>
> I see, do you consider it a bug?

The code seems to insist that, if either IPv4 or IPv6 is in use then a
control encap key is also used:

if (FL_KEY_IS_MASKED(&mask->key, enc_ipv4) ||
   FL_KEY_IS_MASKED(&mask->key, enc_ipv6))
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_ENC_CONTROL,
  enc_control);

Therefore, I think it should be ok to use this to determine the IP
type in use by the tunnel.

^ permalink raw reply

* Re: [PATCH] lib: fix multiple strlcpy definition
From: Phil Sutter @ 2017-09-26 15:55 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Stephen Hemminger, netdev
In-Reply-To: <7f5fee59e46bfd3b4acf8ed9a8fbd8c7b4f1cd70.1506424129.git.baruch@tkos.co.il>

On Tue, Sep 26, 2017 at 02:08:49PM +0300, Baruch Siach wrote:
[...]
> diff --git a/configure b/configure
> index 7be8fb113cc9..787b2e061af9 100755
> --- a/configure
> +++ b/configure
> @@ -326,6 +326,27 @@ EOF
>      rm -f $TMPDIR/dbtest.c $TMPDIR/dbtest
>  }
>  
> +check_strlcpy()
> +{
> +    cat >$TMPDIR/strtest.c <<EOF
> +#include <string.h>
> +int main(int argc, char **argv) {
> +	char dst[10];
> +	strlcpy("test", dst, sizeof(dst));

You swapped source and destination here. It's not important for the
given use-case, but the resulting binary should segfault.

Apart from that, LGTM!

Cheers, Phil

^ permalink raw reply

* [PATCH net] packet: in packet_do_bind, test fanout with bind_lock held
From: Willem de Bruijn @ 2017-09-26 16:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Once a socket has po->fanout set, it remains a member of the group
until it is destroyed. The prot_hook must be constant and identical
across sockets in the group.

If fanout_add races with packet_do_bind between the test of po->fanout
and taking the lock, the bind call may make type or dev inconsistent
with that of the fanout group.

Hold po->bind_lock when testing po->fanout to avoid this race.

I had to introduce artificial delay (local_bh_enable) to actually
observe the race.

Fixes: dc99f600698d ("packet: Add fanout support.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1da0851f51f2..bec01a3daf5b 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3071,13 +3071,15 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
 	int ret = 0;
 	bool unlisted = false;

-	if (po->fanout)
-		return -EINVAL;
-
 	lock_sock(sk);
 	spin_lock(&po->bind_lock);
 	rcu_read_lock();

+	if (po->fanout) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	if (name) {
 		dev = dev_get_by_name_rcu(sock_net(sk), name);
 		if (!dev) {
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [PATCH net] packet: only test po->has_vnet_hdr once in packet_snd
From: Willem de Bruijn @ 2017-09-26 16:20 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Packet socket option po->has_vnet_hdr can be updated concurrently with
other operations if no ring is attached.

Do not test the option twice in packet_snd, as the value may change in
between calls. A race on setsockopt disable may cause a packet > mtu
to be sent without having GSO options set.

Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 net/packet/af_packet.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d288f52c53f7..1da0851f51f2 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2840,6 +2840,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	struct virtio_net_hdr vnet_hdr = { 0 };
 	int offset = 0;
 	struct packet_sock *po = pkt_sk(sk);
+	bool has_vnet_hdr = false;
 	int hlen, tlen, linear;
 	int extra_len = 0;
 
@@ -2883,6 +2884,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 		err = packet_snd_vnet_parse(msg, &len, &vnet_hdr);
 		if (err)
 			goto out_unlock;
+		has_vnet_hdr = true;
 	}
 
 	if (unlikely(sock_flag(sk, SOCK_NOFCS))) {
@@ -2941,7 +2943,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->priority = sk->sk_priority;
 	skb->mark = sockc.mark;
 
-	if (po->has_vnet_hdr) {
+	if (has_vnet_hdr) {
 		err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le());
 		if (err)
 			goto out_free;
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* Re: [PATCH] lib: fix multiple strlcpy definition
From: Baruch Siach @ 2017-09-26 16:27 UTC (permalink / raw)
  To: Phil Sutter, Stephen Hemminger, netdev
In-Reply-To: <20170926155524.GA26610@orbyte.nwl.cc>

Hi Phil,

On Tue, Sep 26, 2017 at 05:55:24PM +0200, Phil Sutter wrote:
> On Tue, Sep 26, 2017 at 02:08:49PM +0300, Baruch Siach wrote:
> [...]
> > diff --git a/configure b/configure
> > index 7be8fb113cc9..787b2e061af9 100755
> > --- a/configure
> > +++ b/configure
> > @@ -326,6 +326,27 @@ EOF
> >      rm -f $TMPDIR/dbtest.c $TMPDIR/dbtest
> >  }
> >  
> > +check_strlcpy()
> > +{
> > +    cat >$TMPDIR/strtest.c <<EOF
> > +#include <string.h>
> > +int main(int argc, char **argv) {
> > +	char dst[10];
> > +	strlcpy("test", dst, sizeof(dst));
> 
> You swapped source and destination here. It's not important for the
> given use-case, but the resulting binary should segfault.

Will fix that in v2.

> Apart from that, LGTM!

Thanks. I'll take this as an ack.

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply

* [PATCH v3] ebtables: fix race condition in frame_filter_net_init()
From: Artem Savkov @ 2017-09-26 16:35 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netdev, linux-kernel, netfilter-devel,
	Artem Savkov
In-Reply-To: <20170926153923.30094-1-asavkov@redhat.com>

It is possible for ebt_in_hook to be triggered before ebt_table is assigned
resulting in a NULL-pointer dereference. Make sure hooks are
registered as the last step.

v3: restore errorneously removed ops == NULL case check

Fixes: aee12a0a3727 ebtables: remove nf_hook_register usage
Signed-off-by: Artem Savkov <asavkov@redhat.com>
---
 include/linux/netfilter_bridge/ebtables.h |  7 ++++---
 net/bridge/netfilter/ebtable_broute.c     |  4 ++--
 net/bridge/netfilter/ebtable_filter.c     |  4 ++--
 net/bridge/netfilter/ebtable_nat.c        |  4 ++--
 net/bridge/netfilter/ebtables.c           | 17 +++++++++--------
 5 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 2c2a5514b0df..528b24c78308 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -108,9 +108,10 @@ struct ebt_table {
 
 #define EBT_ALIGN(s) (((s) + (__alignof__(struct _xt_align)-1)) & \
 		     ~(__alignof__(struct _xt_align)-1))
-extern struct ebt_table *ebt_register_table(struct net *net,
-					    const struct ebt_table *table,
-					    const struct nf_hook_ops *);
+extern int ebt_register_table(struct net *net,
+			      const struct ebt_table *table,
+			      const struct nf_hook_ops *ops,
+			      struct ebt_table **res);
 extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
 				 const struct nf_hook_ops *);
 extern unsigned int ebt_do_table(struct sk_buff *skb,
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 2585b100ebbb..276b60262981 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -65,8 +65,8 @@ static int ebt_broute(struct sk_buff *skb)
 
 static int __net_init broute_net_init(struct net *net)
 {
-	net->xt.broute_table = ebt_register_table(net, &broute_table, NULL);
-	return PTR_ERR_OR_ZERO(net->xt.broute_table);
+	return ebt_register_table(net, &broute_table, NULL,
+				  &net->xt.broute_table);
 }
 
 static void __net_exit broute_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index 45a00dbdbcad..c41da5fac84f 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_filter[] = {
 
 static int __net_init frame_filter_net_init(struct net *net)
 {
-	net->xt.frame_filter = ebt_register_table(net, &frame_filter, ebt_ops_filter);
-	return PTR_ERR_OR_ZERO(net->xt.frame_filter);
+	return ebt_register_table(net, &frame_filter, ebt_ops_filter,
+				  &net->xt.frame_filter);
 }
 
 static void __net_exit frame_filter_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 57cd5bb154e7..08df7406ecb3 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -93,8 +93,8 @@ static const struct nf_hook_ops ebt_ops_nat[] = {
 
 static int __net_init frame_nat_net_init(struct net *net)
 {
-	net->xt.frame_nat = ebt_register_table(net, &frame_nat, ebt_ops_nat);
-	return PTR_ERR_OR_ZERO(net->xt.frame_nat);
+	return ebt_register_table(net, &frame_nat, ebt_ops_nat,
+				  &net->xt.frame_nat);
 }
 
 static void __net_exit frame_nat_net_exit(struct net *net)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 83951f978445..3b3dcf719e07 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1169,9 +1169,8 @@ static void __ebt_unregister_table(struct net *net, struct ebt_table *table)
 	kfree(table);
 }
 
-struct ebt_table *
-ebt_register_table(struct net *net, const struct ebt_table *input_table,
-		   const struct nf_hook_ops *ops)
+int ebt_register_table(struct net *net, const struct ebt_table *input_table,
+		       const struct nf_hook_ops *ops, struct ebt_table **res)
 {
 	struct ebt_table_info *newinfo;
 	struct ebt_table *t, *table;
@@ -1183,7 +1182,7 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 	    repl->entries == NULL || repl->entries_size == 0 ||
 	    repl->counters != NULL || input_table->private != NULL) {
 		BUGPRINT("Bad table data for ebt_register_table!!!\n");
-		return ERR_PTR(-EINVAL);
+		return -EINVAL;
 	}
 
 	/* Don't add one table to multiple lists. */
@@ -1252,16 +1251,18 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 	list_add(&table->list, &net->xt.tables[NFPROTO_BRIDGE]);
 	mutex_unlock(&ebt_mutex);
 
+	WRITE_ONCE(*res, table);
+
 	if (!ops)
-		return table;
+		return 0;
 
 	ret = nf_register_net_hooks(net, ops, hweight32(table->valid_hooks));
 	if (ret) {
 		__ebt_unregister_table(net, table);
-		return ERR_PTR(ret);
+		*res = NULL;
 	}
 
-	return table;
+	return ret;
 free_unlock:
 	mutex_unlock(&ebt_mutex);
 free_chainstack:
@@ -1276,7 +1277,7 @@ ebt_register_table(struct net *net, const struct ebt_table *input_table,
 free_table:
 	kfree(table);
 out:
-	return ERR_PTR(ret);
+	return ret;
 }
 
 void ebt_unregister_table(struct net *net, struct ebt_table *table,
-- 
2.13.5

^ permalink raw reply related

* [iproute PATCH v2 1/3] ip{6,}tunnel: Avoid copying user-supplied interface name around
From: Phil Sutter @ 2017-09-26 16:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170926163548.24347-1-phil@nwl.cc>

In both files' parse_args() functions as well as in iptunnel's do_prl()
and do_6rd() functions, a user-supplied 'dev' parameter is uselessly
copied into a temporary buffer before passing it to ll_name_to_index()
or copying into a struct ifreq.  Avoid this by just caching the argv
pointer value until the later lookup/strcpy.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 ip/ip6tunnel.c |  6 +++---
 ip/iptunnel.c  | 22 +++++++++-------------
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index b4a7def144226..c12d700e74189 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -136,7 +136,7 @@ static void print_tunnel(struct ip6_tnl_parm2 *p)
 static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 {
 	int count = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "mode") == 0) {
@@ -180,7 +180,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 			memcpy(&p->laddr, &laddr.data, sizeof(p->laddr));
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ - 1);
+			medium = *argv;
 		} else if (strcmp(*argv, "encaplimit") == 0) {
 			NEXT_ARG();
 			if (strcmp(*argv, "none") == 0) {
@@ -285,7 +285,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 		count++;
 		argc--; argv++;
 	}
-	if (medium[0]) {
+	if (medium) {
 		p->link = ll_name_to_index(medium);
 		if (p->link == 0) {
 			fprintf(stderr, "Cannot find device \"%s\"\n", medium);
diff --git a/ip/iptunnel.c b/ip/iptunnel.c
index 105d0f5576f1a..0acfd0793d3cd 100644
--- a/ip/iptunnel.c
+++ b/ip/iptunnel.c
@@ -60,7 +60,7 @@ static void set_tunnel_proto(struct ip_tunnel_parm *p, int proto)
 static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 {
 	int count = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 	int isatap = 0;
 
 	memset(p, 0, sizeof(*p));
@@ -139,7 +139,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 				p->iph.saddr = htonl(INADDR_ANY);
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ - 1);
+			medium = *argv;
 		} else if (strcmp(*argv, "ttl") == 0 ||
 			   strcmp(*argv, "hoplimit") == 0 ||
 			   strcmp(*argv, "hlim") == 0) {
@@ -216,7 +216,7 @@ static int parse_args(int argc, char **argv, int cmd, struct ip_tunnel_parm *p)
 		}
 	}
 
-	if (medium[0]) {
+	if (medium) {
 		p->link = ll_name_to_index(medium);
 		if (p->link == 0) {
 			fprintf(stderr, "Cannot find device \"%s\"\n", medium);
@@ -465,9 +465,8 @@ static int do_prl(int argc, char **argv)
 {
 	struct ip_tunnel_prl p = {};
 	int count = 0;
-	int devname = 0;
 	int cmd = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "prl-default") == 0) {
@@ -488,8 +487,7 @@ static int do_prl(int argc, char **argv)
 			count++;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ-1);
-			devname++;
+			medium = *argv;
 		} else {
 			fprintf(stderr,
 				"Invalid PRL parameter \"%s\"\n", *argv);
@@ -502,7 +500,7 @@ static int do_prl(int argc, char **argv)
 		}
 		argc--; argv++;
 	}
-	if (devname == 0) {
+	if (!medium) {
 		fprintf(stderr, "Must specify device\n");
 		exit(-1);
 	}
@@ -513,9 +511,8 @@ static int do_prl(int argc, char **argv)
 static int do_6rd(int argc, char **argv)
 {
 	struct ip_tunnel_6rd ip6rd = {};
-	int devname = 0;
 	int cmd = 0;
-	char medium[IFNAMSIZ] = {};
+	const char *medium = NULL;
 	inet_prefix prefix;
 
 	while (argc > 0) {
@@ -537,8 +534,7 @@ static int do_6rd(int argc, char **argv)
 			cmd = SIOCDEL6RD;
 		} else if (strcmp(*argv, "dev") == 0) {
 			NEXT_ARG();
-			strncpy(medium, *argv, IFNAMSIZ-1);
-			devname++;
+			medium = *argv;
 		} else {
 			fprintf(stderr,
 				"Invalid 6RD parameter \"%s\"\n", *argv);
@@ -546,7 +542,7 @@ static int do_6rd(int argc, char **argv)
 		}
 		argc--; argv++;
 	}
-	if (devname == 0) {
+	if (!medium) {
 		fprintf(stderr, "Must specify device\n");
 		exit(-1);
 	}
-- 
2.13.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox