Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v3 2/5] net: dsa: b53: Make SRAB driver manage port interrupts
From: Andrew Lunn @ 2018-09-05 22:36 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, vivien.didelot, davem
In-Reply-To: <20180905194215.29301-3-f.fainelli@gmail.com>

On Wed, Sep 05, 2018 at 12:42:12PM -0700, Florian Fainelli wrote:
> Update the SRAB driver to manage per-port interrupts. Since we cannot
> sleep during b53_io_ops, schedule a workqueue whenever we get a port
> specific interrupt. We will later make use of this to call back into
> PHYLINK when there is e.g: a link state change.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next v3 3/5] net: dsa: b53: Add helper to set link parameters
From: Andrew Lunn @ 2018-09-05 22:39 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, vivien.didelot, davem
In-Reply-To: <20180905194215.29301-4-f.fainelli@gmail.com>

On Wed, Sep 05, 2018 at 12:42:13PM -0700, Florian Fainelli wrote:
> Extract the logic from b53_adjust_link() responsible for overriding a
> given port's link, speed, duplex and pause settings and make two helper
> functions to set the port's configuration and the port's link settings.
> We will make use of both, as separate functions while adding PHYLINK
> support next.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH] iwlwifi: Use kmemdup instead of duplicating it in iwl_parse_nvm_mcc_info
From: YueHaibing @ 2018-09-06  3:13 UTC (permalink / raw)
  To: davem, johannes.berg, emmanuel.grumbach, luciano.coelho,
	linuxwifi, kvalo
  Cc: linux-kernel, netdev, linux-wireless, YueHaibing

Replace calls to kmalloc followed by a memcpy with a direct call to
kmemdup.

Patch found using coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index 27db4a3..bd91657 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -1113,14 +1113,12 @@ iwl_parse_nvm_mcc_info(struct device *dev, const struct iwl_cfg *cfg,
 	regd_to_copy = sizeof(struct ieee80211_regdomain) +
 		valid_rules * sizeof(struct ieee80211_reg_rule);
 
-	copy_rd = kzalloc(regd_to_copy, GFP_KERNEL);
+	copy_rd = kmemdup(reg, regd_to_copy, GFP_KERNEL);
 	if (!copy_rd) {
 		copy_rd = ERR_PTR(-ENOMEM);
 		goto out;
 	}
 
-	memcpy(copy_rd, regd, regd_to_copy);
-
 out:
 	kfree(regdb_ptrs);
 	kfree(regd);
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next v3 4/5] net: dsa: b53: Add PHYLINK support
From: Andrew Lunn @ 2018-09-05 22:42 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, vivien.didelot, davem
In-Reply-To: <20180905194215.29301-5-f.fainelli@gmail.com>

On Wed, Sep 05, 2018 at 12:42:14PM -0700, Florian Fainelli wrote:
> Add support for PHYLINK, things are reasonably straight forward since we
> do not yet support SerDes interfaces, that leaves us with just
> MLO_AN_PHY and MLO_AN_FIXED to deal with.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Cut out
From: Jesse Kurtz @ 2018-09-05 11:24 UTC (permalink / raw)
  To: netdev

Just want to check with you to see if you have any photos for editing?

We are providing below services to you.
Clipping path for the images.
Cut out for the photos
Masking for the images
All kinds of retouching for the beauty and model.

We provide testing for your photos if you need.

Thanks,
Jesse

^ permalink raw reply

* Re: KASAN: slab-out-of-bounds Read in _decode_session6
From: syzbot @ 2018-09-06  3:17 UTC (permalink / raw)
  To: ast, daniel, davem, dvyukov, herbert, kuznet, linux-kernel,
	netdev, steffen.klassert, syzkaller-bugs, yoshfuji
In-Reply-To: <0000000000003658b00574dc08cc@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=164938d1400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7e83258d6e0156
dashboard link: https://syzkaller.appspot.com/bug?extid=acffccec848dc13fe459
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=115f172e400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16399be1400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+acffccec848dc13fe459@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KASAN: slab-out-of-bounds in _decode_session6+0x1331/0x14e0  
net/ipv6/xfrm6_policy.c:161
Read of size 1 at addr ffff8801d4a67f07 by task syz-executor092/4673

CPU: 1 PID: 4673 Comm: syz-executor092 Not tainted 4.19.0-rc2+ #223
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161
  __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299
  xfrm_decode_session include/net/xfrm.h:1232 [inline]
  vti6_tnl_xmit+0x3fc/0x1bb1 net/ipv6/ip6_vti.c:542
  __netdev_start_xmit include/linux/netdevice.h:4287 [inline]
  netdev_start_xmit include/linux/netdevice.h:4296 [inline]
  xmit_one net/core/dev.c:3216 [inline]
  dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3232
  __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3802
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3835
  __bpf_tx_skb net/core/filter.c:2012 [inline]
  __bpf_redirect_common net/core/filter.c:2050 [inline]
  __bpf_redirect+0x5b7/0xae0 net/core/filter.c:2057
  ____bpf_clone_redirect net/core/filter.c:2090 [inline]
  bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2062
  bpf_prog_c39d1ba309a769f7+0xe9e/0x1000

Allocated by task 4673:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  __do_kmalloc_node mm/slab.c:3682 [inline]
  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3696
  __kmalloc_reserve.isra.41+0x3a/0xe0 net/core/skbuff.c:137
  pskb_expand_head+0x230/0x10e0 net/core/skbuff.c:1463
  skb_ensure_writable+0x3dd/0x640 net/core/skbuff.c:5129
  __bpf_try_make_writable net/core/filter.c:1633 [inline]
  bpf_try_make_writable net/core/filter.c:1639 [inline]
  bpf_try_make_head_writable net/core/filter.c:1647 [inline]
  ____bpf_clone_redirect net/core/filter.c:2084 [inline]
  bpf_clone_redirect+0x14a/0x490 net/core/filter.c:2062
  bpf_prog_c39d1ba309a769f7+0xe9e/0x1000

Freed by task 3286:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x210 mm/slab.c:3813
  load_elf_binary+0x2569/0x5610 fs/binfmt_elf.c:1118
  search_binary_handler+0x17d/0x570 fs/exec.c:1653
  exec_binprm fs/exec.c:1695 [inline]
  __do_execve_file.isra.35+0x15ff/0x2460 fs/exec.c:1819
  do_execveat_common fs/exec.c:1866 [inline]
  do_execve fs/exec.c:1883 [inline]
  __do_sys_execve fs/exec.c:1964 [inline]
  __se_sys_execve fs/exec.c:1959 [inline]
  __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801d4a67d00
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 7 bytes to the right of
  512-byte region [ffff8801d4a67d00, ffff8801d4a67f00)
The buggy address belongs to the page:
page:ffffea00075299c0 count:1 mapcount:0 mapping:ffff8801dac00940 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0007529988 ffffea0007529a48 ffff8801dac00940
raw: 0000000000000000 ffff8801d4a67080 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801d4a67e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8801d4a67e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff8801d4a67f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                    ^
  ffff8801d4a67f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801d4a68000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

^ permalink raw reply

* Re: [PATCH] iwlwifi: Use kmemdup instead of duplicating it in iwl_parse_nvm_mcc_info
From: YueHaibing @ 2018-09-06  3:17 UTC (permalink / raw)
  To: davem, johannes.berg, emmanuel.grumbach, luciano.coelho,
	linuxwifi, kvalo
  Cc: linux-kernel, netdev, linux-wireless
In-Reply-To: <20180906031322.17748-1-yuehaibing@huawei.com>

sorry ,there is a typo. Pls ignore this.

On 2018/9/6 11:13, YueHaibing wrote:
> Replace calls to kmalloc followed by a memcpy with a direct call to
> kmemdup.
> 
> Patch found using coccinelle.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
> index 27db4a3..bd91657 100644
> --- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
> +++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
> @@ -1113,14 +1113,12 @@ iwl_parse_nvm_mcc_info(struct device *dev, const struct iwl_cfg *cfg,
>  	regd_to_copy = sizeof(struct ieee80211_regdomain) +
>  		valid_rules * sizeof(struct ieee80211_reg_rule);
>  
> -	copy_rd = kzalloc(regd_to_copy, GFP_KERNEL);
> +	copy_rd = kmemdup(reg, regd_to_copy, GFP_KERNEL);
>  	if (!copy_rd) {
>  		copy_rd = ERR_PTR(-ENOMEM);
>  		goto out;
>  	}
>  
> -	memcpy(copy_rd, regd, regd_to_copy);
> -
>  out:
>  	kfree(regdb_ptrs);
>  	kfree(regd);
> 

^ permalink raw reply

* [PATCH v2] iwlwifi: Use kmemdup instead of duplicating it in iwl_parse_nvm_mcc_info
From: YueHaibing @ 2018-09-06  3:20 UTC (permalink / raw)
  To: davem, johannes.berg, emmanuel.grumbach, luciano.coelho,
	linuxwifi, kvalo
  Cc: linux-kernel, netdev, linux-wireless, YueHaibing

Replace calls to kzalloc followed by a memcpy with a direct call to
kmemdup.

Patch found using coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
v2: fix a typo 'reg' --> 'regd'
---
 drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index 27db4a3..bd91657 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -1113,14 +1113,12 @@ iwl_parse_nvm_mcc_info(struct device *dev, const struct iwl_cfg *cfg,
 	regd_to_copy = sizeof(struct ieee80211_regdomain) +
 		valid_rules * sizeof(struct ieee80211_reg_rule);
 
-	copy_rd = kzalloc(regd_to_copy, GFP_KERNEL);
+	copy_rd = kmemdup(regd, regd_to_copy, GFP_KERNEL);
 	if (!copy_rd) {
 		copy_rd = ERR_PTR(-ENOMEM);
 		goto out;
 	}
 
-	memcpy(copy_rd, regd, regd_to_copy);
-
 out:
 	kfree(regdb_ptrs);
 	kfree(regd);
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH] cxgb4: fix abort_req_rss6 struct
From: Jason Gunthorpe @ 2018-09-05 22:55 UTC (permalink / raw)
  To: Steve Wise; +Cc: netdev, dledford, davem, linux-rdma
In-Reply-To: <1e9f55943699dcc2bc921000ee7ee5353cbf7480.1535742195.git.swise@opengridcomputing.com>

On Fri, Aug 31, 2018 at 11:52:00AM -0700, Steve Wise wrote:
> Remove the incorrect WR_HDR field which can cause a misinterpretation
> of this CPL by ULDs.

What does that mean?

Is this an -rc patch?

Jason

^ permalink raw reply

* [PATCH net-next] ieee802154: ca8210: Use kmemdup instead of duplicating it in ca8210_test_int_driver_write
From: YueHaibing @ 2018-09-06  3:42 UTC (permalink / raw)
  To: davem, h.morris, alex.aring, stefan
  Cc: linux-kernel, netdev, linux-wpan, YueHaibing

Replace calls to kmalloc followed by a memcpy with a direct call tokmemdup.

Patch found using coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ieee802154/ca8210.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 58299fb..e21279d 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -634,10 +634,9 @@ static int ca8210_test_int_driver_write(
 	for (i = 0; i < len; i++)
 		dev_dbg(&priv->spi->dev, "%#03x\n", buf[i]);
 
-	fifo_buffer = kmalloc(len, GFP_KERNEL);
+	fifo_buffer = kmemdup(buf, len, GFP_KERNEL);
 	if (!fifo_buffer)
 		return -ENOMEM;
-	memcpy(fifo_buffer, buf, len);
 	kfifo_in(&test->up_fifo, &fifo_buffer, 4);
 	wake_up_interruptible(&priv->test.readq);
 
-- 
2.7.0

^ permalink raw reply related

* [PATCH net-next 00/11] Vhost_net TX batching
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang

Hi all:

This series tries to batch submitting packets to underlayer socket
through msg_control during sendmsg(). This is done by:

1) Doing userspace copy inside vhost_net
2) Build XDP buff
3) Batch at most 64 (VHOST_NET_BATCH) XDP buffs and submit them once
   through msg_control during sendmsg().
4) Underlayer sockets can use XDP buffs directly when XDP is enalbed,
   or build skb based on XDP buff.

For the packet that can not be built easily with XDP or for the case
that batch submission is hard (e.g sndbuf is limited). We will go for
the previous slow path, passing iov iterator to underlayer socket
through sendmsg() once per packet.

This can help to improve cache utilization and avoid lots of indirect
calls with sendmsg(). It can also co-operate with the batching support
of the underlayer sockets (e.g the case of XDP redirection through
maps).

Testpmd(txonly) in guest shows obvious improvements:

Test                /+pps%
XDP_DROP on TAP     /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb)       /+26%

Netperf TCP_STREAM TX from guest shows obvious improvements on small
packet:

    size/session/+thu%/+normalize%
       64/     1/   +2%/    0%
       64/     2/   +3%/   +1%
       64/     4/   +7%/   +5%
       64/     8/   +8%/   +6%
      256/     1/   +3%/    0%
      256/     2/  +10%/   +7%
      256/     4/  +26%/  +22%
      256/     8/  +27%/  +23%
      512/     1/   +3%/   +2%
      512/     2/  +19%/  +14%
      512/     4/  +43%/  +40%
      512/     8/  +45%/  +41%
     1024/     1/   +4%/    0%
     1024/     2/  +27%/  +21%
     1024/     4/  +38%/  +73%
     1024/     8/  +15%/  +24%
     2048/     1/  +10%/   +7%
     2048/     2/  +16%/  +12%
     2048/     4/    0%/   +2%
     2048/     8/    0%/   +2%
     4096/     1/  +36%/  +60%
     4096/     2/  -11%/  -26%
     4096/     4/    0%/  +14%
     4096/     8/    0%/   +4%
    16384/     1/   -1%/   +5%
    16384/     2/    0%/   +2%
    16384/     4/    0%/   -3%
    16384/     8/    0%/   +4%
    65535/     1/    0%/  +10%
    65535/     2/    0%/   +8%
    65535/     4/    0%/   +1%
    65535/     8/    0%/   +3%

Please review.

Thanks

Jason Wang (11):
  net: sock: introduce SOCK_XDP
  tuntap: switch to use XDP_PACKET_HEADROOM
  tuntap: enable bh early during processing XDP
  tuntap: simplify error handling in tun_build_skb()
  tuntap: tweak on the path of non-xdp case in tun_build_skb()
  tuntap: split out XDP logic
  tuntap: move XDP flushing out of tun_do_xdp()
  tun: switch to new type of msg_control
  tuntap: accept an array of XDP buffs through sendmsg()
  tap: accept an array of XDP buffs through sendmsg()
  vhost_net: batch submitting XDP buffers to underlayer sockets

 drivers/net/tap.c      |  87 +++++++++++++-
 drivers/net/tun.c      | 251 +++++++++++++++++++++++++++++++----------
 drivers/vhost/net.c    | 171 +++++++++++++++++++++++++---
 include/linux/if_tun.h |   7 ++
 include/net/sock.h     |   1 +
 5 files changed, 437 insertions(+), 80 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH net-next 02/11] tuntap: switch to use XDP_PACKET_HEADROOM
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2c548bd20393..d3677a544b56 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -113,7 +113,6 @@ do {								\
 } while (0)
 #endif
 
-#define TUN_HEADROOM 256
 #define TUN_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
 
 /* TUN device flags */
@@ -1654,7 +1653,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(tun->xdp_prog);
 	if (xdp_prog)
-		pad += TUN_HEADROOM;
+		pad += XDP_PACKET_HEADROOM;
 	buflen += SKB_DATA_ALIGN(len + pad);
 	rcu_read_unlock();
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 04/11] tuntap: simplify error handling in tun_build_skb()
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

There's no need to duplicate page get logic in each action. So this
patch tries to get page and calculate the offset before processing XDP
actions, and undo them when meet errors (we don't care the performance
on errors). This will be used for factoring out XDP logic.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 372caf7d67d9..f8cdcfa392c3 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1642,7 +1642,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 				     int len, int *skb_xdp)
 {
 	struct page_frag *alloc_frag = &current->task_frag;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	struct bpf_prog *xdp_prog;
 	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	unsigned int delta = 0;
@@ -1668,6 +1668,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	if (copied != len)
 		return ERR_PTR(-EFAULT);
 
+	get_page(alloc_frag->page);
+	alloc_frag->offset += buflen;
+
 	/* There's a small window that XDP may be set after the check
 	 * of xdp_prog above, this should be rare and for simplicity
 	 * we do XDP on skb in case the headroom is not enough.
@@ -1695,23 +1698,15 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 
 		switch (act) {
 		case XDP_REDIRECT:
-			get_page(alloc_frag->page);
-			alloc_frag->offset += buflen;
 			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
 			xdp_do_flush_map();
 			if (err)
-				goto err_redirect;
-			rcu_read_unlock();
-			local_bh_enable();
-			return NULL;
+				goto err_xdp;
+			goto out;
 		case XDP_TX:
-			get_page(alloc_frag->page);
-			alloc_frag->offset += buflen;
 			if (tun_xdp_tx(tun->dev, &xdp) < 0)
-				goto err_redirect;
-			rcu_read_unlock();
-			local_bh_enable();
-			return NULL;
+				goto err_xdp;
+			goto out;
 		case XDP_PASS:
 			delta = orig_data - xdp.data;
 			len = xdp.data_end - xdp.data;
@@ -1730,23 +1725,23 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	local_bh_enable();
 
 	skb = build_skb(buf, buflen);
-	if (!skb)
-		return ERR_PTR(-ENOMEM);
+	if (!skb) {
+		skb = ERR_PTR(-ENOMEM);
+		goto out;
+	}
 
 	skb_reserve(skb, pad - delta);
 	skb_put(skb, len);
-	get_page(alloc_frag->page);
-	alloc_frag->offset += buflen;
 
 	return skb;
 
-err_redirect:
-	put_page(alloc_frag->page);
 err_xdp:
+	alloc_frag->offset -= buflen;
+	put_page(alloc_frag->page);
+out:
 	rcu_read_unlock();
 	local_bh_enable();
-	this_cpu_inc(tun->pcpu_stats->rx_dropped);
-	return NULL;
+	return skb;
 }
 
 /* Get packet from user space buffer */
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 05/11] tuntap: tweak on the path of non-xdp case in tun_build_skb()
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

If we're sure not to go native XDP, there's no need for several things
like bh and rcu stuffs.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f8cdcfa392c3..389aa0727cc6 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1675,10 +1675,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	 * of xdp_prog above, this should be rare and for simplicity
 	 * we do XDP on skb in case the headroom is not enough.
 	 */
-	if (hdr->gso_type || !xdp_prog)
+	if (hdr->gso_type || !xdp_prog) {
 		*skb_xdp = 1;
-	else
-		*skb_xdp = 0;
+		goto build;
+	}
+
+	*skb_xdp = 0;
 
 	local_bh_disable();
 	rcu_read_lock();
@@ -1724,6 +1726,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	rcu_read_unlock();
 	local_bh_enable();
 
+build:
 	skb = build_skb(buf, buflen);
 	if (!skb) {
 		skb = ERR_PTR(-ENOMEM);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 06/11] tuntap: split out XDP logic
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

This patch split out XDP logic into a single function. This make it to
be reused by XDP batching path in the following patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 84 ++++++++++++++++++++++++++++-------------------
 1 file changed, 51 insertions(+), 33 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 389aa0727cc6..21b125020b3b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1635,6 +1635,44 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
 	return true;
 }
 
+static u32 tun_do_xdp(struct tun_struct *tun,
+		      struct tun_file *tfile,
+		      struct bpf_prog *xdp_prog,
+		      struct xdp_buff *xdp,
+		      int *err)
+{
+	u32 act = bpf_prog_run_xdp(xdp_prog, xdp);
+
+	switch (act) {
+	case XDP_REDIRECT:
+		*err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
+		xdp_do_flush_map();
+		if (*err)
+			break;
+		goto out;
+	case XDP_TX:
+		*err = tun_xdp_tx(tun->dev, xdp);
+		if (*err < 0)
+			break;
+		*err = 0;
+		goto out;
+	case XDP_PASS:
+		goto out;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		/* fall through */
+	case XDP_ABORTED:
+		trace_xdp_exception(tun->dev, xdp_prog, act);
+		/* fall through */
+	case XDP_DROP:
+		break;
+	}
+
+	put_page(virt_to_head_page(xdp->data_hard_start));
+out:
+	return act;
+}
+
 static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 				     struct tun_file *tfile,
 				     struct iov_iter *from,
@@ -1645,10 +1683,10 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	struct sk_buff *skb = NULL;
 	struct bpf_prog *xdp_prog;
 	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	unsigned int delta = 0;
 	char *buf;
 	size_t copied;
-	int err, pad = TUN_RX_PAD;
+	int pad = TUN_RX_PAD;
+	int err = 0;
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(tun->xdp_prog);
@@ -1685,9 +1723,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	local_bh_disable();
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(tun->xdp_prog);
-	if (xdp_prog && !*skb_xdp) {
+	if (xdp_prog) {
 		struct xdp_buff xdp;
-		void *orig_data;
 		u32 act;
 
 		xdp.data_hard_start = buf;
@@ -1695,33 +1732,14 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &tfile->xdp_rxq;
-		orig_data = xdp.data;
-		act = bpf_prog_run_xdp(xdp_prog, &xdp);
-
-		switch (act) {
-		case XDP_REDIRECT:
-			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
-			xdp_do_flush_map();
-			if (err)
-				goto err_xdp;
-			goto out;
-		case XDP_TX:
-			if (tun_xdp_tx(tun->dev, &xdp) < 0)
-				goto err_xdp;
-			goto out;
-		case XDP_PASS:
-			delta = orig_data - xdp.data;
-			len = xdp.data_end - xdp.data;
-			break;
-		default:
-			bpf_warn_invalid_xdp_action(act);
-			/* fall through */
-		case XDP_ABORTED:
-			trace_xdp_exception(tun->dev, xdp_prog, act);
-			/* fall through */
-		case XDP_DROP:
+		act = tun_do_xdp(tun, tfile, xdp_prog, &xdp, &err);
+		if (err)
 			goto err_xdp;
-		}
+		if (act != XDP_PASS)
+			goto out;
+
+		pad = xdp.data - xdp.data_hard_start;
+		len = xdp.data_end - xdp.data;
 	}
 	rcu_read_unlock();
 	local_bh_enable();
@@ -1729,18 +1747,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 build:
 	skb = build_skb(buf, buflen);
 	if (!skb) {
+		put_page(alloc_frag->page);
 		skb = ERR_PTR(-ENOMEM);
 		goto out;
 	}
 
-	skb_reserve(skb, pad - delta);
+	skb_reserve(skb, pad);
 	skb_put(skb, len);
 
 	return skb;
 
 err_xdp:
-	alloc_frag->offset -= buflen;
-	put_page(alloc_frag->page);
+	this_cpu_inc(tun->pcpu_stats->rx_dropped);
 out:
 	rcu_read_unlock();
 	local_bh_enable();
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 07/11] tuntap: move XDP flushing out of tun_do_xdp()
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

This will allow adding batch flushing on top.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 21b125020b3b..ff1cbf3ebd50 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1646,7 +1646,6 @@ static u32 tun_do_xdp(struct tun_struct *tun,
 	switch (act) {
 	case XDP_REDIRECT:
 		*err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
-		xdp_do_flush_map();
 		if (*err)
 			break;
 		goto out;
@@ -1735,6 +1734,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		act = tun_do_xdp(tun, tfile, xdp_prog, &xdp, &err);
 		if (err)
 			goto err_xdp;
+
+		if (act == XDP_REDIRECT)
+			xdp_do_flush_map();
 		if (act != XDP_PASS)
 			goto out;
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 08/11] tun: switch to new type of msg_control
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

This patch introduces to a new tun/tap specific msg_control:

#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR  2
struct tun_msg_ctl {
       int type;
       void *ptr;
};

This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tap.c      | 18 ++++++++++++------
 drivers/net/tun.c      |  6 +++++-
 drivers/vhost/net.c    |  7 +++++--
 include/linux/if_tun.h |  7 +++++++
 4 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index f0f7cd977667..7996ed7cbf18 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -619,7 +619,7 @@ static inline struct sk_buff *tap_alloc_skb(struct sock *sk, size_t prepad,
 #define TAP_RESERVE HH_DATA_OFF(ETH_HLEN)
 
 /* Get packet from user space buffer */
-static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
+static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 			    struct iov_iter *from, int noblock)
 {
 	int good_linear = SKB_MAX_HEAD(TAP_RESERVE);
@@ -663,7 +663,7 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
 	if (unlikely(len < ETH_HLEN))
 		goto err;
 
-	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
+	if (msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
 		struct iov_iter i;
 
 		copylen = vnet_hdr.hdr_len ?
@@ -724,11 +724,11 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
 	tap = rcu_dereference(q->tap);
 	/* copy skb_ubuf_info for callback when skb has no error */
 	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = m->msg_control;
+		skb_shinfo(skb)->destructor_arg = msg_control;
 		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
-	} else if (m && m->msg_control) {
-		struct ubuf_info *uarg = m->msg_control;
+	} else if (msg_control) {
+		struct ubuf_info *uarg = msg_control;
 		uarg->callback(uarg, false);
 	}
 
@@ -1150,7 +1150,13 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
 		       size_t total_len)
 {
 	struct tap_queue *q = container_of(sock, struct tap_queue, sock);
-	return tap_get_user(q, m, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
+	struct tun_msg_ctl *ctl = m->msg_control;
+
+	if (ctl && ctl->type != TUN_MSG_UBUF)
+		return -EINVAL;
+
+	return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
+			    m->msg_flags & MSG_DONTWAIT);
 }
 
 static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ff1cbf3ebd50..c839a4bdcbd9 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2429,11 +2429,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	int ret;
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = tun_get(tfile);
+	struct tun_msg_ctl *ctl = m->msg_control;
 
 	if (!tun)
 		return -EBADFD;
 
-	ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
+	if (ctl && ctl->type != TUN_MSG_UBUF)
+		return -EINVAL;
+
+	ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
 	tun_put(tun);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4e656f89cb22..fb01ce6d981c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -620,6 +620,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		.msg_controllen = 0,
 		.msg_flags = MSG_DONTWAIT,
 	};
+	struct tun_msg_ctl ctl;
 	size_t len, total_len = 0;
 	int err;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -664,8 +665,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
 			refcount_set(&ubuf->refcnt, 1);
-			msg.msg_control = ubuf;
-			msg.msg_controllen = sizeof(ubuf);
+			msg.msg_control = &ctl;
+			ctl.type = TUN_MSG_UBUF;
+			ctl.ptr = ubuf;
+			msg.msg_controllen = sizeof(ctl);
 			ubufs = nvq->ubufs;
 			atomic_inc(&ubufs->refcount);
 			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 3d2996dc7d85..ba46dced1f38 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -19,6 +19,13 @@
 
 #define TUN_XDP_FLAG 0x1UL
 
+#define TUN_MSG_UBUF 1
+#define TUN_MSG_PTR  2
+struct tun_msg_ctl {
+	int type;
+	void *ptr;
+};
+
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
 struct ptr_ring *tun_get_tx_ring(struct file *file);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 09/11] tuntap: accept an array of XDP buffs through sendmsg()
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. If an XDP program is attached, tuntap can run
XDP program directly. If not, tuntap will build skb and do a fast
receiving since part of the work has been done by vhost_net.

This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 103 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 100 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c839a4bdcbd9..069db2e5dd08 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2424,22 +2424,119 @@ static void tun_sock_write_space(struct sock *sk)
 	kill_fasync(&tfile->fasync, SIGIO, POLL_OUT);
 }
 
+static int tun_xdp_one(struct tun_struct *tun,
+		       struct tun_file *tfile,
+		       struct xdp_buff *xdp, int *flush)
+{
+	struct virtio_net_hdr *gso = xdp->data_hard_start + sizeof(int);
+	struct tun_pcpu_stats *stats;
+	struct bpf_prog *xdp_prog;
+	struct sk_buff *skb = NULL;
+	u32 rxhash = 0, act;
+	int buflen = *(int *)xdp->data_hard_start;
+	int err = 0;
+	bool skb_xdp = false;
+
+	xdp_prog = rcu_dereference(tun->xdp_prog);
+	if (xdp_prog) {
+		if (gso->gso_type) {
+			skb_xdp = true;
+			goto build;
+		}
+		xdp_set_data_meta_invalid(xdp);
+		xdp->rxq = &tfile->xdp_rxq;
+		act = tun_do_xdp(tun, tfile, xdp_prog, xdp, &err);
+		if (err)
+			goto out;
+		if (act == XDP_REDIRECT)
+			*flush = true;
+		if (act != XDP_PASS)
+			goto out;
+	}
+
+build:
+	skb = build_skb(xdp->data_hard_start, buflen);
+	if (!skb) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
+	skb_put(skb, xdp->data_end - xdp->data);
+
+	if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
+		this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
+		kfree_skb(skb);
+		err = -EINVAL;
+		goto out;
+	}
+
+	skb->protocol = eth_type_trans(skb, tun->dev);
+	skb_reset_network_header(skb);
+	skb_probe_transport_header(skb, 0);
+
+	if (skb_xdp) {
+		err = do_xdp_generic(xdp_prog, skb);
+		if (err != XDP_PASS)
+			goto out;
+	}
+
+	if (!rcu_dereference(tun->steering_prog))
+		rxhash = __skb_get_hash_symmetric(skb);
+
+	netif_receive_skb(skb);
+
+	stats = get_cpu_ptr(tun->pcpu_stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_packets++;
+	stats->rx_bytes += skb->len;
+	u64_stats_update_end(&stats->syncp);
+	put_cpu_ptr(stats);
+
+	if (rxhash)
+		tun_flow_update(tun, rxhash, tfile);
+
+out:
+	return err;
+}
+
 static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
-	int ret;
+	int ret, i;
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = tun_get(tfile);
 	struct tun_msg_ctl *ctl = m->msg_control;
+	struct xdp_buff *xdp;
 
 	if (!tun)
 		return -EBADFD;
 
-	if (ctl && ctl->type != TUN_MSG_UBUF)
-		return -EINVAL;
+	if (ctl && ((ctl->type & 0xF) == TUN_MSG_PTR)) {
+		int n = ctl->type >> 16;
+		int flush = 0;
+
+		local_bh_disable();
+		rcu_read_lock();
+
+		for (i = 0; i < n; i++) {
+			xdp = &((struct xdp_buff *)ctl->ptr)[i];
+			tun_xdp_one(tun, tfile, xdp, &flush);
+		}
+
+		if (flush)
+			xdp_do_flush_map();
+
+		rcu_read_unlock();
+		local_bh_enable();
+
+		ret = total_len;
+		goto out;
+	}
 
 	ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
+out:
 	tun_put(tun);
 	return ret;
 }
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Jason Wang @ 2018-09-06  4:05 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kvm, virtualization, mst, jasowang
In-Reply-To: <20180906040526.22518-1-jasowang@redhat.com>

This patch implements XDP batching for vhost_net. The idea is first to
try to do userspace copy and build XDP buff directly in vhost. Instead
of submitting the packet immediately, vhost_net will batch them in an
array and submit every 64 (VHOST_NET_BATCH) packets to the under layer
sockets through msg_control of sendmsg().

When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a
loop without caring GUP thus it can do batch map flushing. When XDP is
not enabled or not supported, the underlayer socket need to build skb
and pass it to network core. The batched packet submission allows us
to do batching like netif_receive_skb_list() in the future.

This saves lots of indirect calls for better cache utilization. For
the case that we can't so batching e.g when sndbuf is limited or
packet size is too large, we will go for usual one packet per
sendmsg() way.

Doing testpmd on various setups gives us:

Test                /+pps%
XDP_DROP on TAP     /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb)       /+26%

Netperf tests shows obvious improvements for small packet transmission:

size/session/+thu%/+normalize%
   64/     1/   +2%/    0%
   64/     2/   +3%/   +1%
   64/     4/   +7%/   +5%
   64/     8/   +8%/   +6%
  256/     1/   +3%/    0%
  256/     2/  +10%/   +7%
  256/     4/  +26%/  +22%
  256/     8/  +27%/  +23%
  512/     1/   +3%/   +2%
  512/     2/  +19%/  +14%
  512/     4/  +43%/  +40%
  512/     8/  +45%/  +41%
 1024/     1/   +4%/    0%
 1024/     2/  +27%/  +21%
 1024/     4/  +38%/  +73%
 1024/     8/  +15%/  +24%
 2048/     1/  +10%/   +7%
 2048/     2/  +16%/  +12%
 2048/     4/    0%/   +2%
 2048/     8/    0%/   +2%
 4096/     1/  +36%/  +60%
 4096/     2/  -11%/  -26%
 4096/     4/    0%/  +14%
 4096/     8/    0%/   +4%
16384/     1/   -1%/   +5%
16384/     2/    0%/   +2%
16384/     4/    0%/   -3%
16384/     8/    0%/   +4%
65535/     1/    0%/  +10%
65535/     2/    0%/   +8%
65535/     4/    0%/   +1%
65535/     8/    0%/   +3%

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 164 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 151 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fb01ce6d981c..1dd4239cbff8 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -116,6 +116,7 @@ struct vhost_net_virtqueue {
 	 * For RX, number of batched heads
 	 */
 	int done_idx;
+	int batched_xdp;
 	/* an array of userspace buffers info */
 	struct ubuf_info *ubuf_info;
 	/* Reference counting for outstanding ubufs.
@@ -123,6 +124,7 @@ struct vhost_net_virtqueue {
 	struct vhost_net_ubuf_ref *ubufs;
 	struct ptr_ring *rx_ring;
 	struct vhost_net_buf rxq;
+	struct xdp_buff xdp[VHOST_NET_BATCH];
 };
 
 struct vhost_net {
@@ -338,6 +340,11 @@ static bool vhost_sock_zcopy(struct socket *sock)
 		sock_flag(sock->sk, SOCK_ZEROCOPY);
 }
 
+static bool vhost_sock_xdp(struct socket *sock)
+{
+	return sock_flag(sock->sk, SOCK_XDP);
+}
+
 /* In case of DMA done not in order in lower device driver for some reason.
  * upend_idx is used to track end of used idx, done_idx is used to track head
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -444,10 +451,36 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
 	nvq->done_idx = 0;
 }
 
+static void vhost_tx_batch(struct vhost_net *net,
+			   struct vhost_net_virtqueue *nvq,
+			   struct socket *sock,
+			   struct msghdr *msghdr)
+{
+	struct tun_msg_ctl ctl = {
+		.type = nvq->batched_xdp << 16 | TUN_MSG_PTR,
+		.ptr = nvq->xdp,
+	};
+	int err;
+
+	if (nvq->batched_xdp == 0)
+		goto signal_used;
+
+	msghdr->msg_control = &ctl;
+	err = sock->ops->sendmsg(sock, msghdr, 0);
+	if (unlikely(err < 0)) {
+		vq_err(&nvq->vq, "Fail to batch sending packets\n");
+		return;
+	}
+
+signal_used:
+	vhost_net_signal_used(nvq);
+	nvq->batched_xdp = 0;
+}
+
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				    struct vhost_net_virtqueue *nvq,
 				    unsigned int *out_num, unsigned int *in_num,
-				    bool *busyloop_intr)
+				    struct msghdr *msghdr, bool *busyloop_intr)
 {
 	struct vhost_virtqueue *vq = &nvq->vq;
 	unsigned long uninitialized_var(endtime);
@@ -455,8 +488,9 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 				  out_num, in_num, NULL, NULL);
 
 	if (r == vq->num && vq->busyloop_timeout) {
+		/* Flush batched packets first */
 		if (!vhost_sock_zcopy(vq->private_data))
-			vhost_net_signal_used(nvq);
+			vhost_tx_batch(net, nvq, vq->private_data, msghdr);
 		preempt_disable();
 		endtime = busy_clock() + vq->busyloop_timeout;
 		while (vhost_can_busy_poll(endtime)) {
@@ -512,7 +546,7 @@ static int get_tx_bufs(struct vhost_net *net,
 	struct vhost_virtqueue *vq = &nvq->vq;
 	int ret;
 
-	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, busyloop_intr);
+	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr);
 
 	if (ret < 0 || ret == vq->num)
 		return ret;
@@ -540,6 +574,83 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
 	       !vhost_vq_avail_empty(vq->dev, vq);
 }
 
+#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
+
+static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
+			       struct iov_iter *from)
+{
+	struct vhost_virtqueue *vq = &nvq->vq;
+	struct socket *sock = vq->private_data;
+	struct page_frag *alloc_frag = &current->task_frag;
+	struct virtio_net_hdr *gso;
+	struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp];
+	size_t len = iov_iter_count(from);
+	int headroom = vhost_sock_xdp(sock) ? XDP_PACKET_HEADROOM : 0;
+	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + headroom + nvq->sock_hlen);
+	int sock_hlen = nvq->sock_hlen;
+	void *buf;
+	int copied;
+
+	if (unlikely(len < nvq->sock_hlen))
+		return -EFAULT;
+
+	if (SKB_DATA_ALIGN(len + pad) +
+	    SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
+		return -ENOSPC;
+
+	buflen += SKB_DATA_ALIGN(len + pad);
+	alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
+	if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
+		return -ENOMEM;
+
+	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
+
+	/* We store two kinds of metadata in the header which will be
+	 * used for XDP_PASS to do build_skb():
+	 * offset 0: buflen
+	 * offset sizeof(int): vnet header
+	 */
+	copied = copy_page_from_iter(alloc_frag->page,
+				     alloc_frag->offset + sizeof(int),
+				     sock_hlen, from);
+	if (copied != sock_hlen)
+		return -EFAULT;
+
+	gso = (struct virtio_net_hdr *)(buf + sizeof(int));
+
+	if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
+	    vhost16_to_cpu(vq, gso->csum_start) +
+	    vhost16_to_cpu(vq, gso->csum_offset) + 2 >
+	    vhost16_to_cpu(vq, gso->hdr_len)) {
+		gso->hdr_len = cpu_to_vhost16(vq,
+			       vhost16_to_cpu(vq, gso->csum_start) +
+			       vhost16_to_cpu(vq, gso->csum_offset) + 2);
+
+		if (vhost16_to_cpu(vq, gso->hdr_len) > len)
+			return -EINVAL;
+	}
+
+	len -= sock_hlen;
+	copied = copy_page_from_iter(alloc_frag->page,
+				     alloc_frag->offset + pad,
+				     len, from);
+	if (copied != len)
+		return -EFAULT;
+
+	xdp->data_hard_start = buf;
+	xdp->data = buf + pad;
+	xdp->data_end = xdp->data + len;
+	*(int *)(xdp->data_hard_start) = buflen;
+
+	get_page(alloc_frag->page);
+	alloc_frag->offset += buflen;
+
+	++nvq->batched_xdp;
+
+	return 0;
+}
+
 static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
@@ -556,10 +667,14 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 	size_t len, total_len = 0;
 	int err;
 	int sent_pkts = 0;
+	bool bulking = (sock->sk->sk_sndbuf == INT_MAX);
 
 	for (;;) {
 		bool busyloop_intr = false;
 
+		if (nvq->done_idx == VHOST_NET_BATCH)
+			vhost_tx_batch(net, nvq, sock, &msg);
+
 		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
 				   &busyloop_intr);
 		/* On error, stop handling until the next kick. */
@@ -577,14 +692,34 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			break;
 		}
 
-		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
-		vq->heads[nvq->done_idx].len = 0;
-
 		total_len += len;
-		if (tx_can_batch(vq, total_len))
-			msg.msg_flags |= MSG_MORE;
-		else
-			msg.msg_flags &= ~MSG_MORE;
+
+		/* For simplicity, TX batching is only enabled if
+		 * sndbuf is unlimited.
+		 */
+		if (bulking) {
+			err = vhost_net_build_xdp(nvq, &msg.msg_iter);
+			if (!err) {
+				goto done;
+			} else if (unlikely(err != -ENOSPC)) {
+				vhost_tx_batch(net, nvq, sock, &msg);
+				vhost_discard_vq_desc(vq, 1);
+				vhost_net_enable_vq(net, vq);
+				break;
+			}
+
+			/* We can't build XDP buff, go for single
+			 * packet path but let's flush batched
+			 * packets.
+			 */
+			vhost_tx_batch(net, nvq, sock, &msg);
+			msg.msg_control = NULL;
+		} else {
+			if (tx_can_batch(vq, total_len))
+				msg.msg_flags |= MSG_MORE;
+			else
+				msg.msg_flags &= ~MSG_MORE;
+		}
 
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
@@ -596,15 +731,17 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		if (err != len)
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
-		if (++nvq->done_idx >= VHOST_NET_BATCH)
-			vhost_net_signal_used(nvq);
+done:
+		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
+		vq->heads[nvq->done_idx].len = 0;
+		++nvq->done_idx;
 		if (vhost_exceeds_weight(++sent_pkts, total_len)) {
 			vhost_poll_queue(&vq->poll);
 			break;
 		}
 	}
 
-	vhost_net_signal_used(nvq);
+	vhost_tx_batch(net, nvq, sock, &msg);
 }
 
 static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
@@ -1111,6 +1248,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].ubuf_info = NULL;
 		n->vqs[i].upend_idx = 0;
 		n->vqs[i].done_idx = 0;
+		n->vqs[i].batched_xdp = 0;
 		n->vqs[i].vhost_hlen = 0;
 		n->vqs[i].sock_hlen = 0;
 		n->vqs[i].rx_ring = NULL;
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next 2/4] tools/bpf: move bpf/lib netlink related functions into a new file
From: Yonghong Song @ 2018-09-05 23:58 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180905235806.1536396-1-yhs@fb.com>

There are no functionality change for this patch.

In the subsequent patches, more netlink related library functions
will be added and a separate file is better than cluttering bpf.c.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/Build     |   2 +-
 tools/lib/bpf/bpf.c     | 129 -------------------------------
 tools/lib/bpf/netlink.c | 165 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 166 insertions(+), 130 deletions(-)
 create mode 100644 tools/lib/bpf/netlink.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 13a861135127..512b2c0ba0d2 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o netlink.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 60aa4ca8b2c5..3878a26a2071 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -28,16 +28,8 @@
 #include <linux/bpf.h>
 #include "bpf.h"
 #include "libbpf.h"
-#include "nlattr.h"
-#include <linux/rtnetlink.h>
-#include <linux/if_link.h>
-#include <sys/socket.h>
 #include <errno.h>
 
-#ifndef SOL_NETLINK
-#define SOL_NETLINK 270
-#endif
-
 /*
  * When building perf, unistd.h is overridden. __NR_bpf is
  * required to be defined explicitly.
@@ -499,127 +491,6 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
 	return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
-{
-	struct sockaddr_nl sa;
-	int sock, seq = 0, len, ret = -1;
-	char buf[4096];
-	struct nlattr *nla, *nla_xdp;
-	struct {
-		struct nlmsghdr  nh;
-		struct ifinfomsg ifinfo;
-		char             attrbuf[64];
-	} req;
-	struct nlmsghdr *nh;
-	struct nlmsgerr *err;
-	socklen_t addrlen;
-	int one = 1;
-
-	memset(&sa, 0, sizeof(sa));
-	sa.nl_family = AF_NETLINK;
-
-	sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
-	if (sock < 0) {
-		return -errno;
-	}
-
-	if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
-		       &one, sizeof(one)) < 0) {
-		fprintf(stderr, "Netlink error reporting not supported\n");
-	}
-
-	if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
-		ret = -errno;
-		goto cleanup;
-	}
-
-	addrlen = sizeof(sa);
-	if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) {
-		ret = -errno;
-		goto cleanup;
-	}
-
-	if (addrlen != sizeof(sa)) {
-		ret = -LIBBPF_ERRNO__INTERNAL;
-		goto cleanup;
-	}
-
-	memset(&req, 0, sizeof(req));
-	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-	req.nh.nlmsg_type = RTM_SETLINK;
-	req.nh.nlmsg_pid = 0;
-	req.nh.nlmsg_seq = ++seq;
-	req.ifinfo.ifi_family = AF_UNSPEC;
-	req.ifinfo.ifi_index = ifindex;
-
-	/* started nested attribute for XDP */
-	nla = (struct nlattr *)(((char *)&req)
-				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
-	nla->nla_len = NLA_HDRLEN;
-
-	/* add XDP fd */
-	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-	nla_xdp->nla_type = IFLA_XDP_FD;
-	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-	nla->nla_len += nla_xdp->nla_len;
-
-	/* if user passed in any flags, add those too */
-	if (flags) {
-		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-		nla_xdp->nla_type = IFLA_XDP_FLAGS;
-		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-		nla->nla_len += nla_xdp->nla_len;
-	}
-
-	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
-
-	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
-		ret = -errno;
-		goto cleanup;
-	}
-
-	len = recv(sock, buf, sizeof(buf), 0);
-	if (len < 0) {
-		ret = -errno;
-		goto cleanup;
-	}
-
-	for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
-	     nh = NLMSG_NEXT(nh, len)) {
-		if (nh->nlmsg_pid != sa.nl_pid) {
-			ret = -LIBBPF_ERRNO__WRNGPID;
-			goto cleanup;
-		}
-		if (nh->nlmsg_seq != seq) {
-			ret = -LIBBPF_ERRNO__INVSEQ;
-			goto cleanup;
-		}
-		switch (nh->nlmsg_type) {
-		case NLMSG_ERROR:
-			err = (struct nlmsgerr *)NLMSG_DATA(nh);
-			if (!err->error)
-				continue;
-			ret = err->error;
-			nla_dump_errormsg(nh);
-			goto cleanup;
-		case NLMSG_DONE:
-			break;
-		default:
-			break;
-		}
-	}
-
-	ret = 0;
-
-cleanup:
-	close(sock);
-	return ret;
-}
-
 int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
 		 bool do_log)
 {
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
new file mode 100644
index 000000000000..ccaa991fe9d8
--- /dev/null
+++ b/tools/lib/bpf/netlink.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: LGPL-2.1
+/* Copyright (c) 2018 Facebook */
+
+#include <stdlib.h>
+#include <memory.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include <linux/rtnetlink.h>
+#include <sys/socket.h>
+#include <errno.h>
+#include <time.h>
+
+#include "bpf.h"
+#include "libbpf.h"
+#include "nlattr.h"
+
+#ifndef SOL_NETLINK
+#define SOL_NETLINK 270
+#endif
+
+static int bpf_netlink_open(__u32 *nl_pid)
+{
+	struct sockaddr_nl sa;
+	socklen_t addrlen;
+	int one = 1, ret;
+	int sock;
+
+	memset(&sa, 0, sizeof(sa));
+	sa.nl_family = AF_NETLINK;
+
+	sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+	if (sock < 0)
+		return -errno;
+
+	if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
+		       &one, sizeof(one)) < 0) {
+		fprintf(stderr, "Netlink error reporting not supported\n");
+	}
+
+	if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
+		ret = -errno;
+		goto cleanup;
+	}
+
+	addrlen = sizeof(sa);
+	if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) {
+		ret = -errno;
+		goto cleanup;
+	}
+
+	if (addrlen != sizeof(sa)) {
+		ret = -LIBBPF_ERRNO__INTERNAL;
+		goto cleanup;
+	}
+
+	*nl_pid = sa.nl_pid;
+	return sock;
+
+cleanup:
+	close(sock);
+	return ret;
+}
+
+static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
+{
+	struct nlmsgerr *err;
+	struct nlmsghdr *nh;
+	char buf[4096];
+	int len, ret;
+
+	while (1) {
+		len = recv(sock, buf, sizeof(buf), 0);
+		if (len < 0) {
+			ret = -errno;
+			goto done;
+		}
+
+		for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
+		     nh = NLMSG_NEXT(nh, len)) {
+			if (nh->nlmsg_pid != nl_pid) {
+				ret = -LIBBPF_ERRNO__WRNGPID;
+				goto done;
+			}
+			if (nh->nlmsg_seq != seq) {
+				ret = -LIBBPF_ERRNO__INVSEQ;
+				goto done;
+			}
+			switch (nh->nlmsg_type) {
+			case NLMSG_ERROR:
+				err = (struct nlmsgerr *)NLMSG_DATA(nh);
+				if (!err->error)
+					continue;
+				ret = err->error;
+				nla_dump_errormsg(nh);
+				goto done;
+			case NLMSG_DONE:
+				return 0;
+			default:
+				break;
+			}
+		}
+	}
+	ret = 0;
+done:
+	return ret;
+}
+
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	int sock, seq = 0, ret;
+	struct nlattr *nla, *nla_xdp;
+	struct {
+		struct nlmsghdr  nh;
+		struct ifinfomsg ifinfo;
+		char             attrbuf[64];
+	} req;
+	__u32 nl_pid;
+
+	sock = bpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return sock;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+	req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_type = RTM_SETLINK;
+	req.nh.nlmsg_pid = 0;
+	req.nh.nlmsg_seq = ++seq;
+	req.ifinfo.ifi_family = AF_UNSPEC;
+	req.ifinfo.ifi_index = ifindex;
+
+	/* started nested attribute for XDP */
+	nla = (struct nlattr *)(((char *)&req)
+				+ NLMSG_ALIGN(req.nh.nlmsg_len));
+	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+	nla->nla_len = NLA_HDRLEN;
+
+	/* add XDP fd */
+	nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+	nla_xdp->nla_type = IFLA_XDP_FD;
+	nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
+	memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
+	nla->nla_len += nla_xdp->nla_len;
+
+	/* if user passed in any flags, add those too */
+	if (flags) {
+		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+		nla_xdp->nla_type = IFLA_XDP_FLAGS;
+		nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
+		memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
+		nla->nla_len += nla_xdp->nla_len;
+	}
+
+	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		ret = -errno;
+		goto cleanup;
+	}
+	ret = bpf_netlink_recv(sock, nl_pid, seq);
+
+cleanup:
+	close(sock);
+	return ret;
+}
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next 1/4] tools/bpf: sync kernel uapi header if_link.h to tools
From: Yonghong Song @ 2018-09-05 23:58 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180905235806.1536396-1-yhs@fb.com>

Among others, this header will be used later for
bpftool net support.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/if_link.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index cf01b6824244..43391e2d1153 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -164,6 +164,8 @@ enum {
 	IFLA_CARRIER_UP_COUNT,
 	IFLA_CARRIER_DOWN_COUNT,
 	IFLA_NEW_IFINDEX,
+	IFLA_MIN_MTU,
+	IFLA_MAX_MTU,
 	__IFLA_MAX
 };
 
@@ -334,6 +336,7 @@ enum {
 	IFLA_BRPORT_GROUP_FWD_MASK,
 	IFLA_BRPORT_NEIGH_SUPPRESS,
 	IFLA_BRPORT_ISOLATED,
+	IFLA_BRPORT_BACKUP_PORT,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
@@ -459,6 +462,16 @@ enum {
 
 #define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1)
 
+/* XFRM section */
+enum {
+	IFLA_XFRM_UNSPEC,
+	IFLA_XFRM_LINK,
+	IFLA_XFRM_IF_ID,
+	__IFLA_XFRM_MAX
+};
+
+#define IFLA_XFRM_MAX (__IFLA_XFRM_MAX - 1)
+
 enum macsec_validation_type {
 	MACSEC_VALIDATE_DISABLED = 0,
 	MACSEC_VALIDATE_CHECK = 1,
@@ -920,6 +933,7 @@ enum {
 	XDP_ATTACHED_DRV,
 	XDP_ATTACHED_SKB,
 	XDP_ATTACHED_HW,
+	XDP_ATTACHED_MULTI,
 };
 
 enum {
@@ -928,6 +942,9 @@ enum {
 	IFLA_XDP_ATTACHED,
 	IFLA_XDP_FLAGS,
 	IFLA_XDP_PROG_ID,
+	IFLA_XDP_DRV_PROG_ID,
+	IFLA_XDP_SKB_PROG_ID,
+	IFLA_XDP_HW_PROG_ID,
 	__IFLA_XDP_MAX,
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next 3/4] tools/bpf: add more netlink functionalities in lib/bpf
From: Yonghong Song @ 2018-09-05 23:58 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180905235806.1536396-1-yhs@fb.com>

This patch added a few netlink attribute parsing functions
and the netlink API functions to query networking links, tc classes,
tc qdiscs and tc filters. For example, the following API is
to get networking links:
  int nl_get_link(int sock, unsigned int nl_pid,
                  dump_nlmsg_t dump_link_nlmsg,
                  void *cookie);

Note that when the API is called, the user also provided a
callback function with the following signature:
  int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);

The "cookie" is the parameter the user passed to the API and will
be available for the callback function.
The "msg" is the information about the result, e.g., ifinfomsg or
tcmsg. The "tb" is the parsed netlink attributes.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/libbpf.h       |  16 ++++
 tools/lib/bpf/libbpf_errno.c |   1 +
 tools/lib/bpf/netlink.c      | 165 ++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/nlattr.c       |  33 ++++---
 tools/lib/bpf/nlattr.h       |  38 ++++++++
 5 files changed, 238 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 96c55fac54c3..e3b00e23e181 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -46,6 +46,7 @@ enum libbpf_errno {
 	LIBBPF_ERRNO__PROGTYPE,	/* Kernel doesn't support this program type */
 	LIBBPF_ERRNO__WRNGPID,	/* Wrong pid in netlink message */
 	LIBBPF_ERRNO__INVSEQ,	/* Invalid netlink sequence */
+	LIBBPF_ERRNO__NLPARSE,	/* netlink parsing error */
 	__LIBBPF_ERRNO__END,
 };
 
@@ -297,4 +298,19 @@ int bpf_perf_event_read_simple(void *mem, unsigned long size,
 			       unsigned long page_size,
 			       void **buf, size_t *buf_len,
 			       bpf_perf_event_print_t fn, void *priv);
+
+struct nlmsghdr;
+struct nlattr;
+typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
+typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t,
+			      void *cookie);
+int bpf_netlink_open(unsigned int *nl_pid);
+int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
+		void *cookie);
+int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
+		 dump_nlmsg_t dump_class_nlmsg, void *cookie);
+int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
+		 dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
+int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
+		  dump_nlmsg_t dump_filter_nlmsg, void *cookie);
 #endif
diff --git a/tools/lib/bpf/libbpf_errno.c b/tools/lib/bpf/libbpf_errno.c
index d9ba851bd7f9..2464ade3b326 100644
--- a/tools/lib/bpf/libbpf_errno.c
+++ b/tools/lib/bpf/libbpf_errno.c
@@ -42,6 +42,7 @@ static const char *libbpf_strerror_table[NR_ERRNO] = {
 	[ERRCODE_OFFSET(PROGTYPE)]	= "Kernel doesn't support this program type",
 	[ERRCODE_OFFSET(WRNGPID)]	= "Wrong pid in netlink message",
 	[ERRCODE_OFFSET(INVSEQ)]	= "Invalid netlink sequence",
+	[ERRCODE_OFFSET(NLPARSE)]	= "Incorrect netlink message parsing",
 };
 
 int libbpf_strerror(int err, char *buf, size_t size)
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index ccaa991fe9d8..469e068dd0c5 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -18,7 +18,7 @@
 #define SOL_NETLINK 270
 #endif
 
-static int bpf_netlink_open(__u32 *nl_pid)
+int bpf_netlink_open(__u32 *nl_pid)
 {
 	struct sockaddr_nl sa;
 	socklen_t addrlen;
@@ -61,7 +61,9 @@ static int bpf_netlink_open(__u32 *nl_pid)
 	return ret;
 }
 
-static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
+static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
+			    __dump_nlmsg_t _fn, dump_nlmsg_t fn,
+			    void *cookie)
 {
 	struct nlmsgerr *err;
 	struct nlmsghdr *nh;
@@ -98,6 +100,11 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
 			default:
 				break;
 			}
+			if (_fn) {
+				ret = _fn(nh, fn, cookie);
+				if (ret)
+					return ret;
+			}
 		}
 	}
 	ret = 0;
@@ -157,9 +164,161 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 		ret = -errno;
 		goto cleanup;
 	}
-	ret = bpf_netlink_recv(sock, nl_pid, seq);
+	ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
 
 cleanup:
 	close(sock);
 	return ret;
 }
+
+static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t dump_link_nlmsg,
+			     void *cookie)
+{
+	struct nlattr *tb[IFLA_MAX + 1], *attr;
+	struct ifinfomsg *ifi = NLMSG_DATA(nlh);
+	int len;
+
+	len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
+	attr = (struct nlattr *) ((void *) ifi + NLMSG_ALIGN(sizeof(*ifi)));
+	if (nla_parse(tb, IFLA_MAX, attr, len, NULL) != 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	return dump_link_nlmsg(cookie, ifi, tb);
+}
+
+int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
+		void *cookie)
+{
+	struct {
+		struct nlmsghdr nlh;
+		struct ifinfomsg ifm;
+	} req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+		.nlh.nlmsg_type = RTM_GETLINK,
+		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		.ifm.ifi_family = AF_PACKET,
+	};
+	int seq = time(NULL);
+
+	req.nlh.nlmsg_seq = seq;
+	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
+		return -errno;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, __dump_link_nlmsg,
+				dump_link_nlmsg, cookie);
+}
+
+static int __dump_class_nlmsg(struct nlmsghdr *nlh,
+			      dump_nlmsg_t dump_class_nlmsg, void *cookie)
+{
+	struct nlattr *tb[TCA_MAX + 1], *attr;
+	struct tcmsg *t = NLMSG_DATA(nlh);
+	int len;
+
+	len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
+	attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
+	if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	return dump_class_nlmsg(cookie, t, tb);
+}
+
+int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
+		 dump_nlmsg_t dump_class_nlmsg, void *cookie)
+{
+	struct {
+		struct nlmsghdr nlh;
+		struct tcmsg t;
+	} req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.nlh.nlmsg_type = RTM_GETTCLASS,
+		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		.t.tcm_family = AF_UNSPEC,
+		.t.tcm_ifindex = ifindex,
+	};
+	int seq = time(NULL);
+
+	req.nlh.nlmsg_seq = seq;
+	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
+		return -errno;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, __dump_class_nlmsg,
+				dump_class_nlmsg, cookie);
+}
+
+static int __dump_qdisc_nlmsg(struct nlmsghdr *nlh,
+			      dump_nlmsg_t dump_qdisc_nlmsg, void *cookie)
+{
+	struct nlattr *tb[TCA_MAX + 1], *attr;
+	struct tcmsg *t = NLMSG_DATA(nlh);
+	int len;
+
+	len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
+	attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
+	if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	return dump_qdisc_nlmsg(cookie, t, tb);
+}
+
+int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
+		 dump_nlmsg_t dump_qdisc_nlmsg, void *cookie)
+{
+	struct {
+		struct nlmsghdr nlh;
+		struct tcmsg t;
+	} req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.nlh.nlmsg_type = RTM_GETQDISC,
+		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		.t.tcm_family = AF_UNSPEC,
+		.t.tcm_ifindex = ifindex,
+	};
+	int seq = time(NULL);
+
+	req.nlh.nlmsg_seq = seq;
+	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
+		return -errno;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, __dump_qdisc_nlmsg,
+				dump_qdisc_nlmsg, cookie);
+}
+
+static int __dump_filter_nlmsg(struct nlmsghdr *nlh,
+			       dump_nlmsg_t dump_filter_nlmsg, void *cookie)
+{
+	struct nlattr *tb[TCA_MAX + 1], *attr;
+	struct tcmsg *t = NLMSG_DATA(nlh);
+	int len;
+
+	len = nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*t));
+	attr = (struct nlattr *) ((void *) t + NLMSG_ALIGN(sizeof(*t)));
+	if (nla_parse(tb, TCA_MAX, attr, len, NULL) != 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	return dump_filter_nlmsg(cookie, t, tb);
+}
+
+int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
+		  dump_nlmsg_t dump_filter_nlmsg, void *cookie)
+{
+	struct {
+		struct nlmsghdr nlh;
+		struct tcmsg t;
+	} req = {
+		.nlh.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+		.nlh.nlmsg_type = RTM_GETTFILTER,
+		.nlh.nlmsg_flags = NLM_F_DUMP | NLM_F_REQUEST,
+		.t.tcm_family = AF_UNSPEC,
+		.t.tcm_ifindex = ifindex,
+		.t.tcm_parent = handle,
+	};
+	int seq = time(NULL);
+
+	req.nlh.nlmsg_seq = seq;
+	if (send(sock, &req, req.nlh.nlmsg_len, 0) < 0)
+		return -errno;
+
+	return bpf_netlink_recv(sock, nl_pid, seq, __dump_filter_nlmsg,
+				dump_filter_nlmsg, cookie);
+}
diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c
index 4719434278b2..49f514119bdb 100644
--- a/tools/lib/bpf/nlattr.c
+++ b/tools/lib/bpf/nlattr.c
@@ -26,11 +26,6 @@ static uint16_t nla_attr_minlen[NLA_TYPE_MAX+1] = {
 	[NLA_FLAG]	= 0,
 };
 
-static int nla_len(const struct nlattr *nla)
-{
-	return nla->nla_len - NLA_HDRLEN;
-}
-
 static struct nlattr *nla_next(const struct nlattr *nla, int *remaining)
 {
 	int totlen = NLA_ALIGN(nla->nla_len);
@@ -46,11 +41,6 @@ static int nla_ok(const struct nlattr *nla, int remaining)
 	       nla->nla_len <= remaining;
 }
 
-static void *nla_data(const struct nlattr *nla)
-{
-	return (char *) nla + NLA_HDRLEN;
-}
-
 static int nla_type(const struct nlattr *nla)
 {
 	return nla->nla_type & NLA_TYPE_MASK;
@@ -114,8 +104,8 @@ static inline int nlmsg_len(const struct nlmsghdr *nlh)
  * @see nla_validate
  * @return 0 on success or a negative error code.
  */
-static int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len,
-		     struct nla_policy *policy)
+int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len,
+	      struct nla_policy *policy)
 {
 	struct nlattr *nla;
 	int rem, err;
@@ -146,6 +136,25 @@ static int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int
 	return err;
 }
 
+/**
+ * Create attribute index based on nested attribute
+ * @arg tb              Index array to be filled (maxtype+1 elements).
+ * @arg maxtype         Maximum attribute type expected and accepted.
+ * @arg nla             Nested Attribute.
+ * @arg policy          Attribute validation policy.
+ *
+ * Feeds the stream of attributes nested into the specified attribute
+ * to nla_parse().
+ *
+ * @see nla_parse
+ * @return 0 on success or a negative error code.
+ */
+int nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla,
+		     struct nla_policy *policy)
+{
+	return nla_parse(tb, maxtype, nla_data(nla), nla_len(nla), policy);
+}
+
 /* dump netlink extended ack error message */
 int nla_dump_errormsg(struct nlmsghdr *nlh)
 {
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 931a71f68f93..a6e2396bce7c 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -67,6 +67,44 @@ struct nla_policy {
 	     nla_ok(pos, rem); \
 	     pos = nla_next(pos, &(rem)))
 
+/**
+ * nla_data - head of payload
+ * @nla: netlink attribute
+ */
+static inline void *nla_data(const struct nlattr *nla)
+{
+	return (char *) nla + NLA_HDRLEN;
+}
+
+static inline uint8_t nla_getattr_u8(const struct nlattr *nla)
+{
+	return *(uint8_t *)nla_data(nla);
+}
+
+static inline uint32_t nla_getattr_u32(const struct nlattr *nla)
+{
+	return *(uint32_t *)nla_data(nla);
+}
+
+static inline const char *nla_getattr_str(const struct nlattr *nla)
+{
+	return (const char *)nla_data(nla);
+}
+
+/**
+ * nla_len - length of payload
+ * @nla: netlink attribute
+ */
+static inline int nla_len(const struct nlattr *nla)
+{
+	return nla->nla_len - NLA_HDRLEN;
+}
+
+int nla_parse(struct nlattr *tb[], int maxtype, struct nlattr *head, int len,
+	      struct nla_policy *policy);
+int nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla,
+		     struct nla_policy *policy);
+
 int nla_dump_errormsg(struct nlmsghdr *nlh);
 
 #endif /* __NLATTR_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf-next 0/4] tools/bpf: add bpftool net support
From: Yonghong Song @ 2018-09-05 23:58 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

As bpf usage becomes more pervasive, people starts to worry
about their cpu and memory cost. On a particular host,
people often wanted to know all running bpf programs
and their attachment context. So they can relate
a performance/memory anormly quickly to a particular bpf
program or an application.

bpftool already provides a pretty good coverage for perf
and cgroup related attachments. This patch set enabled
to dump attachment info for xdp and tc bpf programs.

Currently, users can already use "ip link show <dev>" and
"tc filter show dev <dev> ..." to dump bpf program attachment
information for xdp and tc bpf programs. The main reason
to implement such functionality in bpftool as well is for
better user experience. We want the bpftool to be the
ultimate tool for bpf introspection. The bpftool net
implementation will only present necessary bpf attachment
information to the user, ignoring most other ip/tc
specific information.

For example, the below is a pretty json print for xdp
and tc_filters.

  $ ./bpftool -jp net
  [{
        "xdp": [{
                "ifindex": 2,
                "devname": "eth0",
                "prog_id": 198
            }
        ],
        "tc_filters": [{
                "ifindex": 2,
                "kind": "qdisc_htb",
                "name": "prefix_matcher.o:[cls_prefix_matcher_htb]",
                "prog_id": 111727,
                "tag": "d08fe3b4319bc2fd",
                "act": []
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_ingress",
                "name": "fbflow_icmp",
                "prog_id": 130246,
                "tag": "3f265c7f26db62c9",
                "act": []
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_egress",
                "name": "prefix_matcher.o:[cls_prefix_matcher_clsact]",
                "prog_id": 111726,
                "tag": "99a197826974c876"
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_egress",
                "name": "cls_fg_dscp",
                "prog_id": 108619,
                "tag": "dc4630674fd72dcc",
                "act": []
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_egress",
                "name": "fbflow_egress",
                "prog_id": 130245,
                "tag": "72d2d830d6888d2c"
            }
        ]
    }
  ]

Patch #1 synced kernel uapi header if_link.h to tools directory.
Patch #2 moved tools/bpf/lib/bpf.c netlink related functions to
a new file. Patch #3 implemented additional functions
in libbpf which will be used in Patch #4.
Patch #4 implemented bpftool net support to dump
xdp and tc bpf program attachments.

Yonghong Song (4):
  tools/bpf: sync kernel uapi header if_link.h to tools
  tools/bpf: move bpf/lib netlink related functions into a new file
  tools/bpf: add more netlink functionalities in lib/bpf
  tools/bpf: bpftool: add net support

 .../bpf/bpftool/Documentation/bpftool-net.rst | 133 +++++++
 tools/bpf/bpftool/Documentation/bpftool.rst   |   6 +-
 tools/bpf/bpftool/bash-completion/bpftool     |  17 +-
 tools/bpf/bpftool/main.c                      |   3 +-
 tools/bpf/bpftool/main.h                      |   7 +
 tools/bpf/bpftool/net.c                       | 233 +++++++++++++
 tools/bpf/bpftool/netlink_dumper.c            | 181 ++++++++++
 tools/bpf/bpftool/netlink_dumper.h            | 103 ++++++
 tools/include/uapi/linux/if_link.h            |  17 +
 tools/lib/bpf/Build                           |   2 +-
 tools/lib/bpf/bpf.c                           | 129 -------
 tools/lib/bpf/libbpf.h                        |  16 +
 tools/lib/bpf/libbpf_errno.c                  |   1 +
 tools/lib/bpf/netlink.c                       | 324 ++++++++++++++++++
 tools/lib/bpf/nlattr.c                        |  33 +-
 tools/lib/bpf/nlattr.h                        |  38 ++
 16 files changed, 1094 insertions(+), 149 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-net.rst
 create mode 100644 tools/bpf/bpftool/net.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.h
 create mode 100644 tools/lib/bpf/netlink.c

-- 
2.17.1

^ permalink raw reply

* [PATCH bpf-next 4/4] tools/bpf: bpftool: add net support
From: Yonghong Song @ 2018-09-05 23:58 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180905235806.1536396-1-yhs@fb.com>

Add "bpftool net" support. Networking devices are enumerated
to dump device index/name associated with xdp progs.

For each networking device, tc classes and qdiscs are enumerated
in order to check their bpf filters.
In addition, root handle and clsact ingress/egress are also checked for
bpf filters.  Not all filter information is printed out. Only ifindex,
kind, filter name, prog_id and tag are printed out, which are good
enough to show attachment information. If the filter action
is a bpf action, its bpf program id, bpf name and tag will be
printed out as well.

For example,
  $ ./bpftool net
  xdp [
  ifindex 2 devname eth0 prog_id 198
  ]
  tc_filters [
  ifindex 2 kind qdisc_htb name prefix_matcher.o:[cls_prefix_matcher_htb]
            prog_id 111727 tag d08fe3b4319bc2fd act []
  ifindex 2 kind qdisc_clsact_ingress name fbflow_icmp
            prog_id 130246 tag 3f265c7f26db62c9 act []
  ifindex 2 kind qdisc_clsact_egress name prefix_matcher.o:[cls_prefix_matcher_clsact]
            prog_id 111726 tag 99a197826974c876
  ifindex 2 kind qdisc_clsact_egress name cls_fg_dscp
            prog_id 108619 tag dc4630674fd72dcc act []
  ifindex 2 kind qdisc_clsact_egress name fbflow_egress
            prog_id 130245 tag 72d2d830d6888d2c
  ]
  $ ./bpftool -jp net
  [{
        "xdp": [{
                "ifindex": 2,
                "devname": "eth0",
                "prog_id": 198
            }
        ],
        "tc_filters": [{
                "ifindex": 2,
                "kind": "qdisc_htb",
                "name": "prefix_matcher.o:[cls_prefix_matcher_htb]",
                "prog_id": 111727,
                "tag": "d08fe3b4319bc2fd",
                "act": []
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_ingress",
                "name": "fbflow_icmp",
                "prog_id": 130246,
                "tag": "3f265c7f26db62c9",
                "act": []
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_egress",
                "name": "prefix_matcher.o:[cls_prefix_matcher_clsact]",
                "prog_id": 111726,
                "tag": "99a197826974c876"
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_egress",
                "name": "cls_fg_dscp",
                "prog_id": 108619,
                "tag": "dc4630674fd72dcc",
                "act": []
            },{
                "ifindex": 2,
                "kind": "qdisc_clsact_egress",
                "name": "fbflow_egress",
                "prog_id": 130245,
                "tag": "72d2d830d6888d2c"
            }
        ]
    }
  ]

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../bpf/bpftool/Documentation/bpftool-net.rst | 133 ++++++++++
 tools/bpf/bpftool/Documentation/bpftool.rst   |   6 +-
 tools/bpf/bpftool/bash-completion/bpftool     |  17 +-
 tools/bpf/bpftool/main.c                      |   3 +-
 tools/bpf/bpftool/main.h                      |   7 +
 tools/bpf/bpftool/net.c                       | 233 ++++++++++++++++++
 tools/bpf/bpftool/netlink_dumper.c            | 181 ++++++++++++++
 tools/bpf/bpftool/netlink_dumper.h            | 103 ++++++++
 8 files changed, 676 insertions(+), 7 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-net.rst
 create mode 100644 tools/bpf/bpftool/net.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.h

diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst b/tools/bpf/bpftool/Documentation/bpftool-net.rst
new file mode 100644
index 000000000000..48a61837a264
--- /dev/null
+++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
@@ -0,0 +1,133 @@
+================
+bpftool-net
+================
+-------------------------------------------------------------------------------
+tool for inspection of netdev/tc related bpf prog attachments
+-------------------------------------------------------------------------------
+
+:Manual section: 8
+
+SYNOPSIS
+========
+
+	**bpftool** [*OPTIONS*] **net** *COMMAND*
+
+	*OPTIONS* := { [{ **-j** | **--json** }] [{ **-p** | **--pretty** }] }
+
+	*COMMANDS* :=
+	{ **show** | **list** } [ **dev** name ] | **help**
+
+NET COMMANDS
+============
+
+|	**bpftool** **net { show | list } [ dev name ]**
+|	**bpftool** **net help**
+
+DESCRIPTION
+===========
+	**bpftool net { show | list } [ dev name ]**
+		  List all networking device driver and tc attachment in the system.
+
+                  Output will start with all xdp program attachment, followed by
+                  all tc class/qdisc bpf program attachments. Both xdp programs and
+                  tc programs are ordered based on ifindex number. If multiple bpf
+                  programs attached to the same networking device through **tc filter**,
+                  the order will be first all bpf programs attached to tc classes, then
+                  all bpf programs attached to non clsact qdiscs, and finally all
+                  bpf programs attached to root and clsact qdisc.
+
+	**bpftool net help**
+		  Print short help message.
+
+OPTIONS
+=======
+	-h, --help
+		  Print short generic help message (similar to **bpftool help**).
+
+	-v, --version
+		  Print version number (similar to **bpftool version**).
+
+	-j, --json
+		  Generate JSON output. For commands that cannot produce JSON, this
+		  option has no effect.
+
+	-p, --pretty
+		  Generate human-readable JSON output. Implies **-j**.
+
+EXAMPLES
+========
+
+| **# bpftool net**
+
+::
+
+      xdp [
+      ifindex 2 devname eth0 prog_id 198
+      ]
+      tc_filters [
+      ifindex 2 kind qdisc_htb name prefix_matcher.o:[cls_prefix_matcher_htb]
+                prog_id 111727 tag d08fe3b4319bc2fd act []
+      ifindex 2 kind qdisc_clsact_ingress name fbflow_icmp
+                prog_id 130246 tag 3f265c7f26db62c9 act []
+      ifindex 2 kind qdisc_clsact_egress name prefix_matcher.o:[cls_prefix_matcher_clsact]
+                prog_id 111726 tag 99a197826974c876
+      ifindex 2 kind qdisc_clsact_egress name cls_fg_dscp
+                prog_id 108619 tag dc4630674fd72dcc act []
+      ifindex 2 kind qdisc_clsact_egress name fbflow_egress
+                prog_id 130245 tag 72d2d830d6888d2c
+      ]
+
+|
+| **# bpftool -jp net**
+
+::
+
+    [{
+            "xdp": [{
+                    "ifindex": 2,
+                    "devname": "eth0",
+                    "prog_id": 198
+                }
+            ],
+            "tc_filters": [{
+                    "ifindex": 2,
+                    "kind": "qdisc_htb",
+                    "name": "prefix_matcher.o:[cls_prefix_matcher_htb]",
+                    "prog_id": 111727,
+                    "tag": "d08fe3b4319bc2fd",
+                    "act": []
+                },{
+                    "ifindex": 2,
+                    "kind": "qdisc_clsact_ingress",
+                    "name": "fbflow_icmp",
+                    "prog_id": 130246,
+                    "tag": "3f265c7f26db62c9",
+                    "act": []
+                },{
+                    "ifindex": 2,
+                    "kind": "qdisc_clsact_egress",
+                    "name": "prefix_matcher.o:[cls_prefix_matcher_clsact]",
+                    "prog_id": 111726,
+                    "tag": "99a197826974c876"
+                },{
+                    "ifindex": 2,
+                    "kind": "qdisc_clsact_egress",
+                    "name": "cls_fg_dscp",
+                    "prog_id": 108619,
+                    "tag": "dc4630674fd72dcc",
+                    "act": []
+                },{
+                    "ifindex": 2,
+                    "kind": "qdisc_clsact_egress",
+                    "name": "fbflow_egress",
+                    "prog_id": 130245,
+                    "tag": "72d2d830d6888d2c"
+                }
+            ]
+        }
+    ]
+
+
+SEE ALSO
+========
+	**bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index b6f5d560460d..8dda77daeda9 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -16,7 +16,7 @@ SYNOPSIS
 
 	**bpftool** **version**
 
-	*OBJECT* := { **map** | **program** | **cgroup** | **perf** }
+	*OBJECT* := { **map** | **program** | **cgroup** | **perf** | **net** }
 
 	*OPTIONS* := { { **-V** | **--version** } | { **-h** | **--help** }
 	| { **-j** | **--json** } [{ **-p** | **--pretty** }] }
@@ -32,6 +32,8 @@ SYNOPSIS
 
 	*PERF-COMMANDS* := { **show** | **list** | **help** }
 
+	*NET-COMMANDS* := { **show** | **list** | **help** }
+
 DESCRIPTION
 ===========
 	*bpftool* allows for inspection and simple modification of BPF objects
@@ -58,4 +60,4 @@ OPTIONS
 SEE ALSO
 ========
 	**bpftool-map**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
-        **bpftool-perf**\ (8)
+        **bpftool-perf**\ (8), **bpftool-net**\ (8)
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 598066c40191..df1060b852c1 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -494,10 +494,10 @@ _bpftool()
                     _filedir
                     return 0
                     ;;
-		tree)
-		    _filedir
-		    return 0
-		    ;;
+                tree)
+                    _filedir
+                    return 0
+                    ;;
                 attach|detach)
                     local ATTACH_TYPES='ingress egress sock_create sock_ops \
                         device bind4 bind6 post_bind4 post_bind6 connect4 \
@@ -552,6 +552,15 @@ _bpftool()
                     ;;
             esac
             ;;
+        net)
+            case $command in
+                *)
+                    [[ $prev == $object ]] && \
+                        COMPREPLY=( $( compgen -W 'help \
+                            show list' -- "$cur" ) )
+                    ;;
+            esac
+            ;;
     esac
 } &&
 complete -F _bpftool bpftool
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index d15a62be6cf0..79dc3f193547 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -85,7 +85,7 @@ static int do_help(int argc, char **argv)
 		"       %s batch file FILE\n"
 		"       %s version\n"
 		"\n"
-		"       OBJECT := { prog | map | cgroup | perf }\n"
+		"       OBJECT := { prog | map | cgroup | perf | net }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, bin_name, bin_name);
@@ -215,6 +215,7 @@ static const struct cmd cmds[] = {
 	{ "map",	do_map },
 	{ "cgroup",	do_cgroup },
 	{ "perf",	do_perf },
+	{ "net",	do_net },
 	{ "version",	do_version },
 	{ 0 }
 };
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 238e734d75b3..02dfbcb92a23 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -136,6 +136,7 @@ int do_map(int argc, char **arg);
 int do_event_pipe(int argc, char **argv);
 int do_cgroup(int argc, char **arg);
 int do_perf(int argc, char **arg);
+int do_net(int argc, char **arg);
 
 int prog_parse_fd(int *argc, char ***argv);
 int map_parse_fd(int *argc, char ***argv);
@@ -165,4 +166,10 @@ struct btf_dumper {
  */
 int btf_dumper_type(const struct btf_dumper *d, __u32 type_id,
 		    const void *data);
+
+struct nlattr;
+struct ifinfomsg;
+struct tcmsg;
+int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb);
+int do_filter_dump(struct tcmsg *ifinfo, struct nlattr **tb, const char *kind);
 #endif
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
new file mode 100644
index 000000000000..77dd73dd9ade
--- /dev/null
+++ b/tools/bpf/bpftool/net.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright (C) 2018 Facebook
+
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <libbpf.h>
+#include <net/if.h>
+#include <linux/if.h>
+#include <linux/rtnetlink.h>
+#include <linux/tc_act/tc_bpf.h>
+#include <sys/socket.h>
+
+#include <bpf.h>
+#include <nlattr.h>
+#include "main.h"
+#include "netlink_dumper.h"
+
+struct bpf_netdev_t {
+	int	*ifindex_array;
+	int	used_len;
+	int	array_len;
+	int	filter_idx;
+};
+
+struct tc_kind_handle {
+	char	kind[64];
+	int	handle;
+};
+
+struct bpf_tcinfo_t {
+	struct tc_kind_handle	*handle_array;
+	int			used_len;
+	int			array_len;
+	bool			is_qdisc;
+};
+
+static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
+{
+	struct bpf_netdev_t *netinfo = cookie;
+	struct ifinfomsg *ifinfo = msg;
+
+	if (netinfo->filter_idx > 0 && netinfo->filter_idx != ifinfo->ifi_index)
+		return 0;
+
+	if (netinfo->used_len == netinfo->array_len) {
+		netinfo->ifindex_array = realloc(netinfo->ifindex_array,
+			(netinfo->array_len + 16) * sizeof(int));
+		netinfo->array_len += 16;
+	}
+	netinfo->ifindex_array[netinfo->used_len++] = ifinfo->ifi_index;
+
+	return do_xdp_dump(ifinfo, tb);
+}
+
+static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
+{
+	struct bpf_tcinfo_t *tcinfo = cookie;
+	struct tcmsg *info = msg;
+
+	if (tcinfo->is_qdisc) {
+		/* skip clsact qdisc */
+		if (tb[TCA_KIND] &&
+		    strcmp(nla_data(tb[TCA_KIND]), "clsact") == 0)
+			return 0;
+		if (info->tcm_handle == 0)
+			return 0;
+	}
+
+	if (tcinfo->used_len == tcinfo->array_len) {
+		tcinfo->handle_array = realloc(tcinfo->handle_array,
+			(tcinfo->array_len + 16) * sizeof(struct tc_kind_handle));
+		tcinfo->array_len += 16;
+	}
+	tcinfo->handle_array[tcinfo->used_len].handle = info->tcm_handle;
+	snprintf(tcinfo->handle_array[tcinfo->used_len].kind,
+		 sizeof(tcinfo->handle_array[tcinfo->used_len].kind),
+		 "%s_%s",
+		 tcinfo->is_qdisc ? "qdisc" : "class",
+		 tb[TCA_KIND] ? nla_getattr_str(tb[TCA_KIND]) : "unknown");
+	tcinfo->used_len++;
+
+	return 0;
+}
+
+static int dump_filter_nlmsg(void *cookie, void *msg, struct nlattr **tb)
+{
+	const char *kind = cookie;
+
+	return do_filter_dump((struct tcmsg *)msg, tb, kind);
+}
+
+static int show_dev_tc_bpf(int sock, unsigned int nl_pid, int ifindex)
+{
+	struct bpf_tcinfo_t tcinfo;
+	int i, handle, ret;
+
+	tcinfo.handle_array = NULL;
+	tcinfo.used_len = 0;
+	tcinfo.array_len = 0;
+
+	tcinfo.is_qdisc = false;
+	ret = nl_get_class(sock, nl_pid, ifindex, dump_class_qdisc_nlmsg,
+			   &tcinfo);
+	if (ret)
+		return ret;
+
+	tcinfo.is_qdisc = true;
+	ret = nl_get_qdisc(sock, nl_pid, ifindex, dump_class_qdisc_nlmsg,
+			   &tcinfo);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < tcinfo.used_len; i++) {
+		ret = nl_get_filter(sock, nl_pid, ifindex,
+				    tcinfo.handle_array[i].handle,
+				    dump_filter_nlmsg,
+				    tcinfo.handle_array[i].kind);
+		if (ret)
+			return ret;
+	}
+
+	/* root, ingress and egress handle */
+	handle = TC_H_ROOT;
+	ret = nl_get_filter(sock, nl_pid, ifindex, handle, dump_filter_nlmsg,
+			    "root");
+	if (ret)
+		return ret;
+
+	handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
+	ret = nl_get_filter(sock, nl_pid, ifindex, handle, dump_filter_nlmsg,
+			    "qdisc_clsact_ingress");
+	if (ret)
+		return ret;
+
+	handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
+	ret = nl_get_filter(sock, nl_pid, ifindex, handle, dump_filter_nlmsg,
+			    "qdisc_clsact_egress");
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+	int i, sock, ret, filter_idx = -1;
+	struct bpf_netdev_t dev_array;
+	unsigned int nl_pid;
+	char err_buf[256];
+
+	if (argc == 2) {
+		if (strcmp(argv[0], "dev") != 0)
+			usage();
+		filter_idx = if_nametoindex(argv[1]);
+		if (filter_idx == 0) {
+			fprintf(stderr, "invalid dev name %s\n", argv[1]);
+			return -1;
+		}
+	} else if (argc != 0) {
+		usage();
+	}
+
+	sock = bpf_netlink_open(&nl_pid);
+	if (sock < 0) {
+		fprintf(stderr, "failed to open netlink sock\n");
+		return -1;
+	}
+
+	dev_array.ifindex_array = NULL;
+	dev_array.used_len = 0;
+	dev_array.array_len = 0;
+	dev_array.filter_idx = filter_idx;
+
+	if (json_output)
+		jsonw_start_array(json_wtr);
+	NET_START_OBJECT;
+	NET_START_ARRAY("xdp", "\n");
+	ret = nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array);
+	NET_END_ARRAY("\n");
+
+	if (!ret) {
+		NET_START_ARRAY("tc_filters", "\n");
+		for (i = 0; i < dev_array.used_len; i++) {
+			ret = show_dev_tc_bpf(sock, nl_pid,
+					      dev_array.ifindex_array[i]);
+			if (ret)
+				break;
+		}
+		NET_END_ARRAY("\n");
+	}
+	NET_END_OBJECT;
+	if (json_output)
+		jsonw_end_array(json_wtr);
+
+	if (ret) {
+		if (json_output)
+			jsonw_null(json_wtr);
+		libbpf_strerror(ret, err_buf, sizeof(err_buf));
+		fprintf(stderr, "Error: %s\n", err_buf);
+	}
+	free(dev_array.ifindex_array);
+	close(sock);
+	return ret;
+}
+
+static int do_help(int argc, char **argv)
+{
+	if (json_output) {
+		jsonw_null(json_wtr);
+		return 0;
+	}
+
+	fprintf(stderr,
+		"Usage: %s %s { show | list } [dev <devname>]\n"
+		"       %s %s help\n",
+		bin_name, argv[-2], bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "list",	do_show },
+	{ "help",	do_help },
+	{ 0 }
+};
+
+int do_net(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/netlink_dumper.c b/tools/bpf/bpftool/netlink_dumper.c
new file mode 100644
index 000000000000..e12494fd1d2e
--- /dev/null
+++ b/tools/bpf/bpftool/netlink_dumper.c
@@ -0,0 +1,181 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright (C) 2018 Facebook
+
+#include <stdlib.h>
+#include <string.h>
+#include <libbpf.h>
+#include <linux/rtnetlink.h>
+#include <linux/tc_act/tc_bpf.h>
+
+#include <nlattr.h>
+#include "main.h"
+#include "netlink_dumper.h"
+
+static void xdp_dump_prog_id(struct nlattr **tb, int attr,
+			     const char *type)
+{
+	if (!tb[attr])
+		return;
+
+	NET_DUMP_UINT(type, nla_getattr_u32(tb[attr]))
+}
+
+static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
+			   const char *name)
+{
+	struct nlattr *tb[IFLA_XDP_MAX + 1];
+	unsigned char mode;
+
+	if (nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0)
+		return -1;
+
+	if (!tb[IFLA_XDP_ATTACHED])
+		return 0;
+
+	mode = nla_getattr_u8(tb[IFLA_XDP_ATTACHED]);
+	if (mode == XDP_ATTACHED_NONE)
+		return 0;
+
+	NET_START_OBJECT;
+	NET_DUMP_UINT("ifindex", ifindex);
+
+	if (name)
+		NET_DUMP_STR("devname", name);
+
+	if (tb[IFLA_XDP_PROG_ID])
+		NET_DUMP_UINT("prog_id", nla_getattr_u32(tb[IFLA_XDP_PROG_ID]));
+
+	if (mode == XDP_ATTACHED_MULTI) {
+		xdp_dump_prog_id(tb, IFLA_XDP_SKB_PROG_ID, "generic_prog_id");
+		xdp_dump_prog_id(tb, IFLA_XDP_DRV_PROG_ID, "drv_prog_id");
+		xdp_dump_prog_id(tb, IFLA_XDP_HW_PROG_ID, "offload_prog_id");
+	}
+
+	NET_END_OBJECT_FINAL;
+	return 0;
+}
+
+int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb)
+{
+	if (!tb[IFLA_XDP])
+		return 0;
+
+	return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index,
+			       nla_getattr_str(tb[IFLA_IFNAME]));
+}
+
+static char *hexstring_n2a(const unsigned char *str, int len,
+			   char *buf, int blen)
+{
+	char *ptr = buf;
+	int i;
+
+	for (i = 0; i < len; i++) {
+		if (blen < 3)
+			break;
+		sprintf(ptr, "%02x", str[i]);
+		ptr += 2;
+		blen -= 2;
+	}
+	return buf;
+}
+
+static int do_bpf_dump_one_act(struct nlattr *attr)
+{
+	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
+	char buf[256];
+
+	if (nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	if (!tb[TCA_ACT_BPF_PARMS])
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	NET_START_OBJECT_NESTED2;
+	if (tb[TCA_ACT_BPF_NAME])
+		NET_DUMP_STR("name", nla_getattr_str(tb[TCA_ACT_BPF_NAME]));
+	if (tb[TCA_ACT_BPF_ID])
+		NET_DUMP_UINT("bpf_id", nla_getattr_u32(tb[TCA_ACT_BPF_ID]));
+	if (tb[TCA_ACT_BPF_TAG])
+		NET_DUMP_STR("tag", hexstring_n2a(nla_data(tb[TCA_ACT_BPF_TAG]),
+						  nla_len(tb[TCA_ACT_BPF_TAG]),
+						  buf, sizeof(buf)));
+	NET_END_OBJECT_NESTED;
+	return 0;
+}
+
+static int do_dump_one_act(struct nlattr *attr)
+{
+	struct nlattr *tb[TCA_ACT_MAX + 1];
+
+	if (!attr)
+		return 0;
+
+	if (nla_parse_nested(tb, TCA_ACT_MAX, attr, NULL) < 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	if (tb[TCA_ACT_KIND] && strcmp(nla_data(tb[TCA_ACT_KIND]), "bpf") == 0)
+		return do_bpf_dump_one_act(tb[TCA_ACT_OPTIONS]);
+
+	return 0;
+}
+
+static int do_bpf_act_dump(struct nlattr *attr)
+{
+	struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
+	int act, ret;
+
+	if (nla_parse_nested(tb, TCA_ACT_MAX_PRIO, attr, NULL) < 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	NET_START_ARRAY("act", "");
+	for (act = 0; act <= TCA_ACT_MAX_PRIO; act++) {
+		ret = do_dump_one_act(tb[act]);
+		if (ret)
+			break;
+	}
+	NET_END_ARRAY(" ");
+
+	return ret;
+}
+
+static int do_bpf_filter_dump(struct nlattr *attr)
+{
+	struct nlattr *tb[TCA_BPF_MAX + 1];
+	char buf[256];
+	int ret;
+
+	if (nla_parse_nested(tb, TCA_BPF_MAX, attr, NULL) < 0)
+		return -LIBBPF_ERRNO__NLPARSE;
+
+	if (tb[TCA_BPF_NAME])
+		NET_DUMP_STR("name", nla_getattr_str(tb[TCA_BPF_NAME]));
+	if (tb[TCA_BPF_ID])
+		NET_DUMP_UINT("prog_id", nla_getattr_u32(tb[TCA_BPF_ID]));
+	if (tb[TCA_BPF_TAG])
+		NET_DUMP_STR("tag", hexstring_n2a(nla_data(tb[TCA_BPF_TAG]),
+						  nla_len(tb[TCA_BPF_TAG]),
+						  buf, sizeof(buf)));
+	if (tb[TCA_BPF_ACT]) {
+		ret = do_bpf_act_dump(tb[TCA_BPF_ACT]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int do_filter_dump(struct tcmsg *info, struct nlattr **tb, const char *kind)
+{
+	int ret = 0;
+
+	if (tb[TCA_OPTIONS] && strcmp(nla_data(tb[TCA_KIND]), "bpf") == 0) {
+		NET_START_OBJECT;
+		NET_DUMP_UINT("ifindex", info->tcm_ifindex);
+		NET_DUMP_STR("kind", kind);
+		ret = do_bpf_filter_dump(tb[TCA_OPTIONS]);
+		NET_END_OBJECT_FINAL;
+	}
+
+	return ret;
+}
diff --git a/tools/bpf/bpftool/netlink_dumper.h b/tools/bpf/bpftool/netlink_dumper.h
new file mode 100644
index 000000000000..552d8851ac06
--- /dev/null
+++ b/tools/bpf/bpftool/netlink_dumper.h
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright (C) 2018 Facebook
+
+#ifndef _NETLINK_DUMPER_H_
+#define _NETLINK_DUMPER_H_
+
+#define NET_START_OBJECT				\
+{							\
+	if (json_output)				\
+		jsonw_start_object(json_wtr);		\
+}
+
+#define NET_START_OBJECT_NESTED(name)			\
+{							\
+	if (json_output) {				\
+		jsonw_name(json_wtr, name);		\
+		jsonw_start_object(json_wtr);		\
+	} else {					\
+		fprintf(stderr, "%s {", name);		\
+	}						\
+}
+
+#define NET_START_OBJECT_NESTED2			\
+{							\
+	if (json_output)				\
+		jsonw_start_object(json_wtr);		\
+	else						\
+		fprintf(stderr, "{");			\
+}
+
+#define NET_END_OBJECT_NESTED				\
+{							\
+	if (json_output)				\
+		jsonw_end_object(json_wtr);		\
+	else						\
+		fprintf(stderr, "}");			\
+}
+
+#define NET_END_OBJECT					\
+{							\
+	if (json_output)				\
+		jsonw_end_object(json_wtr);		\
+}
+
+#define NET_END_OBJECT_FINAL				\
+{							\
+	if (json_output)				\
+		jsonw_end_object(json_wtr);		\
+	else						\
+		fprintf(stderr, "\n");			\
+}
+
+#define NET_START_ARRAY(name, newline)			\
+{							\
+	if (json_output) {				\
+		jsonw_name(json_wtr, name);		\
+		jsonw_start_array(json_wtr);		\
+	} else {					\
+		fprintf(stderr, "%s [%s", name, newline);\
+	}						\
+}
+
+#define NET_END_ARRAY(endstr)				\
+{							\
+	if (json_output)				\
+		jsonw_end_array(json_wtr);		\
+	else						\
+		fprintf(stderr, "]%s", endstr);		\
+}
+
+#define NET_DUMP_UINT(name, val)			\
+{							\
+	if (json_output)				\
+		jsonw_uint_field(json_wtr, name, val);	\
+	else						\
+		fprintf(stderr, "%s %d ", name, val);	\
+}
+
+#define NET_DUMP_LLUINT(name, val)			\
+{							\
+	if (json_output)				\
+		jsonw_lluint_field(json_wtr, name, val);\
+	else						\
+		fprintf(stderr, "%s %lld ", name, val);	\
+}
+
+#define NET_DUMP_STR(name, str)				\
+{							\
+	if (json_output)				\
+		jsonw_string_field(json_wtr, name, str);\
+	else						\
+		fprintf(stderr, "%s %s ", name, str);	\
+}
+
+#define NET_DUMP_STR_ONLY(str)				\
+{							\
+	if (json_output)				\
+		jsonw_string(json_wtr, str);		\
+	else						\
+		fprintf(stderr, "%s ", str);		\
+}
+
+#endif
-- 
2.17.1

^ permalink raw reply related

* lening
From: Funding BVBA Finances @ 2018-09-05 23:55 UTC (permalink / raw)





Goede dag


We zijn Funding BVBA Financiering van leningen per postadvertentie. Heeft u financi&euml;le hulp nodig? Een legitieme kredietwaardigheid voor rente nodig? Heeft u een zakelijke lening nodig? Heb je een lening nodig om een huis, auto te kopen, je rekeningen en schulden te betalen? Heeft u geld nodig om problemen op te lossen? (leningen op korte en lange termijn, persoonlijke leningen, leningen aan bedrijven, enz.) met een rente van 3%. Onze leningen vari&euml;ren van 5.000,00 tot 20.000.000,00 US Dollar of Euro of Pond met een maximale duur van 15 jaar.


INFORMATIE NODIG


Jullie namen:
Adres: ...........
Telefoon: ...........
Benodigde hoeveelheid: .......
Looptijd: ...............
Beroep: ...........
Maandelijks inkomensniveau: ......
Geslacht: ..............
Geboortedatum: .......
Staat: ...............
Land: .........
Doel: .........


"We laten u een betere manier zien om uw financi&euml;le vrijheid te vergroten"


Met vriendelijke groet,


Sigurd Pedersen



+448709453058

info@fundingbvbafinances.com

Website: http://www.fundingbvbafinance.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox