Netdev List
 help / color / mirror / Atom feed
* Re: WARNING in zd_mac_clear
From: Kalle Valo @ 2019-08-07 14:24 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Andrey Konovalov, syzbot, USB list, David S. Miller,
	Ulrich Kunitz, Daniel Drake, syzkaller-bugs, Alan Stern, LKML,
	linux-wireless, netdev
In-Reply-To: <1565187539.15973.6.camel@suse.com>

Oliver Neukum <oneukum@suse.com> writes:

> Am Mittwoch, den 07.08.2019, 16:07 +0200 schrieb Andrey Konovalov:
>> On Fri, Apr 12, 2019 at 1:46 PM syzbot
>> <syzbot+74c65761783d66a9c97c@syzkaller.appspotmail.com> wrote:
>> > 
>> > Hello,
>> > 
>> > syzbot found the following crash on:
>> > 
>> > HEAD commit:    9a33b369 usb-fuzzer: main usb gadget fuzzer driver
>> > git tree:       https://github.com/google/kasan/tree/usb-fuzzer
>> > console output: https://syzkaller.appspot.com/x/log.txt?x=101a06dd200000
>> > kernel config:  https://syzkaller.appspot.com/x/.config?x=23e37f59d94ddd15
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=74c65761783d66a9c97c
>> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1170c22d200000
>> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1496adbb200000
>> > 
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+74c65761783d66a9c97c@syzkaller.appspotmail.com
>> > 
>> > usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
>> > usb 1-1: config 0 descriptor??
>> > usb 1-1: reset low-speed USB device number 2 using dummy_hcd
>> > usb 1-1: read over firmware interface failed: -71
>> > usb 1-1: reset low-speed USB device number 2 using dummy_hcd
>> > WARNING: CPU: 1 PID: 21 at drivers/net/wireless/zydas/zd1211rw/zd_mac.c:238
>> > zd_mac_clear+0xb0/0xe0 drivers/net/wireless/zydas/zd1211rw/zd_mac.c:238
>> > Kernel panic - not syncing: panic_on_warn set ...
>> > CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Workqueue: usb_hub_wq hub_event
>> > Call Trace:
>> >   __dump_stack lib/dump_stack.c:77 [inline]
>> >   dump_stack+0xe8/0x16e lib/dump_stack.c:113
>> >   panic+0x29d/0x5f2 kernel/panic.c:214
>> >   __warn.cold+0x20/0x48 kernel/panic.c:571
>> >   report_bug+0x262/0x2a0 lib/bug.c:186
>> >   fixup_bug arch/x86/kernel/traps.c:179 [inline]
>> >   fixup_bug arch/x86/kernel/traps.c:174 [inline]
>> >   do_error_trap+0x130/0x1f0 arch/x86/kernel/traps.c:272
>> >   do_invalid_op+0x37/0x40 arch/x86/kernel/traps.c:291
>> >   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
>> > RIP: 0010:zd_mac_clear+0xb0/0xe0
>> > drivers/net/wireless/zydas/zd1211rw/zd_mac.c:238
>> > Code: e8 85 d0 60 f8 48 8d bb f8 2b 00 00 be ff ff ff ff e8 54 5a 46 f8 31
>> > ff 89 c3 89 c6 e8 d9 d1 60 f8 85 db 75 d4 e8 60 d0 60 f8 <0f> 0b 5b 5d e9
>> > 57 d0 60 f8 48 c7 c7 58 05 cb 93 e8 fb e0 97 f8 eb
>> > RSP: 0018:ffff8880a85c7310 EFLAGS: 00010293
>> > RAX: ffff8880a84de200 RBX: 0000000000000000 RCX: ffffffff8910f507
>> > RDX: 0000000000000000 RSI: ffffffff8910f510 RDI: 0000000000000005
>> > RBP: 0000000000000001 R08: ffff8880a84de200 R09: ffffed1012f83a0b
>> > R10: ffffed1012f83a0a R11: ffff888097c1d057 R12: 00000000ffffffb9
>> > R13: ffff888097c18b20 R14: ffff888099456630 R15: ffffffff8f979398
>> >   probe+0x259/0x590 drivers/net/wireless/zydas/zd1211rw/zd_usb.c:1421
>> >   usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
>> >   really_probe+0x2da/0xb10 drivers/base/dd.c:509
>> >   driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
>> >   __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
>> >   bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
>> >   __device_attach+0x223/0x3a0 drivers/base/dd.c:844
>> >   bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
>> >   device_add+0xad2/0x16e0 drivers/base/core.c:2106
>> >   usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
>> >   generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
>> >   usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
>> >   really_probe+0x2da/0xb10 drivers/base/dd.c:509
>> >   driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
>> >   __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
>> >   bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
>> >   __device_attach+0x223/0x3a0 drivers/base/dd.c:844
>> >   bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
>> >   device_add+0xad2/0x16e0 drivers/base/core.c:2106
>> >   usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
>> >   hub_port_connect drivers/usb/core/hub.c:5089 [inline]
>> >   hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
>> >   port_event drivers/usb/core/hub.c:5350 [inline]
>> >   hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
>> >   process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
>> >   worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
>> >   kthread+0x313/0x420 kernel/kthread.c:253
>> >   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
>> > Kernel Offset: disabled
>> > Rebooting in 86400 seconds..
>> 
>> This USB bug is the second most frequently triggered one with over 10k
>> kernel crashes.
>
> As far as I can tell this is a false positive. Didn't I submit this
> upstream?
>
> #syz test: https://github.com/google/kasan.git 9a33b369
>
> From ae999d5a437850b65497df7dcca3ffc10f75e697 Mon Sep 17 00:00:00 2001
> From: Oliver Neukum <oneukum@suse.com>
> Date: Tue, 30 Jul 2019 15:59:03 +0200
> Subject: [PATCH] zdnet: remove false assertion from zd_mac_clear()
>
> The function is called before the lock which is asserted was ever used.
> Just remove it.
>
> Reported-by: syzbot+74c65761783d66a9c97c@syzkaller.appspotmail.com
> Signed-off-by: Oliver Neukum <oneukum@suse.com>

Neither google nor patchwork[1] can find the patch. If you think the
patch should be applied, please resubmit it to linux-wireless so
patchwork sees it. Also the title prefix should be "zd1211rw:", not
"zdnet:".

[1] https://patchwork.kernel.org/project/linux-wireless/list/

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: WARNING in zd_mac_clear
From: syzbot @ 2019-08-07 14:58 UTC (permalink / raw)
  To: andreyknvl, davem, dsd, kune, kvalo, linux-kernel, linux-usb,
	linux-wireless, netdev, oneukum, stern, syzkaller-bugs
In-Reply-To: <1565187539.15973.6.camel@suse.com>

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+74c65761783d66a9c97c@syzkaller.appspotmail.com

Tested on:

commit:         9a33b369 usb-fuzzer: main usb gadget fuzzer driver
git tree:       https://github.com/google/kasan.git
kernel config:  https://syzkaller.appspot.com/x/.config?x=23e37f59d94ddd15
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
patch:          https://syzkaller.appspot.com/x/patch.diff?x=10d8c03c600000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
From: Marcelo Ricardo Leitner @ 2019-08-07 15:00 UTC (permalink / raw)
  To: Paul Blakey
  Cc: netdev, David S. Miller, Justin Pettit, Pravin B Shelar,
	Simon Horman, Vlad Buslov, Jiri Pirko, Roi Dayan, Yossi Kuperman,
	Rony Efraim, Oz Shlomo
In-Reply-To: <1565179722-22488-1-git-send-email-paulb@mellanox.com>

On Wed, Aug 07, 2019 at 03:08:42PM +0300, Paul Blakey wrote:
> Offloaded OvS datapath rules are translated one to one to tc rules,
> for example the following simplified OvS rule:
> 
> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
> 
> Will be translated to the following tc rule:
> 
> $ tc filter add dev dev1 ingress \
> 	    prio 1 chain 0 proto ip \
> 		flower tcp ct_state -trk \
> 		action ct pipe \
> 		action goto chain 2
> 
> Received packets will first travel though tc, and if they aren't stolen
> by it, like in the above rule, they will continue to OvS datapath.
> Since we already did some actions (action ct in this case) which might
> modify the packets, and updated action stats, we would like to continue
> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
> where we left off.
> 
> To support this, introduce a new skb extension for tc, which
> will be used for translating tc chain to ovs recirc_id to
> handle these miss cases. Last tc chain index will be set
> by tc goto chain action and read by OvS datapath.
> 
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  include/linux/skbuff.h    | 13 +++++++++++++
>  include/net/sch_generic.h |  5 ++++-
>  net/core/skbuff.c         |  6 ++++++
>  net/openvswitch/flow.c    |  9 +++++++++
>  net/sched/Kconfig         | 13 +++++++++++++
>  net/sched/act_api.c       |  1 +
>  net/sched/cls_api.c       | 12 ++++++++++++
>  7 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 3aef8d8..fb2a792 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -279,6 +279,16 @@ struct nf_bridge_info {
>  };
>  #endif
>  
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +/* Chain in tc_skb_ext will be used to share the tc chain with
> + * ovs recirc_id. It will be set to the current chain by tc
> + * and read by ovs to recirc_id.
> + */
> +struct tc_skb_ext {
> +	__u32 chain;
> +};
> +#endif
> +
>  struct sk_buff_head {
>  	/* These two members must be first. */
>  	struct sk_buff	*next;
> @@ -4050,6 +4060,9 @@ enum skb_ext_id {
>  #ifdef CONFIG_XFRM
>  	SKB_EXT_SEC_PATH,
>  #endif
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +	TC_SKB_EXT,
> +#endif
>  	SKB_EXT_NUM, /* must be last */
>  };
>  
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 6b6b012..871feea 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -275,7 +275,10 @@ struct tcf_result {
>  			unsigned long	class;
>  			u32		classid;
>  		};
> -		const struct tcf_proto *goto_tp;
> +		struct {
> +			const struct tcf_proto *goto_tp;
> +			u32 goto_index;
> +		};
>  
>  		/* used in the skb_tc_reinsert function */
>  		struct {
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ea8e8d3..2b40b5a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>  #ifdef CONFIG_XFRM
>  	[SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path),
>  #endif
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +	[TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext),
> +#endif
>  };
>  
>  static __always_inline unsigned int skb_ext_total_length(void)
> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int skb_ext_total_length(void)
>  #ifdef CONFIG_XFRM
>  		skb_ext_type_len[SKB_EXT_SEC_PATH] +
>  #endif
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +		skb_ext_type_len[TC_SKB_EXT] +
> +#endif
>  		0;
>  }
>  
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index bc89e16..0287ead 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb)
>  int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  			 struct sk_buff *skb, struct sw_flow_key *key)
>  {
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +	struct tc_skb_ext *tc_ext;
> +#endif
>  	int res, err;
>  
>  	/* Extract metadata from packet. */
> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
>  	if (res < 0)
>  		return res;
>  	key->mac_proto = res;
> +
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
> +	key->recirc_id = tc_ext ? tc_ext->chain : 0;
> +#else
>  	key->recirc_id = 0;
> +#endif
>  
>  	err = key_extract(skb, key);
>  	if (!err)
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index afd2ba1..b3faafe 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -963,6 +963,19 @@ config NET_IFE_SKBTCINDEX
>          tristate "Support to encoding decoding skb tcindex on IFE action"
>          depends on NET_ACT_IFE
>  
> +config NET_TC_SKB_EXT
> +	bool "TC recirculation support"
> +	depends on NET_CLS_ACT
> +	default y if NET_CLS_ACT
> +	select SKB_EXTENSIONS
> +
> +	help
> +	  Say Y here to allow tc chain misses to continue in OvS datapath in
> +	  the correct recirc_id, and hardware chain misses to continue in
> +	  the correct chain in tc software datapath.
> +
> +	  Say N here if you won't be using tc<->ovs offload or tc chains offload.
> +
>  endif # NET_SCHED
>  
>  config NET_SCH_FIFO
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 3397122..c393604 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -27,6 +27,7 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a,
>  {
>  	const struct tcf_chain *chain = rcu_dereference_bh(a->goto_chain);
>  
> +	res->goto_index = chain->index;
>  	res->goto_tp = rcu_dereference_bh(chain->filter_chain);
>  }
>  
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 3565d9a..b0b829a 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -1660,6 +1660,18 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>  			goto reset;
>  		} else if (unlikely(TC_ACT_EXT_CMP(err, TC_ACT_GOTO_CHAIN))) {
>  			first_tp = res->goto_tp;
> +
> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
> +			{
> +				struct tc_skb_ext *ext;
> +
> +				ext = skb_ext_add(skb, TC_SKB_EXT);
> +				if (WARN_ON_ONCE(!ext))
> +					return TC_ACT_SHOT;
> +
> +				ext->chain = res->goto_index;
> +			}
> +#endif
>  			goto reset;
>  		}
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: linux-next: Tree for Aug 7 (net/bridge/netfilter/nf_conntrack_bridge.c)
From: Randy Dunlap @ 2019-08-07 15:29 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, netdev@vger.kernel.org, bridge
In-Reply-To: <20190807183606.372ca1a4@canb.auug.org.au>

On 8/7/19 1:36 AM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20190806:
> 

on i386:
when CONFIG_NF_TABLES is not set/enabled:

  CC      net/bridge/netfilter/nf_conntrack_bridge.o
In file included from ../net/bridge/netfilter/nf_conntrack_bridge.c:21:0:
../include/net/netfilter/nf_tables.h: In function ‘nft_gencursor_next’:
../include/net/netfilter/nf_tables.h:1224:14: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return net->nft.gencursor + 1 == 1 ? 1 : 0;
              ^~~
              nf
In file included from ../include/linux/export.h:45:0,
                 from ../include/linux/linkage.h:7,
                 from ../include/linux/kernel.h:8,
                 from ../include/linux/skbuff.h:13,
                 from ../include/linux/ip.h:16,
                 from ../net/bridge/netfilter/nf_conntrack_bridge.c:3:
../include/net/netfilter/nf_tables.h: In function ‘nft_genmask_cur’:
../include/net/netfilter/nf_tables.h:1235:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:261:17: note: in definition of macro ‘__READ_ONCE’
  union { typeof(x) __val; char __c[1]; } __u;   \
                 ^
../include/net/netfilter/nf_tables.h:1235:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1235:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:263:22: note: in definition of macro ‘__READ_ONCE’
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                      ^
../include/net/netfilter/nf_tables.h:1235:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1235:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:263:42: note: in definition of macro ‘__READ_ONCE’
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                                          ^
../include/net/netfilter/nf_tables.h:1235:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1235:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:265:30: note: in definition of macro ‘__READ_ONCE’
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                              ^
../include/net/netfilter/nf_tables.h:1235:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1235:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:265:50: note: in definition of macro ‘__READ_ONCE’
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                                                  ^
../include/net/netfilter/nf_tables.h:1235:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
make[4]: *** [../scripts/Makefile.build:273: net/bridge/netfilter/nf_conntrack_bridge.o] Error 1



-- 
~Randy

^ permalink raw reply

* Clause 73 and USXGMII
From: Jose Abreu @ 2019-08-07 15:46 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hello,

I've some sample code for Clause 73 support using Synopsys based XPCS 
but I would like to clarify some things that I noticed.

I'm using USXGMII as interface and a single SERDES that operates at 10G 
rate but MAC side is working at 2.5G. Maximum available bandwidth is 
therefore 2.5Gbps.

So, I configure USXGMII for 2.5G mode and it works but if I try to limit 
the autoneg abilities to 2.5G max then it never finishes:
# ethtool enp4s0
Settings for enp4s0:
	Supported ports: [ ]
	Supported link modes:   1000baseKX/Full 
	                        2500baseX/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseKX/Full 
	                        2500baseX/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: Unknown!
	Duplex: Unknown! (255)
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: ug
	Wake-on: d
	Current message level: 0x0000003f (63)
			       drv probe link timer ifdown ifup
	Link detected: no

When I do not limit autoneg and I say that maximum limit is 10G then I 
get Link Up and autoneg finishes with this outcome:
# ethtool enp4s0
Settings for enp4s0:
	Supported ports: [ ]
	Supported link modes:   1000baseKX/Full 
	                        2500baseX/Full 
	                        10000baseKX4/Full 
	                        10000baseKR/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseKX/Full 
	                        2500baseX/Full 
	                        10000baseKX4/Full 
	                        10000baseKR/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  1000baseKX/Full 
	                                     2500baseX/Full 
	                                     10000baseKX4/Full 
	                                     10000baseKR/Full 
	Link partner advertised pause frame use: Symmetric Receive-only
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 2500Mb/s
	Duplex: Full
	Port: MII <- Never mind this, it's a SW issue
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: ug
	Wake-on: d
	Current message level: 0x0000003f (63)
			       drv probe link timer ifdown ifup
	Link detected: yes

I was expecting that, as MAC side is limited to 2.5G, I should set in 
phylink the correct capabilities and then outcome of autoneg would only 
have up to 2.5G modes. Am I wrong ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* [PATCH bpf-next 0/3] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-07 15:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau

Currently there is no way to propagate sk storage from the listener
socket to a newly accepted one. Consider the following use case:

        fd = socket();
        setsockopt(fd, SOL_IP, IP_TOS,...);
        /* ^^^ setsockopt BPF program triggers here and saves something
         * into sk storage of the listener.
         */
        listen(fd, ...);
        while (client = accept(fd)) {
                /* At this point all association between listener
                 * socket and newly accepted one is gone. New
                 * socket will not have any sk storage attached.
                 */
        }

Let's add new BPF_SK_STORAGE_GET_F_CLONE flag that can be passed to
bpf_sk_storage_get. This new flag indicates that that particular
bpf_sk_storage_elem should be cloned when the socket is cloned.

Cc: Martin KaFai Lau <kafai@fb.com>

Stanislav Fomichev (3):
  bpf: support cloning sk storage on accept()
  bpf: sync bpf.h to tools/
  selftests/bpf: add sockopt clone/inheritance test

 include/net/bpf_sk_storage.h                  |  10 +
 include/uapi/linux/bpf.h                      |   1 +
 net/core/bpf_sk_storage.c                     | 102 ++++++-
 net/core/sock.c                               |   9 +-
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/progs/sockopt_inherit.c     | 102 +++++++
 .../selftests/bpf/test_sockopt_inherit.c      | 252 ++++++++++++++++++
 9 files changed, 473 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
 create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c

-- 
2.22.0.770.g0f2c4a37fd-goog

^ permalink raw reply

* [PATCH bpf-next 1/3] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-07 15:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <20190807154720.260577-1-sdf@google.com>

Add new helper bpf_sk_storage_clone which optionally clones sk storage
and call it from bpf_sk_storage_clone. Reuse the gap in
bpf_sk_storage_elem to store clone/non-clone flag.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/net/bpf_sk_storage.h |  10 ++++
 include/uapi/linux/bpf.h     |   1 +
 net/core/bpf_sk_storage.c    | 102 +++++++++++++++++++++++++++++++++--
 net/core/sock.c              |   9 ++--
 4 files changed, 115 insertions(+), 7 deletions(-)

diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index b9dcb02e756b..8e4f831d2e52 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
 extern const struct bpf_func_proto bpf_sk_storage_get_proto;
 extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
 
+#ifdef CONFIG_BPF_SYSCALL
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
+#else
+static inline int bpf_sk_storage_clone(const struct sock *sk,
+				       struct sock *newsk)
+{
+	return 0;
+}
+#endif
+
 #endif /* _BPF_SK_STORAGE_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..00459ca4c8cf 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2931,6 +2931,7 @@ enum bpf_func_id {
 
 /* BPF_FUNC_sk_storage_get flags */
 #define BPF_SK_STORAGE_GET_F_CREATE	(1ULL << 0)
+#define BPF_SK_STORAGE_GET_F_CLONE	(1ULL << 1)
 
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 94c7f77ecb6b..b6dea67965bc 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -12,6 +12,9 @@
 
 static atomic_t cache_idx;
 
+#define BPF_SK_STORAGE_GET_F_MASK	(BPF_SK_STORAGE_GET_F_CREATE | \
+					 BPF_SK_STORAGE_GET_F_CLONE)
+
 struct bucket {
 	struct hlist_head list;
 	raw_spinlock_t lock;
@@ -66,7 +69,8 @@ struct bpf_sk_storage_elem {
 	struct hlist_node snode;	/* Linked to bpf_sk_storage */
 	struct bpf_sk_storage __rcu *sk_storage;
 	struct rcu_head rcu;
-	/* 8 bytes hole */
+	u8 clone:1;
+	/* 7 bytes hole */
 	/* The data is stored in aother cacheline to minimize
 	 * the number of cachelines access during a cache hit.
 	 */
@@ -509,7 +513,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 	return 0;
 }
 
-/* Called by __sk_destruct() */
+/* Called by __sk_destruct() & bpf_sk_storage_clone() */
 void bpf_sk_storage_free(struct sock *sk)
 {
 	struct bpf_sk_storage_elem *selem;
@@ -739,19 +743,106 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
 	return err;
 }
 
+static struct bpf_sk_storage_elem *
+bpf_sk_storage_clone_elem(struct sock *newsk,
+			  struct bpf_sk_storage_map *smap,
+			  struct bpf_sk_storage_elem *selem)
+{
+	struct bpf_sk_storage_elem *copy_selem;
+
+	copy_selem = selem_alloc(smap, newsk, NULL, true);
+	if (!copy_selem)
+		return ERR_PTR(-ENOMEM);
+
+	if (map_value_has_spin_lock(&smap->map))
+		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
+				      SDATA(selem)->data, true);
+	else
+		copy_map_value(&smap->map, SDATA(copy_selem)->data,
+			       SDATA(selem)->data);
+
+	return copy_selem;
+}
+
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
+{
+	struct bpf_sk_storage *new_sk_storage = NULL;
+	struct bpf_sk_storage *sk_storage;
+	struct bpf_sk_storage_elem *selem;
+	int ret;
+
+	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
+
+	rcu_read_lock();
+	sk_storage = rcu_dereference(sk->sk_bpf_storage);
+
+	if (!sk_storage || hlist_empty(&sk_storage->list))
+		goto out;
+
+	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
+		struct bpf_sk_storage_map *smap;
+		struct bpf_sk_storage_elem *copy_selem;
+
+		if (!selem->clone)
+			continue;
+
+		smap = rcu_dereference(SDATA(selem)->smap);
+		if (!smap)
+			continue;
+
+		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
+		if (IS_ERR(copy_selem)) {
+			ret = PTR_ERR(copy_selem);
+			goto err;
+		}
+
+		if (!new_sk_storage) {
+			ret = sk_storage_alloc(newsk, smap, copy_selem);
+			if (ret) {
+				kfree(copy_selem);
+				atomic_sub(smap->elem_size,
+					   &newsk->sk_omem_alloc);
+				goto err;
+			}
+
+			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+			continue;
+		}
+
+		raw_spin_lock_bh(&new_sk_storage->lock);
+		selem_link_map(smap, copy_selem);
+		__selem_link_sk(new_sk_storage, copy_selem);
+		raw_spin_unlock_bh(&new_sk_storage->lock);
+	}
+
+out:
+	rcu_read_unlock();
+	return 0;
+
+err:
+	rcu_read_unlock();
+
+	bpf_sk_storage_free(newsk);
+	return ret;
+}
+
 BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 	   void *, value, u64, flags)
 {
 	struct bpf_sk_storage_data *sdata;
 
-	if (flags > BPF_SK_STORAGE_GET_F_CREATE)
+	if (flags & ~BPF_SK_STORAGE_GET_F_MASK)
+		return (unsigned long)NULL;
+
+	if ((flags & BPF_SK_STORAGE_GET_F_CLONE) &&
+	    !(flags & BPF_SK_STORAGE_GET_F_CREATE))
 		return (unsigned long)NULL;
 
 	sdata = sk_storage_lookup(sk, map, true);
 	if (sdata)
 		return (unsigned long)sdata->data;
 
-	if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
+	if ((flags & BPF_SK_STORAGE_GET_F_CREATE) &&
 	    /* Cannot add new elem to a going away sk.
 	     * Otherwise, the new elem may become a leak
 	     * (and also other memory issues during map
@@ -762,6 +853,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 		/* sk must be a fullsock (guaranteed by verifier),
 		 * so sock_gen_put() is unnecessary.
 		 */
+		if (!IS_ERR(sdata))
+			SELEM(sdata)->clone =
+				!!(flags & BPF_SK_STORAGE_GET_F_CLONE);
 		sock_put(sk);
 		return IS_ERR(sdata) ?
 			(unsigned long)NULL : (unsigned long)sdata->data;
diff --git a/net/core/sock.c b/net/core/sock.c
index d57b0cc995a0..f5e801a9cea4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 			goto out;
 		}
 		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
-#ifdef CONFIG_BPF_SYSCALL
-		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
-#endif
+
+		if (bpf_sk_storage_clone(sk, newsk)) {
+			sk_free_unlock_clone(newsk);
+			newsk = NULL;
+			goto out;
+		}
 
 		newsk->sk_err	   = 0;
 		newsk->sk_err_soft = 0;
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH bpf-next 2/3] bpf: sync bpf.h to tools/
From: Stanislav Fomichev @ 2019-08-07 15:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <20190807154720.260577-1-sdf@google.com>

Sync new sk storage clone flag.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..00459ca4c8cf 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2931,6 +2931,7 @@ enum bpf_func_id {
 
 /* BPF_FUNC_sk_storage_get flags */
 #define BPF_SK_STORAGE_GET_F_CREATE	(1ULL << 0)
+#define BPF_SK_STORAGE_GET_F_CLONE	(1ULL << 1)
 
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH bpf-next 3/3] selftests/bpf: add sockopt clone/inheritance test
From: Stanislav Fomichev @ 2019-08-07 15:47 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau
In-Reply-To: <20190807154720.260577-1-sdf@google.com>

Add a test that calls setsockopt on the listener socket which triggers
BPF program. This BPF program writes to the sk storage and sets
clone flag. Make sure that sk storage is cloned for a newly
accepted connection.

We have two cloned maps in the tests to make sure we hit both cases
in bpf_sk_storage_clone: first element (sk_storage_alloc) and
non-first element(s) (selem_link_map).

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/progs/sockopt_inherit.c     | 102 +++++++
 .../selftests/bpf/test_sockopt_inherit.c      | 252 ++++++++++++++++++
 4 files changed, 357 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
 create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 90f70d2c7c22..60c9338cd9b4 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -42,4 +42,5 @@ xdping
 test_sockopt
 test_sockopt_sk
 test_sockopt_multi
+test_sockopt_inherit
 test_tcp_rtt
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3bd0f4a0336a..c875763a851a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
 	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
-	test_sockopt_multi test_tcp_rtt
+	test_sockopt_multi test_sockopt_inherit test_tcp_rtt
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -110,6 +110,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
 $(OUTPUT)/test_sockopt: cgroup_helpers.c
 $(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
 $(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
+$(OUTPUT)/test_sockopt_inherit: cgroup_helpers.c
 $(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
 
 .PHONY: force
diff --git a/tools/testing/selftests/bpf/progs/sockopt_inherit.c b/tools/testing/selftests/bpf/progs/sockopt_inherit.c
new file mode 100644
index 000000000000..357fc9db5874
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/sockopt_inherit.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
+
+#define SOL_CUSTOM			0xdeadbeef
+#define CUSTOM_INHERIT1			0
+#define CUSTOM_INHERIT2			1
+#define CUSTOM_LISTENER			2
+
+struct sockopt_inherit {
+	__u8 val;
+};
+
+struct bpf_map_def SEC("maps") cloned1_map = {
+	.type = BPF_MAP_TYPE_SK_STORAGE,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct sockopt_inherit),
+	.map_flags = BPF_F_NO_PREALLOC,
+};
+BPF_ANNOTATE_KV_PAIR(cloned1_map, int, struct sockopt_inherit);
+
+struct bpf_map_def SEC("maps") cloned2_map = {
+	.type = BPF_MAP_TYPE_SK_STORAGE,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct sockopt_inherit),
+	.map_flags = BPF_F_NO_PREALLOC,
+};
+BPF_ANNOTATE_KV_PAIR(cloned2_map, int, struct sockopt_inherit);
+
+struct bpf_map_def SEC("maps") listener_map = {
+	.type = BPF_MAP_TYPE_SK_STORAGE,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct sockopt_inherit),
+	.map_flags = BPF_F_NO_PREALLOC,
+};
+BPF_ANNOTATE_KV_PAIR(listener_map, int, struct sockopt_inherit);
+
+static __inline struct sockopt_inherit *get_storage(struct bpf_sockopt *ctx)
+{
+	if (ctx->optname == CUSTOM_INHERIT1)
+		return bpf_sk_storage_get(&cloned1_map, ctx->sk, 0,
+					  BPF_SK_STORAGE_GET_F_CREATE |
+					  BPF_SK_STORAGE_GET_F_CLONE);
+	else if (ctx->optname == CUSTOM_INHERIT2)
+		return bpf_sk_storage_get(&cloned2_map, ctx->sk, 0,
+					  BPF_SK_STORAGE_GET_F_CREATE |
+					  BPF_SK_STORAGE_GET_F_CLONE);
+	else
+		return bpf_sk_storage_get(&listener_map, ctx->sk, 0,
+					  BPF_SK_STORAGE_GET_F_CREATE);
+}
+
+SEC("cgroup/getsockopt")
+int _getsockopt(struct bpf_sockopt *ctx)
+{
+	__u8 *optval_end = ctx->optval_end;
+	struct sockopt_inherit *storage;
+	__u8 *optval = ctx->optval;
+
+	if (ctx->level != SOL_CUSTOM)
+		return 1; /* only interested in SOL_CUSTOM */
+
+	if (optval + 1 > optval_end)
+		return 0; /* EPERM, bounds check */
+
+	storage = get_storage(ctx);
+	if (!storage)
+		return 0; /* EPERM, couldn't get sk storage */
+
+	ctx->retval = 0; /* Reset system call return value to zero */
+
+	optval[0] = storage->val;
+	ctx->optlen = 1;
+
+	return 1;
+}
+
+SEC("cgroup/setsockopt")
+int _setsockopt(struct bpf_sockopt *ctx)
+{
+	__u8 *optval_end = ctx->optval_end;
+	struct sockopt_inherit *storage;
+	__u8 *optval = ctx->optval;
+
+	if (ctx->level != SOL_CUSTOM)
+		return 1; /* only interested in SOL_CUSTOM */
+
+	if (optval + 1 > optval_end)
+		return 0; /* EPERM, bounds check */
+
+	storage = get_storage(ctx);
+	if (!storage)
+		return 0; /* EPERM, couldn't get sk storage */
+
+	storage->val = optval[0];
+	ctx->optlen = -1;
+
+	return 1;
+}
diff --git a/tools/testing/selftests/bpf/test_sockopt_inherit.c b/tools/testing/selftests/bpf/test_sockopt_inherit.c
new file mode 100644
index 000000000000..e47b9c28d743
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sockopt_inherit.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <error.h>
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <pthread.h>
+
+#include <linux/filter.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#define CG_PATH				"/sockopt_inherit"
+#define SOL_CUSTOM			0xdeadbeef
+#define CUSTOM_INHERIT1			0
+#define CUSTOM_INHERIT2			1
+#define CUSTOM_LISTENER			2
+
+static int connect_to_server(int server_fd)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd < 0) {
+		log_err("Failed to create client socket");
+		return -1;
+	}
+
+	if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
+		log_err("Failed to get server addr");
+		goto out;
+	}
+
+	if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
+		log_err("Fail to connect to server");
+		goto out;
+	}
+
+	return fd;
+
+out:
+	close(fd);
+	return -1;
+}
+
+static int verify_sockopt(int fd, int optname, const char *msg, char expected)
+{
+	socklen_t optlen = 1;
+	char buf = 0;
+	int err;
+
+	err = getsockopt(fd, SOL_CUSTOM, optname, &buf, &optlen);
+	if (err) {
+		log_err("%s: failed to call getsockopt", msg);
+		return 1;
+	}
+
+	log_err("%s %d: got=0x%x ? expected=0x%x", msg, optname, buf, expected);
+
+	if (buf != expected) {
+		log_err("%s: unexpected getsockopt value %d != %d", msg,
+			buf, expected);
+		return 1;
+	}
+
+	return 0;
+}
+
+static void *server_thread(void *arg)
+{
+	struct sockaddr_storage addr;
+	socklen_t len = sizeof(addr);
+	int fd = *(int *)arg;
+	int client_fd;
+	int err = 0;
+
+	if (listen(fd, 1) < 0)
+		error(1, errno, "Failed to listed on socket");
+
+	err += verify_sockopt(fd, CUSTOM_INHERIT1, "listen", 1);
+	err += verify_sockopt(fd, CUSTOM_INHERIT2, "listen", 1);
+	err += verify_sockopt(fd, CUSTOM_LISTENER, "listen", 1);
+
+	client_fd = accept(fd, (struct sockaddr *)&addr, &len);
+	if (client_fd < 0)
+		error(1, errno, "Failed to accept client");
+
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "accept", 1);
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "accept", 1);
+	err += verify_sockopt(client_fd, CUSTOM_LISTENER, "accept", 0);
+
+	close(client_fd);
+
+	return (void *)(long)err;
+}
+
+static int start_server(void)
+{
+	struct sockaddr_in addr = {
+		.sin_family = AF_INET,
+		.sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+	};
+	char buf;
+	int err;
+	int fd;
+	int i;
+
+	fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (fd < 0) {
+		log_err("Failed to create server socket");
+		return -1;
+	}
+
+	for (i = CUSTOM_INHERIT1; i <= CUSTOM_LISTENER; i++) {
+		buf = 0x01;
+		err = setsockopt(fd, SOL_CUSTOM, i, &buf, 1);
+		if (err) {
+			log_err("Failed to call setsockopt(%d)", i);
+			close(fd);
+			return -1;
+		}
+	}
+
+	if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
+		log_err("Failed to bind socket");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
+{
+	enum bpf_attach_type attach_type;
+	enum bpf_prog_type prog_type;
+	struct bpf_program *prog;
+	int err;
+
+	err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
+	if (err) {
+		log_err("Failed to deduct types for %s BPF program", title);
+		return -1;
+	}
+
+	prog = bpf_object__find_program_by_title(obj, title);
+	if (!prog) {
+		log_err("Failed to find %s BPF program", title);
+		return -1;
+	}
+
+	err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
+			      attach_type, 0);
+	if (err) {
+		log_err("Failed to attach %s BPF program", title);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int run_test(int cgroup_fd)
+{
+	struct bpf_prog_load_attr attr = {
+		.file = "./sockopt_inherit.o",
+	};
+	int server_fd = -1, client_fd;
+	struct bpf_object *obj;
+	void *server_err;
+	pthread_t tid;
+	int ignored;
+	int err;
+
+	err = bpf_prog_load_xattr(&attr, &obj, &ignored);
+	if (err) {
+		log_err("Failed to load BPF object");
+		return -1;
+	}
+
+	err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt");
+	if (err)
+		goto close_bpf_object;
+
+	err = prog_attach(obj, cgroup_fd, "cgroup/setsockopt");
+	if (err)
+		goto close_bpf_object;
+
+	server_fd = start_server();
+	if (server_fd < 0) {
+		err = -1;
+		goto close_bpf_object;
+	}
+
+	pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
+
+	client_fd = connect_to_server(server_fd);
+	if (client_fd < 0) {
+		err = -1;
+		goto close_bpf_object;
+	}
+
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "connect", 0);
+	err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "connect", 0);
+	err += verify_sockopt(client_fd, CUSTOM_LISTENER, "connect", 0);
+
+	pthread_join(tid, &server_err);
+
+	err += (int)(long)server_err;
+
+	close(client_fd);
+
+close_bpf_object:
+	bpf_object__close(obj);
+	close(server_fd);
+	return err;
+}
+
+int main(int args, char **argv)
+{
+	int cgroup_fd;
+	int err = EXIT_SUCCESS;
+
+	if (setup_cgroup_environment())
+		return err;
+
+	cgroup_fd = create_and_get_cgroup(CG_PATH);
+	if (cgroup_fd < 0)
+		goto cleanup_cgroup_env;
+
+	if (join_cgroup(CG_PATH))
+		goto cleanup_cgroup;
+
+	if (run_test(cgroup_fd))
+		err = EXIT_FAILURE;
+
+	printf("test_sockopt_inherit: %s\n",
+	       err == EXIT_SUCCESS ? "PASSED" : "FAILED");
+
+cleanup_cgroup:
+	close(cgroup_fd);
+cleanup_cgroup_env:
+	cleanup_cgroup_environment();
+	return err;
+}
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* Re: linux-next: Tree for Aug 7 (net/bridge/netfilter/nf_conntrack_bridge.c)
From: Jeremy Sowden @ 2019-08-07 15:57 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, netdev@vger.kernel.org, bridge
In-Reply-To: <f54391d9-6259-d08b-8b5f-c844093071d8@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

On 2019-08-07, at 08:29:44 -0700, Randy Dunlap wrote:
> On 8/7/19 1:36 AM, Stephen Rothwell wrote:
> > Hi all,
> >
> > Changes since 20190806:
>
> on i386:
> when CONFIG_NF_TABLES is not set/enabled:
>
>   CC      net/bridge/netfilter/nf_conntrack_bridge.o
> In file included from
> ../net/bridge/netfilter/nf_conntrack_bridge.c:21:0:
> ../include/net/netfilter/nf_tables.h: In function
> ‘nft_gencursor_next’:
> ../include/net/netfilter/nf_tables.h:1224:14: error: ‘const struct
> net’ has no member named ‘nft’; did you mean ‘nf’?
>   return net->nft.gencursor + 1 == 1 ? 1 : 0;
>               ^~~

I've just posted a series of fixes for netfilter header compilation
failures, and I think it includes the fix for that:

  https://lore.kernel.org/netdev/20190807141705.4864-5-jeremy@azazel.net/T/#u

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH v2 1/1] ixgbe: sync the first fragment unconditionally
From: Alexander Duyck @ 2019-08-07 16:06 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Firo Yang, Netdev, LKML, intel-wired-lan, Jacob Wen,
	davem@davemloft.net, Alexander Duyck
In-Reply-To: <20190807160853.00001d71@gmail.com>

On Wed, Aug 7, 2019 at 7:09 AM Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Wed, 7 Aug 2019 08:38:43 +0000
> Firo Yang <firo.yang@suse.com> wrote:
>
> > The 08/07/2019 15:56, Jacob Wen wrote:
> > > I think the description is not correct. Consider using something like below.
> > Thank you for comments.
> >
> > >
> > > In Xen environment, due to memory fragmentation ixgbe may allocate a 'DMA'
> > > buffer with pages that are not physically contiguous.
> > Actually, I didn't look into the reason why ixgbe got a DMA buffer which
> > was mapped to Xen-swiotlb area.
>
> I think that neither of these descriptions are telling us what was truly
> broken. They lack what Alexander wrote on v1 thread of this patch.
>
> ixgbe_dma_sync_frag is called only on case when the current descriptor has EOP
> bit set, skb was already allocated and you'll be adding a current buffer as a
> frag. DMA unmapping for the first frag was intentionally skipped and we will be
> unmapping it here, in ixgbe_dma_sync_frag. As Alex said, we're using the
> DMA_ATTR_SKIP_CPU_SYNC attribute which obliges us to perform a sync manually
> and it was missing.
>
> So IMHO the commit description should include descriptions from both xen and
> ixgbe worlds, the v2 lacks info about ixgbe case.
>
> BTW Alex, what was the initial reason for holding off with unmapping the first
> buffer? Asking because IIRC the i40e and ice behave a bit different here. We
> don't look there for EOP at all when building/constructing skb and not delaying
> the unmap of non-eop buffers.
>
> Thanks,
> Maciej

The reason why we have to hold off on unmapping the first buffer is
because in the case of Receive Side Coalescing (RSC), also known as Large
Receive Offload (LRO), the header of the packet is updated for each
additional frame that is added. As such you can end up with the device
writing data, header, data, header, data, header where each data write
would update a new descriptor, but the header will only ever update the
first descriptor in the chain. As such if we unmapped it earlier it would
result in an IOMMU fault because the device would be rewriting the header
after it had been unmapped.

The devices supported by the ixgbe driver are the only ones that have
RSC/LRO support. As such this behavior is present for ixgbe, but not for
other Intel NIC drivers including igb, igbvf, ixgbevf, i40e, etc.

- Alex

^ permalink raw reply

* Re: [PATCH v2 0/3] arm/arm64: Add support for function error injection
From: Will Deacon @ 2019-08-07 16:07 UTC (permalink / raw)
  To: Leo Yan
  Cc: Russell King, Oleg Nesterov, Catalin Marinas,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Naveen N. Rao,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-arch, netdev,
	bpf, clang-built-linux, Masami Hiramatsu
In-Reply-To: <20190806100015.11256-1-leo.yan@linaro.org>

On Tue, Aug 06, 2019 at 06:00:12PM +0800, Leo Yan wrote:
> This small patch set is to add support for function error injection;
> this can be used to eanble more advanced debugging feature, e.g.
> CONFIG_BPF_KPROBE_OVERRIDE.
> 
> The patch 01/03 is to consolidate the function definition which can be
> suared cross architectures, patches 02,03/03 are used for enabling
> function error injection on arm64 and arm architecture respectively.
> 
> I tested on arm64 platform Juno-r2 and one of my laptop with x86
> architecture with below steps; I don't test for Arm architecture so
> only pass compilation.

Thanks. I've queued the first two patches up here:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/error-injection

Will

^ permalink raw reply

* Re: [PATCH net] net: phy: rtl8211f: do a double read to get real time link status
From: Heiner Kallweit @ 2019-08-07 16:47 UTC (permalink / raw)
  To: Yonglong Liu, davem, andrew
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang,
	shiju.jose
In-Reply-To: <1565183772-44268-1-git-send-email-liuyonglong@huawei.com>

On 07.08.2019 15:16, Yonglong Liu wrote:
> [   27.232781] hns3 0000:bd:00.3 eth7: net open
> [   27.237303] 8021q: adding VLAN 0 to HW filter on device eth7
> [   27.242972] IPv6: ADDRCONF(NETDEV_CHANGE): eth7: link becomes ready
> [   27.244449] hns3 0000:bd:00.3: invalid speed (-1)
> [   27.253904] hns3 0000:bd:00.3 eth7: failed to adjust link.
> [   27.259379] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: PHY state change UP -> RUNNING
> [   27.924903] hns3 0000:bd:00.3 eth7: link up
> [   28.280479] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: PHY state change RUNNING -> NOLINK
> [   29.208452] hns3 0000:bd:00.3 eth7: link down
> [   32.376745] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: PHY state change NOLINK -> RUNNING
> [   33.208448] hns3 0000:bd:00.3 eth7: link up
> [   35.253821] hns3 0000:bd:00.3 eth7: net stop
> [   35.258270] hns3 0000:bd:00.3 eth7: link down
> 
> When using rtl8211f in polling mode, may get a invalid speed,
> because of reading a fake link up and autoneg complete status
> immediately after starting autoneg:
> 
>         ifconfig-1176  [007] ....    27.232763: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>   kworker/u257:1-670   [015] ....    27.232805: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x04 val:0x01e1
>   kworker/u257:1-670   [015] ....    27.232815: mdio_access: mii-0000:bd:00.3 write phy:0x07 reg:0x04 val:0x05e1
>   kworker/u257:1-670   [015] ....    27.232869: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>   kworker/u257:1-670   [015] ....    27.232904: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x09 val:0x0200
>   kworker/u257:1-670   [015] ....    27.232940: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>   kworker/u257:1-670   [015] ....    27.232949: mdio_access: mii-0000:bd:00.3 write phy:0x07 reg:0x00 val:0x1240
>   kworker/u257:1-670   [015] ....    27.233003: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>   kworker/u257:1-670   [015] ....    27.233039: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0a val:0x3002
>   kworker/u257:1-670   [015] ....    27.233074: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x09 val:0x0200
>   kworker/u257:1-670   [015] ....    27.233110: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x05 val:0x0000
>   kworker/u257:1-670   [000] ....    28.280475: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x7989
>   kworker/u257:1-670   [000] ....    29.304471: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x7989
> 
> According to the datasheet of rtl8211f, to get the real time
> link status, need to read MII_BMSR twice.
> 
> This patch add a read_status hook for rtl8211f, and do a fake
> phy_read before genphy_read_status(), so that can get real link
> status in genphy_read_status().
> 
> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
> ---
>  drivers/net/phy/realtek.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
Is this an accidental resubmit? Because we discussed this in
https://marc.info/?t=156413509900003&r=1&w=2 and a fix has
been applied already.

Heiner

^ permalink raw reply

* Re: [v3,1/4] tools: bpftool: add net attach command to attach XDP on interface
From: Y Song @ 2019-08-07 16:56 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190807022509.4214-2-danieltimlee@gmail.com>

On Tue, Aug 6, 2019 at 7:25 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> By this commit, using `bpftool net attach`, user can attach XDP prog on
> interface. New type of enum 'net_attach_type' has been made, as stated at
> cover-letter, the meaning of 'attach' is, prog will be attached on interface.
>
> With 'overwrite' option at argument, attached XDP program could be replaced.
> Added new helper 'net_parse_dev' to parse the network device at argument.
>
> BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
>
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [PATCH net v2] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Willem de Bruijn @ 2019-08-07 16:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, Network Development, davejwatson, borisp, aviadye,
	John Fastabend, Daniel Borkmann, Eric Dumazet, Alexei Starovoitov,
	oss-drivers
In-Reply-To: <20190807060612.19397-1-jakub.kicinski@netronome.com>

On Wed, Aug 7, 2019 at 2:06 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> sk_validate_xmit_skb() and drivers depend on the sk member of
> struct sk_buff to identify segments requiring encryption.
> Any operation which removes or does not preserve the original TLS
> socket such as skb_orphan() or skb_clone() will cause clear text
> leaks.
>
> Make the TCP socket underlying an offloaded TLS connection
> mark all skbs as decrypted, if TLS TX is in offload mode.
> Then in sk_validate_xmit_skb() catch skbs which have no socket
> (or a socket with no validation) and decrypted flag set.
>
> Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
> sk->sk_validate_xmit_skb are slightly interchangeable right now,
> they all imply TLS offload. The new checks are guarded by
> CONFIG_TLS_DEVICE because that's the option guarding the
> sk_buff->decrypted member.
>
> Second, smaller issue with orphaning is that it breaks
> the guarantee that packets will be delivered to device
> queues in-order. All TLS offload drivers depend on that
> scheduling property. This means skb_orphan_partial()'s
> trick of preserving partial socket references will cause
> issues in the drivers. We need a full orphan, and as a
> result netem delay/throttling will cause all TLS offload
> skbs to be dropped.
>
> Reusing the sk_buff->decrypted flag also protects from
> leaking clear text when incoming, decrypted skb is redirected
> (e.g. by TC).
>
> See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect
> through ULP") for justification why the internal flag is safe.
>
> v2:
>  - remove superfluous decrypted mark copy (Willem);
>  - remove the stale doc entry (Boris);
>  - rely entirely on EOR marking to prevent coalescing (Boris);
>  - use an internal sendpages flag instead of marking the socket
>    (Boris).
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  Documentation/networking/tls-offload.rst | 18 ------------------
>  include/linux/skbuff.h                   |  8 ++++++++
>  include/linux/socket.h                   |  3 +++
>  include/net/sock.h                       | 10 +++++++++-
>  net/core/sock.c                          | 20 +++++++++++++++-----
>  net/ipv4/tcp.c                           |  3 +++
>  net/ipv4/tcp_output.c                    |  3 +++
>  net/tls/tls_device.c                     |  9 +++++++--
>  8 files changed, 48 insertions(+), 26 deletions(-)

> diff --git a/net/core/sock.c b/net/core/sock.c
> index d57b0cc995a0..0f9619b0892f 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1992,6 +1992,20 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>
> +static bool can_skb_orphan_partial(const struct sk_buff *skb)
> +{
> +#ifdef CONFIG_TLS_DEVICE
> +       /* Drivers depend on in-order delivery for crypto offload,
> +        * partial orphan breaks out-of-order-OK logic.
> +        */
> +       if (skb->decrypted)
> +               return false;
> +#endif
> +       return (IS_ENABLED(CONFIG_INET) &&
> +               skb->destructor == tcp_wfree) ||

Please add parentheses around IS_ENABLED(CONFIG_INET) &&
skb->destructor == tcp_wfree

I was also surprised that this works when tcp_wfree is not defined if
!CONFIG_INET. But apparently it does (at -O2?) :)

> @@ -984,6 +984,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>                         if (!skb)
>                                 goto wait_for_memory;
>
> +#ifdef CONFIG_TLS_DEVICE
> +                       skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED);
> +#endif

Nothing is stopping userspace from passing this new flag. In send
(tcp_sendmsg_locked) it is ignored. But can it reach do_tcp_sendpages
through tcp_bpf_sendmsg?

>                         skb_entail(sk, skb);
>                         copy = size_goal;
>                 }
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 6e4afc48d7bb..979520e46e33 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1320,6 +1320,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
>         buff = sk_stream_alloc_skb(sk, nsize, gfp, true);
>         if (!buff)
>                 return -ENOMEM; /* We'll just try again later. */
> +       skb_copy_decrypted(buff, skb);

This code has to copy timestamps, tx_flags, zerocopy state and now
this in three locations. Eventually we'll want a single helper for all
of them..

^ permalink raw reply

* Re: [v3,2/4] tools: bpftool: add net detach command to detach XDP on interface
From: Y Song @ 2019-08-07 17:02 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190807022509.4214-3-danieltimlee@gmail.com>

On Tue, Aug 6, 2019 at 7:25 PM Daniel T. Lee <danieltimlee@gmail.com> wrote:
>
> By this commit, using `bpftool net detach`, the attached XDP prog can
> be detached. Detaching the BPF prog will be done through libbpf
> 'bpf_set_link_xdp_fd' with the progfd set to -1.
>
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>  tools/bpf/bpftool/net.c | 42 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> index c05a3fac5cac..7be96acb08e0 100644
> --- a/tools/bpf/bpftool/net.c
> +++ b/tools/bpf/bpftool/net.c
> @@ -343,6 +343,43 @@ static int do_attach(int argc, char **argv)
>         return 0;
>  }
>
> +static int do_detach(int argc, char **argv)
> +{
> +       enum net_attach_type attach_type;
> +       int progfd, ifindex, err = 0;
> +
> +       /* parse detach args */
> +       if (!REQ_ARGS(3))
> +               return -EINVAL;
> +
> +       attach_type = parse_attach_type(*argv);
> +       if (attach_type == max_net_attach_type) {
> +               p_err("invalid net attach/detach type");
> +               return -EINVAL;
> +       }
> +
> +       NEXT_ARG();
> +       ifindex = net_parse_dev(&argc, &argv);
> +       if (ifindex < 1)
> +               return -EINVAL;
> +
> +       /* detach xdp prog */
> +       progfd = -1;
> +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> +               err = do_attach_detach_xdp(progfd, attach_type, ifindex, NULL);

I found an issue here. This is probably related to do_attach_detach_xdp.

-bash-4.4$ sudo ./bpftool net attach x pinned /sys/fs/bpf/xdp_example dev v1
-bash-4.4$ sudo ./bpftool net
xdp:
v1(4) driver id 1172

tc:
eth0(2) clsact/ingress fbflow_icmp id 29 act []
eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
eth0(2) clsact/egress fbflow_egress id 28
eth0(2) clsact/egress fbflow_sslwall_egress id 35

flow_dissector:

-bash-4.4$ sudo ./bpftool net detach x dev v2
-bash-4.4$ sudo ./bpftool net
xdp:
v1(4) driver id 1172

tc:
eth0(2) clsact/ingress fbflow_icmp id 29 act []
eth0(2) clsact/egress cls_fg_dscp_section id 27 act []
eth0(2) clsact/egress fbflow_egress id 28
eth0(2) clsact/egress fbflow_sslwall_egress id 35

flow_dissector:

-bash-4.4$

Basically detaching may fail due to wrong dev name or wrong type, etc.
But the tool did not return an error. Is this expected?
This may be related to this funciton "bpf_set_link_xdp_fd()".
So this patch itself should be okay.

> +
> +       if (err < 0) {
> +               p_err("interface %s detach failed",
> +                     attach_type_strings[attach_type]);
> +               return err;
> +       }
> +
> +       if (json_output)
> +               jsonw_null(json_wtr);
> +
> +       return 0;
> +}
> +
>  static int do_show(int argc, char **argv)
>  {
>         struct bpf_attach_info attach_info = {};
> @@ -419,6 +456,7 @@ static int do_help(int argc, char **argv)
>         fprintf(stderr,
>                 "Usage: %s %s { show | list } [dev <devname>]\n"
>                 "       %s %s attach ATTACH_TYPE PROG dev <devname> [ overwrite ]\n"
> +               "       %s %s detach ATTACH_TYPE dev <devname>\n"
>                 "       %s %s help\n"
>                 "\n"
>                 "       " HELP_SPEC_PROGRAM "\n"
> @@ -429,7 +467,8 @@ static int do_help(int argc, char **argv)
>                 "      to dump program attachments. For program types\n"
>                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
>                 "      consult iproute2.\n",
> -               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> +               bin_name, argv[-2]);
>
>         return 0;
>  }
> @@ -438,6 +477,7 @@ static const struct cmd cmds[] = {
>         { "show",       do_show },
>         { "list",       do_show },
>         { "attach",     do_attach },
> +       { "detach",     do_detach },
>         { "help",       do_help },
>         { 0 }
>  };
> --
> 2.20.1
>

^ permalink raw reply

* [PATCH v5 0/4] net: phy: Add support for DT configuration of PHY LEDs and use it for RTL8211E
From: Matthias Kaehlcke @ 2019-08-07 17:04 UTC (permalink / raw)
  To: David S . Miller, Rob Herring, Mark Rutland, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit
  Cc: netdev, devicetree, linux-kernel, Douglas Anderson,
	Matthias Kaehlcke

This series adds a generic binding to configure PHY LEDs through
the device tree, and phylib support for reading the information
from the DT. PHY drivers that support the generic binding should
implement the new hook .config_led.

Enable DT configuration of the RTL8211E LEDs by implementing the
.config_led hook of the driver. Certain registers of the RTL8211E
can only be accessed through a vendor specific extended page
mechanism. Extended pages need to be accessed for the LED
configuration. This series adds helpers to facilitate accessing
extended pages.

(subject updated, was "net: phy: realtek: Enable configuration
of RTL8211E LEDs")

Matthias Kaehlcke (4):
  dt-bindings: net: phy: Add subnode for LED configuration
  net: phy: Add support for generic LED configuration through the DT
  net: phy: realtek: Add helpers for accessing RTL8211x extension pages
  net: phy: realtek: Add LED configuration support for RTL8211E

 .../devicetree/bindings/net/ethernet-phy.yaml |  59 +++++++
 drivers/net/phy/phy_device.c                  |  72 +++++++++
 drivers/net/phy/realtek.c                     | 148 ++++++++++++++++--
 include/linux/phy.h                           |  22 +++
 4 files changed, 286 insertions(+), 15 deletions(-)

-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply

* [PATCH v5 1/4] dt-bindings: net: phy: Add subnode for LED configuration
From: Matthias Kaehlcke @ 2019-08-07 17:04 UTC (permalink / raw)
  To: David S . Miller, Rob Herring, Mark Rutland, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit
  Cc: netdev, devicetree, linux-kernel, Douglas Anderson,
	Matthias Kaehlcke
In-Reply-To: <20190807170449.205378-1-mka@chromium.org>

The LED behavior of some Ethernet PHYs is configurable. Add an
optional 'leds' subnode with a child node for each LED to be
configured. The binding aims to be compatible with the common
LED binding (see devicetree/bindings/leds/common.txt).

A LED can be configured to be:

- 'on' when a link is active, some PHYs allow configuration for
  certain link speeds
  speeds
- 'off'
- blink on RX/TX activity, some PHYs allow configuration for
  certain link speeds

For the configuration to be effective it needs to be supported by
the hardware and the corresponding PHY driver.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v5:
- renamed triggers from 'phy_link_<speed>_active' to 'phy-link-<speed>'
- added entries for 'phy-link-<speed>-activity'
- added 'phy-link' and 'phy-link-activity' for triggers with any link
  speed
- added entry for trigger 'none'

Changes in v4:
- patch added to the series
---
 .../devicetree/bindings/net/ethernet-phy.yaml | 59 +++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
index f70f18ff821f..98ba320f828b 100644
--- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
@@ -153,6 +153,50 @@ properties:
       Delay after the reset was deasserted in microseconds. If
       this property is missing the delay will be skipped.
 
+patternProperties:
+  "^leds$":
+    type: object
+    description:
+      Subnode with configuration of the PHY LEDs.
+
+    patternProperties:
+      "^led@[0-9]+$":
+        type: object
+        description:
+          Subnode with the configuration of a single PHY LED.
+
+    properties:
+      reg:
+        description:
+          The ID number of the LED, typically corresponds to a hardware ID.
+        $ref: "/schemas/types.yaml#/definitions/uint32"
+
+      linux,default-trigger:
+        description:
+          This parameter, if present, is a string specifying the trigger
+          assigned to the LED. Supported triggers are:
+            "none" - LED will be solid off
+            "phy-link" - LED will be solid on when a link is active
+            "phy-link-10m" - LED will be solid on when a 10Mb/s link is active
+            "phy-link-100m" - LED will be solid on when a 100Mb/s link is active
+            "phy-link-1g" - LED will be solid on when a 1Gb/s link is active
+            "phy-link-10g" - LED will be solid on when a 10Gb/s link is active
+            "phy-link-activity" - LED will be on when link is active and blink
+                                  off with activity.
+            "phy-link-10m-activity" - LED will be on when 10Mb/s link is active
+                                      and blink off with activity.
+            "phy-link-100m-activity" - LED will be on when 100Mb/s link is
+                                       active and blink off with activity.
+            "phy-link-1g-activity" - LED will be on when 1Gb/s link is active
+                                     and blink off with activity.
+            "phy-link-10g-activity" - LED will be on when 10Gb/s link is active
+                                      and blink off with activity.
+
+        $ref: "/schemas/types.yaml#/definitions/string"
+
+    required:
+      - reg
+
 required:
   - reg
 
@@ -173,5 +217,20 @@ examples:
             reset-gpios = <&gpio1 4 1>;
             reset-assert-us = <1000>;
             reset-deassert-us = <2000>;
+
+            leds {
+                #address-cells = <1>;
+                #size-cells = <0>;
+
+                led@0 {
+                    reg = <0>;
+                    linux,default-trigger = "phy-link-1g";
+                };
+
+                led@1 {
+                    reg = <1>;
+                    linux,default-trigger = "phy-link-100m-activity";
+                };
+            };
         };
     };
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH v5 4/4] net: phy: realtek: Add LED configuration support for RTL8211E
From: Matthias Kaehlcke @ 2019-08-07 17:04 UTC (permalink / raw)
  To: David S . Miller, Rob Herring, Mark Rutland, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit
  Cc: netdev, devicetree, linux-kernel, Douglas Anderson,
	Matthias Kaehlcke
In-Reply-To: <20190807170449.205378-1-mka@chromium.org>

Add a .config_led hook which is called by the PHY core when
configuration data for a PHY LED is available. Each LED can be
configured to be solid 'off, solid 'on' for certain (or all)
link speeds or to blink on RX/TX activity.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v5:
- use 'config_leds' driver callback instead of requesting the DT
  configuration
- added support for trigger 'none'
- always disable EEE LED mode when a LED is configured. We have no
  device data struct to keep track of its state, the number of LEDs
  is limited, so the overhead of disabling it multiple times (once for
  each LED that is configured) during initialization is negligible
- print warning when disabling EEE LED mode fails
- updated commit message (previous subject was 'net: phy: realtek:
  configure RTL8211E LEDs')

Changes in v4:
- use the generic PHY LED binding
- keep default/current configuration if none is specified
- added rtl8211e_disable_eee_led_mode()
  - was previously in separate patch, however since we always want to
    disable EEE LED mode when a LED configuration is specified it makes
    sense to just add the function here.
- don't call phy_restore_page() in rtl8211e_config_leds() if
  selection of the extended page failed.
- use phydev_warn() instead of phydev_err() if LED configuration
  fails since we don't bail out
- use hex number to specify page for consistency
- add hex number to comment about ext page 44 to facilitate searching

Changes in v3:
- sanity check led-modes values
- set LACR bits in a more readable way
- use phydev_err() instead of dev_err()
- log an error if LED configuration fails

Changes in v2:
- patch added to the series
---
 drivers/net/phy/realtek.c | 101 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index a5b3708dc4d8..5064ad732443 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -9,8 +9,9 @@
  * Copyright (c) 2004 Freescale Semiconductor, Inc.
  */
 #include <linux/bitops.h>
-#include <linux/phy.h>
+#include <linux/bits.h>
 #include <linux/module.h>
+#include <linux/phy.h>
 
 #define RTL821x_PHYSR				0x11
 #define RTL821x_PHYSR_DUPLEX			BIT(13)
@@ -26,6 +27,18 @@
 #define RTL821x_EXT_PAGE_SELECT			0x1e
 #define RTL821x_PAGE_SELECT			0x1f
 
+/* RTL8211E page 5 */
+#define RTL8211E_EEE_LED_MODE1			0x05
+#define RTL8211E_EEE_LED_MODE2			0x06
+
+/* RTL8211E extension page 44 (0x2c) */
+#define RTL8211E_LACR				0x1a
+#define RLT8211E_LACR_LEDACTCTRL_SHIFT		4
+#define RTL8211E_LCR				0x1c
+
+#define LACR_MASK(led)				BIT(4 + led)
+#define LCR_MASK(led)				GENMASK((led * 4) + 2, led * 4)
+
 #define RTL8211F_INSR				0x1d
 
 #define RTL8211F_TX_DELAY			BIT(8)
@@ -83,6 +96,91 @@ static int rtl8211x_modify_ext_paged(struct phy_device *phydev, int page,
 	return phy_restore_page(phydev, oldpage, ret);
 }
 
+static void rtl8211e_disable_eee_led_mode(struct phy_device *phydev)
+{
+	int oldpage;
+	int err = 0;
+
+	oldpage = phy_select_page(phydev, 5);
+	if (oldpage < 0)
+		goto out;
+
+	/* write magic values to disable EEE LED mode */
+	err = __phy_write(phydev, RTL8211E_EEE_LED_MODE1, 0x8b82);
+	if (err)
+		goto out;
+
+	err = __phy_write(phydev, RTL8211E_EEE_LED_MODE2, 0x052b);
+
+out:
+	if (err)
+		phydev_warn(phydev, "failed to disable EEE LED mode: %d\n", err);
+
+	phy_restore_page(phydev, oldpage, err);
+}
+
+static int rtl8211e_config_led(struct phy_device *phydev, int led,
+			       struct phy_led_config *cfg)
+{
+	u16 lacr_bits = 0, lcr_bits = 0;
+	int oldpage, ret;
+
+	switch (cfg->trigger.t) {
+	case PHY_LED_TRIGGER_LINK:
+		lcr_bits = 7 << (led * 4);
+		break;
+
+	case PHY_LED_TRIGGER_LINK_10M:
+		lcr_bits = 1 << (led * 4);
+		break;
+
+	case PHY_LED_TRIGGER_LINK_100M:
+		lcr_bits = 2 << (led * 4);
+		break;
+
+	case PHY_LED_TRIGGER_LINK_1G:
+		lcr_bits |= 4 << (led * 4);
+		break;
+
+	case PHY_LED_TRIGGER_NONE:
+		break;
+
+	default:
+		phydev_warn(phydev,
+			    "unknown trigger for LED%d: %d\n",
+			    led, cfg->trigger.t);
+		return -EINVAL;
+	}
+
+	if (cfg->trigger.activity)
+		lacr_bits = BIT(RLT8211E_LACR_LEDACTCTRL_SHIFT + led);
+
+	rtl8211e_disable_eee_led_mode(phydev);
+
+	oldpage = rtl8211x_select_ext_page(phydev, 0x2c);
+	if (oldpage < 0) {
+		phydev_err(phydev, "failed to select extended page: %d\n", oldpage);
+		return oldpage;
+	}
+
+	ret = __phy_modify(phydev, RTL8211E_LACR,
+			   LACR_MASK(led), lacr_bits);
+	if (ret) {
+		phydev_err(phydev, "failed to write LACR reg: %d\n",
+			   ret);
+		goto err;
+	}
+
+	ret = __phy_modify(phydev, RTL8211E_LCR,
+			   LCR_MASK(led), lcr_bits);
+	if (ret)
+		phydev_err(phydev, "failed to write LCR reg: %d\n",
+			   ret);
+
+err:
+	return phy_restore_page(phydev, oldpage, ret);
+}
+
 static int rtl8201_ack_interrupt(struct phy_device *phydev)
 {
 	int err;
@@ -330,6 +428,7 @@ static struct phy_driver realtek_drvs[] = {
 		.config_init	= &rtl8211e_config_init,
 		.ack_interrupt	= &rtl821x_ack_interrupt,
 		.config_intr	= &rtl8211e_config_intr,
+		.config_led	= &rtl8211e_config_led,
 		.suspend	= genphy_suspend,
 		.resume		= genphy_resume,
 		.read_page	= rtl821x_read_page,
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH v5 3/4] net: phy: realtek: Add helpers for accessing RTL8211x extension pages
From: Matthias Kaehlcke @ 2019-08-07 17:04 UTC (permalink / raw)
  To: David S . Miller, Rob Herring, Mark Rutland, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit
  Cc: netdev, devicetree, linux-kernel, Douglas Anderson,
	Matthias Kaehlcke
In-Reply-To: <20190807170449.205378-1-mka@chromium.org>

Some RTL8211x PHYs have extension pages, which can be accessed
after selecting a page through a custom method. Add a function to
modify bits in a register of an extension page and a helper for
selecting an ext page. Use rtl8211x_modify_ext_paged() in
rtl8211e_config_init() instead of doing things 'manually'.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v5:
- renamed 'rtl8211e_<action>_ext_page' to 'rtl8211x_<action>_ext_page'
- updated commit message

Changes in v4:
- don't add constant RTL8211E_EXT_PAGE, it's only used once,
  use a literal instead
- pass 'oldpage' to phy_restore_page() in rtl8211e_select_ext_page(),
  not 'page'
- return 'oldpage' in rtl8211e_select_ext_page()
- use __phy_modify() in rtl8211e_modify_ext_paged() instead of
  reimplementing __phy_modify_changed()
- in rtl8211e_modify_ext_paged() return directly when
  rtl8211e_select_ext_page() fails

Changes in v3:
- use the new function in rtl8211e_config_init() instead of
  doing things 'manually'
- use existing RTL8211E_EXT_PAGE instead of adding a new define
- updated commit message

Changes in v2:
- use phy_select_page() and phy_restore_page(), get rid of
  rtl8211e_restore_page()
- s/rtl821e_select_ext_page/rtl8211e_select_ext_page/
- updated commit message
---
 drivers/net/phy/realtek.c | 47 +++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index a669945eb829..a5b3708dc4d8 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -53,6 +53,36 @@ static int rtl821x_write_page(struct phy_device *phydev, int page)
 	return __phy_write(phydev, RTL821x_PAGE_SELECT, page);
 }
 
+static int rtl8211x_select_ext_page(struct phy_device *phydev, int page)
+{
+	int ret, oldpage;
+
+	oldpage = phy_select_page(phydev, 7);
+	if (oldpage < 0)
+		return oldpage;
+
+	ret = __phy_write(phydev, RTL821x_EXT_PAGE_SELECT, page);
+	if (ret)
+		return phy_restore_page(phydev, oldpage, ret);
+
+	return oldpage;
+}
+
+static int rtl8211x_modify_ext_paged(struct phy_device *phydev, int page,
+				     u32 regnum, u16 mask, u16 set)
+{
+	int ret = 0;
+	int oldpage;
+
+	oldpage = rtl8211x_select_ext_page(phydev, page);
+	if (oldpage < 0)
+		return oldpage;
+
+	ret = __phy_modify(phydev, regnum, mask, set);
+
+	return phy_restore_page(phydev, oldpage, ret);
+}
+
 static int rtl8201_ack_interrupt(struct phy_device *phydev)
 {
 	int err;
@@ -184,7 +214,6 @@ static int rtl8211f_config_init(struct phy_device *phydev)
 
 static int rtl8211e_config_init(struct phy_device *phydev)
 {
-	int ret = 0, oldpage;
 	u16 val;
 
 	/* enable TX/RX delay for rgmii-* modes, and disable them for rgmii. */
@@ -213,19 +242,9 @@ static int rtl8211e_config_init(struct phy_device *phydev)
 	 * 2 = RX Delay, 1 = TX Delay, 0 = SELRGV (see original PHY datasheet
 	 * for details).
 	 */
-	oldpage = phy_select_page(phydev, 0x7);
-	if (oldpage < 0)
-		goto err_restore_page;
-
-	ret = __phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
-	if (ret)
-		goto err_restore_page;
-
-	ret = __phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
-			   val);
-
-err_restore_page:
-	return phy_restore_page(phydev, oldpage, ret);
+	return rtl8211x_modify_ext_paged(phydev, 0xa4, 0x1c,
+					 RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
+					 val);
 }
 
 static int rtl8211b_suspend(struct phy_device *phydev)
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* [PATCH v5 2/4] net: phy: Add support for generic LED configuration through the DT
From: Matthias Kaehlcke @ 2019-08-07 17:04 UTC (permalink / raw)
  To: David S . Miller, Rob Herring, Mark Rutland, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit
  Cc: netdev, devicetree, linux-kernel, Douglas Anderson,
	Matthias Kaehlcke
In-Reply-To: <20190807170449.205378-1-mka@chromium.org>

For PHYs with a device tree node look for LED trigger configuration
using the generic binding, if it exists try to apply it via the new
driver hook .config_led.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v5:
- add callback to configure a LED to the PHY driver, instead of
  having the driver retrieve the DT data
- use new trigger names
- added support for trigger 'none'
- release DT nodes after use
- renamed 'PHY_LED_LINK_*' to 'PHY_LED_TRIGGER_LINK_*'
- added anonymous struct to struct phy_led_config to track
  'activity' in a separate flag. this could be changed to 'flags' if
  needed/desired.
- updated commit message (previous subject was 'net: phy: Add
  function to retrieve LED configuration from the DT')

Changes in v4:
- patch added to the series
---
 drivers/net/phy/phy_device.c | 72 ++++++++++++++++++++++++++++++++++++
 include/linux/phy.h          | 22 +++++++++++
 2 files changed, 94 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3866..6f85fdf72af0 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -29,6 +29,7 @@
 #include <linux/phy_led_triggers.h>
 #include <linux/mdio.h>
 #include <linux/io.h>
+#include <linux/of.h>
 #include <linux/uaccess.h>
 
 MODULE_DESCRIPTION("PHY library");
@@ -1064,6 +1065,75 @@ static int phy_poll_reset(struct phy_device *phydev)
 	return 0;
 }
 
+static void of_phy_config_leds(struct phy_device *phydev)
+{
+	struct device_node *np, *child;
+	struct phy_led_config cfg;
+	const char *trigger;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_OF_MDIO) || !phydev->drv->config_led)
+		return;
+
+	np = of_find_node_by_name(phydev->mdio.dev.of_node, "leds");
+	if (!np)
+		return;
+
+	for_each_child_of_node(np, child) {
+		u32 led;
+
+		if (of_property_read_u32(child, "reg", &led))
+			goto skip_config;
+
+		ret = of_property_read_string(child, "linux,default-trigger",
+					      &trigger);
+		if (ret)
+			trigger = "none";
+
+		memset(&cfg, 0, sizeof(cfg));
+
+		if (!strcmp(trigger, "none")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_NONE;
+		} else if (!strcmp(trigger, "phy-link")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK;
+		} else if (!strcmp(trigger, "phy-link-10m")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_10M;
+		} else if (!strcmp(trigger, "phy-link-100m")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_100M;
+		} else if (!strcmp(trigger, "phy-link-1g")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_1G;
+		} else if (!strcmp(trigger, "phy-link-10g")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_10G;
+		} else if (!strcmp(trigger, "phy-link-activity")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK;
+			cfg.trigger.activity = true;
+		} else if (!strcmp(trigger, "phy-link-10m-activity")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_10M;
+			cfg.trigger.activity = true;
+		} else if (!strcmp(trigger, "phy-link-100m-activity")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_100M;
+			cfg.trigger.activity = true;
+		} else if (!strcmp(trigger, "phy-link-1g-activity")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_1G;
+			cfg.trigger.activity = true;
+		} else if (!strcmp(trigger, "phy-link-10g-activity")) {
+			cfg.trigger.t = PHY_LED_TRIGGER_LINK_10G;
+			cfg.trigger.activity = true;
+		} else {
+			phydev_warn(phydev, "trigger '%s' for LED%d is invalid\n",
+				    trigger, led);
+			goto skip_config;
+		}
+
+		phydev->drv->config_led(phydev, led, &cfg);
+
+	skip_config:
+		of_node_put(child);
+	}
+
+	of_node_put(np);
+}
+
 int phy_init_hw(struct phy_device *phydev)
 {
 	int ret = 0;
@@ -1087,6 +1157,8 @@ int phy_init_hw(struct phy_device *phydev)
 	if (phydev->drv->config_init)
 		ret = phydev->drv->config_init(phydev);
 
+	of_phy_config_leds(phydev);
+
 	return ret;
 }
 EXPORT_SYMBOL(phy_init_hw);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73f93..3a07390fc5e9 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -325,6 +325,24 @@ struct phy_c45_device_ids {
 	u32 device_ids[8];
 };
 
+/* Triggers for PHY LEDs */
+enum phy_led_trigger {
+	PHY_LED_TRIGGER_NONE,
+	PHY_LED_TRIGGER_LINK,
+	PHY_LED_TRIGGER_LINK_10M,
+	PHY_LED_TRIGGER_LINK_100M,
+	PHY_LED_TRIGGER_LINK_1G,
+	PHY_LED_TRIGGER_LINK_10G,
+};
+
+/* Configuration of a single PHY LED */
+struct phy_led_config {
+	struct {
+		enum phy_led_trigger t;
+		bool activity;
+	} trigger;
+};
+
 /* phy_device: An instance of a PHY
  *
  * drv: Pointer to the driver for this PHY instance
@@ -626,6 +644,10 @@ struct phy_driver {
 			    struct ethtool_tunable *tuna,
 			    const void *data);
 	int (*set_loopback)(struct phy_device *dev, bool enable);
+
+	/* Configure a PHY LED */
+	int (*config_led)(struct phy_device *dev, int led,
+			  struct phy_led_config *cfg);
 };
 #define to_phy_driver(d) container_of(to_mdio_common_driver(d),		\
 				      struct phy_driver, mdiodrv)
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* Re:[PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Vijay Khemka @ 2019-08-07 17:36 UTC (permalink / raw)
  To: Tao Ren, Samuel Mendoza-Jonas, David S . Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org, William Kennington, Joel Stanley

Lgtm except one small comment below.

On 8/6/19, 5:22 PM, "openbmc on behalf of Tao Ren" <openbmc-bounces+vijaykhemka=fb.com@lists.ozlabs.org on behalf of taoren@fb.com> wrote:

    Currently BMC's MAC address is calculated by adding 1 to NCSI NIC's base
    MAC address when CONFIG_NCSI_OEM_CMD_GET_MAC option is enabled. The logic
    doesn't work for platforms with different BMC MAC offset: for example,
    Facebook Yamp BMC's MAC address is calculated by adding 2 to NIC's base
    MAC address ("BaseMAC + 1" is reserved for Host use).
    
    This patch adds NET_NCSI_MC_MAC_OFFSET config option to customize offset
    between NIC's Base MAC address and BMC's MAC address. Its default value is
    set to 1 to avoid breaking existing users.
    
    Signed-off-by: Tao Ren <taoren@fb.com>
    ---
     net/ncsi/Kconfig    |  8 ++++++++
     net/ncsi/ncsi-rsp.c | 15 +++++++++++++--
     2 files changed, 21 insertions(+), 2 deletions(-)
    
    diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig
    index 2f1e5756c03a..be8efe1ed99e 100644
    --- a/net/ncsi/Kconfig
    +++ b/net/ncsi/Kconfig
    @@ -17,3 +17,11 @@ config NCSI_OEM_CMD_GET_MAC
     	---help---
     	  This allows to get MAC address from NCSI firmware and set them back to
     		controller.
    +config NET_NCSI_MC_MAC_OFFSET
    +	int
    +	prompt "Offset of Management Controller's MAC Address"
    +	depends on NCSI_OEM_CMD_GET_MAC
    +	default 1
    +	help
    +	  This defines the offset between Network Controller's (base) MAC
    +	  address and Management Controller's MAC address.
    diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
    index 7581bf919885..24a791f9ebf5 100644
    --- a/net/ncsi/ncsi-rsp.c
    +++ b/net/ncsi/ncsi-rsp.c
    @@ -656,6 +656,11 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
     	struct ncsi_rsp_oem_pkt *rsp;
     	struct sockaddr saddr;
     	int ret = 0;
    +#ifdef CONFIG_NET_NCSI_MC_MAC_OFFSET
    +	int mac_offset = CONFIG_NET_NCSI_MC_MAC_OFFSET;
    +#else
    +	int mac_offset = 1;
    +#endif
     
     	/* Get the response header */
     	rsp = (struct ncsi_rsp_oem_pkt *)skb_network_header(nr->rsp);
    @@ -663,8 +668,14 @@ static int ncsi_rsp_handler_oem_bcm_gma(struct ncsi_request *nr)
     	saddr.sa_family = ndev->type;
     	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
     	memcpy(saddr.sa_data, &rsp->data[BCM_MAC_ADDR_OFFSET], ETH_ALEN);
    -	/* Increase mac address by 1 for BMC's address */
    -	eth_addr_inc((u8 *)saddr.sa_data);
    +
    +	/* Management Controller's MAC address is calculated by adding
    +	 * the offset to Network Controller's (base) MAC address.
    +	 * Note: negative offset is "ignored", and BMC will use the Base
Just mention negative and zero offset is ignored. As you are ignoring 0 as well.

    +	 * MAC address in this case.
    +	 */
    +	while (mac_offset-- > 0)
    +		eth_addr_inc((u8 *)saddr.sa_data);
     	if (!is_valid_ether_addr((const u8 *)saddr.sa_data))
     		return -ENXIO;
     
    -- 
    2.17.1
    
    


^ permalink raw reply

* Re: [bpf-next PATCH 1/3] samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
From: David Ahern @ 2019-08-07 17:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov
  Cc: xdp-newbies, a.s.protopopov
In-Reply-To: <156518137803.5636.11766023213864836956.stgit@firesoul>

On 8/7/19 6:36 AM, Jesper Dangaard Brouer wrote:
> The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
> which only have a single TX port. Change name to xdp_tx_ports
> to make it more descriptive.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  samples/bpf/xdp_fwd_kern.c |    5 +++--
>  samples/bpf/xdp_fwd_user.c |    2 +-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
Reviewed-by: David Ahern <dsahern@gmail.com>



^ permalink raw reply

* Re: [bpf-next PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: David Ahern @ 2019-08-07 17:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov
  Cc: xdp-newbies, a.s.protopopov
In-Reply-To: <156518138310.5636.13064696265479533742.stgit@firesoul>

On 8/7/19 6:36 AM, Jesper Dangaard Brouer wrote:
> This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
> that the chosen egress index should be checked for existence in the
> devmap. This can now be done via taking advantage of Toke's work in
> commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF").
> 
> This change makes xdp_fwd more practically usable, as this allows for
> a mixed environment, where IP-forwarding fallback to network stack, if
> the egress device isn't configured to use XDP.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  samples/bpf/xdp_fwd_kern.c |   20 ++++++++++++++------
>  samples/bpf/xdp_fwd_user.c |   36 +++++++++++++++++++++++++-----------
>  2 files changed, 39 insertions(+), 17 deletions(-)
> 
Reviewed-by: David Ahern <dsahern@gmail.com>



^ permalink raw reply

* Re: [bpf-next PATCH 1/3] samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
From: Y Song @ 2019-08-07 17:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Daniel Borkmann, Alexei Starovoitov, Xdp, a.s.protopopov,
	David Ahern
In-Reply-To: <156518137803.5636.11766023213864836956.stgit@firesoul>

On Wed, Aug 7, 2019 at 5:37 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
> which only have a single TX port. Change name to xdp_tx_ports
> to make it more descriptive.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox