Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update
From: Jamal Hadi Salim @ 2017-10-11 12:28 UTC (permalink / raw)
  To: Manish Kurup, xiyou.wangcong, jiri, davem, netdev, linux-kernel
  Cc: aring, mrv, manish.kurup
In-Reply-To: <1507689219-22993-1-git-send-email-manish.kurup@verizon.com>

On 17-10-10 10:33 PM, Manish Kurup wrote:
> Using a spinlock in the VLAN action causes performance issues when the VLAN
> action is used on multiple cores. Rewrote the VLAN action to use RCU read
> locking for reads and updates instead.
> 
> Signed-off-by: Manish Kurup <manish.kurup@verizon.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net v2] net: enable interface alias removal via rtnl
From: Nicolas Dichtel @ 2017-10-11 12:29 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev, oliver, Stephen Hemminger
In-Reply-To: <c81359e7-feb8-475e-69d7-3eacc3406caf@gmail.com>

Le 10/10/2017 à 16:50, David Ahern a écrit :
> On 10/10/17 6:41 AM, Nicolas Dichtel wrote:
>> IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of
>> the attribute is 1 ("\0"). However, to remove an alias, the attribute
>> length must be 0 (see dev_set_alias()).
>>
>> Let's define the type to NLA_BINARY, so that the alias can be removed.
> 
> not to be pedantic, but we need to be clear that the type is changed
> only for policy validation.
With the comment in the code, it is clear, isn't it?


Regards,
Nicolas

^ permalink raw reply

* pull-request: mac80211-next 2017-10-11
From: Johannes Berg @ 2017-10-11 12:36 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA

Hi Dave,

Here's a -next pull request. The only bigger thing here is the
addition of the regulatory database as firmware, which will
allow us to - over time - get rid of CRDA, as well as having
the option of adding more fields to the database where needed,
this would've been extremely complex with CRDA because it had
not been built with extensibility in mind.

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit cc71b7b071192ac1c288e272fdc3f3877eb96663:

  net/ipv6: remove unused err variable on icmpv6_push_pending_frames (2017-10-05 21:56:26 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git tags/mac80211-next-for-davem-2017-10-11

for you to fetch changes up to 90a53e4432b12288316efaa5f308adafb8d304b0:

  cfg80211: implement regdb signature checking (2017-10-11 14:24:24 +0200)

----------------------------------------------------------------
Work continues in various areas:
 * port authorized event for 4-way-HS offload (Avi)
 * enable MFP optional for such devices (Emmanuel)
 * Kees's timer setup patch for mac80211 mesh
   (the part that isn't trivially scripted)
 * improve VLAN vs. TXQ handling (myself)
 * load regulatory database as firmware file (myself)
 * with various other small improvements and cleanups

I merged net-next once in the meantime to allow Kees's
timer setup patch to go in.

----------------------------------------------------------------
Avraham Stern (2):
      ieee80211: Add WFA TPC report element OUI type
      cfg80211/nl80211: add a port authorized event

Emmanuel Grumbach (1):
      nl80211: add an option to allow MFP without requiring it

Gregory Greenman (1):
      mac80211: recalculate some sta parameters after insertion

Ilan peer (1):
      mac80211: Simplify locking in ieee80211_sta_tear_down_BA_sessions()

Johannes Berg (12):
      mac80211: avoid allocating TXQs that won't be used
      mac80211: simplify and clarify IE splitting
      mac80211: use offsetofend()
      cfg80211: remove unused function ieee80211_data_from_8023()
      Merge remote-tracking branch 'net-next/master' into mac80211-next
      MAINTAINERS: update Johannes Berg's entries
      fq: support filtering a given tin
      mac80211: only remove AP VLAN frames from TXQ
      cfg80211: support loading regulatory database as firmware file
      cfg80211: support reloading regulatory database
      cfg80211: reg: remove support for built-in regdb
      cfg80211: implement regdb signature checking

Kees Cook (1):
      net/mac80211/mesh_plink: Convert timers to use timer_setup()

Liad Kaufman (1):
      mac80211: extend ieee80211_ie_split to support EXTENSION

Lubomir Rintel (1):
      mac80211_hwsim: use dyndbg for debug messages

Luca Coelho (1):
      mac80211: add documentation to ieee80211_rx_ba_offl()

Manikanta Pubbisetty (1):
      mac80211: fix bandwidth computation for TDLS peers

Richard Schütz (1):
      wireless: set correct mandatory rate flags

Roee Zamir (2):
      nl80211: add OCE scan and capability flags
      mac80211: oce: enable receiving of bcast probe resp

Stanislaw Gruszka (1):
      mac80211: fix STA_SLOW_THRESHOLD htmldocs failure

Tova Mussai (1):
      nl80211: return error for invalid center_freq in 40 MHz

Xiang Gao (1):
      mac80211: aead api to reduce redundancy

 Documentation/driver-api/80211/cfg80211.rst |   3 -
 Documentation/networking/regulatory.txt     |  30 +-
 MAINTAINERS                                 |  13 +-
 drivers/net/wireless/mac80211_hwsim.c       | 192 ++++++------
 include/linux/ieee80211.h                   |   1 +
 include/net/cfg80211.h                      |  40 +--
 include/net/fq.h                            |   7 +
 include/net/fq_impl.h                       |  72 ++++-
 include/net/mac80211.h                      |   8 +-
 include/uapi/linux/nl80211.h                |  82 +++--
 net/mac80211/Makefile                       |   3 +-
 net/mac80211/{aes_ccm.c => aead_api.c}      |  40 +--
 net/mac80211/aead_api.h                     |  27 ++
 net/mac80211/aes_ccm.h                      |  42 ++-
 net/mac80211/aes_gcm.c                      | 109 -------
 net/mac80211/aes_gcm.h                      |  38 ++-
 net/mac80211/agg-rx.c                       |   4 +-
 net/mac80211/ht.c                           |  12 +-
 net/mac80211/ieee80211_i.h                  |   2 +
 net/mac80211/iface.c                        |  29 +-
 net/mac80211/mesh.c                         |   3 +-
 net/mac80211/mesh.h                         |   1 +
 net/mac80211/mesh_hwmp.c                    |   8 +-
 net/mac80211/mesh_plink.c                   |  13 +-
 net/mac80211/mlme.c                         |  19 +-
 net/mac80211/scan.c                         |  37 ++-
 net/mac80211/sta_info.c                     |  61 ++--
 net/mac80211/sta_info.h                     |   4 +-
 net/mac80211/tx.c                           |  34 +++
 net/mac80211/util.c                         |  25 +-
 net/mac80211/vht.c                          |  10 +
 net/mac80211/wpa.c                          |   4 +-
 net/wireless/.gitignore                     |   3 +-
 net/wireless/Kconfig                        |  58 ++--
 net/wireless/Makefile                       |  24 +-
 net/wireless/certs/sforshee.x509            | Bin 0 -> 680 bytes
 net/wireless/core.c                         |   2 +-
 net/wireless/core.h                         |   5 +
 net/wireless/db.txt                         |  17 --
 net/wireless/genregdb.awk                   | 158 ----------
 net/wireless/nl80211.c                      | 199 ++++++++----
 net/wireless/nl80211.h                      |   2 +
 net/wireless/reg.c                          | 452 +++++++++++++++++++++++++---
 net/wireless/reg.h                          |  14 +
 net/wireless/regdb.h                        |  23 --
 net/wireless/sme.c                          |  45 ++-
 net/wireless/util.c                         | 202 ++++---------
 47 files changed, 1278 insertions(+), 899 deletions(-)
 rename net/mac80211/{aes_ccm.c => aead_api.c} (67%)
 create mode 100644 net/mac80211/aead_api.h
 delete mode 100644 net/mac80211/aes_gcm.c
 create mode 100644 net/wireless/certs/sforshee.x509
 delete mode 100644 net/wireless/db.txt
 delete mode 100644 net/wireless/genregdb.awk
 delete mode 100644 net/wireless/regdb.h

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e
From: Jamal Hadi Salim @ 2017-10-11 12:42 UTC (permalink / raw)
  To: Amritha Nambiar, intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, jiri, netdev, alexander.duyck, xiyou.wangcong
In-Reply-To: <150768099999.5320.1633617713417675266.stgit@anamdev.jf.intel.com>

On 17-10-10 08:24 PM, Amritha Nambiar wrote:
> This patch series enables configuring cloud filters in i40e
> using the tc-flower classifier. The classification function
> of the filter is to match a packet to a class. cls_flower is
> extended to offload classid to hardware. The offloaded classid
> is used direct matched packets to a traffic class on the device.
> The approach here is similar to the tc 'prio' qdisc which uses
> the classid for band selection. The ingress qdisc is called ffff:0,
> so traffic classes are ffff:1 to ffff:8 (i40e has max of 8 TCs).
> TC0 is minor number 1, TC1 is minor number 2 etc.
> 
> The cloud filters are added for a VSI and are cleaned up when
> the VSI is deleted. The filters that match on L4 ports needs
> enhanced admin queue functions with big buffer support for
> extended fields in cloud filter commands.
> 
> Example:
> # tc qdisc add dev eth0 ingress
> # ethtool -K eth0 hw-tc-offload on
> 
> Match Dst IPv4,Dst Port and route to TC1:
> # tc filter add dev eth0 protocol ip parent ffff: prio 1 flower\
>    dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
>    skip_sw classid ffff:2
> 
> # tc filter show dev eth0 parent ffff:
> filter pref 1 flower chain 0
> filter pref 1 flower chain 0 handle 0x1 classid ffff:2
>    eth_type ipv4
>    ip_proto udp
>    dst_ip 192.168.1.1
>    dst_port 22
>    skip_sw
>    in_hw
> 

Much much better semantic. Thank you.
Have you tested many filter mapping to the same classid?

cheers,
jamal

^ permalink raw reply

* RE: [E] Re: [PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats
From: Kurup, Manish B @ 2017-10-11 12:35 UTC (permalink / raw)
  To: Jamal Hadi Salim, Manish Kurup, xiyou.wangcong@gmail.com,
	jiri@resnulli.us, davem@davemloft.net, netdev@vger.kernel.org
  Cc: aring@mojatatu.com, mrv@mojatatu.com
In-Reply-To: <d79cf8b2-4c67-1101-0405-afec1e9f89c7@mojatatu.com>

Hi Jamal, Yes, that's an indentation mismatch - I put in one tab too many. Shall fix before commit.

Thanks,

-Manish

-----Original Message-----
From: Jamal Hadi Salim [mailto:jhs@mojatatu.com] 
Sent: Wednesday, October 11, 2017 8:28 AM
To: Manish Kurup; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; netdev@vger.kernel.org
Cc: aring@mojatatu.com; mrv@mojatatu.com; Kurup, Manish B
Subject: [E] Re: [PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats

minus lk

On 17-10-10 10:32 PM, Manish Kurup wrote:
> The VLAN action maintains one set of stats across all cores, and uses 
> a spinlock to synchronize updates to it from the same. Changed this to 
> use a per-CPU stats context instead.
> This change will result in better performance.
> 
> Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
> ---
>   net/sched/act_vlan.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c index 
> 16eb067..14c262c 100644
> --- a/net/sched/act_vlan.c
> +++ b/net/sched/act_vlan.c
> @@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
>   	int err;
>   	u16 tci;
>   
> -	spin_lock(&v->tcf_lock);
>   	tcf_lastuse_update(&v->tcf_tm);
> -	bstats_update(&v->tcf_bstats, skb);
> +	bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
> +
> +	spin_lock(&v->tcf_lock);
>   	action = v->tcf_action;
>   
>   	/* Ensure 'data' points at mac_header prior calling vlan 
> manipulating @@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff 
> *skb, const struct tc_action *a,
>   
>   drop:
>   	action = TC_ACT_SHOT;
> -	v->tcf_qstats.drops++;
> +	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
> +
>   unlock:
>   	if (skb_at_tc_ingress(skb))
>   		skb_pull_rcsum(skb, skb->mac_len); @@ -172,7 +174,7 @@ static int 
> tcf_vlan_init(struct net *net, struct nlattr *nla,
>   
>   	if (!exists) {
>   		ret = tcf_idr_create(tn, parm->index, est, a,
> -				     &act_vlan_ops, bind, false);
> +						&act_vlan_ops, bind, true);
>

Indentation mismatch here?

Otherwise looks good to me.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* RE: [PATCH] rtl8xxxu: mark expected switch fall-throughs
From: David Laight @ 2017-10-11 12:54 UTC (permalink / raw)
  To: 'Joe Perches', Gustavo A. R. Silva, Jes Sorensen,
	Kalle Valo
  Cc: linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Arnaldo Carvalho de Melo
In-Reply-To: <1507717237.3552.48.camel@perches.com>

From: Joe Perches
> Sent: 11 October 2017 11:21
> On Tue, 2017-10-10 at 14:30 -0500, Gustavo A. R. Silva wrote:
> > In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> > where we are expecting to fall through.
> 
> perhaps use Arnaldo's idea:
> 
> https://lkml.org/lkml/2017/2/9/845
> https://lkml.org/lkml/2017/2/10/485

gah, that is even uglier and requires a chase through
headers to find out what it means.

	David

^ permalink raw reply

* Re: [PATCH net-next v3 5/5] selinux: bpf: Add addtional check for bpf object file receive
From: Stephen Smalley @ 2017-10-11 12:54 UTC (permalink / raw)
  To: Chenbo Feng, linux-security-module, netdev, SELinux
  Cc: Daniel Borkmann, Chenbo Feng, Alexei Starovoitov, lorenzo
In-Reply-To: <20171011000930.133308-6-chenbofeng.kernel@gmail.com>

On Tue, 2017-10-10 at 17:09 -0700, Chenbo Feng wrote:
> From: Chenbo Feng <fengc@google.com>
> 
> Introduce a bpf object related check when sending and receiving files
> through unix domain socket as well as binder. It checks if the
> receiving
> process have privilege to read/write the bpf map or use the bpf
> program.
> This check is necessary because the bpf maps and programs are using a
> anonymous inode as their shared inode so the normal way of checking
> the
> files and sockets when passing between processes cannot work properly
> on
> eBPF object. This check only works when the BPF_SYSCALL is
> configured.
> The information stored inside the file security struct is the same as
> the information in bpf object security struct.
> 
> Signed-off-by: Chenbo Feng <fengc@google.com>
> ---
>  include/linux/lsm_hooks.h         | 17 ++++++++++
>  include/linux/security.h          |  9 ++++++
>  kernel/bpf/syscall.c              | 27 ++++++++++++++--
>  security/security.c               |  8 +++++
>  security/selinux/hooks.c          | 67
> +++++++++++++++++++++++++++++++++++++++
>  security/selinux/include/objsec.h |  9 ++++++
>  6 files changed, 135 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 7161d8e7ee79..517dea60b87b 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -1385,6 +1385,19 @@
>   * @bpf_prog_free_security:
>   *	Clean up the security information stored inside bpf prog.
>   *
> + * @bpf_map_file:
> + *	When creating a bpf map fd, set up the file security
> information with
> + *	the bpf security information stored in the map struct. So
> when the map
> + *	fd is passed between processes, the security module can
> directly read
> + *	the security information from file security struct rather
> than the bpf
> + *	security struct.
> + *
> + * @bpf_prog_file:
> + *	When creating a bpf prog fd, set up the file security
> information with
> + *	the bpf security information stored in the prog struct. So
> when the prog
> + *	fd is passed between processes, the security module can
> directly read
> + *	the security information from file security struct rather
> than the bpf
> + *	security struct.
>   */
>  union security_list_options {
>  	int (*binder_set_context_mgr)(struct task_struct *mgr);
> @@ -1726,6 +1739,8 @@ union security_list_options {
>  	void (*bpf_map_free_security)(struct bpf_map *map);
>  	int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
>  	void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
> +	void (*bpf_map_file)(struct bpf_map *map, struct file
> *file);
> +	void (*bpf_prog_file)(struct bpf_prog_aux *aux, struct file
> *file);
>  #endif /* CONFIG_BPF_SYSCALL */
>  };
>  
> @@ -1954,6 +1969,8 @@ struct security_hook_heads {
>  	struct list_head bpf_map_free_security;
>  	struct list_head bpf_prog_alloc_security;
>  	struct list_head bpf_prog_free_security;
> +	struct list_head bpf_map_file;
> +	struct list_head bpf_prog_file;
>  #endif /* CONFIG_BPF_SYSCALL */
>  } __randomize_layout;
>  
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 18800b0911e5..57573b794e2d 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct
> bpf_map *map);
>  extern void security_bpf_map_free(struct bpf_map *map);
>  extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
>  extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
> +extern void security_bpf_map_file(struct bpf_map *map, struct file
> *file);
> +extern void security_bpf_prog_file(struct bpf_prog_aux *aux, struct
> file *file);
>  #else
>  static inline int security_bpf(int cmd, union bpf_attr *attr,
>  					     unsigned int size)
> @@ -1772,6 +1774,13 @@ static inline int
> security_bpf_prog_alloc(struct bpf_prog_aux *aux)
>  
>  static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
>  { }
> +
> +static inline void security_bpf_map_file(struct bpf_map *map, struct
> file *file)
> +{ }
> +
> +static inline void security_bpf_prog_file(struct bpf_prog_aux *aux,
> +					  struct file *file)
> +{ }
>  #endif /* CONFIG_SECURITY */
>  #endif /* CONFIG_BPF_SYSCALL */
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 1cf31ddd7616..aee69e564c50 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -324,11 +324,22 @@ static const struct file_operations
> bpf_map_fops = {
>  
>  int bpf_map_new_fd(struct bpf_map *map, int flags)
>  {
> +	int fd;
> +	struct fd f;
>  	if (security_bpf_map(map, OPEN_FMODE(flags)))
>  		return -EPERM;
>  
> -	return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
> +	fd = anon_inode_getfd("bpf-map", &bpf_map_fops, map,
>  				flags | O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return -EBADF;

This seems convoluted and unnecessarily inefficient, since
anon_inode_getfd() has the struct file and could have directly returned
it instead of having to go through fdget() on a fd we just installed. 
Also, couldn't the fd->file mapping have changed underneath us between
fd_install() and fdget()?
I would think it would be safer and more efficient to create an
anon_inode_getfdandfile() or similar interface and use that, so that we
can just pass the file it set up to the hook.  Obviously that would
need to be reviewed by the vfs folks.

> +	security_bpf_map_file(map, f.file);
> +	fdput(f);
> +	return fd;
>  }
>  
>  int bpf_get_file_flag(int flags)
> @@ -975,11 +986,23 @@ static const struct file_operations
> bpf_prog_fops = {
>  
>  int bpf_prog_new_fd(struct bpf_prog *prog)
>  {
> +	int fd;
> +	struct fd f;
> +
>  	if (security_bpf_prog(prog))
>  		return -EPERM;
>  
> -	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
> +	fd =  anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
>  				O_RDWR | O_CLOEXEC);
> +	if (fd < 0)
> +		return fd;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return -EBADF;
> +	security_bpf_prog_file(prog->aux, f.file);
> +	fdput(f);
> +	return fd;
>  }
>  
>  static struct bpf_prog *____bpf_prog_get(struct fd f)
> diff --git a/security/security.c b/security/security.c
> index 1cd8526cb0b7..dacf649b8cfa 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1734,4 +1734,12 @@ void security_bpf_prog_free(struct
> bpf_prog_aux *aux)
>  {
>  	call_void_hook(bpf_prog_free_security, aux);
>  }
> +void security_bpf_map_file(struct bpf_map *map, struct file *file)
> +{
> +	call_void_hook(bpf_map_file, map, file);
> +}
> +void security_bpf_prog_file(struct bpf_prog_aux *aux, struct file
> *file)
> +{
> +	call_void_hook(bpf_prog_file, aux, file);
> +}
>  #endif /* CONFIG_BPF_SYSCALL */
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 94e473b9c884..0a6ef20513b0 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1815,6 +1815,10 @@ static inline int file_path_has_perm(const
> struct cred *cred,
>  	return inode_has_perm(cred, file_inode(file), av, &ad);
>  }
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_file_check(struct file *file, u32 sid);
> +#endif
> +
>  /* Check whether a task can use an open file descriptor to
>     access an inode in a given way.  Check access to the
>     descriptor itself, and then use dentry_has_perm to
> @@ -1845,6 +1849,14 @@ static int file_has_perm(const struct cred
> *cred,
>  			goto out;
>  	}
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +	if (fsec->bpf_type) {
> +		rc = bpf_file_check(file, cred_sid(cred));
> +		if (rc)
> +			goto out;
> +	}
> +#endif
> +
>  	/* av is zero if only checking access to the descriptor. */
>  	rc = 0;
>  	if (av)
> @@ -2165,6 +2177,14 @@ static int selinux_binder_transfer_file(struct
> task_struct *from,
>  			return rc;
>  	}
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +	if (fsec->bpf_type) {
> +		rc = bpf_file_check(file, sid);
> +		if (rc)
> +			return rc;
> +	}
> +#endif
> +
>  	if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
>  		return 0;
>  
> @@ -6288,6 +6308,33 @@ static u32 bpf_map_fmode_to_av(fmode_t fmode)
>  	return av;
>  }
>  
> +/* This function will check the file pass through unix socket or
> binder to see
> + * if it is a bpf related object. And apply correspinding checks on
> the bpf
> + * object based on the type. The bpf maps and programs, not like
> other files and
> + * socket, are using a shared anonymous inode inside the kernel as
> their inode.
> + * So checking that inode cannot identify if the process have
> privilege to
> + * access the bpf object and that's why we have to add this
> additional check in
> + * selinux_file_receive and selinux_binder_transfer_files.
> + */
> +static int bpf_file_check(struct file *file, u32 sid)
> +{
> +	struct file_security_struct *fsec = file->f_security;
> +	int ret;
> +
> +	if (fsec->bpf_type == BPF_MAP) {
> +		ret = avc_has_perm(sid, fsec->bpf_sid, SECCLASS_BPF,
> +				   bpf_map_fmode_to_av(file-
> >f_mode), NULL);
> +		if (ret)
> +			return ret;
> +	} else if (fsec->bpf_type == BPF_PROG) {
> +		ret = avc_has_perm(sid, fsec->bpf_sid, SECCLASS_BPF,
> +				   BPF__PROG_USE, NULL);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
>  static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
>  {
>  	u32 sid = current_sid();
> @@ -6351,6 +6398,24 @@ static void selinux_bpf_prog_free(struct
> bpf_prog_aux *aux)
>  	aux->security = NULL;
>  	kfree(bpfsec);
>  }
> +
> +static void selinux_bpf_map_file(struct bpf_map *map, struct file
> *file)
> +{
> +	struct bpf_security_struct *bpfsec = map->security;
> +	struct file_security_struct *fsec = file->f_security;
> +
> +	fsec->bpf_type = BPF_MAP;
> +	fsec->bpf_sid = bpfsec->sid;
> +}
> +
> +static void selinux_bpf_prog_file(struct bpf_prog_aux *aux, struct
> file *file)
> +{
> +	struct bpf_security_struct *bpfsec = aux->security;
> +	struct file_security_struct *fsec = file->f_security;
> +
> +	fsec->bpf_type = BPF_PROG;
> +	fsec->bpf_sid = bpfsec->sid;
> +}
>  #endif
>  
>  static struct security_hook_list selinux_hooks[] __lsm_ro_after_init
> = {
> @@ -6581,6 +6646,8 @@ static struct security_hook_list
> selinux_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(bpf_prog_alloc_security,
> selinux_bpf_prog_alloc),
>  	LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
>  	LSM_HOOK_INIT(bpf_prog_free_security,
> selinux_bpf_prog_free),
> +	LSM_HOOK_INIT(bpf_map_file, selinux_bpf_map_file),
> +	LSM_HOOK_INIT(bpf_prog_file, selinux_bpf_prog_file),
>  #endif
>  };
>  
> diff --git a/security/selinux/include/objsec.h
> b/security/selinux/include/objsec.h
> index 3d54468ce334..0162648761f9 100644
> --- a/security/selinux/include/objsec.h
> +++ b/security/selinux/include/objsec.h
> @@ -67,11 +67,20 @@ struct inode_security_struct {
>  	spinlock_t lock;
>  };
>  
> +enum bpf_obj_type {
> +	BPF_MAP = 1,
> +	BPF_PROG,
> +};
> +
>  struct file_security_struct {
>  	u32 sid;		/* SID of open file description */
>  	u32 fown_sid;		/* SID of file owner (for
> SIGIO) */
>  	u32 isid;		/* SID of inode at the time of file
> open */
>  	u32 pseqno;		/* Policy seqno at the time of
> file open */
> +#ifdef CONFIG_BPF_SYSCALL
> +	unsigned char bpf_type;
> +	u32 bpf_sid;
> +#endif
>  };

Can you check how this impacts the size of the file_security_cache
objects, and thus the memory overhead imposed on all open files?

If it is significant, do we need to cache the bpf_sid here or could we
just store the bpf_type and then dereference the bpfsec if it is a map
or prog?

>  
>  struct superblock_security_struct {

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e
From: Jiri Pirko @ 2017-10-11 12:56 UTC (permalink / raw)
  To: Amritha Nambiar
  Cc: intel-wired-lan, jeffrey.t.kirsher, alexander.h.duyck, netdev,
	jhs, alexander.duyck, xiyou.wangcong
In-Reply-To: <150768099999.5320.1633617713417675266.stgit@anamdev.jf.intel.com>

Wed, Oct 11, 2017 at 02:24:12AM CEST, amritha.nambiar@intel.com wrote:
>This patch series enables configuring cloud filters in i40e
>using the tc-flower classifier. The classification function
>of the filter is to match a packet to a class. cls_flower is
>extended to offload classid to hardware. The offloaded classid
>is used direct matched packets to a traffic class on the device. 
>The approach here is similar to the tc 'prio' qdisc which uses
>the classid for band selection. The ingress qdisc is called ffff:0,
>so traffic classes are ffff:1 to ffff:8 (i40e has max of 8 TCs).


NACK. This clearly looks like abuse of classid to something
else. Classid is here to identify qdisc instance. However, you use it
for hw tclass identification. This is mixing of apples and oranges.

Why?

Please don't try to abuse things! This is not nice.




>TC0 is minor number 1, TC1 is minor number 2 etc.
>
>The cloud filters are added for a VSI and are cleaned up when
>the VSI is deleted. The filters that match on L4 ports needs
>enhanced admin queue functions with big buffer support for
>extended fields in cloud filter commands.
>
>Example:
># tc qdisc add dev eth0 ingress
># ethtool -K eth0 hw-tc-offload on
>
>Match Dst IPv4,Dst Port and route to TC1:
># tc filter add dev eth0 protocol ip parent ffff: prio 1 flower\
>  dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
>  skip_sw classid ffff:2
>
># tc filter show dev eth0 parent ffff:
>filter pref 1 flower chain 0
>filter pref 1 flower chain 0 handle 0x1 classid ffff:2
>  eth_type ipv4
>  ip_proto udp
>  dst_ip 192.168.1.1
>  dst_port 22
>  skip_sw
>  in_hw
>
>v4: classid based approach to set traffic class for matched packets.
>
>Authors:
>Amritha Nambiar <amritha.nambiar@intel.com>
>Kiran Patil <kiran.patil@intel.com>
>Anjali Singhai Jain <anjali.singhai@intel.com>
>Jingjing Wu <jingjing.wu@intel.com>
>---
>
>Amritha Nambiar (6):
>      cls_flower: Offload classid to hardware
>      i40e: Map TCs with the VSI seids
>      i40e: Cloud filter mode for set_switch_config command
>      i40e: Admin queue definitions for cloud filters
>      i40e: Clean up of cloud filters
>      i40e: Enable cloud filters via tc-flower
>
>
> drivers/net/ethernet/intel/i40e/i40e.h             |   55 +
> drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  143 +++
> drivers/net/ethernet/intel/i40e/i40e_common.c      |  193 ++++
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    2 
> drivers/net/ethernet/intel/i40e/i40e_main.c        |  941 +++++++++++++++++++-
> drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   18 
> drivers/net/ethernet/intel/i40e/i40e_type.h        |   10 
> .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h    |  113 ++
> include/net/pkt_cls.h                              |    1 
> net/sched/cls_flower.c                             |    2 
> 10 files changed, 1439 insertions(+), 39 deletions(-)
>
>--

^ permalink raw reply

* Re: [E] Re: [PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats
From: Jiri Pirko @ 2017-10-11 12:58 UTC (permalink / raw)
  To: Kurup, Manish B
  Cc: Jamal Hadi Salim, Manish Kurup, xiyou.wangcong@gmail.com,
	davem@davemloft.net, netdev@vger.kernel.org, aring@mojatatu.com,
	mrv@mojatatu.com
In-Reply-To: <10f80e225811433d89d58f139fba8026@OMZP1LUMXCA01.uswin.ad.vzwcorp.com>

Wed, Oct 11, 2017 at 02:35:50PM CEST, manish.kurup@verizon.com wrote:
>Hi Jamal, Yes, that's an indentation mismatch - I put in one tab too many. Shall fix before commit.

No top-posting please!


>
>Thanks,
>
>-Manish
>
>-----Original Message-----
>From: Jamal Hadi Salim [mailto:jhs@mojatatu.com] 
>Sent: Wednesday, October 11, 2017 8:28 AM
>To: Manish Kurup; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; netdev@vger.kernel.org
>Cc: aring@mojatatu.com; mrv@mojatatu.com; Kurup, Manish B
>Subject: [E] Re: [PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats
>
>minus lk
>
>On 17-10-10 10:32 PM, Manish Kurup wrote:
>> The VLAN action maintains one set of stats across all cores, and uses 
>> a spinlock to synchronize updates to it from the same. Changed this to 
>> use a per-CPU stats context instead.
>> This change will result in better performance.
>> 
>> Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
>> ---
>>   net/sched/act_vlan.c | 10 ++++++----
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c index 
>> 16eb067..14c262c 100644
>> --- a/net/sched/act_vlan.c
>> +++ b/net/sched/act_vlan.c
>> @@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
>>   	int err;
>>   	u16 tci;
>>   
>> -	spin_lock(&v->tcf_lock);
>>   	tcf_lastuse_update(&v->tcf_tm);
>> -	bstats_update(&v->tcf_bstats, skb);
>> +	bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
>> +
>> +	spin_lock(&v->tcf_lock);
>>   	action = v->tcf_action;
>>   
>>   	/* Ensure 'data' points at mac_header prior calling vlan 
>> manipulating @@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff 
>> *skb, const struct tc_action *a,
>>   
>>   drop:
>>   	action = TC_ACT_SHOT;
>> -	v->tcf_qstats.drops++;
>> +	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
>> +
>>   unlock:
>>   	if (skb_at_tc_ingress(skb))
>>   		skb_pull_rcsum(skb, skb->mac_len); @@ -172,7 +174,7 @@ static int 
>> tcf_vlan_init(struct net *net, struct nlattr *nla,
>>   
>>   	if (!exists) {
>>   		ret = tcf_idr_create(tn, parm->index, est, a,
>> -				     &act_vlan_ops, bind, false);
>> +						&act_vlan_ops, bind, true);
>>
>
>Indentation mismatch here?
>
>Otherwise looks good to me.
>
>Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>
>cheers,
>jamal

^ permalink raw reply

* Re: [PATCH net-next v2 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
From: Stephen Smalley @ 2017-10-11 13:00 UTC (permalink / raw)
  To: Chenbo Feng
  Cc: Daniel Borkmann, netdev, Chenbo Feng, linux-security-module,
	SELinux, Alexei Starovoitov, Lorenzo Colitti
In-Reply-To: <CAMOXUJmKtpB6LU2V3AyBvMFizYDCrZGQ5ZG3+z=5FOyooQGy=w@mail.gmail.com>

On Tue, 2017-10-10 at 10:54 -0700, Chenbo Feng via Selinux wrote:
> On Tue, Oct 10, 2017 at 7:52 AM, Stephen Smalley <sds@tycho.nsa.gov>
> wrote:
> > On Tue, 2017-10-10 at 10:18 -0400, Stephen Smalley wrote:
> > > On Mon, 2017-10-09 at 15:20 -0700, Chenbo Feng wrote:
> > > > From: Chenbo Feng <fengc@google.com>
> > > > 
> > > > Implement the actual checks introduced to eBPF related
> > > > syscalls.
> > > > This
> > > > implementation use the security field inside bpf object to
> > > > store a
> > > > sid that
> > > > identify the bpf object. And when processes try to access the
> > > > object,
> > > > selinux will check if processes have the right privileges. The
> > > > creation
> > > > of eBPF object are also checked at the general bpf check hook
> > > > and
> > > > new
> > > > cmd introduced to eBPF domain can also be checked there.
> > > > 
> > > > Signed-off-by: Chenbo Feng <fengc@google.com>
> > > > Acked-by: Alexei Starovoitov <ast@kernel.org>
> > > > ---
> > > >  security/selinux/hooks.c            | 111
> > > > ++++++++++++++++++++++++++++++++++++
> > > >  security/selinux/include/classmap.h |   2 +
> > > >  security/selinux/include/objsec.h   |   4 ++
> > > >  3 files changed, 117 insertions(+)
> > > > 
> > > > diff --git a/security/selinux/hooks.c
> > > > b/security/selinux/hooks.c
> > > > index f5d304736852..41aba4e3d57c 100644
> > > > --- a/security/selinux/hooks.c
> > > > +++ b/security/selinux/hooks.c
> > > > @@ -85,6 +85,7 @@
> > > >  #include <linux/export.h>
> > > >  #include <linux/msg.h>
> > > >  #include <linux/shm.h>
> > > > +#include <linux/bpf.h>
> > > > 
> > > >  #include "avc.h"
> > > >  #include "objsec.h"
> > > > @@ -6252,6 +6253,106 @@ static void
> > > > selinux_ib_free_security(void
> > > > *ib_sec)
> > > >  }
> > > >  #endif
> > > > 
> > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > +static int selinux_bpf(int cmd, union bpf_attr *attr,
> > > > +                                unsigned int size)
> > > > +{
> > > > +   u32 sid = current_sid();
> > > > +   int ret;
> > > > +
> > > > +   switch (cmd) {
> > > > +   case BPF_MAP_CREATE:
> > > > +           ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP,
> > > > BPF_MAP__CREATE,
> > > > +                              NULL);
> > > > +           break;
> > > > +   case BPF_PROG_LOAD:
> > > > +           ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG,
> > > > BPF_PROG__LOAD,
> > > > +                              NULL);
> > > > +           break;
> > > > +   default:
> > > > +           ret = 0;
> > > > +           break;
> > > > +   }
> > > > +
> > > > +   return ret;
> > > > +}
> > > > +
> > > > +static u32 bpf_map_fmode_to_av(fmode_t fmode)
> > > > +{
> > > > +   u32 av = 0;
> > > > +
> > > > +   if (f_mode & FMODE_READ)
> > > > +           av |= BPF_MAP__READ;
> > > > +   if (f_mode & FMODE_WRITE)
> > > > +           av |= BPF_MAP__WRITE;
> > > > +   return av;
> > > > +}
> > > > +
> > > > +static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
> > > > +{
> > > > +   u32 sid = current_sid();
> > > > +   struct bpf_security_struct *bpfsec;
> > > > +
> > > > +   bpfsec = map->security;
> > > > +   return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
> > > > +                       bpf_map_fmode_to_av(fmode), NULL);
> > > > +}
> > > > +
> > > > +static int selinux_bpf_prog(struct bpf_prog *prog)
> > > > +{
> > > > +   u32 sid = current_sid();
> > > > +   struct bpf_security_struct *bpfsec;
> > > > +
> > > > +   bpfsec = prog->aux->security;
> > > > +   return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
> > > > +                       BPF_PROG__USE, NULL);
> > > > +}
> > > > +
> > > > +static int selinux_bpf_map_alloc(struct bpf_map *map)
> > > > +{
> > > > +   struct bpf_security_struct *bpfsec;
> > > > +
> > > > +   bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > > > +   if (!bpfsec)
> > > > +           return -ENOMEM;
> > > > +
> > > > +   bpfsec->sid = current_sid();
> > > > +   map->security = bpfsec;
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static void selinux_bpf_map_free(struct bpf_map *map)
> > > > +{
> > > > +   struct bpf_security_struct *bpfsec = map->security;
> > > > +
> > > > +   map->security = NULL;
> > > > +   kfree(bpfsec);
> > > > +}
> > > > +
> > > > +static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
> > > > +{
> > > > +   struct bpf_security_struct *bpfsec;
> > > > +
> > > > +   bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
> > > > +   if (!bpfsec)
> > > > +           return -ENOMEM;
> > > > +
> > > > +   bpfsec->sid = current_sid();
> > > > +   aux->security = bpfsec;
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> > > > +{
> > > > +   struct bpf_security_struct *bpfsec = aux->security;
> > > > +
> > > > +   aux->security = NULL;
> > > > +   kfree(bpfsec);
> > > > +}
> > > > +#endif
> > > > +
> > > >  static struct security_hook_list selinux_hooks[]
> > > > __lsm_ro_after_init
> > > > = {
> > > >     LSM_HOOK_INIT(binder_set_context_mgr,
> > > > selinux_binder_set_context_mgr),
> > > >     LSM_HOOK_INIT(binder_transaction,
> > > > selinux_binder_transaction),
> > > > @@ -6471,6 +6572,16 @@ static struct security_hook_list
> > > > selinux_hooks[] __lsm_ro_after_init = {
> > > >     LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
> > > >     LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
> > > >  #endif
> > > > +
> > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > +   LSM_HOOK_INIT(bpf, selinux_bpf),
> > > > +   LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
> > > > +   LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
> > > > +   LSM_HOOK_INIT(bpf_map_alloc_security,
> > > > selinux_bpf_map_alloc),
> > > > +   LSM_HOOK_INIT(bpf_prog_alloc_security,
> > > > selinux_bpf_prog_alloc),
> > > > +   LSM_HOOK_INIT(bpf_map_free_security,
> > > > selinux_bpf_map_free),
> > > > +   LSM_HOOK_INIT(bpf_prog_free_security,
> > > > selinux_bpf_prog_free),
> > > > +#endif
> > > >  };
> > > > 
> > > >  static __init int selinux_init(void)
> > > > diff --git a/security/selinux/include/classmap.h
> > > > b/security/selinux/include/classmap.h
> > > > index 35ffb29a69cb..7253c5eea59c 100644
> > > > --- a/security/selinux/include/classmap.h
> > > > +++ b/security/selinux/include/classmap.h
> > > > @@ -237,6 +237,8 @@ struct security_class_mapping
> > > > secclass_map[] =
> > > > {
> > > >       { "access", NULL } },
> > > >     { "infiniband_endport",
> > > >       { "manage_subnet", NULL } },
> > > > +   { "bpf_map", {"create", "read", "write"} },
> > > > +   { "bpf_prog", {"load", "use"} },
> > > 
> > > Again I have to ask: do you truly need/want two separate classes,
> > > or
> > > would a single class with distinct permissions suffice, ala:
> > >         { "bpf", { "create_map", "read_map", "write_map",
> > > "prog_load",
> > > "prog_use" } },
> > > 
> > > and then allow A self:bpf { create_map read_map write_map
> > > prog_load
> > > prog_use }; would be stored in a single policy avtab rule, and be
> > > cached in a single AVC entry.
> 
> Sorry I missed to reply on this, I keep it that way because sometimes
> we need to grant the permission of accessing eBPF maps and programs
> separately. But keep them in a single class definitely works for me.

If we anticipated a large number of permissions being defined for
either bpf maps or programs, or if we were labeling them differently
(e.g. inheriting from creator for one, while using type transitions for
another), then it might make more sense to split the class (so that we
can support up to 32 distinct permissions for each one, or because we'd
never end up allowing both map and prog permissions to the same target
SID/context).  But in this case I can't see any benefit to using two
classes, and it would consume more memory to do so.

BTW, please be sure to cc Paul Moore on these patches if you aren't
already since he is the selinux kernel tree maintainer.

> > I guess for consistency in naming it should be either:
> >         { "bpf", { "map_create", "map_read", "map_write",
> > "prog_load",
> > "prog_use" } },
> > 
> > or:
> > 
> >         { "bpf", { "create_map", "read_map", "write_map",
> > "load_prog",
> > "use_prog" } },
> > 
> > 
> > > >     { NULL }
> > > >    };
> > > > 
> > > > diff --git a/security/selinux/include/objsec.h
> > > > b/security/selinux/include/objsec.h
> > > > index 1649cd18eb0b..3d54468ce334 100644
> > > > --- a/security/selinux/include/objsec.h
> > > > +++ b/security/selinux/include/objsec.h
> > > > @@ -150,6 +150,10 @@ struct pkey_security_struct {
> > > >     u32     sid;    /* SID of pkey */
> > > >  };
> > > > 
> > > > +struct bpf_security_struct {
> > > > +   u32 sid;  /*SID of bpf obj creater*/
> > > > +};
> > > > +
> > > >  extern unsigned int selinux_checkreqprot;
> > > > 
> > > >  #endif /* _SELINUX_OBJSEC_H_ */

^ permalink raw reply

* Re: [RFC net-next 1/4] net: ipv6: Make inet6addr_validator a blocking notifier
From: Ido Schimmel @ 2017-10-11 13:08 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, jiri, idosch, kjlx
In-Reply-To: <1507653665-20540-2-git-send-email-dsahern@gmail.com>

On Tue, Oct 10, 2017 at 09:41:02AM -0700, David Ahern wrote:
> inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan
> should return an error when an address is already in use") to allow
> address validation before changes are committed and to be able to
> fail the address change with an error back to the user. The address
> validation is not done for addresses received from router
> advertisements.
> 
> Handling RAs in softirq context is the only reason for the notifier
> chain to be atomic versus blocking. Since the only current user, ipvlan,
> of the validator chain ignores softirq context, the notifier can be made
> blocking and simply not invoked for softirq path.
> 
> The blocking option is needed by spectrum for example to validate
> resources for an adding an address to an interface.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

With the fixup posted later:

Reviewed-by: Ido Schimmel <idosch@mellanox.com>

BTW, the in_softirq() check in ipvlan_addr6_validator_event() can be
removed after this patch.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update
From: Eric Dumazet @ 2017-10-11 13:10 UTC (permalink / raw)
  To: Manish Kurup
  Cc: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel, aring,
	mrv, manish.kurup
In-Reply-To: <1507689219-22993-1-git-send-email-manish.kurup@verizon.com>

On Tue, 2017-10-10 at 22:33 -0400, Manish Kurup wrote:
> Using a spinlock in the VLAN action causes performance issues when the VLAN
> action is used on multiple cores. Rewrote the VLAN action to use RCU read
> locking for reads and updates instead.
> 
> Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
> ---
>  include/net/tc_act/tc_vlan.h | 21 ++++++++-----
>  net/sched/act_vlan.c         | 73 ++++++++++++++++++++++++++++++--------------
>  2 files changed, 63 insertions(+), 31 deletions(-)
> 
> diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
> index c2090df..67fd355 100644
> --- a/include/net/tc_act/tc_vlan.h
> +++ b/include/net/tc_act/tc_vlan.h
> @@ -13,12 +13,17 @@
>  #include <net/act_api.h>
>  #include <linux/tc_act/tc_vlan.h>
>  
> +struct tcf_vlan_params {
> +	struct rcu_head   rcu;
> +	int               tcfv_action;
> +	u16               tcfv_push_vid;
> +	__be16            tcfv_push_proto;
> +	u8                tcfv_push_prio;
> +};
> +
>  struct tcf_vlan {
>  	struct tc_action	common;
> -	int			tcfv_action;
> -	u16			tcfv_push_vid;
> -	__be16			tcfv_push_proto;
> -	u8			tcfv_push_prio;
> +	struct tcf_vlan_params __rcu *vlan_p;
>  };
>  #define to_vlan(a) ((struct tcf_vlan *)a)
>  
> @@ -33,22 +38,22 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
>  
>  static inline u32 tcf_vlan_action(const struct tc_action *a)
>  {
> -	return to_vlan(a)->tcfv_action;
> +	return to_vlan(a)->vlan_p->tcfv_action;

This is not proper RCU : ->vlan_p should be read once, and using
rcu_dereference()

So the caller should provide to this helper the 'p' pointer instead of
'a'


CONFIG_SPARSE_RCU_POINTER=y

and make C=2 would probably give you warnings about that.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update
From: Jiri Pirko @ 2017-10-11 13:13 UTC (permalink / raw)
  To: Manish Kurup
  Cc: jhs, xiyou.wangcong, davem, netdev, linux-kernel, aring, mrv,
	manish.kurup
In-Reply-To: <1507689219-22993-1-git-send-email-manish.kurup@verizon.com>

Wed, Oct 11, 2017 at 04:33:39AM CEST, kurup.manish@gmail.com wrote:
>Using a spinlock in the VLAN action causes performance issues when the VLAN
>action is used on multiple cores. Rewrote the VLAN action to use RCU read
>locking for reads and updates instead.
>
>Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
>---
> include/net/tc_act/tc_vlan.h | 21 ++++++++-----
> net/sched/act_vlan.c         | 73 ++++++++++++++++++++++++++++++--------------
> 2 files changed, 63 insertions(+), 31 deletions(-)
>
>diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
>index c2090df..67fd355 100644
>--- a/include/net/tc_act/tc_vlan.h
>+++ b/include/net/tc_act/tc_vlan.h
>@@ -13,12 +13,17 @@
> #include <net/act_api.h>
> #include <linux/tc_act/tc_vlan.h>
> 
>+struct tcf_vlan_params {
>+	struct rcu_head   rcu;

Usually rcu_head is put at the very end of struct.


>+	int               tcfv_action;
>+	u16               tcfv_push_vid;
>+	__be16            tcfv_push_proto;
>+	u8                tcfv_push_prio;
>+};
>+
> struct tcf_vlan {
> 	struct tc_action	common;
>-	int			tcfv_action;
>-	u16			tcfv_push_vid;
>-	__be16			tcfv_push_proto;
>-	u8			tcfv_push_prio;
>+	struct tcf_vlan_params __rcu *vlan_p;
> };
> #define to_vlan(a) ((struct tcf_vlan *)a)
> 
>@@ -33,22 +38,22 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
> 
> static inline u32 tcf_vlan_action(const struct tc_action *a)
> {
>-	return to_vlan(a)->tcfv_action;
>+	return to_vlan(a)->vlan_p->tcfv_action;
> }
> 
> static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
> {
>-	return to_vlan(a)->tcfv_push_vid;
>+	return to_vlan(a)->vlan_p->tcfv_push_vid;
> }
> 
> static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
> {
>-	return to_vlan(a)->tcfv_push_proto;
>+	return to_vlan(a)->vlan_p->tcfv_push_proto;
> }
> 
> static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
> {
>-	return to_vlan(a)->tcfv_push_prio;
>+	return to_vlan(a)->vlan_p->tcfv_push_prio;

All these helpers should use rtnl_dereference() as well


> }
> 
> #endif /* __NET_TC_VLAN_H */
>diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
>index 14c262c..9bb0236 100644
>--- a/net/sched/act_vlan.c
>+++ b/net/sched/act_vlan.c
>@@ -29,31 +29,37 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
> 	int action;
> 	int err;
> 	u16 tci;
>+	struct tcf_vlan_params *p;
> 
> 	tcf_lastuse_update(&v->tcf_tm);
> 	bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
> 
>-	spin_lock(&v->tcf_lock);
>-	action = v->tcf_action;
>-
> 	/* Ensure 'data' points at mac_header prior calling vlan manipulating
> 	 * functions.
> 	 */
> 	if (skb_at_tc_ingress(skb))
> 		skb_push_rcsum(skb, skb->mac_len);
> 
>-	switch (v->tcfv_action) {
>+	rcu_read_lock();
>+
>+	action = READ_ONCE(v->tcf_action);
>+

Too many empty lines. At least remove the one above this ^


>+	p = rcu_dereference(v->vlan_p);
>+
>+	switch (p->tcfv_action) {
> 	case TCA_VLAN_ACT_POP:
> 		err = skb_vlan_pop(skb);
> 		if (err)
> 			goto drop;
> 		break;
>+
> 	case TCA_VLAN_ACT_PUSH:
>-		err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
>-				    (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
>+		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
>+				(p->tcfv_push_prio << VLAN_PRIO_SHIFT));

Again, indentation is odd. Please fix.



> 		if (err)
> 			goto drop;
> 		break;
>+

Avoid adding this unrelated empty line.


> 	case TCA_VLAN_ACT_MODIFY:
> 		/* No-op if no vlan tag (either hw-accel or in-payload) */
> 		if (!skb_vlan_tagged(skb))
>@@ -69,15 +75,16 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
> 				goto drop;
> 		}
> 		/* replace the vid */
>-		tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
>+		tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
> 		/* replace prio bits, if tcfv_push_prio specified */
>-		if (v->tcfv_push_prio) {
>+		if (p->tcfv_push_prio) {
> 			tci &= ~VLAN_PRIO_MASK;
>-			tci |= v->tcfv_push_prio << VLAN_PRIO_SHIFT;
>+			tci |= p->tcfv_push_prio << VLAN_PRIO_SHIFT;
> 		}
> 		/* put updated tci as hwaccel tag */
>-		__vlan_hwaccel_put_tag(skb, v->tcfv_push_proto, tci);
>+		__vlan_hwaccel_put_tag(skb, p->tcfv_push_proto, tci);
> 		break;
>+

Again, avoid adding this unrelated empty line.


> 	default:
> 		BUG();
> 	}
>@@ -89,6 +96,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
> 	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
> 
> unlock:
>+	rcu_read_unlock();
> 	if (skb_at_tc_ingress(skb))
> 		skb_pull_rcsum(skb, skb->mac_len);
> 
>@@ -111,6 +119,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
> 	struct nlattr *tb[TCA_VLAN_MAX + 1];
> 	struct tc_vlan *parm;
> 	struct tcf_vlan *v;
>+	struct tcf_vlan_params *p, *p_old;

The famous reverse-christmas-tree rule dictates this to be put right
above "struct tc_vlan *parm" :)


> 	int action;
> 	__be16 push_vid = 0;
> 	__be16 push_proto = 0;
>@@ -187,16 +196,33 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
> 
> 	v = to_vlan(*a);
> 
>-	spin_lock_bh(&v->tcf_lock);
>-
>-	v->tcfv_action = action;
>-	v->tcfv_push_vid = push_vid;
>-	v->tcfv_push_prio = push_prio;
>-	v->tcfv_push_proto = push_proto;
>+	ASSERT_RTNL();
>+	p = kzalloc(sizeof(*p), GFP_KERNEL);
>+	if (unlikely(!p)) {
>+		if (ovr)
>+			tcf_idr_release(*a, bind);
>+		return -ENOMEM;
>+	}
> 
> 	v->tcf_action = parm->action;
> 
>-	spin_unlock_bh(&v->tcf_lock);
>+	p_old = rtnl_dereference(v->vlan_p);
>+
>+	if (ovr)
>+		spin_lock_bh(&v->tcf_lock);
>+
>+	p->tcfv_action = action;
>+	p->tcfv_push_vid = push_vid;
>+	p->tcfv_push_prio = push_prio;
>+	p->tcfv_push_proto = push_proto;
>+
>+	rcu_assign_pointer(v->vlan_p, p);
>+
>+	if (ovr)
>+		spin_unlock_bh(&v->tcf_lock);
>+
>+	if (p_old)
>+		kfree_rcu(p_old, rcu);
> 
> 	if (ret == ACT_P_CREATED)
> 		tcf_idr_insert(tn, *a);
>@@ -208,25 +234,26 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct tc_action *a,
> {
> 	unsigned char *b = skb_tail_pointer(skb);
> 	struct tcf_vlan *v = to_vlan(a);
>+	struct tcf_vlan_params *p = rtnl_dereference(v->vlan_p);
> 	struct tc_vlan opt = {
> 		.index    = v->tcf_index,
> 		.refcnt   = v->tcf_refcnt - ref,
> 		.bindcnt  = v->tcf_bindcnt - bind,
> 		.action   = v->tcf_action,
>-		.v_action = v->tcfv_action,
>+		.v_action = p->tcfv_action,
> 	};
> 	struct tcf_t t;
> 
> 	if (nla_put(skb, TCA_VLAN_PARMS, sizeof(opt), &opt))
> 		goto nla_put_failure;
> 
>-	if ((v->tcfv_action == TCA_VLAN_ACT_PUSH ||
>-	     v->tcfv_action == TCA_VLAN_ACT_MODIFY) &&
>-	    (nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, v->tcfv_push_vid) ||
>+	if ((p->tcfv_action == TCA_VLAN_ACT_PUSH ||
>+	     p->tcfv_action == TCA_VLAN_ACT_MODIFY) &&
>+	    (nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, p->tcfv_push_vid) ||
> 	     nla_put_be16(skb, TCA_VLAN_PUSH_VLAN_PROTOCOL,
>-			  v->tcfv_push_proto) ||
>+			  p->tcfv_push_proto) ||
> 	     (nla_put_u8(skb, TCA_VLAN_PUSH_VLAN_PRIORITY,
>-					      v->tcfv_push_prio))))
>+					      p->tcfv_push_prio))))
> 		goto nla_put_failure;
> 
> 	tcf_tm_dump(&t, &v->tcf_tm);
>-- 
>2.7.4
>

Besides the couple of nits I pointed out, this looks nice. Thanks!

^ permalink raw reply

* Re: [PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats
From: Jiri Pirko @ 2017-10-11 13:15 UTC (permalink / raw)
  To: Manish Kurup
  Cc: jhs, xiyou.wangcong, davem, netdev, linux-kernel, aring, mrv,
	manish.kurup
In-Reply-To: <1507689147-22941-1-git-send-email-manish.kurup@verizon.com>

Wed, Oct 11, 2017 at 04:32:27AM CEST, kurup.manish@gmail.com wrote:
>The VLAN action maintains one set of stats across all cores, and uses a
>spinlock to synchronize updates to it from the same. Changed this to use a
>per-CPU stats context instead.
>This change will result in better performance.
>
>Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
>---
> net/sched/act_vlan.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
>diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
>index 16eb067..14c262c 100644
>--- a/net/sched/act_vlan.c
>+++ b/net/sched/act_vlan.c
>@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
> 	int err;
> 	u16 tci;
> 
>-	spin_lock(&v->tcf_lock);
> 	tcf_lastuse_update(&v->tcf_tm);
>-	bstats_update(&v->tcf_bstats, skb);
>+	bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
>+
>+	spin_lock(&v->tcf_lock);
> 	action = v->tcf_action;
> 
> 	/* Ensure 'data' points at mac_header prior calling vlan manipulating
>@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
> 
> drop:
> 	action = TC_ACT_SHOT;
>-	v->tcf_qstats.drops++;
>+	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
>+
> unlock:
> 	if (skb_at_tc_ingress(skb))
> 		skb_pull_rcsum(skb, skb->mac_len);
>@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
> 
> 	if (!exists) {
> 		ret = tcf_idr_create(tn, parm->index, est, a,
>-				     &act_vlan_ops, bind, false);
>+						&act_vlan_ops, bind, true);

Please fix this indent nit as pointed out by Jamal.

Feel free to add my tag:
Acked-by: Jiri Pirko <jiri@mellanox.com>


Thanks!

^ permalink raw reply

* Re: [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore
From: Eric Dumazet @ 2017-10-11 13:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Dan Williams
In-Reply-To: <20171010213945.19074-2-fw@strlen.de>

On Tue, Oct 10, 2017 at 2:39 PM, Florian Westphal <fw@strlen.de> wrote:
> xt_replace_table relies on table replacement counter retrieval (which
> uses xt_recseq to synchronize pcpu counters).
>
> This is fine, however with large rule set get_counters() can take
> a very long time -- it needs to synchronize all counters because
> it has to assume concurrent modifications can occur.
>
> Make xt_replace_table synchronize by itself by waiting until all cpus
> had an even seqcount.
>
> This allows a followup patch to copy the counters of the old ruleset
> without any synchonization after xt_replace_table has completed.
>
> Cc: Dan Williams <dcbw@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  v2: fix Erics email address
>
>  net/netfilter/x_tables.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index c83a3b5e1c6c..f2d4a365768f 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1153,6 +1153,7 @@ xt_replace_table(struct xt_table *table,
>               int *error)
>  {
>         struct xt_table_info *private;
> +       unsigned int cpu;
>         int ret;
>
>         ret = xt_jumpstack_alloc(newinfo);
> @@ -1184,12 +1185,20 @@ xt_replace_table(struct xt_table *table,
>
>         /*
>          * Even though table entries have now been swapped, other CPU's
> -        * may still be using the old entries. This is okay, because
> -        * resynchronization happens because of the locking done
> -        * during the get_counters() routine.
> +        * may still be using the old entries...
>          */
>         local_bh_enable();
>
> +       /* ... so wait for even xt_recseq on all cpus */
> +       for_each_possible_cpu(cpu) {
> +               seqcount_t *s = &per_cpu(xt_recseq, cpu);
> +
> +               while (raw_read_seqcount(s) & 1)
> +                       cpu_relax();
> +
> +               cond_resched();
> +       }

It seems that we could also check :

1) If low order bit of sequence is 0

Or

2) the value has changed

       for_each_possible_cpu(cpu) {
                seqcount_t *s = &per_cpu(xt_recseq, cpu);
                 u32 seq = raw_read_seqcount(s) ;

                 if (seq & 1) {
                        do {
                            cpu_relax();
                          } while (seq == raw_read_seqcount(s));

Thanks !

^ permalink raw reply

* Re: [PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update
From: Eric Dumazet @ 2017-10-11 13:30 UTC (permalink / raw)
  To: Manish Kurup
  Cc: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel, aring,
	mrv, manish.kurup
In-Reply-To: <1507727429.31614.22.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 2017-10-11 at 06:10 -0700, Eric Dumazet wrote:
> On Tue, 2017-10-10 at 22:33 -0400, Manish Kurup wrote:
> > Using a spinlock in the VLAN action causes performance issues when the VLAN
> > action is used on multiple cores. Rewrote the VLAN action to use RCU read
> > locking for reads and updates instead.
> > 
> > Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
> > ---
> >  include/net/tc_act/tc_vlan.h | 21 ++++++++-----
> >  net/sched/act_vlan.c         | 73 ++++++++++++++++++++++++++++++--------------
> >  2 files changed, 63 insertions(+), 31 deletions(-)
> > 
> > diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
> > index c2090df..67fd355 100644
> > --- a/include/net/tc_act/tc_vlan.h
> > +++ b/include/net/tc_act/tc_vlan.h
> > @@ -13,12 +13,17 @@
> >  #include <net/act_api.h>
> >  #include <linux/tc_act/tc_vlan.h>
> >  
> > +struct tcf_vlan_params {
> > +	struct rcu_head   rcu;
> > +	int               tcfv_action;
> > +	u16               tcfv_push_vid;
> > +	__be16            tcfv_push_proto;
> > +	u8                tcfv_push_prio;
> > +};
> > +
> >  struct tcf_vlan {
> >  	struct tc_action	common;
> > -	int			tcfv_action;
> > -	u16			tcfv_push_vid;
> > -	__be16			tcfv_push_proto;
> > -	u8			tcfv_push_prio;
> > +	struct tcf_vlan_params __rcu *vlan_p;
> >  };
> >  #define to_vlan(a) ((struct tcf_vlan *)a)
> >  
> > @@ -33,22 +38,22 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
> >  
> >  static inline u32 tcf_vlan_action(const struct tc_action *a)
> >  {
> > -	return to_vlan(a)->tcfv_action;
> > +	return to_vlan(a)->vlan_p->tcfv_action;
> 
> This is not proper RCU : ->vlan_p should be read once, and using
> rcu_dereference()
> 
> So the caller should provide to this helper the 'p' pointer instead of
> 'a'
> 
> 
> CONFIG_SPARSE_RCU_POINTER=y
> 
> and make C=2 would probably give you warnings about that.
> 


BTW no need to change your .config after :


commit 41a2901e7d220875752a8c870e0b53288a578c20
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Fri May 12 15:56:35 2017 -0700

    rcu: Remove SPARSE_RCU_POINTER Kconfig option
    
    The sparse-based checking for non-RCU accesses to RCU-protected pointers
    has been around for a very long time, and it is now the only type of
    sparse-based checking that is optional.  This commit therefore makes
    it unconditional.
    
    Reported-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Fengguang Wu <fengguang.wu@intel.com>

^ permalink raw reply

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
From: Eric Dumazet @ 2017-10-11 13:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Cong Wang, netdev, Eric Dumazet,
	Yuchung Cheng, Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Song Liu
In-Reply-To: <20171011025606.tzts5wgys5hn52p2@ast-mbp>

On Tue, 2017-10-10 at 19:56 -0700, Alexei Starovoitov wrote:

> actually we hit that too for completely different tracing use case.
> Indeed would be good to generate socket cookie unconditionally
> for all sockets. I don't think there is any harm.
> 

Simply call sock_gen_cookie() when needed.

If a tracepoint needs the cookie and the cookie was not yet generated,
it will be generated at this point.

^ permalink raw reply

* Re: [PATCH] rtl8xxxu: mark expected switch fall-throughs
From: Jes Sorensen @ 2017-10-11 13:34 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Gustavo A. R. Silva, linux-wireless, netdev, linux-kernel
In-Reply-To: <87o9peqdo2.fsf@kamboji.qca.qualcomm.com>

On 10/11/2017 04:41 AM, Kalle Valo wrote:
> Jes Sorensen <jes.sorensen@gmail.com> writes:
> 
>> On 10/10/2017 03:30 PM, Gustavo A. R. Silva wrote:
>>> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
>>> where we are expecting to fall through.
>>
>> While this isn't harmful, to me this looks like pointless patch churn
>> for zero gain and it's just ugly.
> 
> In general I find it useful to mark fall through cases. And it's just a
> comment with two words, so they cannot hurt your eyes that much.

I don't see them being harmful in the code, but I don't see them of much 
use either. If it happened as part of natural code development, fine. My 
objection is to people running around doing this systematically causing 
patch churn for little to zero gain.

Jes

^ permalink raw reply

* Re: [RFC net-next 2/4] net: Add extack to validator_info structs used for address notifier
From: Ido Schimmel @ 2017-10-11 13:37 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, jiri, idosch, kjlx
In-Reply-To: <1507653665-20540-3-git-send-email-dsahern@gmail.com>

On Tue, Oct 10, 2017 at 09:41:03AM -0700, David Ahern wrote:
> Add extack to in_validator_info and in6_validator_info. Update the one
> user of each, ipvlan, to return an error message for failures.
> 
> Only manual configuration of an address is plumbed in the IPv6 code path.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

See two nits below. Otherwise:

Reviewed-by: Ido Schimmel <idosch@mellanox.com>

> ---
>  drivers/net/ipvlan/ipvlan_main.c | 10 ++++++++--
>  include/linux/inetdevice.h       |  1 +
>  include/net/addrconf.h           |  1 +
>  net/ipv4/devinet.c               |  8 +++++---
>  net/ipv6/addrconf.c              | 23 +++++++++++++----------
>  5 files changed, 28 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 57c3856bab05..56a868415ba2 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -846,8 +846,11 @@ static int ipvlan_addr6_validator_event(struct notifier_block *unused,
>  
>  	switch (event) {
>  	case NETDEV_UP:
> -		if (ipvlan_addr_busy(ipvlan->port, &i6vi->i6vi_addr, true))
> +		if (ipvlan_addr_busy(ipvlan->port, &i6vi->i6vi_addr, true)) {
> +			NL_SET_ERR_MSG(i6vi->extack,
> +				       "Address already assigned to an ipvlan device");
>  			return notifier_from_errno(-EADDRINUSE);
> +		}
>  		break;
>  	}
>  
> @@ -916,8 +919,11 @@ static int ipvlan_addr4_validator_event(struct notifier_block *unused,
>  
>  	switch (event) {
>  	case NETDEV_UP:
> -		if (ipvlan_addr_busy(ipvlan->port, &ivi->ivi_addr, false))
> +		if (ipvlan_addr_busy(ipvlan->port, &ivi->ivi_addr, false)) {
> +			NL_SET_ERR_MSG(ivi->extack,
> +				       "Address already assigned to an ipvlan device");
>  			return notifier_from_errno(-EADDRINUSE);
> +		}
>  		break;
>  	}
>  
> diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
> index 751d051f0bc7..681dff30940b 100644
> --- a/include/linux/inetdevice.h
> +++ b/include/linux/inetdevice.h
> @@ -154,6 +154,7 @@ struct in_ifaddr {
>  struct in_validator_info {
>  	__be32			ivi_addr;
>  	struct in_device	*ivi_dev;
> +	struct netlink_ext_ack	*extack;
>  };
>  
>  int register_inetaddr_notifier(struct notifier_block *nb);
> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index 87981cd63180..b8b16437c6d5 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -55,6 +55,7 @@ struct prefix_info {
>  struct in6_validator_info {
>  	struct in6_addr		i6vi_addr;
>  	struct inet6_dev	*i6vi_dev;
> +	struct netlink_ext_ack	*extack;
>  };
>  
>  #define IN6_ADDR_HSIZE_SHIFT	4
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index 7ce22a2c07ce..0118698cd623 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -444,7 +444,7 @@ static void check_lifetime(struct work_struct *work);
>  static DECLARE_DELAYED_WORK(check_lifetime_work, check_lifetime);
>  
>  static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
> -			     u32 portid)
> +			     u32 portid, struct netlink_ext_ack  *extack)

Double space before `*extack`.

>  {
>  	struct in_device *in_dev = ifa->ifa_dev;
>  	struct in_ifaddr *ifa1, **ifap, **last_primary;
> @@ -489,6 +489,7 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
>  	 */
>  	ivi.ivi_addr = ifa->ifa_address;
>  	ivi.ivi_dev = ifa->ifa_dev;
> +	ivi.extack = extack;
>  	ret = blocking_notifier_call_chain(&inetaddr_validator_chain,
>  					   NETDEV_UP, &ivi);
>  	ret = notifier_to_errno(ret);
> @@ -521,7 +522,7 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
>  
>  static int inet_insert_ifa(struct in_ifaddr *ifa)
>  {
> -	return __inet_insert_ifa(ifa, NULL, 0);
> +	return __inet_insert_ifa(ifa, NULL, 0, NULL);
>  }
>  
>  static int inet_set_ifa(struct net_device *dev, struct in_ifaddr *ifa)
> @@ -902,7 +903,8 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh,
>  				return ret;
>  			}
>  		}
> -		return __inet_insert_ifa(ifa, nlh, NETLINK_CB(skb).portid);
> +		return __inet_insert_ifa(ifa, nlh, NETLINK_CB(skb).portid,
> +					 extack);
>  	} else {
>  		inet_free_ifa(ifa);
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 632cf4b26277..0bad4a800f73 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -958,7 +958,8 @@ static u32 inet6_addr_hash(const struct in6_addr *addr)
>  static struct inet6_ifaddr *
>  ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
>  	      const struct in6_addr *peer_addr, int pfxlen,
> -	      int scope, u32 flags, u32 valid_lft, u32 prefered_lft)
> +	      int scope, u32 flags, u32 valid_lft, u32 prefered_lft,
> +	      struct netlink_ext_ack  *extack)

Double space before `*extack`.

>  {
>  	struct net *net = dev_net(idev->dev);
>  	struct inet6_ifaddr *ifa = NULL;
> @@ -994,6 +995,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
>  		struct in6_validator_info i6vi = {
>  			.i6vi_addr = *addr,
>  			.i6vi_dev = idev,
> +			.extack = extack,
>  		};
>  
>  		rcu_read_unlock_bh();
> @@ -1336,7 +1338,7 @@ static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, struct inet6_ifaddr *i
>  
>  	ift = ipv6_add_addr(idev, &addr, NULL, tmp_plen,
>  			    ipv6_addr_scope(&addr), addr_flags,
> -			    tmp_valid_lft, tmp_prefered_lft);
> +			    tmp_valid_lft, tmp_prefered_lft, NULL);
>  	if (IS_ERR(ift)) {
>  		in6_ifa_put(ifp);
>  		in6_dev_put(idev);
> @@ -2020,7 +2022,7 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
>  
>  		ifp2 = ipv6_add_addr(idev, &new_addr, NULL, pfxlen,
>  				     scope, flags, valid_lft,
> -				     preferred_lft);
> +				     preferred_lft, NULL);
>  		if (IS_ERR(ifp2))
>  			goto lock_errdad;
>  
> @@ -2478,7 +2480,7 @@ int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
>  					    pinfo->prefix_len,
>  					    addr_type&IPV6_ADDR_SCOPE_MASK,
>  					    addr_flags, valid_lft,
> -					    prefered_lft);
> +					    prefered_lft, NULL);
>  
>  		if (IS_ERR_OR_NULL(ifp))
>  			return -1;
> @@ -2788,7 +2790,8 @@ static int inet6_addr_add(struct net *net, int ifindex,
>  			  const struct in6_addr *pfx,
>  			  const struct in6_addr *peer_pfx,
>  			  unsigned int plen, __u32 ifa_flags,
> -			  __u32 prefered_lft, __u32 valid_lft)
> +			  __u32 prefered_lft, __u32 valid_lft,
> +			  struct netlink_ext_ack *extack)
>  {
>  	struct inet6_ifaddr *ifp;
>  	struct inet6_dev *idev;
> @@ -2847,7 +2850,7 @@ static int inet6_addr_add(struct net *net, int ifindex,
>  	}
>  
>  	ifp = ipv6_add_addr(idev, pfx, peer_pfx, plen, scope, ifa_flags,
> -			    valid_lft, prefered_lft);
> +			    valid_lft, prefered_lft, extack);
>  
>  	if (!IS_ERR(ifp)) {
>  		if (!(ifa_flags & IFA_F_NOPREFIXROUTE)) {
> @@ -2932,7 +2935,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
>  	rtnl_lock();
>  	err = inet6_addr_add(net, ireq.ifr6_ifindex, &ireq.ifr6_addr, NULL,
>  			     ireq.ifr6_prefixlen, IFA_F_PERMANENT,
> -			     INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);
> +			     INFINITY_LIFE_TIME, INFINITY_LIFE_TIME, NULL);
>  	rtnl_unlock();
>  	return err;
>  }
> @@ -2962,7 +2965,7 @@ static void add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
>  
>  	ifp = ipv6_add_addr(idev, addr, NULL, plen,
>  			    scope, IFA_F_PERMANENT,
> -			    INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);
> +			    INFINITY_LIFE_TIME, INFINITY_LIFE_TIME, NULL);
>  	if (!IS_ERR(ifp)) {
>  		spin_lock_bh(&ifp->lock);
>  		ifp->flags &= ~IFA_F_TENTATIVE;
> @@ -3062,7 +3065,7 @@ void addrconf_add_linklocal(struct inet6_dev *idev,
>  #endif
>  
>  	ifp = ipv6_add_addr(idev, addr, NULL, 64, IFA_LINK, addr_flags,
> -			    INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);
> +			    INFINITY_LIFE_TIME, INFINITY_LIFE_TIME, NULL);
>  	if (!IS_ERR(ifp)) {
>  		addrconf_prefix_route(&ifp->addr, ifp->prefix_len, idev->dev, 0, 0);
>  		addrconf_dad_start(ifp);
> @@ -4565,7 +4568,7 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh,
>  		 */
>  		return inet6_addr_add(net, ifm->ifa_index, pfx, peer_pfx,
>  				      ifm->ifa_prefixlen, ifa_flags,
> -				      preferred_lft, valid_lft);
> +				      preferred_lft, valid_lft, extack);
>  	}
>  
>  	if (nlh->nlmsg_flags & NLM_F_EXCL ||
> -- 
> 2.1.4
> 

^ permalink raw reply

* Re: BUG:af_packet fails to TX TSO frames
From: Willem de Bruijn @ 2017-10-11 13:39 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller
In-Reply-To: <c3002d4f-62ff-1d0b-65fb-b366be45a6d8@cambridgegreys.com>

On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
> Hi all,
>
> I am having an issue with af_packet.c
>
> It fails to transmit any TSO frame submitted via raw socket + vnet headers.
> An identical frame is considered valid for tap.

This may be due to validation. As of commit 104ba78c9880 ("packet: on
direct_xmit, limit tso and csum to supported devices") the packet socket
code validates TSO packets and HW support in the direct_xmit path.

Do you have a test program or packet (incl. vnet hdr) to reproduce this
with? I usually test this path with

https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c

> The frames are generated out of legit linux skbufs (in UML) and vnet headers
> work for checksumming on raw, so I should have the raw initialization right.
>
> The header is supposedly parsed correctly and the newly formed skbuf is sent
> to the device transmit routine (or enqueued) . I have debugged it as far as
> it reaching the following line in packet_snd() (line 2592 in 4.13):
>
> err = po->xmit(skb);

That maps either on to packet_direct_xmit or dev_queue_xmit.

>
> This returns NET_XMIT_DROP for any TSO capable device I tested.

You can also try

  perf record -a -g -e skb:kfree_skb sleep 10
  perf report

to see where these packets are dropped.

> They dislike
> the frame. Same frame is accepted by tap. I have went through the header
> parsing and skb allocation code in both af_packet and tap several times and
> I do not see any material difference (except the new zerocopy stuff). So,
> frankly, I am stuck.
>
> Can someone help me to debug this. I do not see an easy way to debug it, but
> this is not a part of the kernel I am familiar with. Is there a suitable
> helper function to try to segment the frame and see exactly what is wrong
> with it?
>
> Cc-ing DaveM as this has no specific maintainer so it falls under his
> umbrella remit.
>
> --
> Anton R. Ivanov
>
> Cambridge Greys Limited, England and Wales company No 10273661
> http://www.cambridgegreys.com/
>

^ permalink raw reply

* Re: [PATCH v2 nf-next 1/2] netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore
From: Florian Westphal @ 2017-10-11 13:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev, Dan Williams
In-Reply-To: <CANn89i+Y9CtNq6qTGAWi7fNfk9RRyxsuY1e2+SBCd1xRYx77zw@mail.gmail.com>

Eric Dumazet <edumazet@google.com> wrote:
> > +       /* ... so wait for even xt_recseq on all cpus */
> > +       for_each_possible_cpu(cpu) {
> > +               seqcount_t *s = &per_cpu(xt_recseq, cpu);
> > +
> > +               while (raw_read_seqcount(s) & 1)
> > +                       cpu_relax();
> > +
> > +               cond_resched();
> > +       }
> 
> It seems that we could also check :
> 
> 1) If low order bit of sequence is 0
>
> Or
> 
> 2) the value has changed
>        for_each_possible_cpu(cpu) {
>                 seqcount_t *s = &per_cpu(xt_recseq, cpu);
>                  u32 seq = raw_read_seqcount(s) ;
> 
>                  if (seq & 1) {
>                         do {
>                             cpu_relax();
>                           } while (seq == raw_read_seqcount(s));
> 

Actually I first used

for_each_possible_cpu(cpu) {
  seqcount_t *s = &per_cpu(xt_recseq, cpu);

  (void)__read_seqcount_begin(s);
}

but it looked confusing (not paired with a _retry function).
I'll respin with a loop like your suggestion above.  Thanks!

^ permalink raw reply

* Re: BUG:af_packet fails to TX TSO frames
From: Anton Ivanov @ 2017-10-11 13:50 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Network Development, David Miller
In-Reply-To: <CAF=yD-+SxQTHxiU5Yi2f8dm6X1s7SA6PdkbeJP-K_9ZuTMAaTA@mail.gmail.com>



On 10/11/17 14:39, Willem de Bruijn wrote:
> On Wed, Oct 11, 2017 at 4:39 AM, Anton Ivanov
> <anton.ivanov@cambridgegreys.com> wrote:
>> Hi all,
>>
>> I am having an issue with af_packet.c
>>
>> It fails to transmit any TSO frame submitted via raw socket + vnet headers.
>> An identical frame is considered valid for tap.
> This may be due to validation. As of commit 104ba78c9880 ("packet: on
> direct_xmit, limit tso and csum to supported devices") the packet socket
> code validates TSO packets and HW support in the direct_xmit path.

I will look at it. I have tried with bridge+vEth (raw socket on the 
vEth) and directly on a tg3 and e1000e. All of these should be tso capable.

CSUM definitely works in both cases, it is only TSO and only via raw 
socket on TX which is refusing to work.

>
> Do you have a test program or packet (incl. vnet hdr) to reproduce this
> with? I usually test this path with

My test program at present is UML instance with my vector IO patches. 
Latest version just went to the uml mailing list and can be pulled from

http://foswiki.kot-begemot.co.uk/Main/EatYourOwnDogFoodOnUML

I am going to dump one of the frames being produced and isolate it for a 
standalone test case.

>
> https://github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
>
>> The frames are generated out of legit linux skbufs (in UML) and vnet headers
>> work for checksumming on raw, so I should have the raw initialization right.
>>
>> The header is supposedly parsed correctly and the newly formed skbuf is sent
>> to the device transmit routine (or enqueued) . I have debugged it as far as
>> it reaching the following line in packet_snd() (line 2592 in 4.13):
>>
>> err = po->xmit(skb);
> That maps either on to packet_direct_xmit or dev_queue_xmit.

I know. In my case it is direct_xmit as I have asked for QDISC bypass.

>
>> This returns NET_XMIT_DROP for any TSO capable device I tested.
> You can also try
>
>    perf record -a -g -e skb:kfree_skb sleep 10
>    perf report
>
> to see where these packets are dropped.

Thanks, will try that.

>
>> They dislike
>> the frame. Same frame is accepted by tap. I have went through the header
>> parsing and skb allocation code in both af_packet and tap several times and
>> I do not see any material difference (except the new zerocopy stuff). So,
>> frankly, I am stuck.
>>
>> Can someone help me to debug this. I do not see an easy way to debug it, but
>> this is not a part of the kernel I am familiar with. Is there a suitable
>> helper function to try to segment the frame and see exactly what is wrong
>> with it?
>>
>> Cc-ing DaveM as this has no specific maintainer so it falls under his
>> umbrella remit.
>>
>> --
>> Anton R. Ivanov
>>
>> Cambridge Greys Limited, England and Wales company No 10273661
>> http://www.cambridgegreys.com/
>>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply

* Re: [RFC net-next 3/4] mlxsw: spectrum: router: Add support for address validator notifier
From: Ido Schimmel @ 2017-10-11 13:54 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, jiri, idosch, kjlx
In-Reply-To: <1507653665-20540-4-git-send-email-dsahern@gmail.com>

On Tue, Oct 10, 2017 at 09:41:04AM -0700, David Ahern wrote:
> Add support for inetaddr_validator and inet6addr_validator. The
> notifiers provide a means for validating ipv4 and ipv6 addresses
> before the addresses are installed and on failure the error
> is propagated back to the user.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 10 ++++
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  4 ++
>  .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 53 ++++++++++++++++++++++
>  3 files changed, 67 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> index 321988ac57cc..da4ee91235be 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> @@ -4505,11 +4505,19 @@ static struct notifier_block mlxsw_sp_netdevice_nb __read_mostly = {
>  	.notifier_call = mlxsw_sp_netdevice_event,
>  };
>  
> +static struct notifier_block mlxsw_sp_inetaddr_valid_nb __read_mostly = {
> +	.notifier_call = mlxsw_sp_inetaddr_valid_event,
> +};
> +
>  static struct notifier_block mlxsw_sp_inetaddr_nb __read_mostly = {
>  	.notifier_call = mlxsw_sp_inetaddr_event,
>  	.priority = 10,	/* Must be called before FIB notifier block */

This line can now be removed since the validator notifier is called
before the inetaddr notifier.

It's used to create a RIF before fib_add_ifaddr() is invoked to create a
prefix route. IPv6 doesn't need that since routes aren't created based
on a NETDEV_UP event in inet6addr.

>  };
>  
> +static struct notifier_block mlxsw_sp_inet6addr_valid_nb __read_mostly = {
> +	.notifier_call = mlxsw_sp_inet6addr_valid_event,
> +};
> +
>  static struct notifier_block mlxsw_sp_inet6addr_nb __read_mostly = {
>  	.notifier_call = mlxsw_sp_inet6addr_event,
>  };
> @@ -4533,7 +4541,9 @@ static int __init mlxsw_sp_module_init(void)
>  	int err;
>  
>  	register_netdevice_notifier(&mlxsw_sp_netdevice_nb);
> +	register_inetaddr_validator_notifier(&mlxsw_sp_inetaddr_valid_nb);
>  	register_inetaddr_notifier(&mlxsw_sp_inetaddr_nb);
> +	register_inet6addr_validator_notifier(&mlxsw_sp_inet6addr_valid_nb);
>  	register_inet6addr_notifier(&mlxsw_sp_inet6addr_nb);
>  	register_netevent_notifier(&mlxsw_sp_router_netevent_nb);

Need unregister in rollback and __exit function.

>  
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> index 8e45183dc9bb..4865a6f58c83 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> @@ -390,8 +390,12 @@ int mlxsw_sp_router_netevent_event(struct notifier_block *unused,
>  int mlxsw_sp_netdevice_router_port_event(struct net_device *dev);
>  int mlxsw_sp_inetaddr_event(struct notifier_block *unused,
>  			    unsigned long event, void *ptr);
> +int mlxsw_sp_inetaddr_valid_event(struct notifier_block *unused,
> +				  unsigned long event, void *ptr);
>  int mlxsw_sp_inet6addr_event(struct notifier_block *unused,
>  			     unsigned long event, void *ptr);
> +int mlxsw_sp_inet6addr_valid_event(struct notifier_block *unused,
> +				   unsigned long event, void *ptr);
>  int mlxsw_sp_netdevice_vrf_event(struct net_device *l3_dev, unsigned long event,
>  				 struct netdev_notifier_changeupper_info *info);
>  void
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> index 6a356f4b99a3..7d53fdf2c0a8 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> @@ -5656,6 +5656,33 @@ int mlxsw_sp_inetaddr_event(struct notifier_block *unused,
>  	struct mlxsw_sp_rif *rif;
>  	int err = 0;
>  
> +	/* NETDEV_UP event is handled by mlxsw_sp_inetaddr_valid_event */
> +	if (event == NETDEV_UP)
> +		goto out;
> +
> +	mlxsw_sp = mlxsw_sp_lower_get(dev);
> +	if (!mlxsw_sp)
> +		goto out;
> +
> +	rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
> +	if (!mlxsw_sp_rif_should_config(rif, dev, event))
> +		goto out;
> +
> +	err = __mlxsw_sp_inetaddr_event(dev, event);
> +out:
> +	return notifier_from_errno(err);
> +}
> +
> +/* only expected to be called for event == NETDEV_UP */

I don't really mind, but I don't think the comment is necessary.

> +int mlxsw_sp_inetaddr_valid_event(struct notifier_block *unused,
> +				  unsigned long event, void *ptr)
> +{
> +	struct in_validator_info *ivi = (struct in_validator_info *)ptr;

We usually put a space after the cast in mlxsw (I'm aware of the
checkpatch CHECK).

> +	struct net_device *dev = ivi->ivi_dev->dev;
> +	struct mlxsw_sp *mlxsw_sp;
> +	struct mlxsw_sp_rif *rif;
> +	int err = 0;
> +
>  	mlxsw_sp = mlxsw_sp_lower_get(dev);
>  	if (!mlxsw_sp)
>  		goto out;
> @@ -5708,6 +5735,10 @@ int mlxsw_sp_inet6addr_event(struct notifier_block *unused,
>  	struct mlxsw_sp_inet6addr_event_work *inet6addr_work;
>  	struct net_device *dev = if6->idev->dev;
>  
> +	/* NETDEV_UP event is handled by mlxsw_sp_inet6addr_valid_event */
> +	if (event == NETDEV_UP)
> +		return NOTIFY_DONE;
> +
>  	if (!mlxsw_sp_port_dev_lower_find_rcu(dev))
>  		return NOTIFY_DONE;
>  
> @@ -5724,6 +5755,28 @@ int mlxsw_sp_inet6addr_event(struct notifier_block *unused,
>  	return NOTIFY_DONE;
>  }
>  
> +int mlxsw_sp_inet6addr_valid_event(struct notifier_block *unused,
> +				   unsigned long event, void *ptr)
> +{
> +	struct in6_validator_info *i6vi = (struct in6_validator_info *)ptr;

Same as above.

Looks good otherwise, thanks!

> +	struct net_device *dev = i6vi->i6vi_dev->dev;
> +	struct mlxsw_sp *mlxsw_sp;
> +	struct mlxsw_sp_rif *rif;
> +	int err = 0;
> +
> +	mlxsw_sp = mlxsw_sp_lower_get(dev);
> +	if (!mlxsw_sp)
> +		goto out;
> +
> +	rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
> +	if (!mlxsw_sp_rif_should_config(rif, dev, event))
> +		goto out;
> +
> +	err = __mlxsw_sp_inetaddr_event(dev, event);
> +out:
> +	return notifier_from_errno(err);
> +}
> +
>  static int mlxsw_sp_rif_edit(struct mlxsw_sp *mlxsw_sp, u16 rif_index,
>  			     const char *mac, int mtu)
>  {
> -- 
> 2.1.4
> 

^ permalink raw reply

* [PATCH] [net-next] ip_tunnel: fix building with NET_IP_TUNNEL=m
From: Arnd Bergmann @ 2017-10-11 13:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: Arnd Bergmann, netdev, linux-kernel

When af_mpls is built-in but the tunnel support is a module,
we get a link failure:

net/mpls/af_mpls.o: In function `mpls_init':
af_mpls.c:(.init.text+0xdc): undefined reference to `ip_tunnel_encap_add_ops'

This adds a Kconfig statement to prevent the broken
configuration and force mpls to be a module as well in
this case.

Fixes: bdc476413dcd ("ip_tunnel: add mpls over gre support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/mpls/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 5c467ef97311..801ea9098387 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -24,6 +24,7 @@ config NET_MPLS_GSO

 config MPLS_ROUTING
 	tristate "MPLS: routing support"
+	depends on NET_IP_TUNNEL || NET_IP_TUNNEL=n
 	---help---
 	 Add support for forwarding of mpls packets.

-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH] [net-next] ip_tunnel: fix building with NET_IP_TUNNEL=m
From: Arnd Bergmann @ 2017-10-11 13:57 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Networking, Linux Kernel Mailing List,
	Amine Kherbouche
In-Reply-To: <20171011135546.3536829-1-arnd@arndb.de>

I forgot to Cc Amine, sorry.

On Wed, Oct 11, 2017 at 3:55 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> When af_mpls is built-in but the tunnel support is a module,
> we get a link failure:
>
> net/mpls/af_mpls.o: In function `mpls_init':
> af_mpls.c:(.init.text+0xdc): undefined reference to `ip_tunnel_encap_add_ops'
>
> This adds a Kconfig statement to prevent the broken
> configuration and force mpls to be a module as well in
> this case.
>
> Fixes: bdc476413dcd ("ip_tunnel: add mpls over gre support")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  net/mpls/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
> index 5c467ef97311..801ea9098387 100644
> --- a/net/mpls/Kconfig
> +++ b/net/mpls/Kconfig
> @@ -24,6 +24,7 @@ config NET_MPLS_GSO
>
>  config MPLS_ROUTING
>         tristate "MPLS: routing support"
> +       depends on NET_IP_TUNNEL || NET_IP_TUNNEL=n
>         ---help---
>          Add support for forwarding of mpls packets.
>
> --
> 2.9.0
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox