Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: ethernet: lpc_eth: remove unused local variable
From: David Miller @ 2018-10-20  0:05 UTC (permalink / raw)
  To: vz; +Cc: slemieux.tyco, netdev
In-Reply-To: <20181018230653.10637-1-vz@mleia.com>

From: Vladimir Zapolskiy <vz@mleia.com>
Date: Fri, 19 Oct 2018 02:06:53 +0300

> A trivial change which removes an unused local variable, the issue
> is reported as a compile time warning:
> 
>   drivers/net/ethernet/nxp/lpc_eth.c: In function 'lpc_eth_drv_probe':
>   drivers/net/ethernet/nxp/lpc_eth.c:1250:21: warning: variable 'phydev' set but not used [-Wunused-but-set-variable]
>     struct phy_device *phydev;
>                        ^~~~~~
> 
> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ethernet: lpc_eth: remove CONFIG_OF guard from the driver
From: David Miller @ 2018-10-20  0:05 UTC (permalink / raw)
  To: vz; +Cc: slemieux.tyco, netdev
In-Reply-To: <20181018225841.17835-1-vz@mleia.com>

From: Vladimir Zapolskiy <vz@mleia.com>
Date: Fri, 19 Oct 2018 01:58:41 +0300

> The MAC controller device is available on NXP LPC32xx platform only,
> and the LPC32xx platform supports OF builds only, so additional
> checks in the device driver are not needed.
> 
> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ethernet: lpc_eth: clean up the list of included headers
From: David Miller @ 2018-10-20  0:05 UTC (permalink / raw)
  To: vz; +Cc: slemieux.tyco, netdev
In-Reply-To: <20181018225325.4959-1-vz@mleia.com>

From: Vladimir Zapolskiy <vz@mleia.com>
Date: Fri, 19 Oct 2018 01:53:25 +0300

> The change removes all unnecessary included headers from the driver
> source code, the remaining list is sorted in alphabetical order.
> 
> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2] netpoll: allow cleanup to be synchronous
From: David Miller @ 2018-10-19 23:58 UTC (permalink / raw)
  To: dbanerje; +Cc: nhorman, netdev
In-Reply-To: <20181018151826.8373-1-dbanerje@akamai.com>

From: Debabrata Banerjee <dbanerje@akamai.com>
Date: Thu, 18 Oct 2018 11:18:26 -0400

> This fixes a problem introduced by:
> commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
> 
> When using netconsole on a bond, __netpoll_cleanup can asynchronously
> recurse multiple times, each __netpoll_free_async call can result in
> more __netpoll_free_async's. This means there is now a race between
> cleanup_work queues on multiple netpoll_info's on multiple devices and
> the configuration of a new netpoll. For example if a netconsole is set
> to enable 0, reconfigured, and enable 1 immediately, this netconsole
> will likely not work.
> 
> Given the reason for __netpoll_free_async is it can be called when rtnl
> is not locked, if it is locked, we should be able to execute
> synchronously. It appears to be locked everywhere it's called from.
> 
> Generalize the design pattern from the teaming driver for current
> callers of __netpoll_free_async.
> 
> CC: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>

Applied, thank you.

^ permalink raw reply

* Re: [bpf-next v3 0/2] Fix kcm + sockmap by checking psock type
From: John Fastabend @ 2018-10-19 23:38 UTC (permalink / raw)
  To: Daniel Borkmann, ast, eric.dumazet; +Cc: netdev
In-Reply-To: <81286597-9886-84e0-dcb4-00c15c85fed0@iogearbox.net>

On 10/19/2018 03:57 PM, Daniel Borkmann wrote:
> On 10/20/2018 12:51 AM, Daniel Borkmann wrote:
>> On 10/18/2018 10:58 PM, John Fastabend wrote:
>>> We check if the sk_user_data (the psock in skmsg) is in fact a sockmap
>>> type to late, after we read the refcnt which is an error. This
>>> series moves the check up before reading refcnt and also adds a test
>>> to test_maps to test trying to add a KCM socket into a sockmap.
>>>
>>> While reviewig this code I also found an issue with KCM and kTLS
>>> where each uses sk_data_ready hooks and associated stream parser
>>> breaking expectations in kcm, ktls or both. But that fix will need
>>> to go to net.
>>>
>>> Thanks to Eric for reporting.
>>>
>>> v2: Fix up file +/- my scripts lost track of them
>>> v3: return EBUSY if refcnt is zero
>>>
>>> John Fastabend (2):
>>>   bpf: skmsg, fix psock create on existing kcm/tls port
>>>   bpf: test_maps add a test to catch kcm + sockmap
>>>
>>>  include/linux/skmsg.h                     | 25 +++++++++---
>>>  net/core/sock_map.c                       | 11 +++---
>>>  tools/testing/selftests/bpf/Makefile      |  2 +-
>>>  tools/testing/selftests/bpf/sockmap_kcm.c | 14 +++++++
>>>  tools/testing/selftests/bpf/test_maps.c   | 64 ++++++++++++++++++++++++++++++-
>>>  5 files changed, 103 insertions(+), 13 deletions(-)
>>>  create mode 100644 tools/testing/selftests/bpf/sockmap_kcm.c
>>
>> Applied, thanks!
> 
> Fyi, I've only applied patch 1/2 for now to get the bug fixed. The patch 2/2 throws
> a bunch of warnings that look like the below. Also, I think we leak kcm socket in
> error paths and once we're done with testing, so would be good to close it once
> unneeded. Please respin the test as a stand-alone commit, thanks:
> 

Thanks, I didn't see the warnings below locally but will look
into spinning a good version tonight with the closing sock fix
as well.

John

> [...]
> bpf-next/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /home/darkstar/trees/bpf-next-ok/tools/testing/selftests/bpf/test_maps
> test_maps.c: In function ‘test_sockmap’:
> test_maps.c:869:0: warning: "AF_KCM" redefined
>  #define AF_KCM 41
> 
> In file included from /usr/include/sys/socket.h:38:0,
>                  from test_maps.c:21:
> /usr/include/bits/socket.h:133:0: note: this is the location of the previous definition
>  #define AF_KCM  PF_KCM
> 

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Martin Lau @ 2018-10-19 23:27 UTC (permalink / raw)
  To: Edward Cree
  Cc: Yonghong Song, Alexei Starovoitov, daniel@iogearbox.net,
	netdev@vger.kernel.org, Kernel Team
In-Reply-To: <7d251102-2073-98b7-94a6-4dfcc21a3071@solarflare.com>

On Fri, Oct 19, 2018 at 10:26:53PM +0100, Edward Cree wrote:
> On 19/10/18 20:36, Martin Lau wrote:
> > On Fri, Oct 19, 2018 at 06:04:11PM +0100, Edward Cree wrote:
> >> But you *do* have such a new section.
> >> The patch comment talks about a 'FuncInfo Table' which appears to
> > Note that the new section, which contains the FuncInfo Table,
> > is in a new ELF section ".BTF.ext" instead of the ".BTF".
> > It is not in the ".BTF" section because it is only useful during
> > bpf_prog_load().
> I thought it was because it needed to be munged by the loader/linker?
> 
> > IIUC, I think what you are suggesting here is to use (type_id, name)
> > to describe DW_TAG_subprogram "int foo1(int) {}", "int foo2(int) {}",
> > "int foo3(int) {}" where type_id here is referring to the same
> > DW_TAG_subroutine_type, and only define that _one_
> > DW_TAG_subroutine_type in the BTF "type" section.
> Yes, something like that.
> 
> > If the concern is having both FUNC and FUNC_PROTO is confusing,
> The concern is that you're conflating different entities (types
>  and instances); FUNC_PROTO is just a symptom/canary of that.
> 
> > we could go back to the CTF way which adds a new function section
> > in ".BTF" and it is only for DW_TAG_subprogram.
> > BTF_KIND_FUNC_PROTO is then no longer necessary.
> > Some of new BTF verifier checkings may actually go away also.
> > The down side is there will be two id spaces.
> Two id spaces... one for types and the other for subprograms.
> These are different things, so why would you _want_ them to share
>  an id space?  I don't, for instance, see any situation in which
>  you'd want some other record to have a field that could reference
>  either.
> And the 'subprogram id' doesn't have to be just for subprograms;
>  it could be for instances generally — like I've been saying, a
>  variable declaration is to an object type what a subprogram is to
>  a function type, just with a few complications like "subprograms
>  can only appear at file scope, not nested in other functions" and
>  "variables of function type are immutable".
> (I'm assuming that at some point we're going to want to be able to
>  have BTF information for e.g. variables stored on a subprogram's
>  stack, if only for stuff like single-stepping in a debugger in
>  userspace with some sort of mock.  At that point, the variable
>  has to have its own record — you can't just have some sort of
>  magic type record because e.g. "struct foo bar;" has two names,
>  one for the type and one for the variable.)
> 
btf_type is not exactly a C type.

btf_type is a debug-info.  Each btf_type carries specific
debug information.  Name is part of the debug-info/btf_type.
If something carries different debug-info, it is another btf_type.
Like struct, the member's names of struct is part of the btf_type.
A struct with the same member's types but different member's names
is a different btf_type.

The same go for function.  The function with different function
names and arg names is a different btf_type.

> > Discussed a bit offline with folks about the two id spaces
> > situation and it is not good for debugging purpose.
> Could you unpack this a bit more?
Having two id spaces for debug-info is confusing.  They are
all debug-info at the end.

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 08/10] audit: add support for containerid to network namespaces
From: Paul Moore @ 2018-10-19 23:18 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <5a2b4aadf6994f622bc1ad27a8a6889c7e61edff.1533065887.git.rgb@redhat.com>

On Tue, Jul 31, 2018 at 4:12 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> Audit events could happen in a network namespace outside of a task
> context due to packets received from the net that trigger an auditing
> rule prior to being associated with a running task.  The network
> namespace could in use by multiple containers by association to the
> tasks in that network namespace.  We still want a way to attribute
> these events to any potential containers.  Keep a list per network
> namespace to track these audit container identifiiers.
>
> Add/increment the audit container identifier on:
> - initial setting of the audit container identifier via /proc
> - clone/fork call that inherits an audit container identifier
> - unshare call that inherits an audit container identifier
> - setns call that inherits an audit container identifier
> Delete/decrement the audit container identifier on:
> - an inherited audit container identifier dropped when child set
> - process exit
> - unshare call that drops a net namespace
> - setns call that drops a net namespace
>
> See: https://github.com/linux-audit/audit-kernel/issues/92
> See: https://github.com/linux-audit/audit-testsuite/issues/64
> See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h | 17 ++++++++++
>  kernel/audit.c        | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/auditsc.c      |  8 ++++-
>  kernel/nsproxy.c      |  4 +++
>  4 files changed, 114 insertions(+), 1 deletion(-)

...

> @@ -308,6 +312,86 @@ static struct sock *audit_get_sk(const struct net *net)
>         return aunet->sk;
>  }
>
> +/**
> + * audit_get_netns_contid_list - Return the audit container ID list for the given network namespace
> + * @net: the destination network namespace
> + *
> + * Description:
> + * Returns the list pointer if valid, NULL otherwise.  The caller must ensure
> + * that a reference is held for the network namespace while the sock is in use.
> + */
> +struct list_head *audit_get_netns_contid_list(const struct net *net)
> +{
> +       struct audit_net *aunet = net_generic(net, audit_net_id);
> +
> +       return &aunet->contid_list;
> +}
> +
> +spinlock_t *audit_get_netns_contid_list_lock(const struct net *net)
> +{
> +       struct audit_net *aunet = net_generic(net, audit_net_id);
> +
> +       return &aunet->contid_list_lock;
> +}

Instead of returning the spinlock, just do away with the
audit_get_ns_contid_list_lock() function and create two separate lock
and unlock functions that basically do the net_generic() and spinlock
operations together, for example:

static int audit_netns_contid_lock(const struct net *net)
{
  aunet = net_generic(net, audit_net_id);

  if (!aunet)
    return -whatever;

  spin_lock(aunet->lock);
  return 0;
}

> +void audit_netns_contid_add(struct net *net, u64 contid)
> +{
> +       spinlock_t *lock = audit_get_netns_contid_list_lock(net);
> +       struct list_head *contid_list = audit_get_netns_contid_list(net);
> +       struct audit_contid *cont;
> +
> +       if (!audit_contid_valid(contid))
> +               return;
> +       spin_lock(lock);
> +       if (!list_empty(contid_list))
> +               list_for_each_entry(cont, contid_list, list)
> +                       if (cont->id == contid) {
> +                               refcount_inc(&cont->refcount);
> +                               goto out;
> +                       }
> +       cont = kmalloc(sizeof(struct audit_contid), GFP_KERNEL);
> +       if (cont) {
> +               INIT_LIST_HEAD(&cont->list);
> +               cont->id = contid;
> +               refcount_set(&cont->refcount, 1);
> +               list_add(&cont->list, contid_list);
> +       }
> +out:
> +       spin_unlock(lock);
> +}
> +
> +void audit_netns_contid_del(struct net *net, u64 contid)
> +{
> +       spinlock_t *lock = audit_get_netns_contid_list_lock(net);
> +       struct list_head *contid_list = audit_get_netns_contid_list(net);
> +       struct audit_contid *cont = NULL;
> +
> +       if (!audit_contid_valid(contid))
> +               return;
> +       spin_lock(lock);
> +       if (!list_empty(contid_list))
> +               list_for_each_entry(cont, contid_list, list)
> +                       if (cont->id == contid) {
> +                               list_del(&cont->list);
> +                               if (refcount_dec_and_test(&cont->refcount))
> +                                       kfree(cont);
> +                               break;
> +                       }
> +       spin_unlock(lock);
> +}
> +
> +void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
> +{
> +       u64 contid = audit_get_contid(p);
> +       struct nsproxy *new = p->nsproxy;
> +
> +       if (!audit_contid_valid(contid))
> +               return;
> +       audit_netns_contid_del(ns->net_ns, contid);
> +       if (new)
> +               audit_netns_contid_add(new->net_ns, contid);
> +}
> +
>  void audit_panic(const char *message)
>  {
>         switch (audit_failure) {
> @@ -1547,6 +1631,8 @@ static int __net_init audit_net_init(struct net *net)
>                 return -ENOMEM;
>         }
>         aunet->sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
> +       INIT_LIST_HEAD(&aunet->contid_list);
> +       spin_lock_init(&aunet->contid_list_lock);
>
>         return 0;
>  }
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 610c6869..fdf3f68 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -75,6 +75,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/fsnotify_backend.h>
>  #include <uapi/linux/limits.h>
> +#include <net/net_namespace.h>
>
>  #include "audit.h"
>
> @@ -2165,6 +2166,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>         uid_t uid;
>         struct tty_struct *tty;
>         char comm[sizeof(current->comm)];
> +       struct net *net = task->nsproxy->net_ns;
>
>         task_lock(task);
>         /* Can't set if audit disabled */
> @@ -2186,8 +2188,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>         else if (!(thread_group_leader(task) && thread_group_empty(task)))
>                 rc = -EALREADY;
>         read_unlock(&tasklist_lock);
> -       if (!rc)
> +       if (!rc) {
> +               if (audit_contid_valid(oldcontid))
> +                       audit_netns_contid_del(net, oldcontid);
>                 task->audit->contid = contid;
> +               audit_netns_contid_add(net, contid);
> +       }
>         task_unlock(task);
>
>         if (!audit_enabled)
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index f6c5d33..718b120 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -27,6 +27,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/cgroup.h>
>  #include <linux/perf_event.h>
> +#include <linux/audit.h>
>
>  static struct kmem_cache *nsproxy_cachep;
>
> @@ -140,6 +141,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>         struct nsproxy *old_ns = tsk->nsproxy;
>         struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
>         struct nsproxy *new_ns;
> +       u64 contid = audit_get_contid(tsk);
>
>         if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
>                               CLONE_NEWPID | CLONE_NEWNET |
> @@ -167,6 +169,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
>                 return  PTR_ERR(new_ns);
>
>         tsk->nsproxy = new_ns;
> +       audit_netns_contid_add(new_ns->net_ns, contid);
>         return 0;
>  }
>
> @@ -224,6 +227,7 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
>         ns = p->nsproxy;
>         p->nsproxy = new;
>         task_unlock(p);
> +       audit_switch_task_namespaces(ns, p);
>
>         if (ns && atomic_dec_and_test(&ns->count))
>                 free_nsproxy(ns);

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 01/10] audit: collect audit task parameters
From: Paul Moore @ 2018-10-19 23:15 UTC (permalink / raw)
  To: rgb
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <8e617ab568df28a66dfbe3284452de186b42fb0f.1533065887.git.rgb@redhat.com>

On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> The audit-related parameters in struct task_struct should ideally be
> collected together and accessed through a standard audit API.
>
> Collect the existing loginuid, sessionid and audit_context together in a
> new struct audit_task_info called "audit" in struct task_struct.
>
> Use kmem_cache to manage this pool of memory.
> Un-inline audit_free() to be able to always recover that memory.
>
> See: https://github.com/linux-audit/audit-kernel/issues/81
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h | 34 ++++++++++++++++++++++++----------
>  include/linux/sched.h |  5 +----
>  init/init_task.c      |  3 +--
>  init/main.c           |  2 ++
>  kernel/auditsc.c      | 51 ++++++++++++++++++++++++++++++++++++++++++---------
>  kernel/fork.c         |  4 +++-
>  6 files changed, 73 insertions(+), 26 deletions(-)

...

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 9334fbe..8964332 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -219,8 +219,15 @@ static inline void audit_log_task_info(struct audit_buffer *ab,
>
>  /* These are defined in auditsc.c */
>                                 /* Public API */
> +struct audit_task_info {
> +       kuid_t                  loginuid;
> +       unsigned int            sessionid;
> +       struct audit_context    *ctx;
> +};

...

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 87bf02d..e117272 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -873,10 +872,8 @@ struct task_struct {
>
>         struct callback_head            *task_works;
>
> -       struct audit_context            *audit_context;
>  #ifdef CONFIG_AUDITSYSCALL
> -       kuid_t                          loginuid;
> -       unsigned int                    sessionid;
> +       struct audit_task_info          *audit;
>  #endif
>         struct seccomp                  seccomp;

Prior to this patch audit_context was available regardless of
CONFIG_AUDITSYSCALL, after this patch the corresponding audit_context
is only available when CONFIG_AUDITSYSCALL is defined.

> diff --git a/init/main.c b/init/main.c
> index 3b4ada1..6aba171 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -92,6 +92,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/audit.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -721,6 +722,7 @@ asmlinkage __visible void __init start_kernel(void)
>         nsfs_init();
>         cpuset_init();
>         cgroup_init();
> +       audit_task_init();
>         taskstats_init_early();
>         delayacct_init();

It seems like we would need either init_struct_audit or
audit_task_init(), but not both, yes?

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index fb20746..88779a7 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -841,7 +841,7 @@ static inline struct audit_context *audit_take_context(struct task_struct *tsk,
>                                                       int return_valid,
>                                                       long return_code)
>  {
> -       struct audit_context *context = tsk->audit_context;
> +       struct audit_context *context = tsk->audit->ctx;
>
>         if (!context)
>                 return NULL;
> @@ -926,6 +926,15 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
>         return context;
>  }
>
> +static struct kmem_cache *audit_task_cache;
> +
> +void __init audit_task_init(void)
> +{
> +       audit_task_cache = kmem_cache_create("audit_task",
> +                                            sizeof(struct audit_task_info),
> +                                            0, SLAB_PANIC, NULL);
> +}

This is somewhat related to the CONFIG_AUDITSYSCALL comment above, but
since the audit_task_info contains generic audit state (not just
syscall related state), it seems like this, and the audit_task_info
accessors/helpers, should live in kernel/audit.c.

There are probably a few other things that should move to
kernel/audit.c too, e.g. audit_alloc().  Have you verified that this
builds/runs correctly on architectures that define CONFIG_AUDIT but
not CONFIG_AUDITSYSCALL?

>  /**
>   * audit_alloc - allocate an audit context block for a task
>   * @tsk: task
> @@ -940,17 +949,28 @@ int audit_alloc(struct task_struct *tsk)
>         struct audit_context *context;
>         enum audit_state     state;
>         char *key = NULL;
> +       struct audit_task_info *info;
> +
> +       info = kmem_cache_zalloc(audit_task_cache, GFP_KERNEL);
> +       if (!info)
> +               return -ENOMEM;
> +       info->loginuid = audit_get_loginuid(current);
> +       info->sessionid = audit_get_sessionid(current);
> +       tsk->audit = info;
>
>         if (likely(!audit_ever_enabled))
>                 return 0; /* Return if not auditing. */

I don't view this as necessary for initial acceptance, and
synchronization/locking might render this undesirable, but it would be
curious to see if we could do something clever with refcnts and
copy-on-write to minimize the number of kmem_cache objects in use in
the !audit_ever_enabled (and possibly the AUDIT_DISABLED) case.

>         state = audit_filter_task(tsk, &key);
>         if (state == AUDIT_DISABLED) {
> +               audit_set_context(tsk, NULL);

It's already NULL, isn't it?

>                 clear_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
>                 return 0;
>         }
>
>         if (!(context = audit_alloc_context(state))) {
> +               tsk->audit = NULL;
> +               kmem_cache_free(audit_task_cache, info);
>                 kfree(key);
>                 audit_log_lost("out of memory in audit_alloc");
>                 return -ENOMEM;
> @@ -962,6 +982,12 @@ int audit_alloc(struct task_struct *tsk)
>         return 0;
>  }
>
> +struct audit_task_info init_struct_audit = {
> +       .loginuid = INVALID_UID,
> +       .sessionid = AUDIT_SID_UNSET,
> +       .ctx = NULL,
> +};
> +
>  static inline void audit_free_context(struct audit_context *context)
>  {
>         audit_free_names(context);

^ permalink raw reply

* Re: [PATCH v8 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Alexei Starovoitov @ 2018-10-19 23:09 UTC (permalink / raw)
  To: Song Liu; +Cc: netdev, ast, daniel, kernel-team, edumazet
In-Reply-To: <20181019165758.1410213-1-songliubraving@fb.com>

On Fri, Oct 19, 2018 at 09:57:56AM -0700, Song Liu wrote:
> Changes v7 -> v8:
> 1. Dynamically allocate the dummy sk to avoid race conditions.
> 
> Changes v6 -> v7:
> 1. Make dummy sk a global variable (test_run_sk).
> 
> Changes v5 -> v6:
> 1. Fixed dummy sk in bpf_prog_test_run_skb() as suggested by Eric Dumazet.
> 
> Changes v4 -> v5:
> 1. Replaced bpf_compute_and_save_data_pointers() with
>    bpf_compute_and_save_data_end();
>    Replaced bpf_restore_data_pointers() with bpf_restore_data_end().
> 2. Fixed indentation in test_verifier.c
> 
> Changes v3 -> v4:
> 1. Fixed crash issue reported by Alexei.
> 
> Changes v2 -> v3:
> 1. Added helper function bpf_compute_and_save_data_pointers() and
>    bpf_restore_data_pointers().
> 
> Changes v1 -> v2:
> 1. Updated the list of read-only fields, and read-write fields.
> 2. Added dummy sk to bpf_prog_test_run_skb().
> 
> This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
> some __skb_buff data directly.

Applied, Thanks

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Nitin Hande @ 2018-10-19 23:04 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Joe Stringer, Martin KaFai Lau, netdev, ast, Jesper Brouer,
	john fastabend
In-Reply-To: <6c530eaa-c4dd-bcf9-fce5-1f9d66b8efe3@iogearbox.net>

On Fri, 19 Oct 2018 22:32:28 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/19/2018 06:47 PM, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 22:07, Martin Lau <kafai@fb.com> wrote:  
> >> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:  
> >>> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann
> >>> <daniel@iogearbox.net> wrote:  
> >>>> On 10/18/2018 11:06 PM, Joe Stringer wrote:  
> >>>>> On Thu, 18 Oct 2018 at 11:54, Nitin Hande
> >>>>> <nitin.hande@gmail.com> wrote:  
> >>>> [...]  
> >>>>>> Open Issue
> >>>>>> * The underlying code relies on presence of an skb to find out
> >>>>>> the right sk for the case of REUSEPORT socket option. Since
> >>>>>> there is no skb available at XDP hookpoint, the helper
> >>>>>> function will return the first available sk based off the 5
> >>>>>> tuple hash. If the desire is to return a particular sk
> >>>>>> matching reuseport_cb function, please suggest way to tackle
> >>>>>> it, which can be addressed in a future commit.  
> >>>>  
> >>>>>> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>  
> >>>>>
> >>>>> Thanks Nitin, LGTM overall.
> >>>>>
> >>>>> The REUSEPORT thing suggests that the usage of this helper from
> >>>>> XDP layer may lead to a different socket being selected vs. the
> >>>>> equivalent call at TC hook, or other places where the selection
> >>>>> may occur. This could be a bit counter-intuitive.
> >>>>>
> >>>>> One thought I had to work around this was to introduce a flag,
> >>>>> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> >>>>> effectively communicate in the API that the bpf_sk_lookup_xxx()
> >>>>> functions will only select a REUSEPORT socket based on the hash
> >>>>> and not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs.
> >>>>> The absence of the flag would support finding REUSEPORT sockets
> >>>>> by other mechanisms (which would be allowed for now from TC
> >>>>> hooks but would be disallowed from XDP, since there's no
> >>>>> specific plan to support this).  
> >>>>
> >>>> Hmm, given skb is NULL here the only way to lookup the socket in
> >>>> such scenario is based on hash, that is, inet_ehashfn() /
> >>>> inet6_ehashfn(), perhaps alternative is to pass this hash in
> >>>> from XDP itself to the helper so it could be custom selector. Do
> >>>> you have a specific use case on this for XDP (just curious)?  
> >>>
> >>> I don't have a use case for SO_REUSEPORT introspection from XDP,
> >>> so I'm primarily thinking from the perspective of making the
> >>> behaviour clear in the API in a way that leaves open the
> >>> possibility for a reasonable implementation in future. From that
> >>> perspective, my main concern is that it may surprise some BPF
> >>> writers that the same "bpf_sk_lookup_tcp()" call (with identical
> >>> parameters) may have different behaviour at TC vs. XDP layers, as
> >>> the BPF selection of sockets is respected at TC but not at XDP.
> >>>
> >>> FWIW we're already out of parameters for the actual call, so if we
> >>> wanted to allow passing a hash in, we'd need to either dedicate
> >>> half the 'flags' field for this configurable hash, or consider
> >>> adding the new hash parameter to 'struct bpf_sock_tuple'.
> >>>
> >>> +Martin for any thoughts on SO_REUSEPORT and XDP here.  
> >> The XDP/TC prog has read access to the sk fields through
> >> 'struct bpf_sock'?
> >>
> >> A quick thought...
> >> Considering all sk in the same reuse->socks[] share
> >> many things (e.g. family,type,protocol,ip,port..etc are the same),
> >> I wonder returning which particular sk from reuse->socks[] will
> >> matter too much since most of the fields from 'struct bpf_sock'
> >> will be the same.  Some of fields in 'struct bpf_sock' could be
> >> different though, like priority?  Hence, another possibility is to
> >> limit the accessible fields for the XDP prog.  Only allow
> >> accessing the fields that must be the same among the sk in the
> >> same reuse->socks[].  
> > 
> > This sounds pretty reasonable to me.  
> 
> Agree, and in any case this difference in returned sk selection should
> probably also be documented in the uapi helper description.

Okay, will do in a v2.

Thanks
Nitin

^ permalink raw reply

* Re: [bpf-next v3 0/2] Fix kcm + sockmap by checking psock type
From: Daniel Borkmann @ 2018-10-19 22:57 UTC (permalink / raw)
  To: John Fastabend, ast, eric.dumazet; +Cc: netdev
In-Reply-To: <76735602-6af6-bc03-ee66-294987e768e5@iogearbox.net>

On 10/20/2018 12:51 AM, Daniel Borkmann wrote:
> On 10/18/2018 10:58 PM, John Fastabend wrote:
>> We check if the sk_user_data (the psock in skmsg) is in fact a sockmap
>> type to late, after we read the refcnt which is an error. This
>> series moves the check up before reading refcnt and also adds a test
>> to test_maps to test trying to add a KCM socket into a sockmap.
>>
>> While reviewig this code I also found an issue with KCM and kTLS
>> where each uses sk_data_ready hooks and associated stream parser
>> breaking expectations in kcm, ktls or both. But that fix will need
>> to go to net.
>>
>> Thanks to Eric for reporting.
>>
>> v2: Fix up file +/- my scripts lost track of them
>> v3: return EBUSY if refcnt is zero
>>
>> John Fastabend (2):
>>   bpf: skmsg, fix psock create on existing kcm/tls port
>>   bpf: test_maps add a test to catch kcm + sockmap
>>
>>  include/linux/skmsg.h                     | 25 +++++++++---
>>  net/core/sock_map.c                       | 11 +++---
>>  tools/testing/selftests/bpf/Makefile      |  2 +-
>>  tools/testing/selftests/bpf/sockmap_kcm.c | 14 +++++++
>>  tools/testing/selftests/bpf/test_maps.c   | 64 ++++++++++++++++++++++++++++++-
>>  5 files changed, 103 insertions(+), 13 deletions(-)
>>  create mode 100644 tools/testing/selftests/bpf/sockmap_kcm.c
> 
> Applied, thanks!

Fyi, I've only applied patch 1/2 for now to get the bug fixed. The patch 2/2 throws
a bunch of warnings that look like the below. Also, I think we leak kcm socket in
error paths and once we're done with testing, so would be good to close it once
unneeded. Please respin the test as a stand-alone commit, thanks:

[...]
bpf-next/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /home/darkstar/trees/bpf-next-ok/tools/testing/selftests/bpf/test_maps
test_maps.c: In function ‘test_sockmap’:
test_maps.c:869:0: warning: "AF_KCM" redefined
 #define AF_KCM 41

In file included from /usr/include/sys/socket.h:38:0,
                 from test_maps.c:21:
/usr/include/bits/socket.h:133:0: note: this is the location of the previous definition
 #define AF_KCM  PF_KCM

^ permalink raw reply

* Re: [bpf-next v3 0/2] Fix kcm + sockmap by checking psock type
From: Daniel Borkmann @ 2018-10-19 22:51 UTC (permalink / raw)
  To: John Fastabend, ast, eric.dumazet; +Cc: netdev
In-Reply-To: <1539896316-13403-1-git-send-email-john.fastabend@gmail.com>

On 10/18/2018 10:58 PM, John Fastabend wrote:
> We check if the sk_user_data (the psock in skmsg) is in fact a sockmap
> type to late, after we read the refcnt which is an error. This
> series moves the check up before reading refcnt and also adds a test
> to test_maps to test trying to add a KCM socket into a sockmap.
> 
> While reviewig this code I also found an issue with KCM and kTLS
> where each uses sk_data_ready hooks and associated stream parser
> breaking expectations in kcm, ktls or both. But that fix will need
> to go to net.
> 
> Thanks to Eric for reporting.
> 
> v2: Fix up file +/- my scripts lost track of them
> v3: return EBUSY if refcnt is zero
> 
> John Fastabend (2):
>   bpf: skmsg, fix psock create on existing kcm/tls port
>   bpf: test_maps add a test to catch kcm + sockmap
> 
>  include/linux/skmsg.h                     | 25 +++++++++---
>  net/core/sock_map.c                       | 11 +++---
>  tools/testing/selftests/bpf/Makefile      |  2 +-
>  tools/testing/selftests/bpf/sockmap_kcm.c | 14 +++++++
>  tools/testing/selftests/bpf/test_maps.c   | 64 ++++++++++++++++++++++++++++++-
>  5 files changed, 103 insertions(+), 13 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/sockmap_kcm.c

Applied, thanks!

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Eric Dumazet @ 2018-10-19 22:25 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet, andre
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis
In-Reply-To: <4693819f-4a76-532f-9b24-d4328183c807@gmail.com>



On 10/19/2018 02:58 PM, Eric Dumazet wrote:
> 
> 
> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
>> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>>>
>>> On 15.10.2018 17:41, Eric Dumazet wrote:
>>>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
>>>>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
>>>>> got the following first bad commit:
>>>>>
>>>>> commit 88078d98d1bb085d72af8437707279e203524fa5
>>>>> Author: Eric Dumazet <edumazet@google.com>
>>>>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>>>>
>>>>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>>>>
>>>>>      After working on IP defragmentation lately, I found that some large
>>>>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>>>>>      zero paddings on the last (small) fragment.
>>>>>
>>>>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>>>>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>>>>>      fragments had CHECKSUM_COMPLETE set.
>>>>>
>>>>>      We can instead compute the checksum of the part we are trimming,
>>>>>      usually smaller than the part we keep.
>>>>>
>>>>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>>
>>>>
>>>> Thanks for bisecting !
>>>>
>>>> This commit is known to expose some NIC/driver bugs.
>>>>
>>>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
>>>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>>>>
>>>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
>>>>
>>>
>>> I've seen similar on several systems with mlx4 cards when using 4.18.x -
>>> that is hw csum failure followed by some backtrace.
>>>
>>> Only seems to happen on systems dealing with quite a bit of UDP.
>>>
>>
>> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
>> but CHECKSUM_UNNECESSARY
>>
>> I would be nice to track this a bit further, maybe by providing the
>> full packet content.
>>
>>> Example from 4.18.10:
>>>> [635607.740574] p0xe0: hw csum failure
>>>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>>> [635607.740599] Call Trace:
>>>> [635607.740602]  <IRQ>
>>>> [635607.740611]  dump_stack+0x5c/0x7b
>>>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
>>>> [635607.740621]  udp6_gro_receive+0x211/0x290
>>>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
>>>> [635607.740627]  dev_gro_receive+0x33e/0x550
>>>> [635607.740628]  napi_gro_frags+0xa2/0x210
>>>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>>> [635607.740658]  net_rx_action+0xe0/0x2e0
>>>> [635607.740662]  __do_softirq+0xd8/0x2e5
>>>> [635607.740666]  irq_exit+0xb4/0xc0
>>>> [635607.740667]  do_IRQ+0x85/0xd0
>>>> [635607.740670]  common_interrupt+0xf/0xf
>>>> [635607.740671]  </IRQ>
>>>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>>> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>>> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>>> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>>> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>>> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>>> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
>>>> [635607.740712]  do_idle+0x1d0/0x240
>>>> [635607.740715]  cpu_startup_entry+0x5f/0x70
>>>> [635607.740719]  start_secondary+0x185/0x1a0
>>>> [635607.740722]  secondary_startup_64+0xa5/0xb0
>>>> [635607.740731] p0xe0: hw csum failure
>>>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>>> [635607.740746] Call Trace:
>>>> [635607.740747]  <IRQ>
>>>> [635607.740750]  dump_stack+0x5c/0x7b
>>>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
>>>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
>>>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>>> [635607.740774]  ip6_input_finish+0xc0/0x460
>>>> [635607.740776]  ip6_input+0x2b/0x90
>>>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
>>>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
>>>> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
>>>> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
>>>> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
>>>> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
>>>> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
>>>> [635607.740793]  napi_gro_frags+0x165/0x210
>>>> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>>> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>>> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>>> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>>> [635607.740811]  net_rx_action+0xe0/0x2e0
>>>> [635607.740813]  __do_softirq+0xd8/0x2e5
>>>> [635607.740816]  irq_exit+0xb4/0xc0
>>>> [635607.740817]  do_IRQ+0x85/0xd0
>>>> [635607.740820]  common_interrupt+0xf/0xf
>>>> [635607.740821]  </IRQ>
>>>> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>>> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>>> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>>> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>>> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>>> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>>> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>>> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>>> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
>>>> [635607.740857]  do_idle+0x1d0/0x240
>>>> [635607.740859]  cpu_startup_entry+0x5f/0x70
>>>> [635607.740861]  start_secondary+0x185/0x1a0
>>>> [635607.740863]  secondary_startup_64+0xa5/0xb0
> 
> As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()
> 
> Problems comes from trimming an odd number of bytes.

More exactly, trimming bytes starting at an odd offset.

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Eric Dumazet @ 2018-10-19 21:58 UTC (permalink / raw)
  To: Eric Dumazet, andre
  Cc: Stephen Hemminger, netdev, rossi.f, Dimitris Michailidis
In-Reply-To: <CANn89i+Q2z6DZx3q4KR7sU2hY6td2B61GMtyroLmTXFqJH4XRw@mail.gmail.com>



On 10/16/2018 06:00 AM, Eric Dumazet wrote:
> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@tomt.net> wrote:
>>
>> On 15.10.2018 17:41, Eric Dumazet wrote:
>>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
>>>> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
>>>> got the following first bad commit:
>>>>
>>>> commit 88078d98d1bb085d72af8437707279e203524fa5
>>>> Author: Eric Dumazet <edumazet@google.com>
>>>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>>>
>>>>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>>>
>>>>      After working on IP defragmentation lately, I found that some large
>>>>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>>>>      zero paddings on the last (small) fragment.
>>>>
>>>>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>>>>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>>>>      fragments had CHECKSUM_COMPLETE set.
>>>>
>>>>      We can instead compute the checksum of the part we are trimming,
>>>>      usually smaller than the part we keep.
>>>>
>>>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>
>>> Thanks for bisecting !
>>>
>>> This commit is known to expose some NIC/driver bugs.
>>>
>>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
>>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>>>
>>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
>>>
>>
>> I've seen similar on several systems with mlx4 cards when using 4.18.x -
>> that is hw csum failure followed by some backtrace.
>>
>> Only seems to happen on systems dealing with quite a bit of UDP.
>>
> 
> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
> but CHECKSUM_UNNECESSARY
> 
> I would be nice to track this a bit further, maybe by providing the
> full packet content.
> 
>> Example from 4.18.10:
>>> [635607.740574] p0xe0: hw csum failure
>>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>> [635607.740599] Call Trace:
>>> [635607.740602]  <IRQ>
>>> [635607.740611]  dump_stack+0x5c/0x7b
>>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
>>> [635607.740621]  udp6_gro_receive+0x211/0x290
>>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
>>> [635607.740627]  dev_gro_receive+0x33e/0x550
>>> [635607.740628]  napi_gro_frags+0xa2/0x210
>>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>> [635607.740658]  net_rx_action+0xe0/0x2e0
>>> [635607.740662]  __do_softirq+0xd8/0x2e5
>>> [635607.740666]  irq_exit+0xb4/0xc0
>>> [635607.740667]  do_IRQ+0x85/0xd0
>>> [635607.740670]  common_interrupt+0xf/0xf
>>> [635607.740671]  </IRQ>
>>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>> [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>> [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>> [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>> [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>> [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>> [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
>>> [635607.740712]  do_idle+0x1d0/0x240
>>> [635607.740715]  cpu_startup_entry+0x5f/0x70
>>> [635607.740719]  start_secondary+0x185/0x1a0
>>> [635607.740722]  secondary_startup_64+0xa5/0xb0
>>> [635607.740731] p0xe0: hw csum failure
>>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
>>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
>>> [635607.740746] Call Trace:
>>> [635607.740747]  <IRQ>
>>> [635607.740750]  dump_stack+0x5c/0x7b
>>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
>>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
>>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
>>> [635607.740774]  ip6_input_finish+0xc0/0x460
>>> [635607.740776]  ip6_input+0x2b/0x90
>>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
>>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
>>> [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
>>> [635607.740785]  __netif_receive_skb_core+0x455/0xb30
>>> [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
>>> [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
>>> [635607.740792]  netif_receive_skb_internal+0x24/0xb0
>>> [635607.740793]  napi_gro_frags+0x165/0x210
>>> [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
>>> [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
>>> [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
>>> [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
>>> [635607.740811]  net_rx_action+0xe0/0x2e0
>>> [635607.740813]  __do_softirq+0xd8/0x2e5
>>> [635607.740816]  irq_exit+0xb4/0xc0
>>> [635607.740817]  do_IRQ+0x85/0xd0
>>> [635607.740820]  common_interrupt+0xf/0xf
>>> [635607.740821]  </IRQ>
>>> [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
>>> [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>>> [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
>>> [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
>>> [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
>>> [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
>>> [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
>>> [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
>>> [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
>>> [635607.740857]  do_idle+0x1d0/0x240
>>> [635607.740859]  cpu_startup_entry+0x5f/0x70
>>> [635607.740861]  start_secondary+0x185/0x1a0
>>> [635607.740863]  secondary_startup_64+0xa5/0xb0

As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub()

Problems comes from trimming an odd number of bytes.

^ permalink raw reply

* Re: [PATCH v2 net] r8169: fix NAPI handling under high load
From: Francois Romieu @ 2018-10-19 21:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Heiner Kallweit, David Miller, Realtek linux nic maintainers,
	netdev@vger.kernel.org
In-Reply-To: <bcb1b9e9-150d-38d1-d61e-2e0bf440d310@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> :
> On 10/18/2018 03:59 PM, Francois Romieu wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> :
> > [...]
> >> One has to wonder why rtl8169_poll(), which might be called in a loop under DOS,
> >> has to call rtl_ack_events() ?
> > 
> > So as to cover a wider temporal range before any event can trigger an
> > extra irq. I was more worried about irq cost than about IO cost (and
> > I still am).
> > 
> Normally the IRQ would not be enabled under DOS.

Yes.

My concern was not the DOS situation when NAPI runs at full speed.
As far as I was able to experiment with it, the driver did not seem
too bad here.

The location of the ack targets the interim situation where the IRQ
rate can increase before NAPI kicks in. By increasing the time range
whose events can be acked, the maximum irq rate should be lowered.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Edward Cree @ 2018-10-19 21:26 UTC (permalink / raw)
  To: Martin Lau
  Cc: Yonghong Song, Alexei Starovoitov, daniel@iogearbox.net,
	netdev@vger.kernel.org, Kernel Team
In-Reply-To: <20181019193548.g4neaudunoan5gvn@kafai-mbp.dhcp.thefacebook.com>

On 19/10/18 20:36, Martin Lau wrote:
> On Fri, Oct 19, 2018 at 06:04:11PM +0100, Edward Cree wrote:
>> But you *do* have such a new section.
>> The patch comment talks about a 'FuncInfo Table' which appears to
> Note that the new section, which contains the FuncInfo Table,
> is in a new ELF section ".BTF.ext" instead of the ".BTF".
> It is not in the ".BTF" section because it is only useful during
> bpf_prog_load().
I thought it was because it needed to be munged by the loader/linker?

> IIUC, I think what you are suggesting here is to use (type_id, name)
> to describe DW_TAG_subprogram "int foo1(int) {}", "int foo2(int) {}",
> "int foo3(int) {}" where type_id here is referring to the same
> DW_TAG_subroutine_type, and only define that _one_
> DW_TAG_subroutine_type in the BTF "type" section.
Yes, something like that.

> If the concern is having both FUNC and FUNC_PROTO is confusing,
The concern is that you're conflating different entities (types
 and instances); FUNC_PROTO is just a symptom/canary of that.

> we could go back to the CTF way which adds a new function section
> in ".BTF" and it is only for DW_TAG_subprogram.
> BTF_KIND_FUNC_PROTO is then no longer necessary.
> Some of new BTF verifier checkings may actually go away also.
> The down side is there will be two id spaces.
Two id spaces... one for types and the other for subprograms.
These are different things, so why would you _want_ them to share
 an id space?  I don't, for instance, see any situation in which
 you'd want some other record to have a field that could reference
 either.
And the 'subprogram id' doesn't have to be just for subprograms;
 it could be for instances generally — like I've been saying, a
 variable declaration is to an object type what a subprogram is to
 a function type, just with a few complications like "subprograms
 can only appear at file scope, not nested in other functions" and
 "variables of function type are immutable".
(I'm assuming that at some point we're going to want to be able to
 have BTF information for e.g. variables stored on a subprogram's
 stack, if only for stuff like single-stepping in a debugger in
 userspace with some sort of mock.  At that point, the variable
 has to have its own record — you can't just have some sort of
 magic type record because e.g. "struct foo bar;" has two names,
 one for the type and one for the variable.)

> Discussed a bit offline with folks about the two id spaces
> situation and it is not good for debugging purpose.
Could you unpack this a bit more?

-Ed

^ permalink raw reply

* (unknown)
From: David Miller @ 2018-10-19 20:58 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs
In-Reply-To: <6068.1539982295@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Fri, 19 Oct 2018 21:51:35 +0100

> David Miller <davem@davemloft.net> wrote:
> 
>> > Is there going to be a merge of net into net-next before the merge
>> > window opens?  Or do you have a sample merge that I can rebase my
>> > afs-next branch on?
>> 
>> I'll be doing a net to net-next merge some time today.
> 
> Excellent, thanks!

And this is now complete.

^ permalink raw reply

* (unknown)
From: David Howells @ 2018-10-19 20:51 UTC (permalink / raw)
  To: David Miller; +Cc: dhowells, netdev, linux-afs
In-Reply-To: <20181019.104625.891840017846483385.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:

> > Is there going to be a merge of net into net-next before the merge
> > window opens?  Or do you have a sample merge that I can rebase my
> > afs-next branch on?
> 
> I'll be doing a net to net-next merge some time today.

Excellent, thanks!

David

^ permalink raw reply

* RE: [PATCH net-next v2] netpoll: allow cleanup to be synchronous
From: Banerjee, Debabrata @ 2018-10-19 20:46 UTC (permalink / raw)
  To: 'Neil Horman'; +Cc: David S . Miller, netdev@vger.kernel.org
In-Reply-To: <20181019203458.GC30254@hmswarspite.think-freely.org>

> From: Neil Horman <nhorman@tuxdriver.com>

> I presume you've tested this with some of the stacked devices?  I think I'm
> ok with this change, but I'd like confirmation that its worked.
> 
> Neil

Yes I've tested this on a bond device with vlan stacked on top.

-Deb

> 
> > CC: Neil Horman <nhorman@tuxdriver.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
> > ---
> >  drivers/net/bonding/bond_main.c |  3 ++-
> >  drivers/net/macvlan.c           |  2 +-
> >  drivers/net/team/team.c         |  5 +----
> >  include/linux/netpoll.h         |  4 +---
> >  net/8021q/vlan_dev.c            |  3 +--
> >  net/bridge/br_device.c          |  2 +-
> >  net/core/netpoll.c              | 20 +++++---------------
> >  net/dsa/slave.c                 |  2 +-
> >  8 files changed, 13 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/net/bonding/bond_main.c
> > b/drivers/net/bonding/bond_main.c index ee28ec9e0aba..ffa37adb7681
> > 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -963,7 +963,8 @@ static inline void slave_disable_netpoll(struct slave
> *slave)
> >  		return;
> >
> >  	slave->np = NULL;
> > -	__netpoll_free_async(np);
> > +
> > +	__netpoll_free(np);
> >  }
> >
> >  static void bond_poll_controller(struct net_device *bond_dev) diff
> > --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index
> > cfda146f3b3b..fc8d5f1ee1ad 100644
> > --- a/drivers/net/macvlan.c
> > +++ b/drivers/net/macvlan.c
> > @@ -1077,7 +1077,7 @@ static void macvlan_dev_netpoll_cleanup(struct
> > net_device *dev)
> >
> >  	vlan->netpoll = NULL;
> >
> > -	__netpoll_free_async(netpoll);
> > +	__netpoll_free(netpoll);
> >  }
> >  #endif	/* CONFIG_NET_POLL_CONTROLLER */
> >
> > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index
> > d887016e54b6..db633ae9f784 100644
> > --- a/drivers/net/team/team.c
> > +++ b/drivers/net/team/team.c
> > @@ -1104,10 +1104,7 @@ static void team_port_disable_netpoll(struct
> team_port *port)
> >  		return;
> >  	port->np = NULL;
> >
> > -	/* Wait for transmitting packets to finish before freeing. */
> > -	synchronize_rcu_bh();
> > -	__netpoll_cleanup(np);
> > -	kfree(np);
> > +	__netpoll_free(np);
> >  }
> >  #else
> >  static int team_port_enable_netpoll(struct team_port *port) diff
> > --git a/include/linux/netpoll.h b/include/linux/netpoll.h index
> > 3ef82d3a78db..676f1ff161a9 100644
> > --- a/include/linux/netpoll.h
> > +++ b/include/linux/netpoll.h
> > @@ -31,8 +31,6 @@ struct netpoll {
> >  	bool ipv6;
> >  	u16 local_port, remote_port;
> >  	u8 remote_mac[ETH_ALEN];
> > -
> > -	struct work_struct cleanup_work;
> >  };
> >
> >  struct netpoll_info {
> > @@ -63,7 +61,7 @@ int netpoll_parse_options(struct netpoll *np, char
> > *opt);  int __netpoll_setup(struct netpoll *np, struct net_device
> > *ndev);  int netpoll_setup(struct netpoll *np);  void
> > __netpoll_cleanup(struct netpoll *np); -void
> > __netpoll_free_async(struct netpoll *np);
> > +void __netpoll_free(struct netpoll *np);
> >  void netpoll_cleanup(struct netpoll *np);  void
> > netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
> >  			     struct net_device *dev);
> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index
> > 546af0e73ac3..ff720f1ebf73 100644
> > --- a/net/8021q/vlan_dev.c
> > +++ b/net/8021q/vlan_dev.c
> > @@ -756,8 +756,7 @@ static void vlan_dev_netpoll_cleanup(struct
> net_device *dev)
> >  		return;
> >
> >  	vlan->netpoll = NULL;
> > -
> > -	__netpoll_free_async(netpoll);
> > +	__netpoll_free(netpoll);
> >  }
> >  #endif /* CONFIG_NET_POLL_CONTROLLER */
> >
> > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index
> > e053a4e43758..c6abf927f0c9 100644
> > --- a/net/bridge/br_device.c
> > +++ b/net/bridge/br_device.c
> > @@ -344,7 +344,7 @@ void br_netpoll_disable(struct net_bridge_port *p)
> >
> >  	p->np = NULL;
> >
> > -	__netpoll_free_async(np);
> > +	__netpoll_free(np);
> >  }
> >
> >  #endif
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c index
> > de1d1ba92f2d..6ac71624ead4 100644
> > --- a/net/core/netpoll.c
> > +++ b/net/core/netpoll.c
> > @@ -591,7 +591,6 @@ int __netpoll_setup(struct netpoll *np, struct
> > net_device *ndev)
> >
> >  	np->dev = ndev;
> >  	strlcpy(np->dev_name, ndev->name, IFNAMSIZ);
> > -	INIT_WORK(&np->cleanup_work, netpoll_async_cleanup);
> >
> >  	if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
> >  		np_err(np, "%s doesn't support polling, aborting\n", @@ -
> 790,10
> > +789,6 @@ void __netpoll_cleanup(struct netpoll *np)  {
> >  	struct netpoll_info *npinfo;
> >
> > -	/* rtnl_dereference would be preferable here but
> > -	 * rcu_cleanup_netpoll path can put us in here safely without
> > -	 * holding the rtnl, so plain rcu_dereference it is
> > -	 */
> >  	npinfo = rtnl_dereference(np->dev->npinfo);
> >  	if (!npinfo)
> >  		return;
> > @@ -814,21 +809,16 @@ void __netpoll_cleanup(struct netpoll *np)  }
> > EXPORT_SYMBOL_GPL(__netpoll_cleanup);
> >
> > -static void netpoll_async_cleanup(struct work_struct *work)
> > +void __netpoll_free(struct netpoll *np)
> >  {
> > -	struct netpoll *np = container_of(work, struct netpoll,
> cleanup_work);
> > +	ASSERT_RTNL();
> >
> > -	rtnl_lock();
> > +	/* Wait for transmitting packets to finish before freeing. */
> > +	synchronize_rcu_bh();
> >  	__netpoll_cleanup(np);
> > -	rtnl_unlock();
> >  	kfree(np);
> >  }
> > -
> > -void __netpoll_free_async(struct netpoll *np) -{
> > -	schedule_work(&np->cleanup_work);
> > -}
> > -EXPORT_SYMBOL_GPL(__netpoll_free_async);
> > +EXPORT_SYMBOL_GPL(__netpoll_free);
> >
> >  void netpoll_cleanup(struct netpoll *np)  { diff --git
> > a/net/dsa/slave.c b/net/dsa/slave.c index 3f840b6eea69..3679e13b2ead
> > 100644
> > --- a/net/dsa/slave.c
> > +++ b/net/dsa/slave.c
> > @@ -722,7 +722,7 @@ static void dsa_slave_netpoll_cleanup(struct
> > net_device *dev)
> >
> >  	p->netpoll = NULL;
> >
> > -	__netpoll_free_async(netpoll);
> > +	__netpoll_free(netpoll);
> >  }
> >
> >  static void dsa_slave_poll_controller(struct net_device *dev)
> > --
> > 2.19.1
> >
> >

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/2] improve and fix barriers for walking perf ring buffer
From: Alexei Starovoitov @ 2018-10-19 20:45 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: peterz, paulmck, will.deacon, acme, yhs, john.fastabend, netdev
In-Reply-To: <20181019135103.3602-1-daniel@iogearbox.net>

On Fri, Oct 19, 2018 at 03:51:01PM +0200, Daniel Borkmann wrote:
> This set first adds smp_* barrier variants to tools infrastructure
> and updates perf and libbpf to make use of them. For details, please
> see individual patches, thanks!
> 
> Arnaldo, if there are no objections, could this be routed via bpf-next
> with Acked-by's due to later dependencies in libbpf? Alternatively,
> I could also get the 2nd patch out during merge window, but perhaps
> it's okay to do in one go as there shouldn't be much conflict in perf
> itself.
> 
> Thanks!
> 
> v1 -> v2:
>   - add common helper and switch to acquire/release variants
>     when possible, thanks Peter!

Applied, Thanks

^ permalink raw reply

* [PATCH iproute2-next] Tree wide: Drop sockaddr_nl arg
From: David Ahern @ 2018-10-19 20:44 UTC (permalink / raw)
  To: netdev, stephen; +Cc: David Ahern

From: David Ahern <dsahern@gmail.com>

No command, filter, or print function uses the sockaddr_nl arg,
so just drop it.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 bridge/br_common.h   |  9 +++------
 bridge/fdb.c         |  2 +-
 bridge/link.c        |  3 +--
 bridge/mdb.c         |  2 +-
 bridge/monitor.c     |  9 ++++-----
 bridge/vlan.c        | 12 +++---------
 genl/ctrl.c          | 10 ++++------
 genl/genl.c          |  3 +--
 genl/genl_utils.h    |  3 +--
 include/libnetlink.h |  6 ++----
 include/ll_map.h     |  3 +--
 ip/ip_common.h       | 36 ++++++++++++------------------------
 ip/ipaddress.c       | 26 ++++++++++----------------
 ip/ipaddrlabel.c     |  4 ++--
 ip/ipfou.c           |  3 +--
 ip/ipila.c           |  3 +--
 ip/ipl2tp.c          |  6 ++----
 ip/iplink.c          |  9 +++------
 ip/iplink_bridge.c   |  3 +--
 ip/ipmacsec.c        |  3 +--
 ip/ipmonitor.c       | 25 ++++++++++++-------------
 ip/ipmroute.c        |  2 +-
 ip/ipneigh.c         |  2 +-
 ip/ipnetconf.c       |  8 +++-----
 ip/ipnetns.c         |  5 ++---
 ip/ipntable.c        |  3 +--
 ip/ipprefix.c        |  2 +-
 ip/iproute.c         | 17 +++++++----------
 ip/iprule.c          | 11 ++++-------
 ip/ipseg6.c          |  5 ++---
 ip/iptoken.c         |  2 +-
 ip/iptuntap.c        |  3 +--
 ip/rtmon.c           |  7 +++----
 ip/tcp_metrics.c     |  5 ++---
 ip/tunnel.c          |  3 +--
 ip/xfrm.h            |  6 ++----
 ip/xfrm_monitor.c    | 37 +++++++++++++++----------------------
 ip/xfrm_policy.c     |  9 +++------
 ip/xfrm_state.c      | 11 ++++-------
 lib/libnetlink.c     |  7 +++----
 lib/ll_map.c         |  3 +--
 misc/ifstat.c        |  6 ++----
 misc/ss.c            | 31 ++++++++++++-------------------
 tc/m_action.c        |  6 ++----
 tc/tc_class.c        |  3 +--
 tc/tc_common.h       |  8 ++++----
 tc/tc_filter.c       |  4 ++--
 tc/tc_monitor.c      | 11 +++++------
 tc/tc_qdisc.c        |  6 ++----
 49 files changed, 155 insertions(+), 248 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index 00a4e9ea125d..23d653df931d 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -7,12 +7,9 @@
 		((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32))))
 
 void print_vlan_info(struct rtattr *tb, int ifindex);
-int print_linkinfo(const struct sockaddr_nl *who,
-		   struct nlmsghdr *n, void *arg);
-int print_fdb(const struct sockaddr_nl *who,
-		     struct nlmsghdr *n, void *arg);
-int print_mdb(const struct sockaddr_nl *who,
-		     struct nlmsghdr *n, void *arg);
+int print_linkinfo(struct nlmsghdr *n, void *arg);
+int print_fdb(struct nlmsghdr *n, void *arg);
+int print_mdb(struct nlmsghdr *n, void *arg);
 
 int do_fdb(int argc, char **argv);
 int do_mdb(int argc, char **argv);
diff --git a/bridge/fdb.c b/bridge/fdb.c
index 828fdab264cb..d759f7ec12e2 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -126,7 +126,7 @@ static void fdb_print_stats(FILE *fp, const struct nda_cacheinfo *ci)
 	}
 }
 
-int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_fdb(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
 	struct ndmsg *r = NLMSG_DATA(n);
diff --git a/bridge/link.c b/bridge/link.c
index 4a14845da591..3290c16f0951 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -190,8 +190,7 @@ static void print_af_spec(struct rtattr *attr, int ifindex)
 		print_vlan_info(aftb[IFLA_BRIDGE_VLAN_INFO], ifindex);
 }
 
-int print_linkinfo(const struct sockaddr_nl *who,
-		   struct nlmsghdr *n, void *arg)
+int print_linkinfo(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
 	struct ifinfomsg *ifi = NLMSG_DATA(n);
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 03fcc91f0219..855a6a4552c7 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -225,7 +225,7 @@ static void print_router_entries(FILE *fp, struct nlmsghdr *n,
 	close_json_array(PRINT_JSON, NULL);
 }
 
-int print_mdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_mdb(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
 	struct br_port_msg *r = NLMSG_DATA(n);
diff --git a/bridge/monitor.c b/bridge/monitor.c
index d294269e1092..82bc6b407a06 100644
--- a/bridge/monitor.c
+++ b/bridge/monitor.c
@@ -35,8 +35,7 @@ static void usage(void)
 	exit(-1);
 }
 
-static int accept_msg(const struct sockaddr_nl *who,
-		      struct rtnl_ctrl_data *ctrl,
+static int accept_msg(struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
@@ -50,19 +49,19 @@ static int accept_msg(const struct sockaddr_nl *who,
 		if (prefix_banner)
 			fprintf(fp, "[LINK]");
 
-		return print_linkinfo(who, n, arg);
+		return print_linkinfo(n, arg);
 
 	case RTM_NEWNEIGH:
 	case RTM_DELNEIGH:
 		if (prefix_banner)
 			fprintf(fp, "[NEIGH]");
-		return print_fdb(who, n, arg);
+		return print_fdb(n, arg);
 
 	case RTM_NEWMDB:
 	case RTM_DELMDB:
 		if (prefix_banner)
 			fprintf(fp, "[MDB]");
-		return print_mdb(who, n, arg);
+		return print_mdb(n, arg);
 
 	case NLMSG_TSTAMP:
 		print_nlmsg_timestamp(fp, n);
diff --git a/bridge/vlan.c b/bridge/vlan.c
index 239907bdad89..a111d5e66439 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -347,9 +347,7 @@ static void print_vlan_tunnel_info(FILE *fp, struct rtattr *tb, int ifindex)
 		close_vlan_port();
 }
 
-static int print_vlan_tunnel(const struct sockaddr_nl *who,
-			     struct nlmsghdr *n,
-			     void *arg)
+static int print_vlan_tunnel(struct nlmsghdr *n, void *arg)
 {
 	struct ifinfomsg *ifm = NLMSG_DATA(n);
 	struct rtattr *tb[IFLA_MAX+1];
@@ -392,9 +390,7 @@ static int print_vlan_tunnel(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int print_vlan(const struct sockaddr_nl *who,
-		      struct nlmsghdr *n,
-		      void *arg)
+static int print_vlan(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
 	struct ifinfomsg *ifm = NLMSG_DATA(n);
@@ -513,9 +509,7 @@ static void print_vlan_stats_attr(struct rtattr *attr, int ifindex)
 
 }
 
-static int print_vlan_stats(const struct sockaddr_nl *who,
-			    struct nlmsghdr *n,
-			    void *arg)
+static int print_vlan_stats(struct nlmsghdr *n, void *arg)
 {
 	struct if_stats_msg *ifsm = NLMSG_DATA(n);
 	struct rtattr *tb[IFLA_STATS_MAX+1];
diff --git a/genl/ctrl.c b/genl/ctrl.c
index 0d9c5f2517b7..6133336ab435 100644
--- a/genl/ctrl.c
+++ b/genl/ctrl.c
@@ -174,8 +174,7 @@ static int print_ctrl_grp(FILE *fp, struct rtattr *arg, __u32 ctrl_ver)
 /*
  * The controller sends one nlmsg per family
 */
-static int print_ctrl(const struct sockaddr_nl *who,
-		      struct rtnl_ctrl_data *ctrl,
+static int print_ctrl(struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	struct rtattr *tb[CTRL_ATTR_MAX + 1];
@@ -279,10 +278,9 @@ static int print_ctrl(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int print_ctrl2(const struct sockaddr_nl *who,
-		      struct nlmsghdr *n, void *arg)
+static int print_ctrl2(struct nlmsghdr *n, void *arg)
 {
-	return print_ctrl(who, NULL, n, arg);
+	return print_ctrl(NULL, n, arg);
 }
 
 static int ctrl_list(int cmd, int argc, char **argv)
@@ -339,7 +337,7 @@ static int ctrl_list(int cmd, int argc, char **argv)
 			goto ctrl_done;
 		}
 
-		if (print_ctrl2(NULL, answer, (void *) stdout) < 0) {
+		if (print_ctrl2(answer, (void *) stdout) < 0) {
 			fprintf(stderr, "Dump terminated\n");
 			goto ctrl_done;
 		}
diff --git a/genl/genl.c b/genl/genl.c
index 253c4450a3b9..aba3c13afd34 100644
--- a/genl/genl.c
+++ b/genl/genl.c
@@ -34,8 +34,7 @@ static void *BODY;
 static struct genl_util *genl_list;
 
 
-static int print_nofopt(const struct sockaddr_nl *who, struct nlmsghdr *n,
-			void *arg)
+static int print_nofopt(struct nlmsghdr *n, void *arg)
 {
 	fprintf((FILE *) arg, "unknown genl type ..\n");
 	return 0;
diff --git a/genl/genl_utils.h b/genl/genl_utils.h
index 3de0da34bdba..cc1f3fb76596 100644
--- a/genl/genl_utils.h
+++ b/genl/genl_utils.h
@@ -10,8 +10,7 @@ struct genl_util
 	struct  genl_util *next;
 	char	name[16];
 	int	(*parse_genlopt)(struct genl_util *fu, int argc, char **argv);
-	int	(*print_genlopt)(const struct sockaddr_nl *who,
-				 struct nlmsghdr *n, void *arg);
+	int	(*print_genlopt)(struct nlmsghdr *n, void *arg);
 };
 
 int genl_ctrl_resolve_family(const char *family);
diff --git a/include/libnetlink.h b/include/libnetlink.h
index 04264b871ce4..fa8de093d484 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -88,11 +88,9 @@ struct rtnl_ctrl_data {
 	int	nsid;
 };
 
-typedef int (*rtnl_filter_t)(const struct sockaddr_nl *,
-			     struct nlmsghdr *n, void *);
+typedef int (*rtnl_filter_t)(struct nlmsghdr *n, void *);
 
-typedef int (*rtnl_listen_filter_t)(const struct sockaddr_nl *,
-				    struct rtnl_ctrl_data *,
+typedef int (*rtnl_listen_filter_t)(struct rtnl_ctrl_data *,
 				    struct nlmsghdr *n, void *);
 
 typedef int (*nl_ext_ack_fn_t)(const char *errmsg, uint32_t off,
diff --git a/include/ll_map.h b/include/ll_map.h
index 8546ff928bc0..fb708191c22c 100644
--- a/include/ll_map.h
+++ b/include/ll_map.h
@@ -2,8 +2,7 @@
 #ifndef __LL_MAP_H__
 #define __LL_MAP_H__ 1
 
-int ll_remember_index(const struct sockaddr_nl *who,
-		      struct nlmsghdr *n, void *arg);
+int ll_remember_index(struct nlmsghdr *n, void *arg);
 
 void ll_init_map(struct rtnl_handle *rth);
 unsigned ll_name_to_index(const char *name);
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 458a9cb7ff2c..53668f598cd2 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -27,14 +27,10 @@ struct link_filter {
 };
 
 int get_operstate(const char *name);
-int print_linkinfo(const struct sockaddr_nl *who,
-		   struct nlmsghdr *n, void *arg);
-int print_addrinfo(const struct sockaddr_nl *who,
-		   struct nlmsghdr *n, void *arg);
-int print_addrlabel(const struct sockaddr_nl *who,
-		    struct nlmsghdr *n, void *arg);
-int print_neigh(const struct sockaddr_nl *who,
-		struct nlmsghdr *n, void *arg);
+int print_linkinfo(struct nlmsghdr *n, void *arg);
+int print_addrinfo(struct nlmsghdr *n, void *arg);
+int print_addrlabel(struct nlmsghdr *n, void *arg);
+int print_neigh(struct nlmsghdr *n, void *arg);
 int ipaddr_list_link(int argc, char **argv);
 void ipaddr_get_vf_rate(int, int *, int *, const char *);
 void iplink_usage(void) __attribute__((noreturn));
@@ -45,21 +41,15 @@ void ipaddr_reset_filter(int oneline, int ifindex);
 void ipneigh_reset_filter(int ifindex);
 void ipnetconf_reset_filter(int ifindex);
 
-int print_route(const struct sockaddr_nl *who,
-		struct nlmsghdr *n, void *arg);
-int print_mroute(const struct sockaddr_nl *who,
-		 struct nlmsghdr *n, void *arg);
-int print_prefix(const struct sockaddr_nl *who,
-		 struct nlmsghdr *n, void *arg);
-int print_rule(const struct sockaddr_nl *who,
-	       struct nlmsghdr *n, void *arg);
-int print_netconf(const struct sockaddr_nl *who,
-		  struct rtnl_ctrl_data *ctrl,
+int print_route(struct nlmsghdr *n, void *arg);
+int print_mroute(struct nlmsghdr *n, void *arg);
+int print_prefix(struct nlmsghdr *n, void *arg);
+int print_rule(struct nlmsghdr *n, void *arg);
+int print_netconf(struct rtnl_ctrl_data *ctrl,
 		  struct nlmsghdr *n, void *arg);
 void netns_map_init(void);
 void netns_nsid_socket_init(void);
-int print_nsid(const struct sockaddr_nl *who,
-	       struct nlmsghdr *n, void *arg);
+int print_nsid(struct nlmsghdr *n, void *arg);
 char *get_name_from_nsid(int nsid);
 int get_netnsid_from_name(const char *name);
 int set_netnsid_from_name(const char *name, int nsid);
@@ -129,8 +119,7 @@ struct link_util {
 					      FILE *);
 	int			(*parse_ifla_xstats)(struct link_util *,
 						     int, char **);
-	int			(*print_ifla_xstats)(const struct sockaddr_nl *,
-						     struct nlmsghdr *, void *);
+	int			(*print_ifla_xstats)(struct nlmsghdr *, void *);
 };
 
 struct link_util *get_link_kind(const char *kind);
@@ -140,8 +129,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type);
 /* iplink_bridge.c */
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
 int bridge_parse_xstats(struct link_util *lu, int argc, char **argv);
-int bridge_print_xstats(const struct sockaddr_nl *who,
-			struct nlmsghdr *n, void *arg);
+int bridge_print_xstats(struct nlmsghdr *n, void *arg);
 
 /* iproute_lwtunnel.c */
 int lwt_parse_encap(struct rtattr *rta, size_t len, int *argcp, char ***argvp);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 9481f241cb36..cd8cc76a3473 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -824,8 +824,7 @@ static void print_link_event(FILE *f, __u32 event)
 	}
 }
 
-int print_linkinfo(const struct sockaddr_nl *who,
-		   struct nlmsghdr *n, void *arg)
+int print_linkinfo(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct ifinfomsg *ifi = NLMSG_DATA(n);
@@ -1261,8 +1260,7 @@ static int ifa_label_match_rta(int ifindex, const struct rtattr *rta)
 	return fnmatch(filter.label, label, 0);
 }
 
-int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		   void *arg)
+int print_addrinfo(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
 	struct ifaddrmsg *ifa = NLMSG_DATA(n);
@@ -1478,7 +1476,7 @@ static int print_selected_addrinfo(struct ifinfomsg *ifi,
 			continue;
 
 		open_json_object(NULL);
-		print_addrinfo(NULL, n, fp);
+		print_addrinfo(n, fp);
 		close_json_object();
 	}
 	close_json_array(PRINT_JSON, NULL);
@@ -1491,8 +1489,7 @@ static int print_selected_addrinfo(struct ifinfomsg *ifi,
 }
 
 
-static int store_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		       void *arg)
+static int store_nlmsg(struct nlmsghdr *n, void *arg)
 {
 	struct nlmsg_chain *lchain = (struct nlmsg_chain *)arg;
 	struct nlmsg_list *h;
@@ -1510,7 +1507,7 @@ static int store_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 		lchain->head = h;
 	lchain->tail = h;
 
-	ll_remember_index(who, n, NULL);
+	ll_remember_index(n, NULL);
 	return 0;
 }
 
@@ -1553,8 +1550,7 @@ static int ipadd_dump_check_magic(void)
 	return 0;
 }
 
-static int save_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		       void *arg)
+static int save_nlmsg(struct nlmsghdr *n, void *arg)
 {
 	int ret;
 
@@ -1567,15 +1563,14 @@ static int save_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	return ret == n->nlmsg_len ? 0 : ret;
 }
 
-static int show_handler(const struct sockaddr_nl *nl,
-			struct rtnl_ctrl_data *ctrl,
+static int show_handler(struct rtnl_ctrl_data *ctrl,
 			struct nlmsghdr *n, void *arg)
 {
 	struct ifaddrmsg *ifa = NLMSG_DATA(n);
 
 	open_json_object(NULL);
 	print_int(PRINT_ANY, "index", "if%d:\n", ifa->ifa_index);
-	print_addrinfo(NULL, n, stdout);
+	print_addrinfo(n, stdout);
 	close_json_object();
 	return 0;
 }
@@ -1600,8 +1595,7 @@ static int ipaddr_showdump(void)
 	exit(err);
 }
 
-static int restore_handler(const struct sockaddr_nl *nl,
-			   struct rtnl_ctrl_data *ctrl,
+static int restore_handler(struct rtnl_ctrl_data *ctrl,
 			   struct nlmsghdr *n, void *arg)
 {
 	int ret;
@@ -1970,7 +1964,7 @@ static int ipaddr_list_flush_or_save(int argc, char **argv, int action)
 
 		open_json_object(NULL);
 		if (brief || !no_link)
-			res = print_linkinfo(NULL, n, stdout);
+			res = print_linkinfo(n, stdout);
 		if (res >= 0 && filter.family != AF_PACKET)
 			print_selected_addrinfo(ifi, ainfo->head, stdout);
 		if (res > 0 && !do_link && show_stats)
diff --git a/ip/ipaddrlabel.c b/ip/ipaddrlabel.c
index 845fe4c5db27..3714e41785c0 100644
--- a/ip/ipaddrlabel.c
+++ b/ip/ipaddrlabel.c
@@ -54,7 +54,7 @@ static void usage(void)
 	exit(-1);
 }
 
-int print_addrlabel(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_addrlabel(struct nlmsghdr *n, void *arg)
 {
 	struct ifaddrlblmsg *ifal = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
@@ -196,7 +196,7 @@ static int ipaddrlabel_modify(int cmd, int argc, char **argv)
 }
 
 
-static int flush_addrlabel(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+static int flush_addrlabel(struct nlmsghdr *n, void *arg)
 {
 	struct rtnl_handle rth2;
 	struct rtmsg *r = NLMSG_DATA(n);
diff --git a/ip/ipfou.c b/ip/ipfou.c
index 0cb5e3c7a0c7..346522ddb7f7 100644
--- a/ip/ipfou.c
+++ b/ip/ipfou.c
@@ -137,8 +137,7 @@ static int do_del(int argc, char **argv)
 	return 0;
 }
 
-static int print_fou_mapping(const struct sockaddr_nl *who,
-				 struct nlmsghdr *n, void *arg)
+static int print_fou_mapping(struct nlmsghdr *n, void *arg)
 {
 	struct genlmsghdr *ghdr;
 	struct rtattr *tb[FOU_ATTR_MAX + 1];
diff --git a/ip/ipila.c b/ip/ipila.c
index 895fe0cdaf77..11fbb5fae805 100644
--- a/ip/ipila.c
+++ b/ip/ipila.c
@@ -81,8 +81,7 @@ static void print_ila_locid(const char *tag, int attr, struct rtattr *tb[])
 	print_string(PRINT_ANY, tag, "%-20s", abuf);
 }
 
-static int print_ila_mapping(const struct sockaddr_nl *who,
-			     struct nlmsghdr *n, void *arg)
+static int print_ila_mapping(struct nlmsghdr *n, void *arg)
 {
 	struct genlmsghdr *ghdr;
 	struct rtattr *tb[ILA_ATTR_MAX + 1];
diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 16561eccd458..4308b5911965 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -437,8 +437,7 @@ static int get_response(struct nlmsghdr *n, void *arg)
 	return 0;
 }
 
-static int session_nlmsg(const struct sockaddr_nl *who,
-			 struct nlmsghdr *n, void *arg)
+static int session_nlmsg(struct nlmsghdr *n, void *arg)
 {
 	int ret = get_response(n, arg);
 
@@ -476,8 +475,7 @@ static int get_session(struct l2tp_data *p)
 	return 0;
 }
 
-static int tunnel_nlmsg(const struct sockaddr_nl *who,
-			struct nlmsghdr *n, void *arg)
+static int tunnel_nlmsg(struct nlmsghdr *n, void *arg)
 {
 	int ret = get_response(n, arg);
 
diff --git a/ip/iplink.c b/ip/iplink.c
index 9f39e3826c19..b5519201fef7 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -199,8 +199,7 @@ static int get_addr_gen_mode(const char *mode)
 #if IPLINK_IOCTL_COMPAT
 static int have_rtnl_newlink = -1;
 
-static int accept_msg(const struct sockaddr_nl *who,
-		      struct rtnl_ctrl_data *ctrl,
+static int accept_msg(struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(n);
@@ -1107,7 +1106,7 @@ int iplink_get(char *name, __u32 filt_mask)
 		return -2;
 
 	open_json_object(NULL);
-	print_linkinfo(NULL, answer, stdout);
+	print_linkinfo(answer, stdout);
 	close_json_object();
 
 	free(answer);
@@ -1536,9 +1535,7 @@ struct af_stats_ctx {
 	int ifindex;
 };
 
-static int print_af_stats(const struct sockaddr_nl *who,
-			  struct nlmsghdr *n,
-			  void *arg)
+static int print_af_stats(struct nlmsghdr *n, void *arg)
 {
 	struct if_stats_msg *ifsm = NLMSG_DATA(n);
 	struct rtattr *tb[IFLA_STATS_MAX+1];
diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 3008e44b7d72..0ba6be3f47da 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -757,8 +757,7 @@ static void bridge_print_stats_attr(FILE *f, struct rtattr *attr, int ifindex)
 	}
 }
 
-int bridge_print_xstats(const struct sockaddr_nl *who,
-			struct nlmsghdr *n, void *arg)
+int bridge_print_xstats(struct nlmsghdr *n, void *arg)
 {
 	struct if_stats_msg *ifsm = NLMSG_DATA(n);
 	struct rtattr *tb[IFLA_STATS_MAX+1];
diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index fa56e0eee774..646bd891730f 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -929,8 +929,7 @@ static void print_rxsc_list(struct rtattr *sc)
 	close_json_array(PRINT_JSON, NULL);
 }
 
-static int process(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		   void *arg)
+static int process(struct nlmsghdr *n, void *arg)
 {
 	struct genlmsghdr *ghdr;
 	struct rtattr *attrs[MACSEC_ATTR_MAX + 1];
diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index a93b62cd6624..9d5ac2b5e4d2 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -52,8 +52,7 @@ static void print_headers(FILE *fp, char *label, struct rtnl_ctrl_data *ctrl)
 		fprintf(fp, "%s", label);
 }
 
-static int accept_msg(const struct sockaddr_nl *who,
-		      struct rtnl_ctrl_data *ctrl,
+static int accept_msg(struct rtnl_ctrl_data *ctrl,
 		      struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
@@ -75,32 +74,32 @@ static int accept_msg(const struct sockaddr_nl *who,
 		if (r->rtm_family == RTNL_FAMILY_IPMR ||
 		    r->rtm_family == RTNL_FAMILY_IP6MR) {
 			print_headers(fp, "[MROUTE]", ctrl);
-			print_mroute(who, n, arg);
+			print_mroute(n, arg);
 			return 0;
 		} else {
 			print_headers(fp, "[ROUTE]", ctrl);
-			print_route(who, n, arg);
+			print_route(n, arg);
 			return 0;
 		}
 	}
 
 	case RTM_NEWLINK:
 	case RTM_DELLINK:
-		ll_remember_index(who, n, NULL);
+		ll_remember_index(n, NULL);
 		print_headers(fp, "[LINK]", ctrl);
-		print_linkinfo(who, n, arg);
+		print_linkinfo(n, arg);
 		return 0;
 
 	case RTM_NEWADDR:
 	case RTM_DELADDR:
 		print_headers(fp, "[ADDR]", ctrl);
-		print_addrinfo(who, n, arg);
+		print_addrinfo(n, arg);
 		return 0;
 
 	case RTM_NEWADDRLABEL:
 	case RTM_DELADDRLABEL:
 		print_headers(fp, "[ADDRLABEL]", ctrl);
-		print_addrlabel(who, n, arg);
+		print_addrlabel(n, arg);
 		return 0;
 
 	case RTM_NEWNEIGH:
@@ -114,18 +113,18 @@ static int accept_msg(const struct sockaddr_nl *who,
 		}
 
 		print_headers(fp, "[NEIGH]", ctrl);
-		print_neigh(who, n, arg);
+		print_neigh(n, arg);
 		return 0;
 
 	case RTM_NEWPREFIX:
 		print_headers(fp, "[PREFIX]", ctrl);
-		print_prefix(who, n, arg);
+		print_prefix(n, arg);
 		return 0;
 
 	case RTM_NEWRULE:
 	case RTM_DELRULE:
 		print_headers(fp, "[RULE]", ctrl);
-		print_rule(who, n, arg);
+		print_rule(n, arg);
 		return 0;
 
 	case NLMSG_TSTAMP:
@@ -135,13 +134,13 @@ static int accept_msg(const struct sockaddr_nl *who,
 	case RTM_NEWNETCONF:
 	case RTM_DELNETCONF:
 		print_headers(fp, "[NETCONF]", ctrl);
-		print_netconf(who, ctrl, n, arg);
+		print_netconf(ctrl, n, arg);
 		return 0;
 
 	case RTM_DELNSID:
 	case RTM_NEWNSID:
 		print_headers(fp, "[NSID]", ctrl);
-		print_nsid(who, n, arg);
+		print_nsid(n, arg);
 		return 0;
 
 	case NLMSG_ERROR:
diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index c5dfa9cb1538..4d8867d3219f 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -52,7 +52,7 @@ struct rtfilter {
 	inet_prefix msrc;
 } filter;
 
-int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_mroute(struct nlmsghdr *n, void *arg)
 {
 	struct rtmsg *r = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 042d01fd24c2..6041c467749c 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -237,7 +237,7 @@ static void print_neigh_state(unsigned int nud)
 	close_json_array(PRINT_JSON, NULL);
 }
 
-int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_neigh(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct ndmsg *r = NLMSG_DATA(n);
diff --git a/ip/ipnetconf.c b/ip/ipnetconf.c
index 21822e367e11..0e946ca34b4a 100644
--- a/ip/ipnetconf.c
+++ b/ip/ipnetconf.c
@@ -55,8 +55,7 @@ static struct rtattr *netconf_rta(struct netconfmsg *ncm)
 				 + NLMSG_ALIGN(sizeof(struct netconfmsg)));
 }
 
-int print_netconf(const struct sockaddr_nl *who, struct rtnl_ctrl_data *ctrl,
-		  struct nlmsghdr *n, void *arg)
+int print_netconf(struct rtnl_ctrl_data *ctrl, struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct netconfmsg *ncm = NLMSG_DATA(n);
@@ -154,10 +153,9 @@ int print_netconf(const struct sockaddr_nl *who, struct rtnl_ctrl_data *ctrl,
 	return 0;
 }
 
-static int print_netconf2(const struct sockaddr_nl *who,
-			  struct nlmsghdr *n, void *arg)
+static int print_netconf2(struct nlmsghdr *n, void *arg)
 {
-	return print_netconf(who, NULL, n, arg);
+	return print_netconf(NULL, n, arg);
 }
 
 void ipnetconf_reset_filter(int ifindex)
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index e8500c773994..0eac18cf2682 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -43,8 +43,7 @@ static struct rtnl_handle rtnsh = { .fd = -1 };
 
 static int have_rtnl_getnsid = -1;
 
-static int ipnetns_accept_msg(const struct sockaddr_nl *who,
-			      struct rtnl_ctrl_data *ctrl,
+static int ipnetns_accept_msg(struct rtnl_ctrl_data *ctrl,
 			      struct nlmsghdr *n, void *arg)
 {
 	struct nlmsgerr *err = (struct nlmsgerr *)NLMSG_DATA(n);
@@ -284,7 +283,7 @@ static int netns_get_name(int nsid, char *name)
 	return -ENOENT;
 }
 
-int print_nsid(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_nsid(struct nlmsghdr *n, void *arg)
 {
 	struct rtgenmsg *rthdr = NLMSG_DATA(n);
 	struct rtattr *tb[NETNSA_MAX+1];
diff --git a/ip/ipntable.c b/ip/ipntable.c
index ce3f4614e32b..5b61dd5cb001 100644
--- a/ip/ipntable.c
+++ b/ip/ipntable.c
@@ -520,8 +520,7 @@ static void print_ndtstats(const struct ndt_stats *ndts)
 	print_nl();
 }
 
-static int print_ntable(const struct sockaddr_nl *who,
-			struct nlmsghdr *n, void *arg)
+static int print_ntable(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct ndtmsg *ndtm = NLMSG_DATA(n);
diff --git a/ip/ipprefix.c b/ip/ipprefix.c
index 20f23ca799aa..466af2088d90 100644
--- a/ip/ipprefix.c
+++ b/ip/ipprefix.c
@@ -35,7 +35,7 @@
 #define IF_PREFIX_ONLINK	0x01
 #define IF_PREFIX_AUTOCONF	0x02
 
-int print_prefix(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_prefix(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct prefixmsg *prefix = NLMSG_DATA(n);
diff --git a/ip/iproute.c b/ip/iproute.c
index 699635923764..1326a6574fbe 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -708,7 +708,7 @@ static void print_rta_multipath(FILE *fp, const struct rtmsg *r,
 	}
 }
 
-int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_route(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct rtmsg *r = NLMSG_DATA(n);
@@ -1580,8 +1580,7 @@ static int iproute_flush_cache(void)
 
 static __u32 route_dump_magic = 0x45311224;
 
-static int save_route(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		      void *arg)
+static int save_route(struct nlmsghdr *n, void *arg)
 {
 	int ret;
 	int len = n->nlmsg_len;
@@ -2082,7 +2081,7 @@ static int iproute_get(int argc, char **argv)
 		int len = answer->nlmsg_len;
 		struct rtattr *tb[RTA_MAX+1];
 
-		if (print_route(NULL, answer, (void *)stdout) < 0) {
+		if (print_route(answer, (void *)stdout) < 0) {
 			fprintf(stderr, "An error :-)\n");
 			free(answer);
 			return -1;
@@ -2126,7 +2125,7 @@ static int iproute_get(int argc, char **argv)
 			return -2;
 	}
 
-	if (print_route(NULL, answer, (void *)stdout) < 0) {
+	if (print_route(answer, (void *)stdout) < 0) {
 		fprintf(stderr, "An error :-)\n");
 		free(answer);
 		return -1;
@@ -2144,8 +2143,7 @@ static int rtattr_cmp(const struct rtattr *rta1, const struct rtattr *rta2)
 	return memcmp(RTA_DATA(rta1), RTA_DATA(rta2), RTA_PAYLOAD(rta1));
 }
 
-static int restore_handler(const struct sockaddr_nl *nl,
-			   struct rtnl_ctrl_data *ctrl,
+static int restore_handler(struct rtnl_ctrl_data *ctrl,
 			   struct nlmsghdr *n, void *arg)
 {
 	struct rtmsg *r = NLMSG_DATA(n);
@@ -2231,11 +2229,10 @@ static int iproute_restore(void)
 	return 0;
 }
 
-static int show_handler(const struct sockaddr_nl *nl,
-			struct rtnl_ctrl_data *ctrl,
+static int show_handler(struct rtnl_ctrl_data *ctrl,
 			struct nlmsghdr *n, void *arg)
 {
-	print_route(nl, n, stdout);
+	print_route(n, stdout);
 	return 0;
 }
 
diff --git a/ip/iprule.c b/ip/iprule.c
index 60fd4c7e9f93..d89d808d8909 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -181,7 +181,7 @@ static bool filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 	return true;
 }
 
-int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_rule(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = arg;
 	struct fib_rule_hdr *frh = NLMSG_DATA(n);
@@ -442,8 +442,7 @@ static int save_rule_prep(void)
 	return 0;
 }
 
-static int save_rule(const struct sockaddr_nl *who,
-		     struct nlmsghdr *n, void *arg)
+static int save_rule(struct nlmsghdr *n, void *arg)
 {
 	int ret;
 
@@ -456,8 +455,7 @@ static int save_rule(const struct sockaddr_nl *who,
 	return ret == n->nlmsg_len ? 0 : ret;
 }
 
-static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		      void *arg)
+static int flush_rule(struct nlmsghdr *n, void *arg)
 {
 	struct rtnl_handle rth2;
 	struct fib_rule_hdr *frh = NLMSG_DATA(n);
@@ -650,8 +648,7 @@ static int rule_dump_check_magic(void)
 	return 0;
 }
 
-static int restore_handler(const struct sockaddr_nl *nl,
-			   struct rtnl_ctrl_data *ctrl,
+static int restore_handler(struct rtnl_ctrl_data *ctrl,
 			   struct nlmsghdr *n, void *arg)
 {
 	int ret;
diff --git a/ip/ipseg6.c b/ip/ipseg6.c
index 6f5ae4d239f7..33076e726de6 100644
--- a/ip/ipseg6.c
+++ b/ip/ipseg6.c
@@ -99,8 +99,7 @@ static void print_tunsrc(struct rtattr *attrs[])
 		     "tunsrc addr %s\n", dst);
 }
 
-static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		       void *arg)
+static int process_msg(struct nlmsghdr *n, void *arg)
 {
 	struct rtattr *attrs[SEG6_ATTR_MAX + 1];
 	struct genlmsghdr *ghdr;
@@ -180,7 +179,7 @@ static int seg6_do_cmd(void)
 		if (rtnl_talk(&grth, &req.n, &answer) < 0)
 			return -2;
 		new_json_obj(json);
-		if (process_msg(NULL, answer, stdout) < 0) {
+		if (process_msg(answer, stdout) < 0) {
 			fprintf(stderr, "Error parsing reply\n");
 			exit(1);
 		}
diff --git a/ip/iptoken.c b/ip/iptoken.c
index 8605e75c4edc..f1194c3e1aa4 100644
--- a/ip/iptoken.c
+++ b/ip/iptoken.c
@@ -42,7 +42,7 @@ static void usage(void)
 	exit(-1);
 }
 
-static int print_token(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+static int print_token(struct nlmsghdr *n, void *arg)
 {
 	struct rtnl_dump_args *args = arg;
 	FILE *fp = args->fp;
diff --git a/ip/iptuntap.c b/ip/iptuntap.c
index 8c84e6206fa9..528055a0bf46 100644
--- a/ip/iptuntap.c
+++ b/ip/iptuntap.c
@@ -386,8 +386,7 @@ static int tuntap_filter_req(struct nlmsghdr *nlh, int reqlen)
 	return 0;
 }
 
-static int print_tuntap(const struct sockaddr_nl *who,
-			struct nlmsghdr *n, void *arg)
+static int print_tuntap(struct nlmsghdr *n, void *arg)
 {
 	struct ifinfomsg *ifi = NLMSG_DATA(n);
 	struct rtattr *tb[IFLA_MAX+1];
diff --git a/ip/rtmon.c b/ip/rtmon.c
index 7d2405d724a6..7373443f2f8a 100644
--- a/ip/rtmon.c
+++ b/ip/rtmon.c
@@ -43,7 +43,7 @@ static void write_stamp(FILE *fp)
 	fwrite((void *)n1, 1, NLMSG_ALIGN(n1->nlmsg_len), fp);
 }
 
-static int dump_msg(const struct sockaddr_nl *who, struct rtnl_ctrl_data *ctrl,
+static int dump_msg(struct rtnl_ctrl_data *ctrl,
 		    struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
@@ -55,10 +55,9 @@ static int dump_msg(const struct sockaddr_nl *who, struct rtnl_ctrl_data *ctrl,
 	return 0;
 }
 
-static int dump_msg2(const struct sockaddr_nl *who,
-		     struct nlmsghdr *n, void *arg)
+static int dump_msg2(struct nlmsghdr *n, void *arg)
 {
-	return dump_msg(who, NULL, n, arg);
+	return dump_msg(NULL, n, arg);
 }
 
 static void usage(void)
diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index ad3d6f363003..72ef3ab5cfda 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -156,8 +156,7 @@ static void print_tcp_metrics(struct rtattr *a)
 	}
 }
 
-static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		       void *arg)
+static int process_msg(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *) arg;
 	struct genlmsghdr *ghdr;
@@ -501,7 +500,7 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 	} else if (atype >= 0) {
 		if (rtnl_talk(&grth, &req.n, &answer) < 0)
 			return -2;
-		if (process_msg(NULL, answer, stdout) < 0) {
+		if (process_msg(answer, stdout) < 0) {
 			fprintf(stderr, "Dump terminated\n");
 			exit(1);
 		}
diff --git a/ip/tunnel.c b/ip/tunnel.c
index 20fe6d7d72f1..d0d55f37169e 100644
--- a/ip/tunnel.c
+++ b/ip/tunnel.c
@@ -321,8 +321,7 @@ static void tnl_print_stats(const struct rtnl_link_stats64 *s)
 	       s->tx_carrier_errors, s->tx_dropped);
 }
 
-static int print_nlmsg_tunnel(const struct sockaddr_nl *who,
-			      struct nlmsghdr *n, void *arg)
+static int print_nlmsg_tunnel(struct nlmsghdr *n, void *arg)
 {
 	struct tnl_print_nlmsg_info *info = arg;
 	struct ifinfomsg *ifi = NLMSG_DATA(n);
diff --git a/ip/xfrm.h b/ip/xfrm.h
index 71be574d90d8..3b158ad71c13 100644
--- a/ip/xfrm.h
+++ b/ip/xfrm.h
@@ -104,10 +104,8 @@ struct xfrm_filter {
 
 extern struct xfrm_filter filter;
 
-int xfrm_state_print(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		     void *arg);
-int xfrm_policy_print(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		      void *arg);
+int xfrm_state_print(struct nlmsghdr *n, void *arg);
+int xfrm_policy_print(struct nlmsghdr *n, void *arg);
 int do_xfrm_state(int argc, char **argv);
 int do_xfrm_policy(int argc, char **argv);
 int do_xfrm_monitor(int argc, char **argv);
diff --git a/ip/xfrm_monitor.c b/ip/xfrm_monitor.c
index 5d086768560e..eb07af17cadf 100644
--- a/ip/xfrm_monitor.c
+++ b/ip/xfrm_monitor.c
@@ -43,8 +43,7 @@ static void usage(void)
 	exit(-1);
 }
 
-static int xfrm_acquire_print(const struct sockaddr_nl *who,
-			      struct nlmsghdr *n, void *arg)
+static int xfrm_acquire_print(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct xfrm_user_acquire *xacq = NLMSG_DATA(n);
@@ -105,8 +104,7 @@ static int xfrm_acquire_print(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int xfrm_state_flush_print(const struct sockaddr_nl *who,
-				  struct nlmsghdr *n, void *arg)
+static int xfrm_state_flush_print(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct xfrm_usersa_flush *xsf = NLMSG_DATA(n);
@@ -135,8 +133,7 @@ static int xfrm_state_flush_print(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int xfrm_policy_flush_print(const struct sockaddr_nl *who,
-				   struct nlmsghdr *n, void *arg)
+static int xfrm_policy_flush_print(struct nlmsghdr *n, void *arg)
 {
 	struct rtattr *tb[XFRMA_MAX+1];
 	FILE *fp = (FILE *)arg;
@@ -173,8 +170,7 @@ static int xfrm_policy_flush_print(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int xfrm_report_print(const struct sockaddr_nl *who,
-			     struct nlmsghdr *n, void *arg)
+static int xfrm_report_print(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct xfrm_user_report *xrep = NLMSG_DATA(n);
@@ -236,8 +232,7 @@ static void xfrm_usersa_print(const struct xfrm_usersa_id *sa_id, __u32 reqid, F
 	fprintf(fp, " SPI 0x%x", ntohl(sa_id->spi));
 }
 
-static int xfrm_ae_print(const struct sockaddr_nl *who,
-			     struct nlmsghdr *n, void *arg)
+static int xfrm_ae_print(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct xfrm_aevent_id *id = NLMSG_DATA(n);
@@ -261,8 +256,7 @@ static void xfrm_print_addr(FILE *fp, int family, xfrm_address_t *a)
 	fprintf(fp, "%s", rt_addr_n2a(family, sizeof(*a), a));
 }
 
-static int xfrm_mapping_print(const struct sockaddr_nl *who,
-			     struct nlmsghdr *n, void *arg)
+static int xfrm_mapping_print(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct xfrm_user_mapping *map = NLMSG_DATA(n);
@@ -281,8 +275,7 @@ static int xfrm_mapping_print(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int xfrm_accept_msg(const struct sockaddr_nl *who,
-			   struct rtnl_ctrl_data *ctrl,
+static int xfrm_accept_msg(struct rtnl_ctrl_data *ctrl,
 			   struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
@@ -302,31 +295,31 @@ static int xfrm_accept_msg(const struct sockaddr_nl *who,
 	case XFRM_MSG_DELSA:
 	case XFRM_MSG_UPDSA:
 	case XFRM_MSG_EXPIRE:
-		xfrm_state_print(who, n, arg);
+		xfrm_state_print(n, arg);
 		return 0;
 	case XFRM_MSG_NEWPOLICY:
 	case XFRM_MSG_DELPOLICY:
 	case XFRM_MSG_UPDPOLICY:
 	case XFRM_MSG_POLEXPIRE:
-		xfrm_policy_print(who, n, arg);
+		xfrm_policy_print(n, arg);
 		return 0;
 	case XFRM_MSG_ACQUIRE:
-		xfrm_acquire_print(who, n, arg);
+		xfrm_acquire_print(n, arg);
 		return 0;
 	case XFRM_MSG_FLUSHSA:
-		xfrm_state_flush_print(who, n, arg);
+		xfrm_state_flush_print(n, arg);
 		return 0;
 	case XFRM_MSG_FLUSHPOLICY:
-		xfrm_policy_flush_print(who, n, arg);
+		xfrm_policy_flush_print(n, arg);
 		return 0;
 	case XFRM_MSG_REPORT:
-		xfrm_report_print(who, n, arg);
+		xfrm_report_print(n, arg);
 		return 0;
 	case XFRM_MSG_NEWAE:
-		xfrm_ae_print(who, n, arg);
+		xfrm_ae_print(n, arg);
 		return 0;
 	case XFRM_MSG_MAPPING:
-		xfrm_mapping_print(who, n, arg);
+		xfrm_mapping_print(n, arg);
 		return 0;
 	default:
 		break;
diff --git a/ip/xfrm_policy.c b/ip/xfrm_policy.c
index d54402691ca0..feccaadac2db 100644
--- a/ip/xfrm_policy.c
+++ b/ip/xfrm_policy.c
@@ -453,8 +453,7 @@ static int xfrm_policy_filter_match(struct xfrm_userpolicy_info *xpinfo,
 	return 1;
 }
 
-int xfrm_policy_print(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		      void *arg)
+int xfrm_policy_print(struct nlmsghdr *n, void *arg)
 {
 	struct rtattr *tb[XFRMA_MAX+1];
 	struct rtattr *rta;
@@ -681,7 +680,7 @@ static int xfrm_policy_get(int argc, char **argv)
 
 	xfrm_policy_get_or_delete(argc, argv, 0, &n);
 
-	if (xfrm_policy_print(NULL, n, (void *)stdout) < 0) {
+	if (xfrm_policy_print(n, (void *)stdout) < 0) {
 		fprintf(stderr, "An error :-)\n");
 		exit(1);
 	}
@@ -694,9 +693,7 @@ static int xfrm_policy_get(int argc, char **argv)
  * With an existing policy of nlmsg, make new nlmsg for deleting the policy
  * and store it to buffer.
  */
-static int xfrm_policy_keep(const struct sockaddr_nl *who,
-			    struct nlmsghdr *n,
-			    void *arg)
+static int xfrm_policy_keep(struct nlmsghdr *n, void *arg)
 {
 	struct xfrm_buffer *xb = (struct xfrm_buffer *)arg;
 	struct rtnl_handle *rth = xb->rth;
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 913e9fa3bbdb..3cfcad1a4712 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -869,7 +869,7 @@ static int xfrm_state_allocspi(int argc, char **argv)
 	if (rtnl_talk(&rth, &req.n, &answer) < 0)
 		exit(2);
 
-	if (xfrm_state_print(NULL, answer, (void *)stdout) < 0) {
+	if (xfrm_state_print(answer, (void *)stdout) < 0) {
 		fprintf(stderr, "An error :-)\n");
 		exit(1);
 	}
@@ -908,8 +908,7 @@ static int xfrm_state_filter_match(struct xfrm_usersa_info *xsinfo)
 	return 1;
 }
 
-int xfrm_state_print(const struct sockaddr_nl *who, struct nlmsghdr *n,
-		     void *arg)
+int xfrm_state_print(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct rtattr *tb[XFRMA_MAX+1];
@@ -1063,7 +1062,7 @@ static int xfrm_state_get_or_delete(int argc, char **argv, int delete)
 		if (rtnl_talk(&rth, &req.n, &answer) < 0)
 			exit(2);
 
-		if (xfrm_state_print(NULL, answer, (void *)stdout) < 0) {
+		if (xfrm_state_print(answer, (void *)stdout) < 0) {
 			fprintf(stderr, "An error :-)\n");
 			exit(1);
 		}
@@ -1080,9 +1079,7 @@ static int xfrm_state_get_or_delete(int argc, char **argv, int delete)
  * With an existing state of nlmsg, make new nlmsg for deleting the state
  * and store it to buffer.
  */
-static int xfrm_state_keep(const struct sockaddr_nl *who,
-			   struct nlmsghdr *n,
-			   void *arg)
+static int xfrm_state_keep(struct nlmsghdr *n, void *arg)
 {
 	struct xfrm_buffer *xb = (struct xfrm_buffer *)arg;
 	struct rtnl_handle *rth = xb->rth;
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index e8202f7915ba..fe4a7a4b9c71 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -674,7 +674,7 @@ int rtnl_dump_filter_l(struct rtnl_handle *rth,
 				}
 
 				if (!rth->dump_fp) {
-					err = a->filter(&nladdr, h, a->arg1);
+					err = a->filter(h, a->arg1);
 					if (err < 0) {
 						free(buf);
 						return err;
@@ -983,7 +983,7 @@ int rtnl_listen(struct rtnl_handle *rtnl,
 				exit(1);
 			}
 
-			err = handler(&nladdr, &ctrl, h, jarg);
+			err = handler(&ctrl, h, jarg);
 			if (err < 0)
 				return err;
 
@@ -1005,7 +1005,6 @@ int rtnl_from_file(FILE *rtnl, rtnl_listen_filter_t handler,
 		   void *jarg)
 {
 	int status;
-	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
 	char buf[16384];
 	struct nlmsghdr *h = (struct nlmsghdr *)buf;
 
@@ -1044,7 +1043,7 @@ int rtnl_from_file(FILE *rtnl, rtnl_listen_filter_t handler,
 			return -1;
 		}
 
-		err = handler(&nladdr, NULL, h, jarg);
+		err = handler(NULL, h, jarg);
 		if (err < 0)
 			return err;
 	}
diff --git a/lib/ll_map.c b/lib/ll_map.c
index 32c8e4429fca..1b4095a7d873 100644
--- a/lib/ll_map.c
+++ b/lib/ll_map.c
@@ -77,8 +77,7 @@ static struct ll_cache *ll_get_by_name(const char *name)
 	return NULL;
 }
 
-int ll_remember_index(const struct sockaddr_nl *who,
-		      struct nlmsghdr *n, void *arg)
+int ll_remember_index(struct nlmsghdr *n, void *arg)
 {
 	unsigned int h;
 	const char *ifname;
diff --git a/misc/ifstat.c b/misc/ifstat.c
index 3a0e780f7569..60efe6cb60fa 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -110,8 +110,7 @@ static int match(const char *id)
 	return 0;
 }
 
-static int get_nlmsg_extended(const struct sockaddr_nl *who,
-			      struct nlmsghdr *m, void *arg)
+static int get_nlmsg_extended(struct nlmsghdr *m, void *arg)
 {
 	struct if_stats_msg *ifsm = NLMSG_DATA(m);
 	struct rtattr *tb[IFLA_STATS_MAX+1];
@@ -154,8 +153,7 @@ static int get_nlmsg_extended(const struct sockaddr_nl *who,
 	return 0;
 }
 
-static int get_nlmsg(const struct sockaddr_nl *who,
-		     struct nlmsghdr *m, void *arg)
+static int get_nlmsg(struct nlmsghdr *m, void *arg)
 {
 	struct ifinfomsg *ifi = NLMSG_DATA(m);
 	struct rtattr *tb[IFLA_MAX+1];
diff --git a/misc/ss.c b/misc/ss.c
index f99b6874c228..c8970438ce73 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -3156,8 +3156,7 @@ static int kill_inet_sock(struct nlmsghdr *h, void *arg, struct sockstat *s)
 	return rtnl_talk(rth, &req.nlh, NULL);
 }
 
-static int show_one_inet_sock(const struct sockaddr_nl *addr,
-		struct nlmsghdr *h, void *arg)
+static int show_one_inet_sock(struct nlmsghdr *h, void *arg)
 {
 	int err;
 	struct inet_diag_arg *diag_arg = arg;
@@ -3548,8 +3547,7 @@ static void unix_stats_print(struct sockstat *s, struct filter *f)
 	proc_ctx_print(s);
 }
 
-static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh,
-		void *arg)
+static int unix_show_sock(struct nlmsghdr *nlh, void *arg)
 {
 	struct filter *f = (struct filter *)arg;
 	struct unix_diag_msg *r = NLMSG_DATA(nlh);
@@ -3843,8 +3841,7 @@ static void packet_show_ring(struct packet_diag_ring *ring)
 	out(",features:0x%x", ring->pdr_features);
 }
 
-static int packet_show_sock(const struct sockaddr_nl *addr,
-		struct nlmsghdr *nlh, void *arg)
+static int packet_show_sock(struct nlmsghdr *nlh, void *arg)
 {
 	const struct filter *f = arg;
 	struct packet_diag_msg *r = NLMSG_DATA(nlh);
@@ -4133,8 +4130,7 @@ static int netlink_show_one(struct filter *f,
 	return 0;
 }
 
-static int netlink_show_sock(const struct sockaddr_nl *addr,
-		struct nlmsghdr *nlh, void *arg)
+static int netlink_show_sock(struct nlmsghdr *nlh, void *arg)
 {
 	struct filter *f = (struct filter *)arg;
 	struct netlink_diag_msg *r = NLMSG_DATA(nlh);
@@ -4257,8 +4253,7 @@ static void vsock_stats_print(struct sockstat *s, struct filter *f)
 	proc_ctx_print(s);
 }
 
-static int vsock_show_sock(const struct sockaddr_nl *addr,
-			   struct nlmsghdr *nlh, void *arg)
+static int vsock_show_sock(struct nlmsghdr *nlh, void *arg)
 {
 	struct filter *f = (struct filter *)arg;
 	struct vsock_diag_msg *r = NLMSG_DATA(nlh);
@@ -4311,8 +4306,7 @@ static void tipc_sock_addr_print(struct rtattr *net_addr, struct rtattr *id)
 
 }
 
-static int tipc_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh,
-			  void *arg)
+static int tipc_show_sock(struct nlmsghdr *nlh, void *arg)
 {
 	struct rtattr *stat[TIPC_NLA_SOCK_STAT_MAX + 1] = {};
 	struct rtattr *attrs[TIPC_NLA_SOCK_MAX + 1] = {};
@@ -4400,8 +4394,7 @@ struct sock_diag_msg {
 	__u8 sdiag_family;
 };
 
-static int generic_show_sock(const struct sockaddr_nl *addr,
-		struct nlmsghdr *nlh, void *arg)
+static int generic_show_sock(struct nlmsghdr *nlh, void *arg)
 {
 	struct sock_diag_msg *r = NLMSG_DATA(nlh);
 	struct inet_diag_arg inet_arg = { .f = arg, .protocol = IPPROTO_MAX };
@@ -4411,19 +4404,19 @@ static int generic_show_sock(const struct sockaddr_nl *addr,
 	case AF_INET:
 	case AF_INET6:
 		inet_arg.rth = inet_arg.f->rth_for_killing;
-		ret = show_one_inet_sock(addr, nlh, &inet_arg);
+		ret = show_one_inet_sock(nlh, &inet_arg);
 		break;
 	case AF_UNIX:
-		ret = unix_show_sock(addr, nlh, arg);
+		ret = unix_show_sock(nlh, arg);
 		break;
 	case AF_PACKET:
-		ret = packet_show_sock(addr, nlh, arg);
+		ret = packet_show_sock(nlh, arg);
 		break;
 	case AF_NETLINK:
-		ret = netlink_show_sock(addr, nlh, arg);
+		ret = netlink_show_sock(nlh, arg);
 		break;
 	case AF_VSOCK:
-		ret = vsock_show_sock(addr, nlh, arg);
+		ret = vsock_show_sock(nlh, arg);
 		break;
 	default:
 		ret = -1;
diff --git a/tc/m_action.c b/tc/m_action.c
index 8993b93a5c4b..e90867fc6c25 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -386,9 +386,7 @@ tc_print_action(FILE *f, const struct rtattr *arg, unsigned short tot_acts)
 	return 0;
 }
 
-int print_action(const struct sockaddr_nl *who,
-			   struct nlmsghdr *n,
-			   void *arg)
+int print_action(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct tcamsg *t = NLMSG_DATA(n);
@@ -541,7 +539,7 @@ static int tc_action_gd(int cmd, unsigned int flags,
 
 	if (cmd == RTM_GETACTION) {
 		new_json_obj(json);
-		ret = print_action(NULL, ans, stdout);
+		ret = print_action(ans, stdout);
 		if (ret < 0) {
 			fprintf(stderr, "Dump terminated\n");
 			free(ans);
diff --git a/tc/tc_class.c b/tc/tc_class.c
index 6b4ea48073f2..7e4e17fd7f39 100644
--- a/tc/tc_class.c
+++ b/tc/tc_class.c
@@ -296,8 +296,7 @@ static void graph_cls_show(FILE *fp, char *buf, struct hlist_head *root_list,
 	}
 }
 
-int print_class(const struct sockaddr_nl *who,
-		       struct nlmsghdr *n, void *arg)
+int print_class(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct tcmsg *t = NLMSG_DATA(n);
diff --git a/tc/tc_common.h b/tc/tc_common.h
index 371ca7d04602..d8a6dfdeabd4 100644
--- a/tc/tc_common.h
+++ b/tc/tc_common.h
@@ -13,10 +13,10 @@ int do_action(int argc, char **argv, void *buf, size_t buflen);
 int do_tcmonitor(int argc, char **argv);
 int do_exec(int argc, char **argv);
 
-int print_action(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg);
-int print_filter(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg);
-int print_qdisc(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg);
-int print_class(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg);
+int print_action(struct nlmsghdr *n, void *arg);
+int print_filter(struct nlmsghdr *n, void *arg);
+int print_qdisc(struct nlmsghdr *n, void *arg);
+int print_class(struct nlmsghdr *n, void *arg);
 void print_size_table(FILE *fp, const char *prefix, struct rtattr *rta);
 
 struct tc_estimator;
diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 15044b4bc6ed..e5c7bc4605a2 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -258,7 +258,7 @@ static int filter_chain_index_set;
 static __u32 filter_block_index;
 __u16 f_proto;
 
-int print_filter(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int print_filter(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct tcmsg *t = NLMSG_DATA(n);
@@ -592,7 +592,7 @@ static int tc_filter_get(int cmd, unsigned int flags, int argc, char **argv)
 	}
 
 	new_json_obj(json);
-	print_filter(NULL, answer, (void *)stdout);
+	print_filter(answer, (void *)stdout);
 	delete_json_obj();
 
 	free(answer);
diff --git a/tc/tc_monitor.c b/tc/tc_monitor.c
index 1f1ee08fb9cf..f8816cc53a46 100644
--- a/tc/tc_monitor.c
+++ b/tc/tc_monitor.c
@@ -34,8 +34,7 @@ static void usage(void)
 }
 
 
-static int accept_tcmsg(const struct sockaddr_nl *who,
-			struct rtnl_ctrl_data *ctrl,
+static int accept_tcmsg(struct rtnl_ctrl_data *ctrl,
 			struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
@@ -47,20 +46,20 @@ static int accept_tcmsg(const struct sockaddr_nl *who,
 	    n->nlmsg_type == RTM_DELTFILTER ||
 	    n->nlmsg_type == RTM_NEWCHAIN ||
 	    n->nlmsg_type == RTM_DELCHAIN) {
-		print_filter(who, n, arg);
+		print_filter(n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWTCLASS || n->nlmsg_type == RTM_DELTCLASS) {
-		print_class(who, n, arg);
+		print_class(n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_NEWQDISC || n->nlmsg_type == RTM_DELQDISC) {
-		print_qdisc(who, n, arg);
+		print_qdisc(n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type == RTM_GETACTION || n->nlmsg_type == RTM_NEWACTION ||
 	    n->nlmsg_type == RTM_DELACTION) {
-		print_action(who, n, arg);
+		print_action(n, arg);
 		return 0;
 	}
 	if (n->nlmsg_type != NLMSG_ERROR && n->nlmsg_type != NLMSG_NOOP &&
diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index c1d2df0171a7..c5da5b5c1ed5 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -212,8 +212,7 @@ static int tc_qdisc_modify(int cmd, unsigned int flags, int argc, char **argv)
 
 static int filter_ifindex;
 
-int print_qdisc(const struct sockaddr_nl *who,
-		struct nlmsghdr *n, void *arg)
+int print_qdisc(struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE *)arg;
 	struct tcmsg *t = NLMSG_DATA(n);
@@ -448,8 +447,7 @@ struct tc_qdisc_block_exists_ctx {
 	bool found;
 };
 
-static int tc_qdisc_block_exists_cb(const struct sockaddr_nl *who,
-				    struct nlmsghdr *n, void *arg)
+static int tc_qdisc_block_exists_cb(struct nlmsghdr *n, void *arg)
 {
 	struct tc_qdisc_block_exists_ctx *ctx = arg;
 	struct tcmsg *t = NLMSG_DATA(n);
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net v2] net/sched: act_gact: properly init 'goto chain'
From: Cong Wang @ 2018-10-19 20:40 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Jamal Hadi Salim, Jiri Pirko, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <33333af24adf6ef97013445c51f8ce7bd3c87a82.camel@redhat.com>

On Thu, Oct 18, 2018 at 8:30 AM Davide Caratti <dcaratti@redhat.com> wrote:
> The alternative is, we systematically forbid usage of 'goto chain' in
> tcfg_paction, so that:
>
> # tc f a dev v0 egress matchall action <whatever> random determ goto chain 4 5
>
> is systematically rejected with -EINVAL. This comand never worked, so we
> are not breaking anything in userspace.

This is exactly why I asked you if we really need to support it. :)

If no one finds it useful, disallowing it is a good solution here, as
we don't need
to introduce any additional code to handle filter chains.

^ permalink raw reply

* Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Martin Lau @ 2018-10-19 20:35 UTC (permalink / raw)
  To: Edward Cree
  Cc: Yonghong Song, Alexei Starovoitov, daniel@iogearbox.net,
	netdev@vger.kernel.org, Kernel Team
In-Reply-To: <20181019193548.g4neaudunoan5gvn@kafai-mbp.dhcp.thefacebook.com>

On Fri, Oct 19, 2018 at 12:36:49PM -0700, Martin Lau wrote:
> On Fri, Oct 19, 2018 at 06:04:11PM +0100, Edward Cree wrote:
> > On 18/10/18 22:19, Martin Lau wrote:
> > > As I have mentioned earlier, it is also special to
> > > the kernel because the BTF verifier and bpf_prog_load()
> > > need to do different checks for FUNC and FUNC_PROTO to
> > > ensure they are sane.
> > >
> > > First, we need to agree that the kernel needs to verify
> > > them differently.
> > >
> > > and then we can proceed to discuss how to distinguish them.
> > > We picked the current way to avoid adding a
> > > new BTF function section and keep it
> > > strict forward to distinguish them w/o relying
> > > on other hints from 'struct btf_type'.
> > >
> > > Are you suggesting another way of doing it?
> > But you *do* have such a new section.
> > The patch comment talks about a 'FuncInfo Table' which appears to
> Note that the new section, which contains the FuncInfo Table,
> is in a new ELF section ".BTF.ext" instead of the ".BTF".
> It is not in the ".BTF" section because it is only useful during
> bpf_prog_load().
> 
> I was meaning a new function section within ".BTF".
> 
> >  map (section, insn_idx) to type_id.  (I think this gets added in
> >  .BTF.ext per patch 9?)  So when you're looking at a FUNC type
> >  because you looked up a type_id from that table, you know it's
> >  the signature of a subprogram, and you're checking it as such.
> > Whereas, if you were doing something with some other type and it
> >  referenced a FUNC type (e.g., in the patch comment's example,
> >  you're checking foo's first argument against the type bar) in
> >  its type_id, you know you're using it as a formal type (a FUNC_
> >  PROTO in your parlance) and not as a subprogram.
> > The context in which you are using a type entry tells you which
> >  kind it is.  And the verifier can and should be smart enough to
> >  know what it's doing in this way.
> > 
> > And it's entirely reasonable for the same type entry to get used
> >  for both those cases; in my example, you'd have a FUNC type for
> >  int foo(int), referenced both by the func_info entry for foo
> >  and by the PTR type for bar.  And if you had another subprogram
> >  int baz(int), its func_info entry could reference the same
> >  type_id, because the (reference to the) name of the function
> >  should live in the func_info, not in the type.
> IIUC, I think what you are suggesting here is to use (type_id, name)
> to describe DW_TAG_subprogram "int foo1(int) {}", "int foo2(int) {}",
> "int foo3(int) {}" where type_id here is referring to the same
> DW_TAG_subroutine_type, and only define that _one_
> DW_TAG_subroutine_type in the BTF "type" section.
> 
> That will require more manipulation/type-merging in the dwarf2btf
> process and it could get quite complicated.
> 
> Note that CTF is also fully spelling out the return type
> and arg types for each DW_TAG_subprogram in a separate
> function section (still within the same ELF section).
> The only difference here is they are merged into the type
> section and FUNC_PROTO is added.
> 
> If the concern is having both FUNC and FUNC_PROTO is confusing,
> we could go back to the CTF way which adds a new function section
> in ".BTF" and it is only for DW_TAG_subprogram.
> BTF_KIND_FUNC_PROTO is then no longer necessary.
> Some of new BTF verifier checkings may actually go away also.
> The down side is there will be two id spaces.
Discussed a bit offline with folks about the two id spaces
situation and it is not good for debugging purpose.

If we must get rid of FUNC_PROTO, it is better to use the
name_off==0 check instead of adding a new function section.
We will go for this path in the next respin.

> 
> > 
> > What you are proposing seems to be saying "if we have this
> >  particular special btf_kind, then this BTF entry doesn't just
> >  define a type, it declares a subprogram of that type".  Oh,
> >  and with the name of the type as the subprogram name.  Which
> >  just creates blurry confusion as to whether BTF entries define
> >  types or declare objects; IMNSHO the correct approach is for
> >  objects to be declared elsewhere and to reference BTF types by
> >  their type_id.
> > Which is what the func_info table in patch 9 appears to do.
> > 
> > (It also rather bothers me the way we are using special type
> >  names to associate maps with their k/v types, rather than
> >  extending the records in the maps section to include type_ids
> >  referencing them.  It's the same kind of weird implicitness,
> >  and if I'd spotted it when it was going in I'd've nacked it,
> >  but I suppose it's ABI now and too late to change.)
> > 
> > -Ed

^ permalink raw reply

* Re: [PATCH net-next v2] netpoll: allow cleanup to be synchronous
From: Neil Horman @ 2018-10-19 20:34 UTC (permalink / raw)
  To: Debabrata Banerjee; +Cc: David S . Miller, netdev
In-Reply-To: <20181018151826.8373-1-dbanerje@akamai.com>

On Thu, Oct 18, 2018 at 11:18:26AM -0400, Debabrata Banerjee wrote:
> This fixes a problem introduced by:
> commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
> 
> When using netconsole on a bond, __netpoll_cleanup can asynchronously
> recurse multiple times, each __netpoll_free_async call can result in
> more __netpoll_free_async's. This means there is now a race between
> cleanup_work queues on multiple netpoll_info's on multiple devices and
> the configuration of a new netpoll. For example if a netconsole is set
> to enable 0, reconfigured, and enable 1 immediately, this netconsole
> will likely not work.
> 
> Given the reason for __netpoll_free_async is it can be called when rtnl
> is not locked, if it is locked, we should be able to execute
> synchronously. It appears to be locked everywhere it's called from.
> 
> Generalize the design pattern from the teaming driver for current
> callers of __netpoll_free_async.
> 
I presume you've tested this with some of the stacked devices?  I think I'm ok
with this change, but I'd like confirmation that its worked.

Neil

> CC: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
> ---
>  drivers/net/bonding/bond_main.c |  3 ++-
>  drivers/net/macvlan.c           |  2 +-
>  drivers/net/team/team.c         |  5 +----
>  include/linux/netpoll.h         |  4 +---
>  net/8021q/vlan_dev.c            |  3 +--
>  net/bridge/br_device.c          |  2 +-
>  net/core/netpoll.c              | 20 +++++---------------
>  net/dsa/slave.c                 |  2 +-
>  8 files changed, 13 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index ee28ec9e0aba..ffa37adb7681 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -963,7 +963,8 @@ static inline void slave_disable_netpoll(struct slave *slave)
>  		return;
>  
>  	slave->np = NULL;
> -	__netpoll_free_async(np);
> +
> +	__netpoll_free(np);
>  }
>  
>  static void bond_poll_controller(struct net_device *bond_dev)
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index cfda146f3b3b..fc8d5f1ee1ad 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -1077,7 +1077,7 @@ static void macvlan_dev_netpoll_cleanup(struct net_device *dev)
>  
>  	vlan->netpoll = NULL;
>  
> -	__netpoll_free_async(netpoll);
> +	__netpoll_free(netpoll);
>  }
>  #endif	/* CONFIG_NET_POLL_CONTROLLER */
>  
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index d887016e54b6..db633ae9f784 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -1104,10 +1104,7 @@ static void team_port_disable_netpoll(struct team_port *port)
>  		return;
>  	port->np = NULL;
>  
> -	/* Wait for transmitting packets to finish before freeing. */
> -	synchronize_rcu_bh();
> -	__netpoll_cleanup(np);
> -	kfree(np);
> +	__netpoll_free(np);
>  }
>  #else
>  static int team_port_enable_netpoll(struct team_port *port)
> diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
> index 3ef82d3a78db..676f1ff161a9 100644
> --- a/include/linux/netpoll.h
> +++ b/include/linux/netpoll.h
> @@ -31,8 +31,6 @@ struct netpoll {
>  	bool ipv6;
>  	u16 local_port, remote_port;
>  	u8 remote_mac[ETH_ALEN];
> -
> -	struct work_struct cleanup_work;
>  };
>  
>  struct netpoll_info {
> @@ -63,7 +61,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt);
>  int __netpoll_setup(struct netpoll *np, struct net_device *ndev);
>  int netpoll_setup(struct netpoll *np);
>  void __netpoll_cleanup(struct netpoll *np);
> -void __netpoll_free_async(struct netpoll *np);
> +void __netpoll_free(struct netpoll *np);
>  void netpoll_cleanup(struct netpoll *np);
>  void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>  			     struct net_device *dev);
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 546af0e73ac3..ff720f1ebf73 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -756,8 +756,7 @@ static void vlan_dev_netpoll_cleanup(struct net_device *dev)
>  		return;
>  
>  	vlan->netpoll = NULL;
> -
> -	__netpoll_free_async(netpoll);
> +	__netpoll_free(netpoll);
>  }
>  #endif /* CONFIG_NET_POLL_CONTROLLER */
>  
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index e053a4e43758..c6abf927f0c9 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -344,7 +344,7 @@ void br_netpoll_disable(struct net_bridge_port *p)
>  
>  	p->np = NULL;
>  
> -	__netpoll_free_async(np);
> +	__netpoll_free(np);
>  }
>  
>  #endif
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index de1d1ba92f2d..6ac71624ead4 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -591,7 +591,6 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
>  
>  	np->dev = ndev;
>  	strlcpy(np->dev_name, ndev->name, IFNAMSIZ);
> -	INIT_WORK(&np->cleanup_work, netpoll_async_cleanup);
>  
>  	if (ndev->priv_flags & IFF_DISABLE_NETPOLL) {
>  		np_err(np, "%s doesn't support polling, aborting\n",
> @@ -790,10 +789,6 @@ void __netpoll_cleanup(struct netpoll *np)
>  {
>  	struct netpoll_info *npinfo;
>  
> -	/* rtnl_dereference would be preferable here but
> -	 * rcu_cleanup_netpoll path can put us in here safely without
> -	 * holding the rtnl, so plain rcu_dereference it is
> -	 */
>  	npinfo = rtnl_dereference(np->dev->npinfo);
>  	if (!npinfo)
>  		return;
> @@ -814,21 +809,16 @@ void __netpoll_cleanup(struct netpoll *np)
>  }
>  EXPORT_SYMBOL_GPL(__netpoll_cleanup);
>  
> -static void netpoll_async_cleanup(struct work_struct *work)
> +void __netpoll_free(struct netpoll *np)
>  {
> -	struct netpoll *np = container_of(work, struct netpoll, cleanup_work);
> +	ASSERT_RTNL();
>  
> -	rtnl_lock();
> +	/* Wait for transmitting packets to finish before freeing. */
> +	synchronize_rcu_bh();
>  	__netpoll_cleanup(np);
> -	rtnl_unlock();
>  	kfree(np);
>  }
> -
> -void __netpoll_free_async(struct netpoll *np)
> -{
> -	schedule_work(&np->cleanup_work);
> -}
> -EXPORT_SYMBOL_GPL(__netpoll_free_async);
> +EXPORT_SYMBOL_GPL(__netpoll_free);
>  
>  void netpoll_cleanup(struct netpoll *np)
>  {
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 3f840b6eea69..3679e13b2ead 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -722,7 +722,7 @@ static void dsa_slave_netpoll_cleanup(struct net_device *dev)
>  
>  	p->netpoll = NULL;
>  
> -	__netpoll_free_async(netpoll);
> +	__netpoll_free(netpoll);
>  }
>  
>  static void dsa_slave_poll_controller(struct net_device *dev)
> -- 
> 2.19.1
> 
> 

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Daniel Borkmann @ 2018-10-19 20:32 UTC (permalink / raw)
  To: Joe Stringer, Martin KaFai Lau
  Cc: Nitin Hande, netdev, ast, Jesper Brouer, john fastabend
In-Reply-To: <CAOftzPjyeD2-nGW+NPC4sbxLcQY_CFT5HikXYeKUEWCbRcrpQg@mail.gmail.com>

On 10/19/2018 06:47 PM, Joe Stringer wrote:
> On Thu, 18 Oct 2018 at 22:07, Martin Lau <kafai@fb.com> wrote:
>> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:
>>> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>> On 10/18/2018 11:06 PM, Joe Stringer wrote:
>>>>> On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
>>>> [...]
>>>>>> Open Issue
>>>>>> * The underlying code relies on presence of an skb to find out the
>>>>>> right sk for the case of REUSEPORT socket option. Since there is
>>>>>> no skb available at XDP hookpoint, the helper function will return
>>>>>> the first available sk based off the 5 tuple hash. If the desire
>>>>>> is to return a particular sk matching reuseport_cb function, please
>>>>>> suggest way to tackle it, which can be addressed in a future commit.
>>>>
>>>>>> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
>>>>>
>>>>> Thanks Nitin, LGTM overall.
>>>>>
>>>>> The REUSEPORT thing suggests that the usage of this helper from XDP
>>>>> layer may lead to a different socket being selected vs. the equivalent
>>>>> call at TC hook, or other places where the selection may occur. This
>>>>> could be a bit counter-intuitive.
>>>>>
>>>>> One thought I had to work around this was to introduce a flag,
>>>>> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
>>>>> effectively communicate in the API that the bpf_sk_lookup_xxx()
>>>>> functions will only select a REUSEPORT socket based on the hash and
>>>>> not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
>>>>> of the flag would support finding REUSEPORT sockets by other
>>>>> mechanisms (which would be allowed for now from TC hooks but would be
>>>>> disallowed from XDP, since there's no specific plan to support this).
>>>>
>>>> Hmm, given skb is NULL here the only way to lookup the socket in such
>>>> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
>>>> perhaps alternative is to pass this hash in from XDP itself to the
>>>> helper so it could be custom selector. Do you have a specific use case
>>>> on this for XDP (just curious)?
>>>
>>> I don't have a use case for SO_REUSEPORT introspection from XDP, so
>>> I'm primarily thinking from the perspective of making the behaviour
>>> clear in the API in a way that leaves open the possibility for a
>>> reasonable implementation in future. From that perspective, my main
>>> concern is that it may surprise some BPF writers that the same
>>> "bpf_sk_lookup_tcp()" call (with identical parameters) may have
>>> different behaviour at TC vs. XDP layers, as the BPF selection of
>>> sockets is respected at TC but not at XDP.
>>>
>>> FWIW we're already out of parameters for the actual call, so if we
>>> wanted to allow passing a hash in, we'd need to either dedicate half
>>> the 'flags' field for this configurable hash, or consider adding the
>>> new hash parameter to 'struct bpf_sock_tuple'.
>>>
>>> +Martin for any thoughts on SO_REUSEPORT and XDP here.
>> The XDP/TC prog has read access to the sk fields through
>> 'struct bpf_sock'?
>>
>> A quick thought...
>> Considering all sk in the same reuse->socks[] share
>> many things (e.g. family,type,protocol,ip,port..etc are the same),
>> I wonder returning which particular sk from reuse->socks[] will
>> matter too much since most of the fields from 'struct bpf_sock' will
>> be the same.  Some of fields in 'struct bpf_sock' could be different
>> though, like priority?  Hence, another possibility is to limit the
>> accessible fields for the XDP prog.  Only allow accessing the fields
>> that must be the same among the sk in the same reuse->socks[].
> 
> This sounds pretty reasonable to me.

Agree, and in any case this difference in returned sk selection should
probably also be documented in the uapi helper description.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox