Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC 2/4] net: phy: allow to bind genphy driver at probe time
From: Heiner Kallweit @ 2019-08-13 23:02 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <010ae64f-7e48-5e1e-2928-af3c4364f6e3@gmail.com>

On 14.08.2019 00:53, Florian Fainelli wrote:
> On 8/13/19 2:25 PM, Heiner Kallweit wrote:
>> In cases like a fixed phy that is never attached to a net_device we
>> may want to bind the genphy driver at probe time. Setting a PHY ID of
>> 0xffffffff to bind the genphy driver would fail due to a check in
>> get_phy_device(). Therefore let's change the PHY ID the genphy driver
>> binds to to 0xfffffffe. This still shouldn't match any real PHY,
>> and it will pass the check in get_phy_devcie().
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/phy/phy_device.c | 3 +--
>>  include/linux/phy.h          | 4 ++++
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 163295dbc..54f80af31 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -2388,8 +2388,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n)
>>  EXPORT_SYMBOL(phy_drivers_unregister);
>>  
>>  static struct phy_driver genphy_driver = {
>> -	.phy_id		= 0xffffffff,
>> -	.phy_id_mask	= 0xffffffff,
>> +	PHY_ID_MATCH_EXACT(GENPHY_ID),
>>  	.name		= "Generic PHY",
>>  	.soft_reset	= genphy_no_soft_reset,
>>  	.get_features	= genphy_read_abilities,
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 5ac7d2137..3b07bce78 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -37,6 +37,10 @@
>>  #define PHY_1000BT_FEATURES	(SUPPORTED_1000baseT_Half | \
>>  				 SUPPORTED_1000baseT_Full)
>>  
>> +#define GENPHY_ID_HIGH		0xffffU
>> +#define GENPHY_ID_LOW		0xfffeU
>> +#define GENPHY_ID		((GENPHY_ID_HIGH << 16) | GENPHY_ID_LOW)
> 
> This is a possible user ABI change here, if there is anything that
> relies on reading 0xffff_ffff as a valid PHY OUI, you would be breaking
> it. We might as well try to assign ourselves a specific PHY OUI, very
> much like the Linux USB hubs show up with a Linux Foundation vendor ID.
> 

I see the point. However in get_phy_device() we have the following check
that should cause a PHY with ID 0xffff_ffff to be ignored. Therefore
I doubt there's any such PHY ID in use.

	/* If the phy_id is mostly Fs, there is no device there */
	if ((phy_id & 0x1fffffff) == 0x1fffffff)
		return ERR_PTR(-ENODEV);

Heiner

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-13 23:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Song Liu, Kees Cook, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190813215823.3sfbakzzjjykyng2@ast-mbp>

On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > >
> > > Inside containers and inside nested containers we need to start processes
> > > that will use bpf. All of the processes are trusted.
> >
> > Trusted by whom?  In a non-nested container, the container manager
> > *might* be trusted by the outside world.  In a *nested* container,
> > unless the inner container management is controlled from outside the
> > outer container, it's not trusted.  I don't know much about how
> > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > very strongly toward user namespaces and maximally-untrusted
> > containers, and I think bpf() should work in that context.
>
> agree that containers (namespaces) reduce amount of trust necessary
> for apps to run, but the end goal is not security though.
> Linux has become a single user system.
> If user can ssh into the host they can become root.
> If arbitrary code can run on the host it will be break out of any sandbox.

I would argue that this is a reasonable assumption to make if you're
designing a system using Linux, but it's not a valid assumption to
make as kernel developers.  Otherwise we should just give everyone
CAP_SYS_ADMIN and call it a day.  There really is a difference between
root and non-root.

> Containers are not providing the level of security that is enough
> to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> Containers are used to make production systems safer.
> Some people call it more 'secure', but it's clearly not secure for
> arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> It's been a constant source of pain. The constant blinding, randomization,
> verifier speculative analysis, all spectre v1, v2, v4 mitigations
> are simply not worth it. It's a lot of complex kernel code without users.

Seccomp really will want eBPF some day, and it should work without
privilege.  Maybe it should be a restricted subset of eBPF, and
Spectre will always be an issue until dramatically better hardware
shows up, but I think people will want the ability for regular
programs to load eBPF seccomp programs.

> Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> The applications that will use it are going to be just as trusted as systemd.

I still don't understand your systemd example.  systemd --users is not
trusted systemwide in any respect.  The main PID 1 systemd is root.
No matter how you dice it, granting a user systemd instance extra bpf
access is tantamount to granting the user extra bpf access in general.

It sounds to me like you're thinking of eBPF as a feature a bit like
unprivileged user namespaces: *in principle*, it's supposed to be safe
to give any unprivileged process the ability to use it, and you
consider security flaws in it to be bugs worth fixing.  But you think
it's a large attack surface and that most unprivileged programs
shouldn't be allowed to use it.  Is that reasonable?


>
> > > To solve your concern of bypassing all capable checks...
> > > How about we do /dev/bpf/full_verifier first?
> > > It will replace capable() checks in the verifier only.
> >
> > I'm not convinced that "in the verifier" is the right distinction.
> > Telling administrators that some setting lets certain users bypass
> > bpf() verifier checks doesn't have a clear enough meaning.
>
> linux is a single user system. there are no administrators any more.
> No doubt, folks will disagree, but that game is over.
> At least on bpf side it's done.
>
> > I propose,
> > instead, that the current capable() checks be divided into three
> > categories:
>
> I don't see a use case for these categories.
> All bpf programs extend the kernel in some way.
> The kernel vs user is one category.
> Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> When application has CAP_NET_ADMIN it covers all of networking knobs.
> There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> Similarly CAP_BPF as the only knob is enough.
> The only disadvantage of CAP_BPF is that it's not possible to
> pass it from one systemd-like daemon to another systemd-like daemon.
> Hence /dev/bpf idea and passing file descriptor.
>
> > This type of thing actually fits quite nicely into an idea I've been
> > thinking about for a while called "implicit rights". In very brief
> > summary, there would be objects called /dev/rights/xyz, where xyz is
> > the same of a "right".  If there is a readable object of the right
> > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > There's a bit more flexibility on top of this.  BPF could use
> > /dev/rights/bpf/maptypes/lpm and
> > /dev/rights/bpf/verifier/bounded_loops, for example.  Other non-BPF
> > use cases include a biggie:
> > /dev/rights/namespace/create_unprivileged_userns.
> > /dev/rights/bind_port/80 would be nice, too.
>
> The concept of "implicit rights" is very nice and I'm sure it will
> be a good fit somewhere, but I don't see why use it in bpf space.
> There is no use case for fine grain partition of bpf features.
>

^ permalink raw reply

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Yonghong Song @ 2019-08-13 23:11 UTC (permalink / raw)
  To: Carlos Neira, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <20190813184747.12225-2-cneirabustos@gmail.com>



On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
> 
> New bpf helper bpf_get_current_pidns_info.
> This helper obtains the active namespace from current and returns
> pid, tgid, device and namespace id as seen from that namespace,
> allowing to instrument a process inside a container.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   fs/internal.h            |  2 --
>   fs/namei.c               |  1 -
>   include/linux/bpf.h      |  1 +
>   include/linux/namei.h    |  4 +++
>   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>   kernel/bpf/core.c        |  1 +
>   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/trace/bpf_trace.c |  2 ++
>   8 files changed, 102 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 315fcd8d237c..6647e15dd419 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>   /*
>    * namei.c
>    */
> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> -			   struct path *path, struct path *root);
>   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>   			   const char *, unsigned int, struct path *);
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a89fc72a4a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
>   #include <linux/export.h>
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
> -#include <linux/fs.h>
>   #include <linux/namei.h>
>   #include <linux/pagemap.h>
>   #include <linux/fsnotify.h>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..b45c8b6f7cb4 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
>   #include <linux/path.h>
>   #include <linux/fcntl.h>
>   #include <linux/errno.h>
> +#include <linux/fs.h>
>   
>   enum { MAX_NESTED_LINKS = 8 };
>   
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>   
>   extern void nd_jump_link(struct path *path);
>   
> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> +			   struct path *path, struct path *root);
> +
>   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>   {
>   	((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOTDIR** if /proc/self/ns does not exists.
> + *
> + *		**-ENOMEM**  if allocation fails.
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2875,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..41fbf1f28a48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *pidns_path = "/proc/self/ns/pid";
> +	struct pid_namespace *pidns = NULL;
> +	struct filename *tmp = NULL;
> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret;
> +	int len;

I am running your sample program and get the following kernel bug:

...
[   26.414825] BUG: sleeping function called from invalid context at 
/data/users/yhs/work/net-next/fs
/dcache.c:843
[   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
[   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W 
5.3.0-rc1+ #280
[   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3-1.el7.centos 04/01/2
014
[   26.419393] Call Trace:
[   26.419697]  <IRQ>
[   26.419960]  dump_stack+0x46/0x5b
[   26.420434]  ___might_sleep+0xe4/0x110
[   26.420894]  dput+0x2a/0x200
[   26.421265]  walk_component+0x10c/0x280
[   26.421773]  link_path_walk+0x327/0x560
[   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
[   26.422848]  ? path_init+0x232/0x330
[   26.423364]  path_lookupat+0x88/0x200
[   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
[   26.424521]  filename_lookup+0xaf/0x190
[   26.425031]  ? simple_attr_release+0x20/0x20
[   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
[   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
[   26.426779]  trace_call_bpf+0xb5/0x160
[   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
[   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
[   26.428496]  kprobe_perf_func+0x4d/0x280
[   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
[   26.429584]  ? tracing_record_taskinfo+0xe/0x80
[   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
[   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
[   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
[   26.431930]  kprobe_ftrace_handler+0x90/0xf0
[   26.432495]  ftrace_ops_assist_func+0x63/0x100
[   26.433060]  0xffffffffc03180bf
[   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
...

To prevent we are running in arbitrary task (e.g., idle task)
context which may introduce sleeping issues, the following
probably appropriate:

        if (in_nmi() || in_softirq())
                return -EPERM;

Anyway, if in nmi or softirq, the namespace and pid/tgid
we get may be just accidentally associated with the bpf running
context, but it could be in a different context. So such info
is not reliable any way.

> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;
> +	pidns = task_active_pid_ns(current);
> +	if (unlikely(!pidns))
> +		goto clear;
> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +	if (unlikely(!pid))
> +		goto clear;
> +	tgid = task_tgid_nr_ns(current, pidns);
> +	if (unlikely(!tgid))
> +		goto clear;
> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;
> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return -ENOMEM;
> +	}
> +	len = strlen(pidns_path) + 1;
> +	memcpy((char *)tmp->name, pidns_path, len);
> +	tmp->uptr = NULL;
> +	tmp->aname = NULL;
> +	tmp->refcnt = 1;
> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> +	if (ret) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return ret;
> +	}
> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = inode->i_sb->s_dev;
> +	return 0;
> +clear:
> +	memset((void *)pidns_info, 0, (size_t) size);
> +	return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func		= bpf_get_current_pidns_info,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> 

^ permalink raw reply

* pull-request: bpf-next 2019-08-14
From: Daniel Borkmann @ 2019-08-13 23:16 UTC (permalink / raw)
  To: davem, jakub.kicinski; +Cc: daniel, ast, andrii.nakryiko, netdev, bpf

Hi David, hi Jakub,

The following pull-request contains BPF updates for your *net-next* tree.

There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):

        for (i = 1; i <= btf__get_nr_types(btf); i++) {
                t = (struct btf_type *)btf__type_by_id(btf, i);

                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
  <<<<<<< HEAD
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && kind == BTF_KIND_DATASEC) {
  =======
                        t->size = sizeof(int);
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
                } else if (!has_datasec && btf_is_datasec(t)) {
  >>>>>>> 72ef80b5ee131e96172f19e74b4f98fa3404efe8
                        /* replace DATASEC with STRUCT */

Conflict is between the two commits 1d4126c4e119 ("libbpf: sanitize VAR to
conservative 1-byte INT") and b03bc6853c0e ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:

  [...]
                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && btf_is_datasec(t)) {
                        /* replace DATASEC with STRUCT */
  [...]

The main changes are:

1) Addition of core parts of compile once - run everywhere (co-re) effort,
   that is, relocation of fields offsets in libbpf as well as exposure of
   kernel's own BTF via sysfs and loading through libbpf, from Andrii.

   More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
   and http://vger.kernel.org/lpc-bpf2018.html#session-2

2) Enable passing input flags to the BPF flow dissector to customize parsing
   and allowing it to stop early similar to the C based one, from Stanislav.

3) Add a BPF helper function that allows generating SYN cookies from XDP and
   tc BPF, from Petar.

4) Add devmap hash-based map type for more flexibility in device lookup for
   redirects, from Toke.

5) Improvements to XDP forwarding sample code now utilizing recently enabled
   devmap lookups, from Jesper.

6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
   and Takshak.

7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.

8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.

9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.

10) Add perf event output helper also for other skb-based program types, from Allan.

11) Fix a co-re related compilation error in selftests, from Yonghong.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 3e3bb69589e482e0783f28d4cd1d8e56fda0bcbb:

  tc-testing: added tdc tests for [b|p]fifo qdisc (2019-07-23 14:08:15 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to 72ef80b5ee131e96172f19e74b4f98fa3404efe8:

  Merge branch 'bpf-libbpf-read-sysfs-btf' (2019-08-13 23:19:42 +0200)

----------------------------------------------------------------
Alexei Starovoitov (10):
      Merge branch 'convert-tests-to-libbpf'
      Merge branch 'flow_dissector-input-flags'
      Merge branch 'revamp-test_progs'
      Merge branch 'devmap_hash'
      Merge branch 'gen-syn-cookie'
      Merge branch 'setsockopt-extra-mem'
      selftests/bpf: add loop test 4
      selftests/bpf: add loop test 5
      Merge branch 'test_progs-stdio'
      Merge branch 'compile-once-run-everywhere'

Allan Zhang (2):
      bpf: Allow bpf_skb_event_output for a few prog types
      selftests/bpf: Add selftests for bpf_perf_event_output

Andrii Nakryiko (33):
      libbpf: provide more helpful message on uninitialized global var
      selftests/bpf: convert test_get_stack_raw_tp to perf_buffer API
      selftests/bpf: switch test_tcpnotify to perf_buffer API
      samples/bpf: convert xdp_sample_pkts_user to perf_buffer API
      samples/bpf: switch trace_output sample to perf_buffer API
      selftests/bpf: remove perf buffer helpers
      selftests/bpf: prevent headers to be compiled as C code
      selftests/bpf: revamp test_progs to allow more control
      selftests/bpf: add test selectors by number and name to test_progs
      libbpf: return previous print callback from libbpf_set_print
      selftest/bpf: centralize libbpf logging management for test_progs
      selftests/bpf: abstract away test log output
      selftests/bpf: add sub-tests support for test_progs
      selftests/bpf: convert bpf_verif_scale.c to sub-tests API
      selftests/bpf: convert send_signal.c to use subtests
      selftests/bpf: fix clearing buffered output between tests/subtests
      libbpf: add helpers for working with BTF types
      libbpf: convert libbpf code to use new btf helpers
      libbpf: add .BTF.ext offset relocation section loading
      libbpf: implement BPF CO-RE offset relocation algorithm
      selftests/bpf: add BPF_CORE_READ relocatable read macro
      selftests/bpf: add CO-RE relocs testing setup
      selftests/bpf: add CO-RE relocs struct flavors tests
      selftests/bpf: add CO-RE relocs nesting tests
      selftests/bpf: add CO-RE relocs array tests
      selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
      selftests/bpf: add CO-RE relocs modifiers/typedef tests
      selftests/bpf: add CO-RE relocs ptr-as-array tests
      selftests/bpf: add CO-RE relocs ints tests
      selftests/bpf: add CO-RE relocs misc tests
      btf: expose BTF info through sysfs
      btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux
      libbpf: attempt to load kernel BTF from sysfs first

Daniel Borkmann (2):
      Merge branch 'bpf-xdp-fwd-sample-improvements'
      Merge branch 'bpf-libbpf-read-sysfs-btf'

Ivan Khoronzhuk (1):
      xdp: xdp_umem: fix umem pages mapping for 32bits systems

Jakub Kicinski (1):
      tools: bpftool: add support for reporting the effective cgroup progs

Jesper Dangaard Brouer (3):
      samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
      samples/bpf: make xdp_fwd more practically usable via devmap lookup
      samples/bpf: xdp_fwd explain bpf_fib_lookup return codes

Petar Penkov (7):
      tcp: tcp_syn_flood_action read port from socket
      tcp: add skb-less helpers to retrieve SYN cookie
      bpf: add bpf_tcp_gen_syncookie helper
      bpf: sync bpf.h to tools/
      selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
      selftests/bpf: add test for bpf_tcp_gen_syncookie
      selftests/bpf: fix race in flow dissector tests

Peter Wu (2):
      tools: bpftool: fix reading from /proc/config.gz
      tools: bpftool: add feature check for zlib

Stanislav Fomichev (12):
      bpf/flow_dissector: pass input flags to BPF flow dissector program
      bpf/flow_dissector: document flags
      bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
      tools/bpf: sync bpf_flow_keys flags
      selftests/bpf: support BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG
      bpf/flow_dissector: support ipv6 flow_label and BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
      selftests/bpf: support BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP
      bpf: always allocate at least 16 bytes for setsockopt hook
      selftests/bpf: extend sockopt_sk selftest with TCP_CONGESTION use case
      selftests/bpf: test_progs: switch to open_memstream
      selftests/bpf: test_progs: test__printf -> printf
      selftests/bpf: test_progs: drop extra trailing tab

Toke Høiland-Jørgensen (6):
      include/bpf.h: Remove map_insert_ctx() stubs
      xdp: Refactor devmap allocation code for reuse
      xdp: Add devmap_hash map type for looking up devices by hashed index
      tools/include/uapi: Add devmap_hash BPF map type
      tools/libbpf_probes: Add new devmap_hash type
      tools: Add definitions for devmap_hash map type

Yonghong Song (1):
      tools/bpf: fix core_reloc.c compilation error

 Documentation/ABI/testing/sysfs-kernel-btf         |   17 +
 Documentation/bpf/prog_flow_dissector.rst          |   18 +
 include/linux/bpf.h                                |   11 +-
 include/linux/bpf_types.h                          |    1 +
 include/linux/skbuff.h                             |    2 +-
 include/net/tcp.h                                  |   10 +
 include/trace/events/xdp.h                         |    3 +-
 include/uapi/linux/bpf.h                           |   37 +-
 kernel/bpf/Makefile                                |    3 +
 kernel/bpf/cgroup.c                                |   17 +-
 kernel/bpf/devmap.c                                |  332 ++++++-
 kernel/bpf/sysfs_btf.c                             |   51 +
 kernel/bpf/verifier.c                              |    2 +
 net/bpf/test_run.c                                 |   39 +-
 net/core/filter.c                                  |   88 +-
 net/core/flow_dissector.c                          |   21 +-
 net/ipv4/tcp_input.c                               |   81 +-
 net/ipv4/tcp_ipv4.c                                |   15 +
 net/ipv6/tcp_ipv6.c                                |   15 +
 net/xdp/xdp_umem.c                                 |   12 +-
 samples/bpf/trace_output_user.c                    |   43 +-
 samples/bpf/xdp_fwd_kern.c                         |   39 +-
 samples/bpf/xdp_fwd_user.c                         |   35 +-
 samples/bpf/xdp_sample_pkts_user.c                 |   61 +-
 scripts/link-vmlinux.sh                            |   52 +-
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |   16 +-
 tools/bpf/bpftool/Documentation/bpftool-map.rst    |    2 +-
 tools/bpf/bpftool/Makefile                         |   13 +-
 tools/bpf/bpftool/bash-completion/bpftool          |   19 +-
 tools/bpf/bpftool/cgroup.c                         |   83 +-
 tools/bpf/bpftool/feature.c                        |  105 +-
 tools/bpf/bpftool/map.c                            |    3 +-
 tools/include/uapi/linux/bpf.h                     |   44 +-
 tools/lib/bpf/btf.c                                |  250 +++--
 tools/lib/bpf/btf.h                                |  182 ++++
 tools/lib/bpf/btf_dump.c                           |  138 +--
 tools/lib/bpf/libbpf.c                             | 1009 +++++++++++++++++++-
 tools/lib/bpf/libbpf.h                             |    3 +-
 tools/lib/bpf/libbpf_internal.h                    |  105 ++
 tools/lib/bpf/libbpf_probes.c                      |    1 +
 tools/testing/selftests/bpf/Makefile               |   14 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   23 +
 .../testing/selftests/bpf/prog_tests/bpf_obj_id.c  |    6 +-
 .../selftests/bpf/prog_tests/bpf_verif_scale.c     |   92 +-
 .../testing/selftests/bpf/prog_tests/core_reloc.c  |  385 ++++++++
 .../selftests/bpf/prog_tests/flow_dissector.c      |  265 ++++-
 .../selftests/bpf/prog_tests/get_stack_raw_tp.c    |   82 +-
 .../selftests/bpf/prog_tests/reference_tracking.c  |   15 +-
 .../testing/selftests/bpf/prog_tests/send_signal.c |   15 +-
 .../selftests/bpf/prog_tests/xdp_noinline.c        |    3 +-
 tools/testing/selftests/bpf/progs/bpf_flow.c       |   60 +-
 .../selftests/bpf/progs/btf__core_reloc_arrays.c   |    3 +
 .../progs/btf__core_reloc_arrays___diff_arr_dim.c  |    3 +
 .../btf__core_reloc_arrays___diff_arr_val_sz.c     |    3 +
 .../progs/btf__core_reloc_arrays___err_non_array.c |    3 +
 .../btf__core_reloc_arrays___err_too_shallow.c     |    3 +
 .../progs/btf__core_reloc_arrays___err_too_small.c |    3 +
 .../btf__core_reloc_arrays___err_wrong_val_type1.c |    3 +
 .../btf__core_reloc_arrays___err_wrong_val_type2.c |    3 +
 .../selftests/bpf/progs/btf__core_reloc_flavors.c  |    3 +
 .../btf__core_reloc_flavors__err_wrong_name.c      |    3 +
 .../selftests/bpf/progs/btf__core_reloc_ints.c     |    3 +
 .../bpf/progs/btf__core_reloc_ints___bool.c        |    3 +
 .../progs/btf__core_reloc_ints___err_bitfield.c    |    3 +
 .../progs/btf__core_reloc_ints___err_wrong_sz_16.c |    3 +
 .../progs/btf__core_reloc_ints___err_wrong_sz_32.c |    3 +
 .../progs/btf__core_reloc_ints___err_wrong_sz_64.c |    3 +
 .../progs/btf__core_reloc_ints___err_wrong_sz_8.c  |    3 +
 .../progs/btf__core_reloc_ints___reverse_sign.c    |    3 +
 .../selftests/bpf/progs/btf__core_reloc_misc.c     |    5 +
 .../selftests/bpf/progs/btf__core_reloc_mods.c     |    3 +
 .../bpf/progs/btf__core_reloc_mods___mod_swap.c    |    3 +
 .../bpf/progs/btf__core_reloc_mods___typedefs.c    |    3 +
 .../selftests/bpf/progs/btf__core_reloc_nesting.c  |    3 +
 .../progs/btf__core_reloc_nesting___anon_embed.c   |    3 +
 .../btf__core_reloc_nesting___dup_compat_types.c   |    5 +
 ...btf__core_reloc_nesting___err_array_container.c |    3 +
 .../btf__core_reloc_nesting___err_array_field.c    |    3 +
 ...__core_reloc_nesting___err_dup_incompat_types.c |    4 +
 ...f__core_reloc_nesting___err_missing_container.c |    3 +
 .../btf__core_reloc_nesting___err_missing_field.c  |    3 +
 ..._core_reloc_nesting___err_nonstruct_container.c |    3 +
 ...__core_reloc_nesting___err_partial_match_dups.c |    4 +
 .../progs/btf__core_reloc_nesting___err_too_deep.c |    3 +
 .../btf__core_reloc_nesting___extra_nesting.c      |    3 +
 .../btf__core_reloc_nesting___struct_union_mixup.c |    3 +
 .../bpf/progs/btf__core_reloc_primitives.c         |    3 +
 .../btf__core_reloc_primitives___diff_enum_def.c   |    3 +
 .../btf__core_reloc_primitives___diff_func_proto.c |    3 +
 .../btf__core_reloc_primitives___diff_ptr_type.c   |    3 +
 .../btf__core_reloc_primitives___err_non_enum.c    |    3 +
 .../btf__core_reloc_primitives___err_non_int.c     |    3 +
 .../btf__core_reloc_primitives___err_non_ptr.c     |    3 +
 .../bpf/progs/btf__core_reloc_ptr_as_arr.c         |    3 +
 .../progs/btf__core_reloc_ptr_as_arr___diff_sz.c   |    3 +
 .../testing/selftests/bpf/progs/core_reloc_types.h |  667 +++++++++++++
 tools/testing/selftests/bpf/progs/loop4.c          |   18 +
 tools/testing/selftests/bpf/progs/loop5.c          |   32 +
 tools/testing/selftests/bpf/progs/sockopt_sk.c     |   22 +
 .../selftests/bpf/progs/test_core_reloc_arrays.c   |   55 ++
 .../selftests/bpf/progs/test_core_reloc_flavors.c  |   62 ++
 .../selftests/bpf/progs/test_core_reloc_ints.c     |   44 +
 .../selftests/bpf/progs/test_core_reloc_kernel.c   |   36 +
 .../selftests/bpf/progs/test_core_reloc_misc.c     |   57 ++
 .../selftests/bpf/progs/test_core_reloc_mods.c     |   62 ++
 .../selftests/bpf/progs/test_core_reloc_nesting.c  |   46 +
 .../bpf/progs/test_core_reloc_primitives.c         |   43 +
 .../bpf/progs/test_core_reloc_ptr_as_arr.c         |   30 +
 .../selftests/bpf/progs/test_get_stack_rawtp.c     |    2 +-
 .../bpf/progs/test_tcp_check_syncookie_kern.c      |   48 +-
 tools/testing/selftests/bpf/test_maps.c            |   16 +
 tools/testing/selftests/bpf/test_progs.c           |  374 +++++++-
 tools/testing/selftests/bpf/test_progs.h           |   40 +-
 tools/testing/selftests/bpf/test_sockopt_sk.c      |   25 +
 .../selftests/bpf/test_tcp_check_syncookie.sh      |    3 +
 .../selftests/bpf/test_tcp_check_syncookie_user.c  |   61 +-
 tools/testing/selftests/bpf/test_tcpnotify_user.c  |   90 +-
 tools/testing/selftests/bpf/test_verifier.c        |   12 +-
 tools/testing/selftests/bpf/trace_helpers.c        |  125 ---
 tools/testing/selftests/bpf/trace_helpers.h        |    9 -
 .../testing/selftests/bpf/verifier/event_output.c  |   94 ++
 121 files changed, 5229 insertions(+), 920 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-btf
 create mode 100644 kernel/bpf/sysfs_btf.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_misc.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
 create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
 create mode 100644 tools/testing/selftests/bpf/progs/loop4.c
 create mode 100644 tools/testing/selftests/bpf/progs/loop5.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_misc.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c
 create mode 100644 tools/testing/selftests/bpf/verifier/event_output.c

^ permalink raw reply

* Re: [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper.
From: Yonghong Song @ 2019-08-13 23:19 UTC (permalink / raw)
  To: Carlos Neira, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <20190813184747.12225-4-cneirabustos@gmail.com>



On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
> 
> Added self-tests for new helper bpf_get_current_pidns_info.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   tools/include/uapi/linux/bpf.h                     |  31 ++++-
>   tools/testing/selftests/bpf/Makefile               |   2 +-
>   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
>   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
>   tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++

Could you break this patch into two?
   patch 1: tools/include/uapi/linux/bpf.h
   patch 2: rest of changes

>   5 files changed, 223 insertions(+), 2 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOTDIR** if /proc/self/ns does not exists.
> + *
> + *		**-ENOMEM**  if allocation fails.
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2875,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 3bd0f4a0336a..1f97b571b581 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>   	test_cgroup_storage test_select_reuseport test_section_names \
>   	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>   	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -	test_sockopt_multi test_tcp_rtt
> +	test_sockopt_multi test_tcp_rtt test_pidns
>   
>   BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>   TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 8b503ea142f0..3fae3b9fcd2c 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
>   static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
>   					  int ip_len, void *tcp, int tcp_len) =
>   	(void *) BPF_FUNC_tcp_gen_syncookie;
> +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
> +					 unsigned int buf_size) =
> +	(void *) BPF_FUNC_get_current_pidns_info;
>   
>   /* llvm builtin functions that eBPF C program may use to
>    * emit BPF_LD_ABS and BPF_LD_IND instructions
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> new file mode 100644
> index 000000000000..e1d2facfa762
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") nsidmap = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};
> +
> +struct bpf_map_def SEC("maps") pidmap = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};

Could you use new map definitions. Search
"SEC(".maps")" for examples.

> +
> +SEC("tracepoint/syscalls/sys_enter_nanosleep")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid, *val;
> +	char fmt[] = "ERROR nspid:%d\n";
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid)
> +		return 0;
> 

I would like you to compare device major/minor, namespace id,
pid and tid. We should test everything here.

+
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo.nsid;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
> new file mode 100644
> index 000000000000..a7254055f294
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "syscalls/sys_enter_nanosleep";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 knsid = 0;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;

Please use libbpf perf APIs.

It would be good if the test actually create a namespace and do the test.

Do you think it is possible to use the existing test_progs 
infrastructure. The current one without creating pid namespace
surely fit in. Not sure if we add creating/deleting namespace,
I would think it should fit in as well.

> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
> +		  "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}
> 

^ permalink raw reply

* Re: [PATCH net-next] net: hns3: Make hclge_func_reset_sync_vf static
From: Jakub Kicinski @ 2019-08-13 23:20 UTC (permalink / raw)
  To: YueHaibing
  Cc: yisen.zhuang, salil.mehta, davem, lipeng321, tanhuazhong,
	linux-kernel, netdev
In-Reply-To: <20190812144156.70020-1-yuehaibing@huawei.com>

On Mon, 12 Aug 2019 22:41:56 +0800, YueHaibing wrote:
> Fix sparse warning:
> 
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:3190:5:
>  warning: symbol 'hclge_func_reset_sync_vf' was not declared. Should it be static?
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied, thanks.

^ permalink raw reply

* [PATCH bpf-next] libbpf: make libbpf.map source of truth for libbpf version
From: Andrii Nakryiko @ 2019-08-13 23:24 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Andrey Ignatov

Currently libbpf version is specified in 2 places: libbpf.map and
Makefile. They easily get out of sync and it's very easy to update one,
but forget to update another one. In addition, Github projection of
libbpf has to maintain its own version which has to be remembered to be
kept in sync manually, which is very error-prone approach.

This patch makes libbpf.map a source of truth for libbpf version and
uses shell invocation to parse out correct full and major libbpf version
to use during build. Now we need to make sure that once new release
cycle starts, we need to add (initially) empty section to libbpf.map
with correct latest version.

This also will make it possible to keep Github projection consistent
with kernel sources version of libbpf by adopting similar parsing of
version from libbpf.map.

Cc: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/Makefile   | 12 +++++-------
 tools/lib/bpf/libbpf.map |  3 +++
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 9312066a1ae3..d9afc8509725 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -1,9 +1,10 @@
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 # Most of this file is copied from tools/lib/traceevent/Makefile
 
-BPF_VERSION = 0
-BPF_PATCHLEVEL = 0
-BPF_EXTRAVERSION = 4
+BPF_FULL_VERSION = $(shell \
+	grep -E 'LIBBPF_([0-9]+)\.([0-9]+)\.([0-9]+) \{' libbpf.map | \
+	tail -n1 | cut -d'_' -f2 | cut -d' ' -f1)
+BPF_VERSION = $(firstword $(subst ., ,$(BPF_FULL_VERSION)))
 
 MAKEFLAGS += --no-print-directory
 
@@ -79,15 +80,12 @@ export prefix libdir src obj
 libdir_SQ = $(subst ','\'',$(libdir))
 libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
 
+LIBBPF_VERSION	= $(BPF_FULL_VERSION)
 VERSION		= $(BPF_VERSION)
-PATCHLEVEL	= $(BPF_PATCHLEVEL)
-EXTRAVERSION	= $(BPF_EXTRAVERSION)
 
 OBJ		= $@
 N		=
 
-LIBBPF_VERSION	= $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
-
 LIB_TARGET	= libbpf.a libbpf.so.$(LIBBPF_VERSION)
 LIB_FILE	= libbpf.a libbpf.so*
 PC_FILE		= libbpf.pc
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index f9d316e873d8..4e72df8e98ba 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -184,3 +184,6 @@ LIBBPF_0.0.4 {
 		perf_buffer__new_raw;
 		perf_buffer__poll;
 } LIBBPF_0.0.3;
+
+LIBBPF_0.0.5 {
+} LIBBPF_0.0.4;
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-13 23:24 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Alexei Starovoitov, Andy Lutomirski, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <CAKOZuev8XY5+shG8SiWcx4z12QnkgzhcUqCHs9t+eV2z-6nzPA@mail.gmail.com>

On Tue, Aug 13, 2019 at 3:27 PM Daniel Colascione <dancol@google.com> wrote:
>
> On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > > >
> > > > Inside containers and inside nested containers we need to start processes
> > > > that will use bpf. All of the processes are trusted.
> > >
> > > Trusted by whom?  In a non-nested container, the container manager
> > > *might* be trusted by the outside world.  In a *nested* container,
> > > unless the inner container management is controlled from outside the
> > > outer container, it's not trusted.  I don't know much about how
> > > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > > very strongly toward user namespaces and maximally-untrusted
> > > containers, and I think bpf() should work in that context.
> >
> > agree that containers (namespaces) reduce amount of trust necessary
> > for apps to run, but the end goal is not security though.
> > Linux has become a single user system.
> > If user can ssh into the host they can become root.
> > If arbitrary code can run on the host it will be break out of any sandbox.
> > Containers are not providing the level of security that is enough
> > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> > Containers are used to make production systems safer.
> > Some people call it more 'secure', but it's clearly not secure for
> > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> > It's been a constant source of pain. The constant blinding, randomization,
> > verifier speculative analysis, all spectre v1, v2, v4 mitigations
> > are simply not worth it. It's a lot of complex kernel code without users.
> > There is not a single use case to allow arbitrary malicious bpf
> > program to be loaded and executed.
> > As soon as we have /dev/bpf to allow all of bpf to be used without root
> > we will set sysctl kernel.unprivileged_bpf_disabled=1
> > Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> > The applications that will use it are going to be just as trusted as systemd.
> >
> > > > To solve your concern of bypassing all capable checks...
> > > > How about we do /dev/bpf/full_verifier first?
> > > > It will replace capable() checks in the verifier only.
> > >
> > > I'm not convinced that "in the verifier" is the right distinction.
> > > Telling administrators that some setting lets certain users bypass
> > > bpf() verifier checks doesn't have a clear enough meaning.
> >
> > linux is a single user system. there are no administrators any more.
> > No doubt, folks will disagree, but that game is over.
> > At least on bpf side it's done.
> >
> > > I propose,
> > > instead, that the current capable() checks be divided into three
> > > categories:
> >
> > I don't see a use case for these categories.
> > All bpf programs extend the kernel in some way.
> > The kernel vs user is one category.
> > Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> > When application has CAP_NET_ADMIN it covers all of networking knobs.
> > There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> > CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> > Similarly CAP_BPF as the only knob is enough.
> > The only disadvantage of CAP_BPF is that it's not possible to
> > pass it from one systemd-like daemon to another systemd-like daemon.
> > Hence /dev/bpf idea and passing file descriptor.
> >
> > > This type of thing actually fits quite nicely into an idea I've been
> > > thinking about for a while called "implicit rights". In very brief
> > > summary, there would be objects called /dev/rights/xyz, where xyz is
> > > the same of a "right".  If there is a readable object of the right
> > > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > > There's a bit more flexibility on top of this.  BPF could use
> > > /dev/rights/bpf/maptypes/lpm and
> > > /dev/rights/bpf/verifier/bounded_loops, for example.  Other non-BPF
> > > use cases include a biggie:
> > > /dev/rights/namespace/create_unprivileged_userns.
> > > /dev/rights/bind_port/80 would be nice, too.
> >
> > The concept of "implicit rights" is very nice and I'm sure it will
> > be a good fit somewhere, but I don't see why use it in bpf space.
> > There is no use case for fine grain partition of bpf features.
>
> Isn't this "implicit rights" model just another kind of ambient
> authority --- one that constrains the otherwise-free filesystem
> namespace to boot?

Yes.

> IMHO, the kernel should be moving toward explicit
> authorization tokens modeled by file descriptors and away from
> contextual authorization decisions.

And yes, I agree there too. Here's how I think about it: there are
really two layers here:

Rights: these are objects like /dev/rights/bpf/some_bpf_privilege or
/dev/rights/namespace/unpriv_userns, and you would, ideally, use them
like genuine capabilities.  You'd pass an fd with appropriate access
(FMODE_READ, presumably, since exec is awkward to work with for fds)
into bpf() or similar, and the kernel would say "yep, caller has the
capability" and do something.  There's nothing really restricting them
to /dev/rights, but they more or less have to live on a memory-backed
file system (a real backing store has all kinds of issues), and
putting them in /dev gets a lot of nifty properties for free.  For
example, existing container systems that don't know about them will
automatically deny them to containers, since nothing with an ounce of
sense passes all of /dev through to a container.  But container
systems that are aware of them can bind-mount them into the container.
And /dev is already known to be magical due to things like
/dev/urandom.

The implicit part on top is less than ideal, but it solves two problems:

1. It keeps compatibility with existing code.  There are programs that
expect unshare(CLONE_NEWUSER) to work -- with *implicit* rights, it
will work exactly when it's supposed to.  Also, for cases like
CLONE_NEWUSER, it does have more or less the right semantics -- if
they were explicit, most programs would just try to open
/dev/rights/namespace/unpriv_userns and pass the fd to unshare2, so
we're not losing much.

2. For things like eBPF where the set of rights could be a moving
target, implicit rights lets the model evolve without breaking
userspace.  So if LPM maps eventually become bulletproof and a right
is no longer needed, it still works.  Or if some feature in the
verifier that is currently unrestricted were subsequently deemed to
need restrictions, they could be added without retrofitting all the
users.

There are cases where implicit rights would be totally inappropriate.
For example, a CAP_DAC_READ_SEARCH right could not be safely made
implicit.  In general, I think the implicit model works for system
calls where it's unambiguous what the caller wants to have happen and
there, depending on privilege level, it either works or it doesn't.
So, for accessing a filesystem, it's not at all obvious whether a
program is accessing it on its own behalf or on a client's behalf, and
privilege usage should be explicit.  For something like "don't
Spectre-mitigate this eBPF program", the semantics change and the
request should IMO be explicit.  For for "create an LPM map", I don't
see how a confused deputy is likely, and an implicit right seems
reasonable.  Similarly, for creating a namespace or binding a network
port, confused deputies seem unlikely.  (For connecting to a network
address, if such a thing were ever restricted, confused deputies are
definitely possible and happen all the time, e.g. under a DNS
rebinding attack.)

^ permalink raw reply

* Re: [PATCH] net: ieee802154: remove redundant assignment to rc
From: Stefan Schmidt @ 2019-08-13 23:28 UTC (permalink / raw)
  To: Colin King, Alexander Aring, David S . Miller, linux-wpan, netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20190813142818.15022-1-colin.king@canonical.com>

Hello.

On 13.08.19 16:28, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Variable rc is initialized to a value that is never read and it is
> re-assigned later. The initialization is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  net/ieee802154/socket.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
> index dacbd58e1799..badc5cfe4dc6 100644
> --- a/net/ieee802154/socket.c
> +++ b/net/ieee802154/socket.c
> @@ -1092,7 +1092,7 @@ static struct packet_type ieee802154_packet_type = {
>  
>  static int __init af_ieee802154_init(void)
>  {
> -	int rc = -EINVAL;
> +	int rc;
>  
>  	rc = proto_register(&ieee802154_raw_prot, 1);
>  	if (rc)
> 

This patch has been applied to the wpan tree and will be
part of the next pull request to net. Thanks!

regards
Stefan Schmidt

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] libbpf: add asm/unistd.h to xsk to get __NR_mmap2
From: Andrii Nakryiko @ 2019-08-13 23:38 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: Magnus Karlsson, Björn Töpel, David S. Miller,
	Jesper Dangaard Brouer, john fastabend, Jakub Kicinski,
	Daniel Borkmann, Networking, bpf, xdp-newbies, open list
In-Reply-To: <20190813102318.5521-2-ivan.khoronzhuk@linaro.org>

On Tue, Aug 13, 2019 at 3:24 AM Ivan Khoronzhuk
<ivan.khoronzhuk@linaro.org> wrote:
>
> That's needed to get __NR_mmap2 when mmap2 syscall is used.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>  tools/lib/bpf/xsk.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..f2fc40f9804c 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -12,6 +12,7 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <unistd.h>
> +#include <asm/unistd.h>

asm/unistd.h is not present in Github libbpf projection. Is there any
way to avoid including this header? Generally, libbpf can't easily use
all of kernel headers, we need to re-implemented all the extra used
stuff for Github version of libbpf, so we try to minimize usage of new
headers that are not just plain uapi headers from include/uapi.

>  #include <arpa/inet.h>
>  #include <asm/barrier.h>
>  #include <linux/compiler.h>
> --
> 2.17.1
>

^ permalink raw reply

* Re: [Potential Spoof] Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Yonghong Song @ 2019-08-13 23:51 UTC (permalink / raw)
  To: Carlos Neira, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <13b7f81f-83b6-07c9-4864-b49749cbf7d9@fb.com>



On 8/13/19 4:11 PM, Yonghong Song wrote:
> 
> 
> On 8/13/19 11:47 AM, Carlos Neira wrote:
>> From: Carlos <cneirabustos@gmail.com>
>>
>> New bpf helper bpf_get_current_pidns_info.
>> This helper obtains the active namespace from current and returns
>> pid, tgid, device and namespace id as seen from that namespace,
>> allowing to instrument a process inside a container.
>>
>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>> ---
>>    fs/internal.h            |  2 --
>>    fs/namei.c               |  1 -
>>    include/linux/bpf.h      |  1 +
>>    include/linux/namei.h    |  4 +++
>>    include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>>    kernel/bpf/core.c        |  1 +
>>    kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>>    kernel/trace/bpf_trace.c |  2 ++
>>    8 files changed, 102 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/internal.h b/fs/internal.h
>> index 315fcd8d237c..6647e15dd419 100644
>> --- a/fs/internal.h
>> +++ b/fs/internal.h
>> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>>    /*
>>     * namei.c
>>     */
>> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>> -			   struct path *path, struct path *root);
>>    extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>>    extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>>    			   const char *, unsigned int, struct path *);
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 209c51a5226c..a89fc72a4a10 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -19,7 +19,6 @@
>>    #include <linux/export.h>
>>    #include <linux/kernel.h>
>>    #include <linux/slab.h>
>> -#include <linux/fs.h>
>>    #include <linux/namei.h>
>>    #include <linux/pagemap.h>
>>    #include <linux/fsnotify.h>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index f9a506147c8a..e4adf5e05afd 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>>    extern const struct bpf_func_proto bpf_strtol_proto;
>>    extern const struct bpf_func_proto bpf_strtoul_proto;
>>    extern const struct bpf_func_proto bpf_tcp_sock_proto;
>> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>>    
>>    /* Shared helpers among cBPF and eBPF. */
>>    void bpf_user_rnd_init_once(void);
>> diff --git a/include/linux/namei.h b/include/linux/namei.h
>> index 9138b4471dbf..b45c8b6f7cb4 100644
>> --- a/include/linux/namei.h
>> +++ b/include/linux/namei.h
>> @@ -6,6 +6,7 @@
>>    #include <linux/path.h>
>>    #include <linux/fcntl.h>
>>    #include <linux/errno.h>
>> +#include <linux/fs.h>
>>    
>>    enum { MAX_NESTED_LINKS = 8 };
>>    
>> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>>    
>>    extern void nd_jump_link(struct path *path);
>>    
>> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>> +			   struct path *path, struct path *root);
>> +
>>    static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>>    {
>>    	((char *) name)[min(len, maxlen)] = '\0';
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 4393bd4b2419..db241857ec15 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2741,6 +2741,28 @@ union bpf_attr {
>>     *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>>     *
>>     *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
>> + *
>> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
>> + *	Description
>> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
>> + *		current namespace and also device from /proc/self/ns/pid.
>> + *		*size_of_pidns* must be the size of *pidns*
>> + *
>> + *		This helper is used when pid filtering is needed inside a
>> + *		container as bpf_get_current_tgid() helper returns always the
>> + *		pid id as seen by the root namespace.
>> + *	Return
>> + *		0 on success
>> + *
>> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
>> + *		or tgid of the current task.
>> + *
>> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
>> + *
>> + *		**-ENOTDIR** if /proc/self/ns does not exists.
>> + *
>> + *		**-ENOMEM**  if allocation fails.
>> + *
>>     */
>>    #define __BPF_FUNC_MAPPER(FN)		\
>>    	FN(unspec),			\
>> @@ -2853,7 +2875,8 @@ union bpf_attr {
>>    	FN(sk_storage_get),		\
>>    	FN(sk_storage_delete),		\
>>    	FN(send_signal),		\
>> -	FN(tcp_gen_syncookie),
>> +	FN(tcp_gen_syncookie),		\
>> +	FN(get_current_pidns_info),
>>    
>>    /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>     * function eBPF program intends to call
>> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>>    	__s32	retval;
>>    };
>>    
>> +struct bpf_pidns_info {
>> +	__u32 dev;
>> +	__u32 nsid;
>> +	__u32 tgid;
>> +	__u32 pid;
>> +};
>>    #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 8191a7db2777..3159f2a0188c 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>>    const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>>    const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>>    const struct bpf_func_proto bpf_get_local_storage_proto __weak;
>> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>>    
>>    const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>>    {
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 5e28718928ca..41fbf1f28a48 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -11,6 +11,12 @@
>>    #include <linux/uidgid.h>
>>    #include <linux/filter.h>
>>    #include <linux/ctype.h>
>> +#include <linux/pid_namespace.h>
>> +#include <linux/major.h>
>> +#include <linux/stat.h>
>> +#include <linux/namei.h>
>> +#include <linux/version.h>
>> +
>>    
>>    #include "../../lib/kstrtox.h"
>>    
>> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>>    	preempt_enable();
>>    }
>>    
>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
>> +	 size)
>> +{
>> +	const char *pidns_path = "/proc/self/ns/pid";
>> +	struct pid_namespace *pidns = NULL;
>> +	struct filename *tmp = NULL;
>> +	struct inode *inode;
>> +	struct path kp;
>> +	pid_t tgid = 0;
>> +	pid_t pid = 0;
>> +	int ret;
>> +	int len;
> 
> I am running your sample program and get the following kernel bug:
> 
> ...
> [   26.414825] BUG: sleeping function called from invalid context at
> /data/users/yhs/work/net-next/fs
> /dcache.c:843
> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W
> 5.3.0-rc1+ #280
> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.9.3-1.el7.centos 04/01/2
> 014
> [   26.419393] Call Trace:
> [   26.419697]  <IRQ>
> [   26.419960]  dump_stack+0x46/0x5b
> [   26.420434]  ___might_sleep+0xe4/0x110
> [   26.420894]  dput+0x2a/0x200
> [   26.421265]  walk_component+0x10c/0x280
> [   26.421773]  link_path_walk+0x327/0x560
> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> [   26.422848]  ? path_init+0x232/0x330
> [   26.423364]  path_lookupat+0x88/0x200
> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> [   26.424521]  filename_lookup+0xaf/0x190
> [   26.425031]  ? simple_attr_release+0x20/0x20
> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> [   26.426779]  trace_call_bpf+0xb5/0x160
> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.428496]  kprobe_perf_func+0x4d/0x280
> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> [   26.433060]  0xffffffffc03180bf
> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> ...
> 
> To prevent we are running in arbitrary task (e.g., idle task)
> context which may introduce sleeping issues, the following
> probably appropriate:
> 
>          if (in_nmi() || in_softirq())
>                  return -EPERM;

A better condition is (from helper bpf_probe_write_user()):
         if (unlikely(in_interrupt() ||
                      current->flags & (PF_KTHREAD | PF_EXITING)))
                 return -EPERM;

> 
> Anyway, if in nmi or softirq, the namespace and pid/tgid
> we get may be just accidentally associated with the bpf running
> context, but it could be in a different context. So such info
> is not reliable any way.
> 
>> +
>> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
>> +		return -EINVAL;
>> +	pidns = task_active_pid_ns(current);
>> +	if (unlikely(!pidns))
>> +		goto clear;
>> +	pidns_info->nsid =  pidns->ns.inum;
>> +	pid = task_pid_nr_ns(current, pidns);
>> +	if (unlikely(!pid))
>> +		goto clear;
>> +	tgid = task_tgid_nr_ns(current, pidns);
>> +	if (unlikely(!tgid))
>> +		goto clear;
>> +	pidns_info->tgid = (u32) tgid;
>> +	pidns_info->pid = (u32) pid;
>> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
>> +	if (unlikely(!tmp)) {
>> +		memset((void *)pidns_info, 0, (size_t) size);
>> +		return -ENOMEM;
>> +	}
>> +	len = strlen(pidns_path) + 1;
>> +	memcpy((char *)tmp->name, pidns_path, len);
>> +	tmp->uptr = NULL;
>> +	tmp->aname = NULL;
>> +	tmp->refcnt = 1;
>> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
>> +	if (ret) {
>> +		memset((void *)pidns_info, 0, (size_t) size);
>> +		return ret;
>> +	}
>> +	inode = d_backing_inode(kp.dentry);
>> +	pidns_info->dev = inode->i_sb->s_dev;
>> +	return 0;
>> +clear:
>> +	memset((void *)pidns_info, 0, (size_t) size);
>> +	return -EINVAL;
>> +}
>> +
>> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
>> +	.func		= bpf_get_current_pidns_info,
>> +	.gpl_only	= false,
>> +	.ret_type	= RET_INTEGER,
>> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
>> +	.arg2_type	= ARG_CONST_SIZE,
>> +};
>> +
>>    #ifdef CONFIG_CGROUPS
>>    BPF_CALL_0(bpf_get_current_cgroup_id)
>>    {
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ca1255d14576..5e1dc22765a5 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>    #endif
>>    	case BPF_FUNC_send_signal:
>>    		return &bpf_send_signal_proto;
>> +	case BPF_FUNC_get_current_pidns_info:
>> +		return &bpf_get_current_pidns_info_proto;
>>    	default:
>>    		return NULL;
>>    	}
>>

^ permalink raw reply

* Re: [PATCH net-next] mcast: ensure L-L IPv6 packets are accepted by bridge
From: Nikolay Aleksandrov @ 2019-08-13 23:55 UTC (permalink / raw)
  To: Ido Schimmel, Patrick Ruddy; +Cc: netdev, roopa, linus.luessing
In-Reply-To: <20190813195341.GA27005@splinter>

On 8/13/19 10:53 PM, Ido Schimmel wrote:
> + Bridge maintainers, Linus
> 

Good catch Ido, thanks!
First I'd say the subject needs to reflect that this is a bridge change
better, please rearrange it like so - bridge: mcast: ...
More below,

> On Tue, Aug 13, 2019 at 03:18:04PM +0100, Patrick Ruddy wrote:
>> At present only all-nodes IPv6 multicast packets are accepted by
>> a bridge interface that is not in multicast router mode. Since
>> other protocols can be running in the absense of multicast
>> forwarding e.g. OSPFv3 IPv6 ND. Change the test to allow
>> all of the FFx2::/16 range to be accepted when not in multicast
>> router mode. This aligns the code with IPv4 link-local reception
>> and RFC4291
> 
> Can you please quote the relevant part from RFC 4291?
>
>>
>> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
>> ---
>>  include/net/addrconf.h    | 15 +++++++++++++++
>>  net/bridge/br_multicast.c |  2 +-
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
>> index becdad576859..05b42867e969 100644
>> --- a/include/net/addrconf.h
>> +++ b/include/net/addrconf.h
>> @@ -434,6 +434,21 @@ static inline void addrconf_addr_solict_mult(const struct in6_addr *addr,
>>  		      htonl(0xFF000000) | addr->s6_addr32[3]);
>>  }
>>  
>> +/*
>> + *      link local multicast address range ffx2::/16 rfc4291
>> + */
>> +static inline bool ipv6_addr_is_ll_mcast(const struct in6_addr *addr)
>> +{
>> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
>> +	__be64 *p = (__be64 *)addr;
>> +	return ((p[0] & cpu_to_be64(0xff0f000000000000UL))
>> +		^ cpu_to_be64(0xff02000000000000UL)) == 0UL;
>> +#else
>> +	return ((addr->s6_addr32[0] & htonl(0xff0f0000)) ^
>> +		htonl(0xff020000)) == 0;
>> +#endif
>> +}
>> +
>>  static inline bool ipv6_addr_is_ll_all_nodes(const struct in6_addr *addr)
>>  {
>>  #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
>> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
>> index 9b379e110129..ed3957381fa2 100644
>> --- a/net/bridge/br_multicast.c
>> +++ b/net/bridge/br_multicast.c
>> @@ -1664,7 +1664,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
>>  	err = ipv6_mc_check_mld(skb);
>>  
>>  	if (err == -ENOMSG) {
>> -		if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr))
>> +		if (!ipv6_addr_is_ll_mcast(&ipv6_hdr(skb)->daddr))
>>  			BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
> 
> IIUC, you want IPv6 link-local packets to be locally received, but this
> also changes how these packets are flooded. RFC 4541 says that packets

Indeed, we'll start flooding them all, not just the all hosts address.
If that is at all required it'll definitely have to be optional.

> addressed to the all hosts address are a special case and should be
> forwarded to all ports:
> 
> "In IPv6, the data forwarding rules are more straight forward because MLD is
> mandated for addresses with scope 2 (link-scope) or greater. The only exception
> is the address FF02::1 which is the all hosts link-scope address for which MLD
> messages are never sent. Packets with the all hosts link-scope address should
> be forwarded on all ports."
>

I wonder what is the problem for the host to join such group on behalf of the bridge ?
Then you'll receive the traffic at least locally and the RFC says it itself - MLD is mandated
for the other link-local addresses.
It's very late here and maybe I'm missing something.. :)
 
> Maybe you want something like:
> 

I think we can do without the new field, either pass local_rcv into br_multicast_rcv() or
set it based on return value. The extra test will have to remain unfortunately, but we
can reduce the tests by one if carefully done.

> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 09b1dd8cd853..9f312a73f61c 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -132,7 +132,8 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>  		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
>  		    br_multicast_querier_exists(br, eth_hdr(skb))) {
>  			if ((mdst && mdst->host_joined) ||
> -			    br_multicast_is_router(br)) {
> +			    br_multicast_is_router(br) ||
> +			    BR_INPUT_SKB_CB_LOCAL_RECEIVE(skb)) {
>  				local_rcv = true;
>  				br->dev->stats.multicast++;
>  			}
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 9b379e110129..f03cecf6174e 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1667,6 +1667,9 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
>  		if (!ipv6_addr_is_ll_all_nodes(&ipv6_hdr(skb)->daddr))
>  			BR_INPUT_SKB_CB(skb)->mrouters_only = 1;
>  
> +		if (ipv6_addr_is_ll_mcast(&ipv6_hdr(skb)->daddr))
> +			BR_INPUT_SKB_CB(skb)->local_receive = 1;
> +
>  		if (ipv6_addr_is_all_snoopers(&ipv6_hdr(skb)->daddr)) {
>  			err = br_ip6_multicast_mrd_rcv(br, port, skb);
>  
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index b7a4942ff1b3..d76394ca4059 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -426,6 +426,7 @@ struct br_input_skb_cb {
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  	u8 igmp;
>  	u8 mrouters_only:1;
> +	u8 local_receive:1;
>  #endif
>  	u8 proxyarp_replied:1;
>  	u8 src_port_isolated:1;
> @@ -445,8 +446,10 @@ struct br_input_skb_cb {
>  
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(BR_INPUT_SKB_CB(__skb)->mrouters_only)
> +# define BR_INPUT_SKB_CB_LOCAL_RECEIVE(__skb)	(BR_INPUT_SKB_CB(__skb)->local_receive)
>  #else
>  # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb)	(0)
> +# define BR_INPUT_SKB_CB_LOCAL_RECEIVE(__skb)	(0)
>  #endif
>  
>  #define br_printk(level, br, format, args...)	\
> 


^ permalink raw reply

* Re: pull-request: bpf-next 2019-08-14
From: Jakub Kicinski @ 2019-08-13 23:59 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, ast, andrii.nakryiko, netdev, bpf
In-Reply-To: <20190813231639.29891-1-daniel@iogearbox.net>

On Wed, 14 Aug 2019 01:16:39 +0200, Daniel Borkmann wrote:
> Hi David, hi Jakub,
> 
> The following pull-request contains BPF updates for your *net-next* tree.

Pulled, let me know if I did it wrong 🤞

^ permalink raw reply

* Re: [PATCH net-next] net: devlink: remove redundant rtnl lock assert
From: Jakub Kicinski @ 2019-08-14  0:01 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Jiri Pirko
In-Reply-To: <20190812170202.32314-1-vladbu@mellanox.com>

On Mon, 12 Aug 2019 20:02:02 +0300, Vlad Buslov wrote:
> It is enough for caller of devlink_compat_switch_id_get() to hold the net
> device to guarantee that devlink port is not destroyed concurrently. Remove
> rtnl lock assertion and modify comment to warn user that they must hold
> either rtnl lock or reference to net device. This is necessary to
> accommodate future implementation of rtnl-unlocked TC offloads driver
> callbacks.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

Looks good, applied.

^ permalink raw reply

* Re: [PATCH] sctp: fix memleak in sctp_send_reset_streams
From: Neil Horman @ 2019-08-14  0:10 UTC (permalink / raw)
  To: zhengbin; +Cc: vyasevich, marcelo.leitner, davem, linux-sctp, netdev, yi.zhang
In-Reply-To: <1565705150-17242-1-git-send-email-zhengbin13@huawei.com>

On Tue, Aug 13, 2019 at 10:05:50PM +0800, zhengbin wrote:
> If the stream outq is not empty, need to kfree nstr_list.
> 
> Fixes: d570a59c5b5f ("sctp: only allow the out stream reset when the stream outq is empty")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: zhengbin <zhengbin13@huawei.com>
> ---
>  net/sctp/stream.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/sctp/stream.c b/net/sctp/stream.c
> index 2594660..e83cdaa 100644
> --- a/net/sctp/stream.c
> +++ b/net/sctp/stream.c
> @@ -316,6 +316,7 @@ int sctp_send_reset_streams(struct sctp_association *asoc,
>  		nstr_list[i] = htons(str_list[i]);
> 
>  	if (out && !sctp_stream_outq_is_empty(stream, str_nums, nstr_list)) {
> +		kfree(nstr_list);
>  		retval = -EAGAIN;
>  		goto out;
>  	}
> --
> 2.7.4
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply

* Re: [PATCH net-next v2 0/3] net: phy: let phy_speed_down/up support speeds >1Gbps
From: Jakub Kicinski @ 2019-08-14  0:21 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Andrew Lunn, Florian Fainelli, David Miller,
	netdev@vger.kernel.org
In-Reply-To: <dca82a0e-e936-b60a-3a1c-9fdb1714d1d3@gmail.com>

On Mon, 12 Aug 2019 23:47:45 +0200, Heiner Kallweit wrote:
> So far phy_speed_down/up can be used up to 1Gbps only. Remove this
> restriction and add needed helpers to phy-core.c
> 
> v2:
> - remove unused parameter in patch 1
> - rename __phy_speed_down to phy_speed_down_core in patch 2

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Terry Duncan @ 2019-08-14  0:22 UTC (permalink / raw)
  To: Tao Ren
  Cc: Andrew Lunn, Jakub Kicinski, netdev@vger.kernel.org,
	openbmc@lists.ozlabs.org, Ben Wei, linux-kernel@vger.kernel.org,
	Samuel Mendoza-Jonas, David S.Miller, William Kennington
In-Reply-To: <33e3e783-fb93-e628-8baa-a8374540ea25@linux.intel.com>

On 8/13/19 1:54 PM, Terry Duncan wrote:
> 
> On 8/13/19 11:28 AM, Tao Ren wrote:
>> On 8/13/19 9:31 AM, Terry Duncan wrote:
>>> Tao, in your new patch will it be possible to disable the setting of 
>>> the BMC MAC?  I would like to be able to send NCSI_OEM_GET_MAC 
>>> perhaps with netlink (TBD) to get the system address without it 
>>> affecting the BMC address.
>>>
>>> I was about to send patches to add support for the Intel adapters 
>>> when I saw this thread.
>>>
>>> Thanks,
>>>
>>> Terry
>>
>> Hi Terry,
>>
>> Sounds like you are planning to configure BMC MAC address from user 
>> space via netlink? Ben Wei <benwei@fb.com> started a thread 
>> "Out-of-band NIC management" in openbmc community for NCSI management 
>> using netlink, and you may follow up with him for details.
>>
>> I haven't decided what to do in my v2 patch: maybe using device tree, 
>> maybe moving the logic to uboot, and I'm also evaluating the netlink 
>> option. But it shouldn't impact your patch, because you can disable 
>> NCSI_OEM_GET_MAC option from your config file.
> 
> Thanks Tao. I see now that disabling the NCSI_OEM_GET_MAC option will do 
> what I want.
> 
> Best,
> Terry
Hi Tao,

After a second look, it appears that the OEM handlers for Broadcom and 
Melanox in ncsi-rsp.c will set the MAC regardless of the origin of the 
request. Even with NCSI_OEM_GET_MAC disabled, sending an OEM command 
with netlink would result in setting the BMC MAC.

Thanks,
Terry

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: make libbpf.map source of truth for libbpf version
From: Andrey Ignatov @ 2019-08-14  0:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190813232408.1246694-1-andriin@fb.com>

Andrii Nakryiko <andriin@fb.com> [Tue, 2019-08-13 16:24 -0700]:
> Currently libbpf version is specified in 2 places: libbpf.map and
> Makefile. They easily get out of sync and it's very easy to update one,
> but forget to update another one. In addition, Github projection of
> libbpf has to maintain its own version which has to be remembered to be
> kept in sync manually, which is very error-prone approach.
> 
> This patch makes libbpf.map a source of truth for libbpf version and
> uses shell invocation to parse out correct full and major libbpf version
> to use during build. Now we need to make sure that once new release
> cycle starts, we need to add (initially) empty section to libbpf.map
> with correct latest version.
> 
> This also will make it possible to keep Github projection consistent
> with kernel sources version of libbpf by adopting similar parsing of
> version from libbpf.map.

Thanks for taking care of this!


> Cc: Andrey Ignatov <rdna@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/lib/bpf/Makefile   | 12 +++++-------
>  tools/lib/bpf/libbpf.map |  3 +++
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index 9312066a1ae3..d9afc8509725 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -1,9 +1,10 @@
>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>  # Most of this file is copied from tools/lib/traceevent/Makefile
>  
> -BPF_VERSION = 0
> -BPF_PATCHLEVEL = 0
> -BPF_EXTRAVERSION = 4
> +BPF_FULL_VERSION = $(shell \

Nit: Should it be LIBBPF_VERSION? IMO it's more descriptive name.

> +	grep -E 'LIBBPF_([0-9]+)\.([0-9]+)\.([0-9]+) \{' libbpf.map | \
> +	tail -n1 | cut -d'_' -f2 | cut -d' ' -f1)

It can be done simpler and IMO versions should be sorted before taking
the last one (just in case), something like:

grep -oE '^LIBBPF_[0-9.]+' libbpf.map | cut -d_ -f 2 | sort -nr | head -n 1


> +BPF_VERSION = $(firstword $(subst ., ,$(BPF_FULL_VERSION)))
>  
>  MAKEFLAGS += --no-print-directory
>  
> @@ -79,15 +80,12 @@ export prefix libdir src obj
>  libdir_SQ = $(subst ','\'',$(libdir))
>  libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
>  
> +LIBBPF_VERSION	= $(BPF_FULL_VERSION)
>  VERSION		= $(BPF_VERSION)
> -PATCHLEVEL	= $(BPF_PATCHLEVEL)
> -EXTRAVERSION	= $(BPF_EXTRAVERSION)
>  
>  OBJ		= $@
>  N		=
>  
> -LIBBPF_VERSION	= $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
> -
>  LIB_TARGET	= libbpf.a libbpf.so.$(LIBBPF_VERSION)
>  LIB_FILE	= libbpf.a libbpf.so*
>  PC_FILE		= libbpf.pc
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index f9d316e873d8..4e72df8e98ba 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -184,3 +184,6 @@ LIBBPF_0.0.4 {
>  		perf_buffer__new_raw;
>  		perf_buffer__poll;
>  } LIBBPF_0.0.3;
> +
> +LIBBPF_0.0.5 {
> +} LIBBPF_0.0.4;

I'm not sure version should be bumped in this patch since this patch is
about keeping the version in one place, not about bumping it, right?


> -- 
> 2.17.1
> 

-- 
Andrey Ignatov

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] libbpf: add asm/unistd.h to xsk to get __NR_mmap2
From: Yonghong Song @ 2019-08-14  0:32 UTC (permalink / raw)
  To: Ivan Khoronzhuk, magnus.karlsson@intel.com, bjorn.topel@intel.com
  Cc: davem@davemloft.net, hawk@kernel.org, john.fastabend@gmail.com,
	jakub.kicinski@netronome.com, daniel@iogearbox.net,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	xdp-newbies@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190813102318.5521-2-ivan.khoronzhuk@linaro.org>



On 8/13/19 3:23 AM, Ivan Khoronzhuk wrote:
> That's needed to get __NR_mmap2 when mmap2 syscall is used.

It seems I did not have this issue on x64 machine e.g., Fedora 29.
My glibc version is 2.28. gcc 8.2.1.

What is your particular system glibc version?
So needing kernel asm/unistd.h is because of older glibc on your
system, or something else? Could you clarify?

> 
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---
>   tools/lib/bpf/xsk.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 5007b5d4fd2c..f2fc40f9804c 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -12,6 +12,7 @@
>   #include <stdlib.h>
>   #include <string.h>
>   #include <unistd.h>
> +#include <asm/unistd.h>
>   #include <arpa/inet.h>
>   #include <asm/barrier.h>
>   #include <linux/compiler.h>
> 

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: make libbpf.map source of truth for libbpf version
From: Jakub Kicinski @ 2019-08-14  0:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team,
	Andrey Ignatov
In-Reply-To: <20190813232408.1246694-1-andriin@fb.com>

On Tue, 13 Aug 2019 16:24:08 -0700, Andrii Nakryiko wrote:
> diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
> index 9312066a1ae3..d9afc8509725 100644
> --- a/tools/lib/bpf/Makefile
> +++ b/tools/lib/bpf/Makefile
> @@ -1,9 +1,10 @@
>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>  # Most of this file is copied from tools/lib/traceevent/Makefile
>  
> -BPF_VERSION = 0
> -BPF_PATCHLEVEL = 0
> -BPF_EXTRAVERSION = 4
> +BPF_FULL_VERSION = $(shell \
> +	grep -E 'LIBBPF_([0-9]+)\.([0-9]+)\.([0-9]+) \{' libbpf.map | \
> +	tail -n1 | cut -d'_' -f2 | cut -d' ' -f1)
> +BPF_VERSION = $(firstword $(subst ., ,$(BPF_FULL_VERSION)))
>  
>  MAKEFLAGS += --no-print-directory
>  
> @@ -79,15 +80,12 @@ export prefix libdir src obj
>  libdir_SQ = $(subst ','\'',$(libdir))
>  libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
>  
> +LIBBPF_VERSION	= $(BPF_FULL_VERSION)

Perhaps better use immediate set here ':='? 
I'm not sure how many times this gets evaluated, but it shouldn't
really change either..

>  VERSION		= $(BPF_VERSION)
> -PATCHLEVEL	= $(BPF_PATCHLEVEL)
> -EXTRAVERSION	= $(BPF_EXTRAVERSION)
>  
>  OBJ		= $@
>  N		=
>  
> -LIBBPF_VERSION	= $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
> -
>  LIB_TARGET	= libbpf.a libbpf.so.$(LIBBPF_VERSION)
>  LIB_FILE	= libbpf.a libbpf.so*
>  PC_FILE		= libbpf.pc

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: PHY LIBRARY: Remove sysfs-bus-mdio record
From: Florian Fainelli @ 2019-08-14  0:55 UTC (permalink / raw)
  To: Denis Efremov, linux-kernel
  Cc: joe, David S . Miller, Andrew Lunn, Heiner Kallweit, netdev
In-Reply-To: <20190813061439.17529-1-efremov@linux.com>

Le 8/12/19 à 11:14 PM, Denis Efremov a écrit :
> Update MAINTAINERS to reflect that sysfs-bus-mdio documentation
> was removed.
> 
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: netdev@vger.kernel.org
> Fixes: a6cd0d2d493a ("Documentation: net-sysfs: Remove duplicate PHY device documentation")
> Signed-off-by: Denis Efremov <efremov@linux.com>

Not sure if this really deserves a Fixes tag; but thanks for catching
that, a maybe more appropriate change would be to list
Documentation/ABI/testing/sysfs-class-net-phydev instead.
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Carlos Antonio Neira Bustos @ 2019-08-14  0:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: netdev@vger.kernel.org, ebiederm@xmission.com, brouer@redhat.com,
	bpf@vger.kernel.org
In-Reply-To: <13b7f81f-83b6-07c9-4864-b49749cbf7d9@fb.com>

On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> 
> 
> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > From: Carlos <cneirabustos@gmail.com>
> > 
> > New bpf helper bpf_get_current_pidns_info.
> > This helper obtains the active namespace from current and returns
> > pid, tgid, device and namespace id as seen from that namespace,
> > allowing to instrument a process inside a container.
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> >   fs/internal.h            |  2 --
> >   fs/namei.c               |  1 -
> >   include/linux/bpf.h      |  1 +
> >   include/linux/namei.h    |  4 +++
> >   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >   kernel/bpf/core.c        |  1 +
> >   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   kernel/trace/bpf_trace.c |  2 ++
> >   8 files changed, 102 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 315fcd8d237c..6647e15dd419 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> >   /*
> >    * namei.c
> >    */
> > -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > -			   struct path *path, struct path *root);
> >   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> >   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> >   			   const char *, unsigned int, struct path *);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..a89fc72a4a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> >   #include <linux/export.h>
> >   #include <linux/kernel.h>
> >   #include <linux/slab.h>
> > -#include <linux/fs.h>
> >   #include <linux/namei.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/fsnotify.h>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> >   extern const struct bpf_func_proto bpf_strtol_proto;
> >   extern const struct bpf_func_proto bpf_strtoul_proto;
> >   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >   
> >   /* Shared helpers among cBPF and eBPF. */
> >   void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..b45c8b6f7cb4 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> >   #include <linux/path.h>
> >   #include <linux/fcntl.h>
> >   #include <linux/errno.h>
> > +#include <linux/fs.h>
> >   
> >   enum { MAX_NESTED_LINKS = 8 };
> >   
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >   
> >   extern void nd_jump_link(struct path *path);
> >   
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > +			   struct path *path, struct path *root);
> > +
> >   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> >   {
> >   	((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..db241857ec15 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,28 @@ union bpf_attr {
> >    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> >    *
> >    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + *	Description
> > + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> > + *		current namespace and also device from /proc/self/ns/pid.
> > + *		*size_of_pidns* must be the size of *pidns*
> > + *
> > + *		This helper is used when pid filtering is needed inside a
> > + *		container as bpf_get_current_tgid() helper returns always the
> > + *		pid id as seen by the root namespace.
> > + *	Return
> > + *		0 on success
> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOTDIR** if /proc/self/ns does not exists.
> > + *
> > + *		**-ENOMEM**  if allocation fails.
> > + *
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)		\
> >   	FN(unspec),			\
> > @@ -2853,7 +2875,8 @@ union bpf_attr {
> >   	FN(sk_storage_get),		\
> >   	FN(sk_storage_delete),		\
> >   	FN(send_signal),		\
> > -	FN(tcp_gen_syncookie),
> > +	FN(tcp_gen_syncookie),		\
> > +	FN(get_current_pidns_info),
> >   
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> >   	__s32	retval;
> >   };
> >   
> > +struct bpf_pidns_info {
> > +	__u32 dev;
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
> >   #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >   
> >   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >   {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..41fbf1f28a48 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> >   #include <linux/uidgid.h>
> >   #include <linux/filter.h>
> >   #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >   
> >   #include "../../lib/kstrtox.h"
> >   
> > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> >   	preempt_enable();
> >   }
> >   
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +	 size)
> > +{
> > +	const char *pidns_path = "/proc/self/ns/pid";
> > +	struct pid_namespace *pidns = NULL;
> > +	struct filename *tmp = NULL;
> > +	struct inode *inode;
> > +	struct path kp;
> > +	pid_t tgid = 0;
> > +	pid_t pid = 0;
> > +	int ret;
> > +	int len;
> 

Thank you very much for catching this!. 
Could you share how to replicate this bug?.

> I am running your sample program and get the following kernel bug:
> 
> ...
> [   26.414825] BUG: sleeping function called from invalid context at 
> /data/users/yhs/work/net-next/fs
> /dcache.c:843
> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W 
> 5.3.0-rc1+ #280
> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.9.3-1.el7.centos 04/01/2
> 014
> [   26.419393] Call Trace:
> [   26.419697]  <IRQ>
> [   26.419960]  dump_stack+0x46/0x5b
> [   26.420434]  ___might_sleep+0xe4/0x110
> [   26.420894]  dput+0x2a/0x200
> [   26.421265]  walk_component+0x10c/0x280
> [   26.421773]  link_path_walk+0x327/0x560
> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> [   26.422848]  ? path_init+0x232/0x330
> [   26.423364]  path_lookupat+0x88/0x200
> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> [   26.424521]  filename_lookup+0xaf/0x190
> [   26.425031]  ? simple_attr_release+0x20/0x20
> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> [   26.426779]  trace_call_bpf+0xb5/0x160
> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.428496]  kprobe_perf_func+0x4d/0x280
> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> [   26.433060]  0xffffffffc03180bf
> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> ...
> 
> To prevent we are running in arbitrary task (e.g., idle task)
> context which may introduce sleeping issues, the following
> probably appropriate:
> 
>         if (in_nmi() || in_softirq())
>                 return -EPERM;
> 
> Anyway, if in nmi or softirq, the namespace and pid/tgid
> we get may be just accidentally associated with the bpf running
> context, but it could be in a different context. So such info
> is not reliable any way.
> 
> > +
> > +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +		return -EINVAL;
> > +	pidns = task_active_pid_ns(current);
> > +	if (unlikely(!pidns))
> > +		goto clear;
> > +	pidns_info->nsid =  pidns->ns.inum;
> > +	pid = task_pid_nr_ns(current, pidns);
> > +	if (unlikely(!pid))
> > +		goto clear;
> > +	tgid = task_tgid_nr_ns(current, pidns);
> > +	if (unlikely(!tgid))
> > +		goto clear;
> > +	pidns_info->tgid = (u32) tgid;
> > +	pidns_info->pid = (u32) pid;
> > +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return -ENOMEM;
> > +	}
> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)tmp->name, pidns_path, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> > +	}
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = inode->i_sb->s_dev;
> > +	return 0;
> > +clear:
> > +	memset((void *)pidns_info, 0, (size_t) size);
> > +	return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > +	.func		= bpf_get_current_pidns_info,
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> > +	.arg2_type	= ARG_CONST_SIZE,
> > +};
> > +
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   #endif
> >   	case BPF_FUNC_send_signal:
> >   		return &bpf_send_signal_proto;
> > +	case BPF_FUNC_get_current_pidns_info:
> > +		return &bpf_get_current_pidns_info_proto;
> >   	default:
> >   		return NULL;
> >   	}
> > 

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-14  0:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Song Liu, Kees Cook, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
	Linux API, LSM List
In-Reply-To: <CALCETrVT-dDXQGukGs5S1DkzvQv9_e=axzr_GyEd2c4T4z8Qng@mail.gmail.com>

On Tue, Aug 13, 2019 at 04:06:00PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > > >
> > > > Inside containers and inside nested containers we need to start processes
> > > > that will use bpf. All of the processes are trusted.
> > >
> > > Trusted by whom?  In a non-nested container, the container manager
> > > *might* be trusted by the outside world.  In a *nested* container,
> > > unless the inner container management is controlled from outside the
> > > outer container, it's not trusted.  I don't know much about how
> > > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > > very strongly toward user namespaces and maximally-untrusted
> > > containers, and I think bpf() should work in that context.
> >
> > agree that containers (namespaces) reduce amount of trust necessary
> > for apps to run, but the end goal is not security though.
> > Linux has become a single user system.
> > If user can ssh into the host they can become root.
> > If arbitrary code can run on the host it will be break out of any sandbox.
> 
> I would argue that this is a reasonable assumption to make if you're
> designing a system using Linux, but it's not a valid assumption to
> make as kernel developers.  Otherwise we should just give everyone
> CAP_SYS_ADMIN and call it a day.  There really is a difference between
> root and non-root.

hmm. No. Kernel developers should not make any assumptions.
They should guide their design by real use cases instead. That includes studing
what people do now and hacks they use to workaround lack of interfaces.
Effecitvely bpf is root only. There are no unpriv users.
This root applications go out of their way to reduce privileges
while they still want to use bpf. That is the need that /dev/bpf is solving.

> 
> > Containers are not providing the level of security that is enough
> > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> > Containers are used to make production systems safer.
> > Some people call it more 'secure', but it's clearly not secure for
> > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> > It's been a constant source of pain. The constant blinding, randomization,
> > verifier speculative analysis, all spectre v1, v2, v4 mitigations
> > are simply not worth it. It's a lot of complex kernel code without users.
> 
> Seccomp really will want eBPF some day, and it should work without
> privilege.  Maybe it should be a restricted subset of eBPF, and
> Spectre will always be an issue until dramatically better hardware
> shows up, but I think people will want the ability for regular
> programs to load eBPF seccomp programs.

I'm absolutely against using eBPF in seccomp.
Precisely due to discussions like the current one.

> 
> > Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> > The applications that will use it are going to be just as trusted as systemd.
> 
> I still don't understand your systemd example.  systemd --users is not
> trusted systemwide in any respect.  The main PID 1 systemd is root.
> No matter how you dice it, granting a user systemd instance extra bpf
> access is tantamount to granting the user extra bpf access in general.

People use systemd --user while their kernel have 'undef CONFIG_USER_NS'.

> It sounds to me like you're thinking of eBPF as a feature a bit like
> unprivileged user namespaces: *in principle*, it's supposed to be safe
> to give any unprivileged process the ability to use it, and you
> consider security flaws in it to be bugs worth fixing. But you think
> it's a large attack surface and that most unprivileged programs
> shouldn't be allowed to use it.  Is that reasonable?

I think there should be no unprivileged bpf at all,
because over all these years we've seen zero use cases.
Hence all new features are root only.
LPM map is a prime example. There was not a single security bug in there.
There were few functional bugs, but not security issues.
These bugs didn't crash the kernel and didn't expose any data.
Yet we still keep LPM as root only.
Can we flip the switch and make it non-root? It's trivial single line patch ?
and security risk is very low?
Nope, since it will not address the underlying issue.


^ permalink raw reply

* Re: [PATCH net-next v2 0/5] r8152: RX improve
From: Jakub Kicinski @ 2019-08-14  1:15 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-295-albertk@realtek.com>

On Tue, 13 Aug 2019 11:42:04 +0800, Hayes Wang wrote:
> v2:
> For patch #2, replace list_for_each_safe with list_for_each_entry_safe.
> Remove unlikely in WARN_ON. Adjust the coding style.
> 
> For patch #4, replace list_for_each_safe with list_for_each_entry_safe.
> Remove "else" after "continue".
> 
> For patch #5. replace sysfs with ethtool to modify rx_copybreak and
> rx_pending.
> 
> v1:
> The different chips use different rx buffer size.
> 
> Use skb_add_rx_frag() to reduce memory copy for RX.

Applied, thank you.

^ permalink raw reply

* [PATCH] net/ncsi: Ensure 32-bit boundary for data cksum
From: Terry S. Duncan @ 2019-08-14  1:18 UTC (permalink / raw)
  To: Samuel Mendoza-Jonas, David S . Miller, netdev, linux-kernel,
	openbmc, William Kennington, Joel Stanley
  Cc: Terry S. Duncan

The NCSI spec indicates that if the data does not end on a 32 bit
boundary, one to three padding bytes equal to 0x00 shall be present to
align the checksum field to a 32-bit boundary.

Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com>
---
 net/ncsi/internal.h |  1 +
 net/ncsi/ncsi-cmd.c |  2 +-
 net/ncsi/ncsi-rsp.c | 12 ++++++++----
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 0b3f0673e1a2..468a19fdfd88 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -185,6 +185,7 @@ struct ncsi_package;
 #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
 #define NCSI_MAX_PACKAGE	8
 #define NCSI_MAX_CHANNEL	32
+#define NCSI_ROUND32(x)		(((x) + 3) & ~3) /* Round to 32 bit boundary */
 
 struct ncsi_channel {
 	unsigned char               id;
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 5c3fad8cba57..c12f2183b460 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -54,7 +54,7 @@ static void ncsi_cmd_build_header(struct ncsi_pkt_hdr *h,
 	checksum = ncsi_calculate_checksum((unsigned char *)h,
 					   sizeof(*h) + nca->payload);
 	pchecksum = (__be32 *)((void *)h + sizeof(struct ncsi_pkt_hdr) +
-		    nca->payload);
+		    NCSI_ROUND32(nca->payload));
 	*pchecksum = htonl(checksum);
 }
 
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 7581bf919885..10a142d0422f 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -47,7 +47,8 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 	if (ntohs(h->code) != NCSI_PKT_RSP_C_COMPLETED ||
 	    ntohs(h->reason) != NCSI_PKT_RSP_R_NO_ERROR) {
 		netdev_dbg(nr->ndp->ndev.dev,
-			   "NCSI: non zero response/reason code\n");
+			   "NCSI: non zero response/reason code %04xh, %04xh\n",
+			    ntohs(h->code), ntohs(h->reason));
 		return -EPERM;
 	}
 
@@ -55,15 +56,18 @@ static int ncsi_validate_rsp_pkt(struct ncsi_request *nr,
 	 * sender doesn't support checksum according to NCSI
 	 * specification.
 	 */
-	pchecksum = (__be32 *)((void *)(h + 1) + payload - 4);
+	pchecksum = (__be32 *)((void *)(h + 1) + NCSI_ROUND32(payload) - 4);
 	if (ntohl(*pchecksum) == 0)
 		return 0;
 
 	checksum = ncsi_calculate_checksum((unsigned char *)h,
-					   sizeof(*h) + payload - 4);
+					   sizeof(*h) +
+					       NCSI_ROUND32(payload) - 4);
 
 	if (*pchecksum != htonl(checksum)) {
-		netdev_dbg(nr->ndp->ndev.dev, "NCSI: checksum mismatched\n");
+		netdev_dbg(nr->ndp->ndev.dev,
+			   "NCSI: checksum mismatched; recd: %08x calc: %08x\n",
+			   *pchecksum, htonl(checksum));
 		return -EINVAL;
 	}
 
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox