Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC 2/4] net: phy: allow to bind genphy driver at probe time
From: Heiner Kallweit @ 2019-08-13 23:02 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <010ae64f-7e48-5e1e-2928-af3c4364f6e3@gmail.com>

On 14.08.2019 00:53, Florian Fainelli wrote:
> On 8/13/19 2:25 PM, Heiner Kallweit wrote:
>> In cases like a fixed phy that is never attached to a net_device we
>> may want to bind the genphy driver at probe time. Setting a PHY ID of
>> 0xffffffff to bind the genphy driver would fail due to a check in
>> get_phy_device(). Therefore let's change the PHY ID the genphy driver
>> binds to to 0xfffffffe. This still shouldn't match any real PHY,
>> and it will pass the check in get_phy_devcie().
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/net/phy/phy_device.c | 3 +--
>>  include/linux/phy.h          | 4 ++++
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 163295dbc..54f80af31 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -2388,8 +2388,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n)
>>  EXPORT_SYMBOL(phy_drivers_unregister);
>>  
>>  static struct phy_driver genphy_driver = {
>> -	.phy_id		= 0xffffffff,
>> -	.phy_id_mask	= 0xffffffff,
>> +	PHY_ID_MATCH_EXACT(GENPHY_ID),
>>  	.name		= "Generic PHY",
>>  	.soft_reset	= genphy_no_soft_reset,
>>  	.get_features	= genphy_read_abilities,
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 5ac7d2137..3b07bce78 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -37,6 +37,10 @@
>>  #define PHY_1000BT_FEATURES	(SUPPORTED_1000baseT_Half | \
>>  				 SUPPORTED_1000baseT_Full)
>>  
>> +#define GENPHY_ID_HIGH		0xffffU
>> +#define GENPHY_ID_LOW		0xfffeU
>> +#define GENPHY_ID		((GENPHY_ID_HIGH << 16) | GENPHY_ID_LOW)
> 
> This is a possible user ABI change here, if there is anything that
> relies on reading 0xffff_ffff as a valid PHY OUI, you would be breaking
> it. We might as well try to assign ourselves a specific PHY OUI, very
> much like the Linux USB hubs show up with a Linux Foundation vendor ID.
> 

I see the point. However in get_phy_device() we have the following check
that should cause a PHY with ID 0xffff_ffff to be ignored. Therefore
I doubt there's any such PHY ID in use.

	/* If the phy_id is mostly Fs, there is no device there */
	if ((phy_id & 0x1fffffff) == 0x1fffffff)
		return ERR_PTR(-ENODEV);

Heiner

^ permalink raw reply

* Re: [PATCH RFC 2/4] net: phy: allow to bind genphy driver at probe time
From: Florian Fainelli @ 2019-08-13 22:53 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <b066560d-2cc3-2ea5-5233-e63a612c5aa1@gmail.com>

On 8/13/19 2:25 PM, Heiner Kallweit wrote:
> In cases like a fixed phy that is never attached to a net_device we
> may want to bind the genphy driver at probe time. Setting a PHY ID of
> 0xffffffff to bind the genphy driver would fail due to a check in
> get_phy_device(). Therefore let's change the PHY ID the genphy driver
> binds to to 0xfffffffe. This still shouldn't match any real PHY,
> and it will pass the check in get_phy_devcie().
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/net/phy/phy_device.c | 3 +--
>  include/linux/phy.h          | 4 ++++
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 163295dbc..54f80af31 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2388,8 +2388,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n)
>  EXPORT_SYMBOL(phy_drivers_unregister);
>  
>  static struct phy_driver genphy_driver = {
> -	.phy_id		= 0xffffffff,
> -	.phy_id_mask	= 0xffffffff,
> +	PHY_ID_MATCH_EXACT(GENPHY_ID),
>  	.name		= "Generic PHY",
>  	.soft_reset	= genphy_no_soft_reset,
>  	.get_features	= genphy_read_abilities,
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 5ac7d2137..3b07bce78 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -37,6 +37,10 @@
>  #define PHY_1000BT_FEATURES	(SUPPORTED_1000baseT_Half | \
>  				 SUPPORTED_1000baseT_Full)
>  
> +#define GENPHY_ID_HIGH		0xffffU
> +#define GENPHY_ID_LOW		0xfffeU
> +#define GENPHY_ID		((GENPHY_ID_HIGH << 16) | GENPHY_ID_LOW)

This is a possible user ABI change here, if there is anything that
relies on reading 0xffff_ffff as a valid PHY OUI, you would be breaking
it. We might as well try to assign ourselves a specific PHY OUI, very
much like the Linux USB hubs show up with a Linux Foundation vendor ID.
-- 
Florian

^ permalink raw reply

* Re: [patch net-next] devlink: send notifications for deleted snapshots on region destroy
From: Jakub Kicinski @ 2019-08-13 22:51 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190812122831.1833-1-jiri@resnulli.us>

On Mon, 12 Aug 2019 14:28:31 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Currently the notifications for deleted snapshots are sent only in case
> user deletes a snapshot manually. Send the notifications in case region
> is destroyed too.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Applied, thanks!

^ permalink raw reply

* Re: [patch net-next] selftests: netdevsim: add devlink params tests
From: Jakub Kicinski @ 2019-08-13 22:41 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190813130446.25712-1-jiri@resnulli.us>

On Tue, 13 Aug 2019 15:04:46 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Test recently added netdevsim devlink param implementation.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Thanks for the test, but it doesn't pass here:

TEST: fw flash test                                                 [ OK ]
TEST: params test                                                   [FAIL]
	Failed to get test1 param value

> diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
> index 9d8baf5d14b3..858ebdc8d8a3 100755
> --- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
> +++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
> @@ -3,7 +3,7 @@
>  
>  lib_dir=$(dirname $0)/../../../net/forwarding
>  
> -ALL_TESTS="fw_flash_test"
> +ALL_TESTS="fw_flash_test params_test"
>  NUM_NETIFS=0
>  source $lib_dir/lib.sh
>  
> @@ -30,6 +30,66 @@ fw_flash_test()
>  	log_test "fw flash test"
>  }
>  
> +param_get()
> +{
> +	local name=$1
> +
> +	devlink dev param show $DL_HANDLE name $name -j | \
> +		jq -e -r '.[][][].values[] | select(.cmode == "driverinit").value'

                   ^^

The -e makes jq set exit code to 1 when test1 param is false.

Quoting the man page:

       ·   -e / --exit-status:

           Sets the exit status of jq to 0 if the last output values
           was neither false nor null, 1 if the last output value was
           either false or  null,  or  4  if  no valid  result  was
           ever produced. Normally jq exits with 2 if there was any
           usage problem or system error, 3 if there was a jq program
           compile error, or 0 if the jq program ran.

Without the -e all is well:

# ./devlink.sh 
TEST: fw flash test                                                 [ OK ]
TEST: params test                                                   [ OK ]

> +}
> +

^ permalink raw reply

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
From: Yonghong Song @ 2019-08-13 22:35 UTC (permalink / raw)
  To: Carlos Neira, netdev@vger.kernel.org
  Cc: ebiederm@xmission.com, brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <20190813184747.12225-2-cneirabustos@gmail.com>



On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
> 
> New bpf helper bpf_get_current_pidns_info.
> This helper obtains the active namespace from current and returns
> pid, tgid, device and namespace id as seen from that namespace,
> allowing to instrument a process inside a container.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   fs/internal.h            |  2 --
>   fs/namei.c               |  1 -
>   include/linux/bpf.h      |  1 +
>   include/linux/namei.h    |  4 +++
>   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>   kernel/bpf/core.c        |  1 +
>   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/trace/bpf_trace.c |  2 ++
>   8 files changed, 102 insertions(+), 4 deletions(-)

I prefer to break this into two patches to reduce
the potential merging conflicts:
   patch 1: fs/internal.h, fs/namei.c, include/linux/namei.h
   patch 2: rest of changes
patch 1 is simply a preparing patches to make filename_lookup
available later.

> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 315fcd8d237c..6647e15dd419 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>   /*
>    * namei.c
>    */
> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> -			   struct path *path, struct path *root);
>   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>   			   const char *, unsigned int, struct path *);
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a89fc72a4a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
>   #include <linux/export.h>
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
> -#include <linux/fs.h>
>   #include <linux/namei.h>
>   #include <linux/pagemap.h>
>   #include <linux/fsnotify.h>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..b45c8b6f7cb4 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
>   #include <linux/path.h>
>   #include <linux/fcntl.h>
>   #include <linux/errno.h>
> +#include <linux/fs.h>
>   
>   enum { MAX_NESTED_LINKS = 8 };
>   
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>   
>   extern void nd_jump_link(struct path *path);
>   
> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> +			   struct path *path, struct path *root);
> +
>   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>   {
>   	((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)

size_of_pidns => size.

> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
Copies => Copy.
Maybe something like below:
Get tgid, pid and namespace id as seen by the current namespace, and 
device major/minor numbers from device /proc/self/ns/pid. Such
information is stored in *pidns* of size *size*.

> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the

returns always => always returns.

> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOTDIR** if /proc/self/ns does not exists.

Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
described below.

Please *do verify* what happens when namespaces or pid_ns are not
configured.

> + *
> + *		**-ENOMEM**  if allocation fails.

helper internal allocation fails.

> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2875,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;

Please add a comment for dev for how device major and minor number are 
derived. User space gets device major and minor number, they need to
compare to the corresponding major/minor numbers returned by this helper.

> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..41fbf1f28a48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *pidns_path = "/proc/self/ns/pid";
> +	struct pid_namespace *pidns = NULL;
> +	struct filename *tmp = NULL;

tmp => fname

> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret;
> +	int len;
> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;

Please put an empty line. As a general rule for readability,
put an empty line if control flow is interrupted, e.g., by
"return", "break" or "continue". At least this is what
I saw most in bpf mailing list.

> +	pidns = task_active_pid_ns(current);
> +	if (unlikely(!pidns))
> +		goto clear;

An empty line. Also, there is nothing to clear.
I prefer an error code -ENOENT.

You can set
	ret = -EINVAL;
here

> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +	if (unlikely(!pid))
> +		goto clear;

An empty line.

> +	tgid = task_tgid_nr_ns(current, pidns);
> +	if (unlikely(!tgid))
> +		goto clear;

An empty line.

> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;

Different functionality, an empty line.

> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return -ENOMEM;

ret = -ENOMEM;
goto clear;

> +	}

An empty line.

> +	len = strlen(pidns_path) + 1;
> +	memcpy((char *)tmp->name, pidns_path, len);
> +	tmp->uptr = NULL;
> +	tmp->aname = NULL;
> +	tmp->refcnt = 1;
> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
Adding below to free kmem cache memory
	kmem_cache_free(names_cachep, fname);

In the above, we checked task_active_pid_ns().
If not returning NULL, we have a valid pid ns. So the above
filename_lookup should not go wrong. We can still keep
the error checking though.

> +	if (ret) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return ret;

goto clear;

> +	}

An empty line.

> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = inode->i_sb->s_dev;
> +	return 0;

An empty line.

> +clear > +	memset((void *)pidns_info, 0, (size_t) size);
> +	return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func		= bpf_get_current_pidns_info,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> 

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Daniel Colascione @ 2019-08-13 22:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Song Liu, Kees Cook, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190813215823.3sfbakzzjjykyng2@ast-mbp>

On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > >
> > > Inside containers and inside nested containers we need to start processes
> > > that will use bpf. All of the processes are trusted.
> >
> > Trusted by whom?  In a non-nested container, the container manager
> > *might* be trusted by the outside world.  In a *nested* container,
> > unless the inner container management is controlled from outside the
> > outer container, it's not trusted.  I don't know much about how
> > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > very strongly toward user namespaces and maximally-untrusted
> > containers, and I think bpf() should work in that context.
>
> agree that containers (namespaces) reduce amount of trust necessary
> for apps to run, but the end goal is not security though.
> Linux has become a single user system.
> If user can ssh into the host they can become root.
> If arbitrary code can run on the host it will be break out of any sandbox.
> Containers are not providing the level of security that is enough
> to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> Containers are used to make production systems safer.
> Some people call it more 'secure', but it's clearly not secure for
> arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> It's been a constant source of pain. The constant blinding, randomization,
> verifier speculative analysis, all spectre v1, v2, v4 mitigations
> are simply not worth it. It's a lot of complex kernel code without users.
> There is not a single use case to allow arbitrary malicious bpf
> program to be loaded and executed.
> As soon as we have /dev/bpf to allow all of bpf to be used without root
> we will set sysctl kernel.unprivileged_bpf_disabled=1
> Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> The applications that will use it are going to be just as trusted as systemd.
>
> > > To solve your concern of bypassing all capable checks...
> > > How about we do /dev/bpf/full_verifier first?
> > > It will replace capable() checks in the verifier only.
> >
> > I'm not convinced that "in the verifier" is the right distinction.
> > Telling administrators that some setting lets certain users bypass
> > bpf() verifier checks doesn't have a clear enough meaning.
>
> linux is a single user system. there are no administrators any more.
> No doubt, folks will disagree, but that game is over.
> At least on bpf side it's done.
>
> > I propose,
> > instead, that the current capable() checks be divided into three
> > categories:
>
> I don't see a use case for these categories.
> All bpf programs extend the kernel in some way.
> The kernel vs user is one category.
> Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> When application has CAP_NET_ADMIN it covers all of networking knobs.
> There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> Similarly CAP_BPF as the only knob is enough.
> The only disadvantage of CAP_BPF is that it's not possible to
> pass it from one systemd-like daemon to another systemd-like daemon.
> Hence /dev/bpf idea and passing file descriptor.
>
> > This type of thing actually fits quite nicely into an idea I've been
> > thinking about for a while called "implicit rights". In very brief
> > summary, there would be objects called /dev/rights/xyz, where xyz is
> > the same of a "right".  If there is a readable object of the right
> > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > There's a bit more flexibility on top of this.  BPF could use
> > /dev/rights/bpf/maptypes/lpm and
> > /dev/rights/bpf/verifier/bounded_loops, for example.  Other non-BPF
> > use cases include a biggie:
> > /dev/rights/namespace/create_unprivileged_userns.
> > /dev/rights/bind_port/80 would be nice, too.
>
> The concept of "implicit rights" is very nice and I'm sure it will
> be a good fit somewhere, but I don't see why use it in bpf space.
> There is no use case for fine grain partition of bpf features.

Isn't this "implicit rights" model just another kind of ambient
authority --- one that constrains the otherwise-free filesystem
namespace to boot? IMHO, the kernel should be moving toward explicit
authorization tokens modeled by file descriptors and away from
contextual authorization decisions.

^ permalink raw reply

* Re: [PATCH net-next 2/5] RDS: limit the number of times we loop in rds_send_xmit
From: santosh.shilimkar @ 2019-08-13 22:08 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <4d04cec6-ef2d-392b-233b-0abf7a57fe44@oracle.com>

On 8/13/19 11:20 AM, Gerd Rausch wrote:
> From: Chris Mason <chris.mason@oracle.com>
> Date: Fri, 3 Feb 2012 11:07:54 -0500
> 
> This will kick the RDS worker thread if we have been looping
> too long.
> 
> Original commit from 2012 updated to include a change by
> Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> that triggers "must_wake" if "rds_ib_recv_refill_one" fails.
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>


^ permalink raw reply

* Re: [PATCH net-next 5/5] rds: check for excessive looping in rds_send_xmit
From: santosh.shilimkar @ 2019-08-13 22:09 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <333232d5-311d-ba38-c906-540ef792ab77@oracle.com>

On 8/13/19 11:21 AM, Gerd Rausch wrote:
> From: Andy Grover <andy.grover@oracle.com>
> Date: Thu, 13 Jan 2011 11:40:31 -0800
> 
> Original commit from 2011 updated to include a change by
> Yuval Shaia <yuval.shaia@oracle.com>
> that adds a new statistic counter "send_stuck_rm"
> to capture the messages looping exessively
> in the send path.
>
You need Andy's SOB as well.

> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Other than that,
Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>


^ permalink raw reply

* Re: [PATCH net-next 3/5] RDS: don't use GFP_ATOMIC for sk_alloc in rds_create
From: santosh.shilimkar @ 2019-08-13 22:08 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <14aa4df7-3b7d-157d-1e9a-9c49ff5feb3b@oracle.com>

On 8/13/19 11:21 AM, Gerd Rausch wrote:
> From: Chris Mason <chris.mason@oracle.com>
> Date: Fri, 3 Feb 2012 11:08:51 -0500
> 
> Signed-off-by: Chris Mason <chris.mason@oracle.com>
> Signed-off-by: Bang Nguyen <bang.nguyen@oracle.com>
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
> ---Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>


^ permalink raw reply

* Re: [PATCH net-next 4/5] net/rds: Add a few missing rds_stat_names entries
From: santosh.shilimkar @ 2019-08-13 22:09 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <9a93fcf5-720c-a771-0329-312a1499eda1@oracle.com>

On 8/13/19 11:21 AM, Gerd Rausch wrote:
> Date: Thu, 11 Jul 2019 12:15:50 -0700
> 
> In a previous commit, fields were added to "struct rds_statistics"
> but array "rds_stat_names" was not updated accordingly.
> 
> Please note the inconsistent naming of the string representations
> that is done in the name of compatibility
> with the Oracle internal code-base.
> 
> s_recv_bytes_added_to_socket     -> "recv_bytes_added_to_sock"
> s_recv_bytes_removed_from_socket -> "recv_bytes_freed_fromsock"
> 
> Fixes: 192a798f5299 ("RDS: add stat for socket recv memory usage")
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>


^ permalink raw reply

* Re: [patch net-next v2 1/2] netdevsim: implement support for devlink region and snapshots
From: Jakub Kicinski @ 2019-08-13 22:08 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190813144843.28466-2-jiri@resnulli.us>

On Tue, 13 Aug 2019 16:48:42 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Implement dummy region of size 32K and allow user to create snapshots
> or random data using debugfs file trigger.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Hmm.. did you send the right version?

> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index 127aef85dc99..8485dd805f7c 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -27,6 +27,41 @@
>  
>  static struct dentry *nsim_dev_ddir;
>  
> +#define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
> +
> +static ssize_t nsim_dev_take_snapshot_write(struct file *file,
> +					    const char __user *data,
> +					    size_t count, loff_t *ppos)
> +{
> +	struct nsim_dev *nsim_dev = file->private_data;
> +	void *dummy_data;
> +	u32 id;
> +	int err;
> +
> +	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
> +	if (!dummy_data) {
> +		pr_err("Failed to allocate memory for region snapshot\n");

not needed, without __GFP_NOWARN there will be a huge OOM splat, anyway.

> +		goto out;

		return -ENOMEM;

> +	}
> +
> +	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
> +
> +	id = devlink_region_shapshot_id_get(priv_to_devlink(nsim_dev));
> +	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
> +					     dummy_data, id, kfree);
> +	if (err)
> +		pr_err("Failed to create region snapshot\n");

		return err;
	}

> +
> +out:
> +	return count;
> +}
> +
> +static const struct file_operations nsim_dev_take_snapshot_fops = {
> +	.open = simple_open,
> +	.write = nsim_dev_take_snapshot_write,
> +	.llseek = generic_file_llseek,
> +};
> +
>  static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
>  {
>  	char dev_ddir_name[16];
> @@ -44,6 +79,8 @@ static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
>  			   &nsim_dev->max_macs);
>  	debugfs_create_bool("test1", 0600, nsim_dev->ddir,
>  			    &nsim_dev->test1);
> +	debugfs_create_file("take_snapshot", 0200, nsim_dev->ddir, nsim_dev,
> +			    &nsim_dev_take_snapshot_fops);
>  	return 0;
>  }
>  
> @@ -248,6 +285,26 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
>  		nsim_dev->test1 = saved_value.vbool;
>  }
>  
> +#define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
> +
> +static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
> +				      struct devlink *devlink)
> +{
> +	nsim_dev->dummy_region =
> +		devlink_region_create(devlink, "dummy",
> +				      NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
> +				      NSIM_DEV_DUMMY_REGION_SIZE);
> +	if (IS_ERR(nsim_dev->dummy_region))
> +		return PTR_ERR(nsim_dev->dummy_region);
> +
> +	return 0;

	return PTR_ERR_OR_ZERO(nsim_dev->dummy_region);

> +}

^ permalink raw reply

* Re: [PATCH net-next 1/5] RDS: Re-add pf/sol access via sysctl
From: santosh.shilimkar @ 2019-08-13 22:07 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <e0397d30-7405-a7af-286c-fe76887caf0a@oracle.com>

On 8/13/19 11:20 AM, Gerd Rausch wrote:
> From: Andy Grover <andy.grover@oracle.com>
> Date: Tue, 24 Nov 2009 15:35:51 -0800
> 
> Although RDS has an official PF_RDS value now, existing software
> expects to look for rds sysctls to determine it. We need to maintain
> these for now, for backwards compatibility.
> 
> Signed-off-by: Andy Grover <andy.grover@oracle.com>
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-13 21:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Song Liu, Kees Cook, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
	Linux API, LSM List
In-Reply-To: <CALCETrXEHL3+NAY6P6vUj7Pvd9ZpZsYC6VCLXOaNxb90a_POGw@mail.gmail.com>

On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> >
> > Inside containers and inside nested containers we need to start processes
> > that will use bpf. All of the processes are trusted.
> 
> Trusted by whom?  In a non-nested container, the container manager
> *might* be trusted by the outside world.  In a *nested* container,
> unless the inner container management is controlled from outside the
> outer container, it's not trusted.  I don't know much about how
> Facebook's containers work, but the LXC/LXD/Podman world is moving
> very strongly toward user namespaces and maximally-untrusted
> containers, and I think bpf() should work in that context.

agree that containers (namespaces) reduce amount of trust necessary
for apps to run, but the end goal is not security though.
Linux has become a single user system.
If user can ssh into the host they can become root.
If arbitrary code can run on the host it will be break out of any sandbox.
Containers are not providing the level of security that is enough
to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
Containers are used to make production systems safer.
Some people call it more 'secure', but it's clearly not secure for
arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
It's been a constant source of pain. The constant blinding, randomization,
verifier speculative analysis, all spectre v1, v2, v4 mitigations
are simply not worth it. It's a lot of complex kernel code without users.
There is not a single use case to allow arbitrary malicious bpf
program to be loaded and executed.
As soon as we have /dev/bpf to allow all of bpf to be used without root
we will set sysctl kernel.unprivileged_bpf_disabled=1
Hence I prefer this /dev/bpf mechanism to be as simple a possible.
The applications that will use it are going to be just as trusted as systemd.

> > To solve your concern of bypassing all capable checks...
> > How about we do /dev/bpf/full_verifier first?
> > It will replace capable() checks in the verifier only.
> 
> I'm not convinced that "in the verifier" is the right distinction.
> Telling administrators that some setting lets certain users bypass
> bpf() verifier checks doesn't have a clear enough meaning.  

linux is a single user system. there are no administrators any more.
No doubt, folks will disagree, but that game is over.
At least on bpf side it's done.

> I propose,
> instead, that the current capable() checks be divided into three
> categories:

I don't see a use case for these categories.
All bpf programs extend the kernel in some way.
The kernel vs user is one category.
Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
When application has CAP_NET_ADMIN it covers all of networking knobs.
There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
Similarly CAP_BPF as the only knob is enough.
The only disadvantage of CAP_BPF is that it's not possible to
pass it from one systemd-like daemon to another systemd-like daemon.
Hence /dev/bpf idea and passing file descriptor.

> This type of thing actually fits quite nicely into an idea I've been
> thinking about for a while called "implicit rights". In very brief
> summary, there would be objects called /dev/rights/xyz, where xyz is
> the same of a "right".  If there is a readable object of the right
> type at the literal path "/dev/rights/xyz", then you have right xyz.
> There's a bit more flexibility on top of this.  BPF could use
> /dev/rights/bpf/maptypes/lpm and
> /dev/rights/bpf/verifier/bounded_loops, for example.  Other non-BPF
> use cases include a biggie:
> /dev/rights/namespace/create_unprivileged_userns.
> /dev/rights/bind_port/80 would be nice, too.

The concept of "implicit rights" is very nice and I'm sure it will
be a good fit somewhere, but I don't see why use it in bpf space.
There is no use case for fine grain partition of bpf features.

^ permalink raw reply

* Re: [PATCH] MAINTAINERS: r8169: Update path to the driver
From: Heiner Kallweit @ 2019-08-13 21:52 UTC (permalink / raw)
  To: Denis Efremov, linux-kernel; +Cc: joe, nic_swsd, David S . Miller, netdev
In-Reply-To: <20190813060759.14256-1-efremov@linux.com>

On 13.08.2019 08:07, Denis Efremov wrote:
> Update MAINTAINERS record to reflect the filename change
> from r8169.c to r8169_main.c
> 
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: nic_swsd@realtek.com
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Fixes: 25e992a4603c ("r8169: rename r8169.c to r8169_main.c")
> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 99a7392ad6bc..25eb86f3261e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -183,7 +183,7 @@ M:	Realtek linux nic maintainers <nic_swsd@realtek.com>
>  M:	Heiner Kallweit <hkallweit1@gmail.com>
>  L:	netdev@vger.kernel.org
>  S:	Maintained
> -F:	drivers/net/ethernet/realtek/r8169.c
> +F:	drivers/net/ethernet/realtek/r8169_main.c
>  
That's better than before, but wouldn't cover e.g. changes
to r8169_firmware.c. Better may be:

F:	drivers/net/ethernet/realtek/r8169*

>  8250/16?50 (AND CLONE UARTS) SERIAL DRIVER
>  M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 

Heiner


^ permalink raw reply

* Re: [v5,0/4] tools: bpftool: add net attach/detach command to attach XDP prog
From: Jakub Kicinski @ 2019-08-13 21:43 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190813024621.29886-1-danieltimlee@gmail.com>

On Tue, 13 Aug 2019 11:46:17 +0900, Daniel T. Lee wrote:
> Currently, bpftool net only supports dumping progs attached on the
> interface. To attach XDP prog on interface, user must use other tool
> (eg. iproute2). By this patch, with `bpftool net attach/detach`, user
> can attach/detach XDP prog on interface.
> 
>     # bpftool prog
>         16: xdp  name xdp_prog1  tag 539ec6ce11b52f98  gpl
>         loaded_at 2019-08-07T08:30:17+0900  uid 0
>         ...
>         20: xdp  name xdp_fwd_prog  tag b9cb69f121e4a274  gpl
>         loaded_at 2019-08-07T08:30:17+0900  uid 0
> 
>     # bpftool net attach xdpdrv id 16 dev enp6s0np0
>     # bpftool net
>     xdp:
>         enp6s0np0(4) driver id 16
> 
>     # bpftool net attach xdpdrv id 20 dev enp6s0np0 overwrite
>     # bpftool net
>     xdp:
>         enp6s0np0(4) driver id 20
> 
>     # bpftool net detach xdpdrv dev enp6s0np0
>     # bpftool net
>     xdp:
> 
> 
> While this patch only contains support for XDP, through `net
> attach/detach`, bpftool can further support other prog attach types.
> 
> XDP attach/detach tested on Mellanox ConnectX-4 and Netronome Agilio.
> 
> ---
> Changes in v5:
>   - fix wrong error message, from errno to err with do_attach/detach

The inconsistency in libbpf's error reporting is generally troubling,
but a problem of this set, so:

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

In the future please keep review tags if you have only made minor
changes to the code.

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-13 21:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Stanislav Fomichev, netdev, bpf, davem, ast, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <2d24378a-73f4-bfa0-dc99-4a0ed761c797@iogearbox.net>

On 08/13, Daniel Borkmann wrote:
> On 8/12/19 7:52 PM, Stanislav Fomichev wrote:
> > On 08/12, Daniel Borkmann wrote:
> > > On 8/9/19 6:10 PM, Stanislav Fomichev wrote:
> > > > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > > > and call it from sk_clone_lock.
> > > > 
> > > > Cc: Martin KaFai Lau <kafai@fb.com>
> > > > Cc: Yonghong Song <yhs@fb.com>
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > [...]
> > > > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > > > +{
> > > > +	struct bpf_sk_storage *new_sk_storage = NULL;
> > > > +	struct bpf_sk_storage *sk_storage;
> > > > +	struct bpf_sk_storage_elem *selem;
> > > > +	int ret;
> > > > +
> > > > +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > > > +
> > > > +	rcu_read_lock();
> > > > +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > > > +
> > > > +	if (!sk_storage || hlist_empty(&sk_storage->list))
> > > > +		goto out;
> > > > +
> > > > +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > > > +		struct bpf_sk_storage_elem *copy_selem;
> > > > +		struct bpf_sk_storage_map *smap;
> > > > +		struct bpf_map *map;
> > > > +		int refold;
> > > > +
> > > > +		smap = rcu_dereference(SDATA(selem)->smap);
> > > > +		if (!(smap->map.map_flags & BPF_F_CLONE))
> > > > +			continue;
> > > > +
> > > > +		map = bpf_map_inc_not_zero(&smap->map, false);
> > > > +		if (IS_ERR(map))
> > > > +			continue;
> > > > +
> > > > +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > > > +		if (!copy_selem) {
> > > > +			ret = -ENOMEM;
> > > > +			bpf_map_put(map);
> > > > +			goto err;
> > > > +		}
> > > > +
> > > > +		if (new_sk_storage) {
> > > > +			selem_link_map(smap, copy_selem);
> > > > +			__selem_link_sk(new_sk_storage, copy_selem);
> > > > +		} else {
> > > > +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> > > > +			if (ret) {
> > > > +				kfree(copy_selem);
> > > > +				atomic_sub(smap->elem_size,
> > > > +					   &newsk->sk_omem_alloc);
> > > > +				bpf_map_put(map);
> > > > +				goto err;
> > > > +			}
> > > > +
> > > > +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > > > +		}
> > > > +		bpf_map_put(map);
> > > 
> > > The map get/put combination /under/ RCU read lock seems a bit odd to me, could
> > > you exactly describe the race that this would be preventing?
> > There is a race between sk storage release and sk storage clone.
> > bpf_sk_storage_map_free uses synchronize_rcu to wait for all existing
> > users to finish and the new ones are prevented via map's refcnt being
> > zero; we need to do something like that for the clone.
> > Martin suggested to use bpf_map_inc_not_zero/bpf_map_put.
> > If I read everythin correctly, I think without map_inc/map_put we
> > get the following race:
> > 
> > CPU0                                   CPU1
> > 
> > bpf_map_put
> >    bpf_sk_storage_map_free(smap)
> >      synchronize_rcu
> > 
> >      // no more users via bpf or
> >      // syscall, but clone
> >      // can still happen
> > 
> >      for each (bucket)
> >        selem_unlink
> >          selem_unlink_map(smap)
> > 
> >          // adding anything at
> >          // this point to the
> >          // bucket will leak
> > 
> >                                         rcu_read_lock
> >                                         tcp_v4_rcv
> >                                           tcp_v4_do_rcv
> >                                             // sk is lockless TCP_LISTEN
> >                                             tcp_v4_cookie_check
> >                                               tcp_v4_syn_recv_sock
> >                                                 bpf_sk_storage_clone
> >                                                   rcu_dereference(sk->sk_bpf_storage)
> >                                                   selem_link_map(smap, copy)
> >                                                   // adding new element to the
> >                                                   // map -> leak
> >                                         rcu_read_unlock
> > 
> >        selem_unlink_sk
> >         sk->sk_bpf_storage = NULL
> > 
> >      synchronize_rcu
> > 
> 
> Makes sense, thanks for clarifying. Perhaps a small comment on top of
> the bpf_map_inc_not_zero() would be great as well, so it's immediately
> clear also from this location when reading the code why this is done.
Sure, no problem, will have something similar to what I have before
synchronize_rcu in bpf_sk_storage_map_free.

> Thanks,
> Daniel

^ permalink raw reply

* [PATCH RFC 4/4] net: phy: fixed_phy: let genphy driver set supported and advertised modes
From: Heiner Kallweit @ 2019-08-13 21:26 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <ac3471d5-deb7-b711-6e74-23f59914758a@gmail.com>

A fixed phy as special swphy binds to the genphy driver that calls
genphy_read_abilities(). This function populates the supported and
advertised modes, so we don't have to do it manually.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/fixed_phy.c | 23 -----------------------
 1 file changed, 23 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 7c5265fd2..db4d96f2f 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -282,29 +282,6 @@ static struct phy_device *__fixed_phy_register(unsigned int irq,
 	phy->mdio.dev.of_node = np;
 	phy->is_pseudo_fixed_link = true;
 
-	switch (status->speed) {
-	case SPEED_1000:
-		linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
-				 phy->supported);
-		linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
-				 phy->supported);
-		/* fall through */
-	case SPEED_100:
-		linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT,
-				 phy->supported);
-		linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT,
-				 phy->supported);
-		/* fall through */
-	case SPEED_10:
-	default:
-		linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT,
-				 phy->supported);
-		linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT,
-				 phy->supported);
-	}
-
-	phy_advertise_supported(phy);
-
 	ret = phy_device_register(phy);
 	if (ret) {
 		phy_device_free(phy);
-- 
2.22.0



^ permalink raw reply related

* [PATCH RFC 3/4] net: phy: swphy: bind swphy to genphy driver at probe time
From: Heiner Kallweit @ 2019-08-13 21:26 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <ac3471d5-deb7-b711-6e74-23f59914758a@gmail.com>

Let a swphy bind to the genphy driver at probe time. This provides
automatic feature detection even if the swphy never gets attached to a
net_device. So far the genphy driver binds to a PHY as fallback only
once the PHY is attached to a net_device.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/swphy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/swphy.c b/drivers/net/phy/swphy.c
index 53c214a22..7ac5054fa 100644
--- a/drivers/net/phy/swphy.c
+++ b/drivers/net/phy/swphy.c
@@ -151,8 +151,9 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 	case MII_BMSR:
 		return bmsr;
 	case MII_PHYSID1:
+		return GENPHY_ID_HIGH;
 	case MII_PHYSID2:
-		return 0;
+		return GENPHY_ID_LOW;
 	case MII_LPA:
 		return lpa;
 	case MII_STAT1000:
-- 
2.22.0



^ permalink raw reply related

* [PATCH RFC 2/4] net: phy: allow to bind genphy driver at probe time
From: Heiner Kallweit @ 2019-08-13 21:25 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <ac3471d5-deb7-b711-6e74-23f59914758a@gmail.com>

In cases like a fixed phy that is never attached to a net_device we
may want to bind the genphy driver at probe time. Setting a PHY ID of
0xffffffff to bind the genphy driver would fail due to a check in
get_phy_device(). Therefore let's change the PHY ID the genphy driver
binds to to 0xfffffffe. This still shouldn't match any real PHY,
and it will pass the check in get_phy_devcie().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 3 +--
 include/linux/phy.h          | 4 ++++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 163295dbc..54f80af31 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2388,8 +2388,7 @@ void phy_drivers_unregister(struct phy_driver *drv, int n)
 EXPORT_SYMBOL(phy_drivers_unregister);
 
 static struct phy_driver genphy_driver = {
-	.phy_id		= 0xffffffff,
-	.phy_id_mask	= 0xffffffff,
+	PHY_ID_MATCH_EXACT(GENPHY_ID),
 	.name		= "Generic PHY",
 	.soft_reset	= genphy_no_soft_reset,
 	.get_features	= genphy_read_abilities,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5ac7d2137..3b07bce78 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -37,6 +37,10 @@
 #define PHY_1000BT_FEATURES	(SUPPORTED_1000baseT_Half | \
 				 SUPPORTED_1000baseT_Full)
 
+#define GENPHY_ID_HIGH		0xffffU
+#define GENPHY_ID_LOW		0xfffeU
+#define GENPHY_ID		((GENPHY_ID_HIGH << 16) | GENPHY_ID_LOW)
+
 extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_features) __ro_after_init;
 extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1_features) __ro_after_init;
 extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_features) __ro_after_init;
-- 
2.22.0



^ permalink raw reply related

* [PATCH RFC 1/4] net: phy: swphy: emulate register MII_ESTATUS
From: Heiner Kallweit @ 2019-08-13 21:24 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org
In-Reply-To: <ac3471d5-deb7-b711-6e74-23f59914758a@gmail.com>

When the genphy driver binds to a swphy it will call
genphy_read_abilites that will try to read MII_ESTATUS if BMSR_ESTATEN
is set in MII_BMSR. So far this would read the default value 0xffff
and 1000FD and 1000HD are reported as supported just by chance.
Better add explicit support for emulating MII_ESTATUS.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/swphy.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/phy/swphy.c b/drivers/net/phy/swphy.c
index dad22481d..53c214a22 100644
--- a/drivers/net/phy/swphy.c
+++ b/drivers/net/phy/swphy.c
@@ -22,6 +22,7 @@ struct swmii_regs {
 	u16 bmsr;
 	u16 lpa;
 	u16 lpagb;
+	u16 estat;
 };
 
 enum {
@@ -48,6 +49,7 @@ static const struct swmii_regs speed[] = {
 	[SWMII_SPEED_1000] = {
 		.bmsr  = BMSR_ESTATEN,
 		.lpagb = LPA_1000FULL | LPA_1000HALF,
+		.estat = ESTATUS_1000_TFULL | ESTATUS_1000_THALF,
 	},
 };
 
@@ -56,11 +58,13 @@ static const struct swmii_regs duplex[] = {
 		.bmsr  = BMSR_ESTATEN | BMSR_100HALF,
 		.lpa   = LPA_10HALF | LPA_100HALF,
 		.lpagb = LPA_1000HALF,
+		.estat = ESTATUS_1000_THALF,
 	},
 	[SWMII_DUPLEX_FULL] = {
 		.bmsr  = BMSR_ESTATEN | BMSR_100FULL,
 		.lpa   = LPA_10FULL | LPA_100FULL,
 		.lpagb = LPA_1000FULL,
+		.estat = ESTATUS_1000_TFULL,
 	},
 };
 
@@ -112,6 +116,7 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 {
 	int speed_index, duplex_index;
 	u16 bmsr = BMSR_ANEGCAPABLE;
+	u16 estat = 0;
 	u16 lpagb = 0;
 	u16 lpa = 0;
 
@@ -125,6 +130,7 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 	duplex_index = state->duplex ? SWMII_DUPLEX_FULL : SWMII_DUPLEX_HALF;
 
 	bmsr |= speed[speed_index].bmsr & duplex[duplex_index].bmsr;
+	estat |= speed[speed_index].estat & duplex[duplex_index].estat;
 
 	if (state->link) {
 		bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
@@ -151,6 +157,8 @@ int swphy_read_reg(int reg, const struct fixed_phy_status *state)
 		return lpa;
 	case MII_STAT1000:
 		return lpagb;
+	case MII_ESTATUS:
+		return estat;
 
 	/*
 	 * We do not support emulating Clause 45 over Clause 22 register
-- 
2.22.0



^ permalink raw reply related

* [PATCH RFC 0/4] net: phy: improve fixed_phy / swphy handling
From: Heiner Kallweit @ 2019-08-13 21:23 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Marek Behun, David Miller
  Cc: netdev@vger.kernel.org

Based on discussion [0] I prepared a patch set for improving few
aspects of swphy and fixed_phy handling. So far it's compile-tested
only. I'd appreciate testing on different devices.

[0] https://marc.info/?t=156553610700001&r=1&w=2

Heiner Kallweit (4):
  net: phy: swphy: emulate register MII_ESTATUS
  net: phy: allow to bind genphy driver at probe time
  net: phy: swphy: bind swphy to genphy driver at probe time
  net: phy: fixed_phy: let genphy driver set supported and advertised
    modes

 drivers/net/phy/fixed_phy.c  | 23 -----------------------
 drivers/net/phy/phy_device.c |  3 +--
 drivers/net/phy/swphy.c      | 11 ++++++++++-
 include/linux/phy.h          |  4 ++++
 4 files changed, 15 insertions(+), 26 deletions(-)

-- 
2.22.0


^ permalink raw reply

* Re: [PATCH bpf-next 0/2] libbpf: make use of BTF through sysfs
From: Daniel Borkmann @ 2019-08-13 21:22 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, acme; +Cc: andrii.nakryiko, kernel-team
In-Reply-To: <20190813185443.437829-1-andriin@fb.com>

On 8/13/19 8:54 PM, Andrii Nakryiko wrote:
> Now that kernel's BTF is exposed through sysfs at well-known location, attempt
> to load it first as a target BTF for the purpose of BPF CO-RE relocations.
> 
> Patch #1 is a follow-up patch to rename /sys/kernel/btf/kernel into
> /sys/kernel/btf/vmlinux.
> Patch #2 adds ability to load raw BTF contents from sysfs and expands the list
> of locations libbpf attempts to load vmlinux BTF from.
> 
> Andrii Nakryiko (2):
>    btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux
>    libbpf: attempt to load kernel BTF from sysfs first
> 
>   Documentation/ABI/testing/sysfs-kernel-btf |  2 +-
>   kernel/bpf/sysfs_btf.c                     | 30 +++++-----
>   scripts/link-vmlinux.sh                    | 18 +++---
>   tools/lib/bpf/libbpf.c                     | 64 +++++++++++++++++++---
>   4 files changed, 82 insertions(+), 32 deletions(-)
> 

LGTM, applied thanks!

^ permalink raw reply

* Re: [PATCH net-next v2 07/12] net: stmmac: Add ethtool register dump for XGMAC cores
From: Jakub Kicinski @ 2019-08-13 21:19 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, Joao Pinto, Giuseppe Cavallaro, Alexandre Torgue,
	David S. Miller, Maxime Coquelin, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <3d860a78ce4e98941f7e292d251d7360755fdf2e.1565602974.git.joabreu@synopsys.com>

On Mon, 12 Aug 2019 11:44:06 +0200, Jose Abreu wrote:
>  static void stmmac_ethtool_gregs(struct net_device *dev,
>  			  struct ethtool_regs *regs, void *space)
>  {
> -	u32 *reg_space = (u32 *) space;
> -
>  	struct stmmac_priv *priv = netdev_priv(dev);
> +	int size = stmmac_ethtool_get_regs_len(dev);
> +	u32 *reg_space = (u32 *) space;
>  
> -	memset(reg_space, 0x0, REG_SPACE_SIZE);
> +	memset(reg_space, 0x0, size);

no need to zero regs, ethtool core zallocs them

^ permalink raw reply

* Re: Error when loading BPF_CGROUP_INET_EGRESS program with bpftool
From: Andrii Nakryiko @ 2019-08-13 21:18 UTC (permalink / raw)
  To: Fejes Ferenc; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAAej5NbwZ80MNQYxP4NiJXheAn1DcSgm+O3zQQgCoP03HGHEgQ@mail.gmail.com>

On Mon, Aug 12, 2019 at 1:48 PM Fejes Ferenc <fejes@inf.elte.hu> wrote:
>
> Thanks for the answer, I really appreciate it. I tried omitting

Please reply inline, no top posting on kernel mailing lists.

> "cgroup/skb" to let libbpf guess the attach type, but I got the same
> error. Really interesting, because I got the error
> > libbpf: failed to load program 'cgroup_skb/egress'
> wich is weird because it shows that libbpf guess the program type
> correctly. So something definitely on my side - thank you for verifyng
> that - I try to investigate it!

What was the error message you got after you provided correct program
attach type?

>
> Ferenc
> Andrii Nakryiko <andrii.nakryiko@gmail.com> ezt írta (időpont: 2019.
> aug. 12., H, 20:27):
> >
> > On Mon, Aug 12, 2019 at 1:59 AM Fejes Ferenc <fejes@inf.elte.hu> wrote:
> > >
> > > Greetings!
> > >
> > > I found a strange error when I tried to load a BPF_CGROUP_INET_EGRESS
> > > prog with bpftool. Loading the same program from C code with
> > > bpf_prog_load_xattr works without problem.
> > >
> > > The error message I got:
> > > bpftool prog loadall hbm_kern.o /sys/fs/bpf/hbm type cgroup/skb
> >
> > You need "cgroup_skb/egress" instead of "cgroup/skb" (or try just
> > dropping it, bpftool will try to guess the type from program's section
> > name, which would be correct in this case).
> >
> > > libbpf: load bpf program failed: Invalid argument
> > > libbpf: -- BEGIN DUMP LOG ---
> > > libbpf:
> > > ; return ALLOW_PKT | REDUCE_CW;
> > > 0: (b7) r0 = 3
> > > 1: (95) exit
> > > At program exit the register R0 has value (0x3; 0x0) should have been
> > > in (0x0; 0x1)
> > > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0
> > > peak_states 0 mark_read 0
> > >
> > > libbpf: -- END LOG --
> > > libbpf: failed to load program 'cgroup_skb/egress'
> > > libbpf: failed to load object 'hbm_kern.o'
> > > Error: failed to load object file
> > >
> > >
> > > My environment: 5.3-rc3 / net-next master (both producing the error).
> > > Libbpf and bpftool installed from the kernel source (cleaned and
> > > reinstalled when I tried a new kernel). I compiled the program with
> > > Clang 8, on Ubuntu 19.10 server image, the source:
> > >
> > > #include <linux/bpf.h>
> > > #include "bpf_helpers.h"
> > >
> > > #define DROP_PKT        0
> > > #define ALLOW_PKT       1
> > > #define REDUCE_CW       2
> > >
> > > SEC("cgroup_skb/egress")
> > > int hbm(struct __sk_buff *skb)
> > > {
> > >         return ALLOW_PKT | REDUCE_CW;
> > > }
> > > char _license[] SEC("license") = "GPL";
> > >
> > >
> > > I also tried to trace down the bug with gdb. It seems like the
> > > section_names array in libbpf.c filled with garbage, especially the
> >
> > I did the same, section_names appears to be correct, not sure what was
> > going on in your case. The problem is that "cgroup/skb", which you
> > provided on command line, overrides this section name and forces
> > bpftool to guess program type as BPF_CGROUP_INET_INGRESS, which
> > restricts return codes to just 0 or 1, while for
> > BPF_CGROUP_INET_EGRESS i is [0, 3].
> >
> > > expected_attach_type fields (in my case, this contains
> > > BPF_CGROUP_INET_INGRESS instead of BPF_CGROUP_INET_EGRESS).
> > >
> > > Thanks!

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: allow to customize BMC MAC Address offset
From: Terry Duncan @ 2019-08-13 21:15 UTC (permalink / raw)
  To: Ben Wei, Tao Ren
  Cc: Andrew Lunn, Jakub Kicinski, netdev@vger.kernel.org,
	openbmc@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	Samuel Mendoza-Jonas, David S.Miller, William Kennington
In-Reply-To: <CH2PR15MB3686B3A20A231FC111C42F40A3D20@CH2PR15MB3686.namprd15.prod.outlook.com>

On 8/13/19 12:52 PM, Ben Wei wrote:
>> On 8/13/19 9:31 AM, Terry Duncan wrote:
>>> Tao, in your new patch will it be possible to disable the setting of the BMC MAC?  I would like to be able to send NCSI_OEM_GET_MAC perhaps with netlink (TBD) to get the system address without it affecting the BMC address.
>>>
>>> I was about to send patches to add support for the Intel adapters when I saw this thread.
>>> Thanks,
>>> Terry
>>
> Hi Terry,
> Do you plan to monitor and configure NIC through use space programs via netlink?  I'm curious if you have additional use cases.
> I had planned to add some daemon in user space to monitor NIC through NC-SI, primarily to log AENs, check link status and retrieve NIC counters.
> We can collaborate on these if they align with your needs as well.
> 
> Also about Intel NIC, do you not plan to derive system address from BMC MAC? Is the BMC MAC independent of system address?
> 
> Thanks,
> -Ben
> 
Hi Ben,
Intel has in the past programmed BMC MACs with an offset from the system 
MAC address of the first shared interface. We have found this approach 
causes problems and adds complexity when system interfaces / OCP cards 
are added or removed or disabled in BIOS. Our approach in OpenBMC is to 
keep this simple - provide means for the BMC MAC addresses to be set in 
manufacturing and persist them.

We do also have some of the same use cases you mention.
Thanks,
Terry

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox