Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 2/3] net/mlx5e: Fix unused variable warning when CONFIG_MLX5_ESWITCH is off
From: Saeed Mahameed @ 2019-07-11 19:39 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev@vger.kernel.org, Saeed Mahameed, Mark Bloch, Tariq Toukan,
	Nathan Chancellor
In-Reply-To: <20190711193937.29802-1-saeedm@mellanox.com>

In mlx5e_setup_tc "priv" variable is not being used if
CONFIG_MLX5_ESWITCH is off, one way to fix this is to actually use it.

mlx5e_setup_tc_mqprio also needs the "priv" variable and it extracts it
on its own. We can simply pass priv to mlx5e_setup_tc_mqprio instead of
netdev and avoid extracting the priv var, which will also resolve the
compiler warning.

Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
CC: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6d0ae87c8ded..9163d6904741 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3390,10 +3390,9 @@ static int mlx5e_modify_channels_vsd(struct mlx5e_channels *chs, bool vsd)
 	return 0;
 }
 
-static int mlx5e_setup_tc_mqprio(struct net_device *netdev,
+static int mlx5e_setup_tc_mqprio(struct mlx5e_priv *priv,
 				 struct tc_mqprio_qopt *mqprio)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5e_channels new_channels = {};
 	u8 tc = mqprio->num_tc;
 	int err = 0;
@@ -3475,7 +3474,7 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 						  priv, priv, true);
 #endif
 	case TC_SETUP_QDISC_MQPRIO:
-		return mlx5e_setup_tc_mqprio(dev, type_data);
+		return mlx5e_setup_tc_mqprio(priv, type_data);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 3/3] net/mlx5: E-Switch, Reduce ingress acl modify metadata stack usage
From: Saeed Mahameed @ 2019-07-11 19:39 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed, Jianbo Liu
In-Reply-To: <20190711193937.29802-1-saeedm@mellanox.com>

Fix the following compiler warning:
In function ‘esw_vport_add_ingress_acl_modify_metadata’:
the frame size of 1084 bytes is larger than 1024 bytes [-Wframe-larger-than=]

Since the structure is never written to, we can statically allocate
it to avoid the stack usage.

Fixes: 7445cfb1169c ("net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 8ed4497929b9..5f78e76019c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1785,8 +1785,8 @@ static int esw_vport_add_ingress_acl_modify_metadata(struct mlx5_eswitch *esw,
 						     struct mlx5_vport *vport)
 {
 	u8 action[MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)] = {};
+	static const struct mlx5_flow_spec spec = {};
 	struct mlx5_flow_act flow_act = {};
-	struct mlx5_flow_spec spec = {};
 	int err = 0;
 
 	MLX5_SET(set_action_in, action, action_type, MLX5_ACTION_TYPE_SET);
-- 
2.21.0


^ permalink raw reply related

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Michael S. Tsirkin @ 2019-07-11 19:52 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Jason Wang, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190711114134.xhmpciyglb2angl6@steredhat>

On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote:
> On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote:
> > 
> > On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> > > Hi,
> > > as Jason suggested some months ago, I looked better at the virtio-net driver to
> > > understand if we can reuse some parts also in the virtio-vsock driver, since we
> > > have similar challenges (mergeable buffers, page allocation, small
> > > packets, etc.).
> > > 
> > > Initially, I would add the skbuff in the virtio-vsock in order to re-use
> > > receive_*() functions.
> > 
> > 
> > Yes, that will be a good step.
> > 
> 
> Okay, I'll go on this way.
> 
> > 
> > > Then I would move receive_[small, big, mergeable]() and
> > > add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, in order to
> > > call them also from virtio-vsock. I need to do some refactoring (e.g. leave the
> > > XDP part on the virtio-net driver), but I think it is feasible.
> > > 
> > > The idea is to create a virtio-skb.[h,c] where put these functions and a new
> > > object where stores some attributes needed (e.g. hdr_len ) and status (e.g.
> > > some fields of struct receive_queue).
> > 
> > 
> > My understanding is we could be more ambitious here. Do you see any blocker
> > for reusing virtio-net directly? It's better to reuse not only the functions
> > but also the logic like NAPI to avoid re-inventing something buggy and
> > duplicated.
> > 
> 
> These are my concerns:
> - virtio-vsock is not a "net_device", so a lot of code related to
>   ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, offloading) will be
>   not used by virtio-vsock.
> 
> - virtio-vsock has a different header. We can consider it as part of
>   virtio_net payload, but it precludes the compatibility with old hosts. This
>   was one of the major doubts that made me think about using only the
>   send/recv skbuff functions, that it shouldn't break the compatibility.
> 
> > 
> > > This is an idea of virtio-skb.h that
> > > I have in mind:
> > >      struct virtskb;
> > 
> > 
> > What fields do you want to store in virtskb? It looks to be exist sk_buff is
> > flexible enough to us?
> 
> My idea is to store queues information, like struct receive_queue or
> struct send_queue, and some device attributes (e.g. hdr_len ).
> 
> > 
> > 
> > > 
> > >      struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
> > >      struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
> > >      struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);
> > > 
> > >      int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > >      int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > >      int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
> > > 
> > > For the Guest->Host path it should be easier, so maybe I can add a
> > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
> > > of xmit_skb().
> > 
> > 
> > I may miss something, but I don't see any thing that prevents us from using
> > xmit_skb() directly.
> > 
> 
> Yes, but my initial idea was to make it more parametric and not related to the
> virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> 'num_buffers' should be handled by the caller.
> 
> > 
> > > 
> > > Let me know if you have in mind better names or if I should put these function
> > > in another place.
> > > 
> > > I would like to leave the control part completely separate, so, for example,
> > > the two drivers will negotiate the features independently and they will call
> > > the right virtskb_receive_*() function based on the negotiation.
> > 
> > 
> > If it's one the issue of negotiation, we can simply change the
> > virtnet_probe() to deal with different devices.
> > 
> > 
> > > 
> > > I already started to work on it, but before to do more steps and send an RFC
> > > patch, I would like to hear your opinion.
> > > Do you think that makes sense?
> > > Do you see any issue or a better solution?
> > 
> > 
> > I still think we need to seek a way of adding some codes on virtio-net.c
> > directly if there's no huge different in the processing of TX/RX. That would
> > save us a lot time.
> 
> After the reading of the buffers from the virtqueue I think the process
> is slightly different, because virtio-net will interface with the network
> stack, while virtio-vsock will interface with the vsock-core (socket).
> So the virtio-vsock implements the following:
> - control flow mechanism to avoid to loose packets, informing the peer
>   about the amount of memory available in the receive queue using some
>   fields in the virtio_vsock_hdr
> - de-multiplexing parsing the virtio_vsock_hdr and choosing the right
>   socket depending on the port
> - socket state handling
> 
> We can use the virtio-net as transport, but we should add a lot of
> code to skip "net device" stuff when it is used by the virtio-vsock.
> This could break something in virtio-net, for this reason, I thought to reuse
> only the send/recv functions starting from the idea to split the virtio-net
> driver in two parts:
> a. one with all stuff related to the network stack
> b. one with the stuff needed to communicate with the host
> 
> And use skbuff to communicate between parts. In this way, virtio-vsock
> can use only the b part.
> 
> Maybe we can do this split in a better way, but I'm not sure it is
> simple.
> 
> Thanks,
> Stefano

Frankly, skb is a huge structure which adds a lot of
overhead. I am not sure that using it is such a great idea
if building a device that does not have to interface
with the networking stack.

So I agree with Jason in theory. To clarify, he is basically saying
current implementation is all wrong, it should be a protocol and we
should teach networking stack that there are reliable net devices that
handle just this protocol. We could add a flag in virtio net that
will say it's such a device.

Whether it's doable, I don't know, and it's definitely not simple - in
particular you will have to also re-implement existing devices in these
terms, and not just virtio - vmware vsock too.

If you want to do a POC you can add a new address family,
that's easier.

Just reusing random functions won't help, net stack
is very heavy, if it manages to outperform vsock it's
because vsock was not written with performance in mind.
But the smarts are in the core not virtio driver.
What makes vsock slow is design decisions like
using a workqueue to process packets,
not batching memory management etc etc.
All things that net core does for virtio net.

-- 
MST

^ permalink raw reply

* Re: [bpf-next v3 09/12] bpf: Split out some helper functions
From: Stanislav Fomichev @ 2019-07-11 20:25 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: linux-kernel, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	netdev, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-10-krzesimir@kinvolk.io>

On 07/08, Krzesimir Nowak wrote:
> The moved functions are generally useful for implementing
> bpf_prog_test_run for other types of BPF programs - they don't have
> any network-specific stuff in them, so I can use them in a test run
> implementation for perf event BPF program too.
It's a bit hard to follow. Maybe split into multiple patches?
First one moves the relevant parts as is.
Second one renames (though, I'm not sure we need to rename, but up to
you/maintainers).
Third one removes duplication from bpf_prog_test_run_flow_dissector.

Also see possible suggestion on BPF_TEST_RUN_SETUP_CGROUP_STORAGE below.

> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  include/linux/bpf.h   |  28 +++++
>  kernel/bpf/Makefile   |   1 +
>  kernel/bpf/test_run.c | 212 ++++++++++++++++++++++++++++++++++
>  net/bpf/test_run.c    | 263 +++++++++++-------------------------------
>  4 files changed, 308 insertions(+), 196 deletions(-)
>  create mode 100644 kernel/bpf/test_run.c
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 18f4cc2c6acd..28db8ba57bc3 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1143,4 +1143,32 @@ static inline u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type,
>  }
>  #endif /* CONFIG_INET */
>  
> +/* Helper functions for bpf_prog_test_run implementations */
> +typedef u32 bpf_prog_run_helper_t(struct bpf_prog *prog, void *ctx,
> +				   void *private_data);
> +
> +enum bpf_test_run_flags {
> +	BPF_TEST_RUN_PLAIN = 0,
> +	BPF_TEST_RUN_SETUP_CGROUP_STORAGE = 1 << 0,
> +};
> +
> +int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 flags,
> +		 u32 *retval, u32 *duration);
> +
> +int bpf_test_run_cb(struct bpf_prog *prog, void *ctx, u32 repeat, u32 flags,
> +		    bpf_prog_run_helper_t run_prog, void *private_data,
> +		    u32 *retval, u32 *duration);
> +
> +int bpf_test_finish(union bpf_attr __user *uattr, u32 retval, u32 duration);
> +
> +void *bpf_receive_ctx(const union bpf_attr *kattr, u32 max_size);
> +
> +int bpf_send_ctx(const union bpf_attr *kattr, union bpf_attr __user *uattr,
> +		 const void *data, u32 size);
> +
> +void *bpf_receive_data(const union bpf_attr *kattr, u32 max_size);
> +
> +int bpf_send_data(const union bpf_attr *kattr, union bpf_attr __user *uattr,
> +		  const void *data, u32 size);
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 29d781061cd5..570fd40288f4 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -22,3 +22,4 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
>  ifeq ($(CONFIG_INET),y)
>  obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
>  endif
> +obj-$(CONFIG_BPF_SYSCALL) += test_run.o
> diff --git a/kernel/bpf/test_run.c b/kernel/bpf/test_run.c
> new file mode 100644
> index 000000000000..0481373da8be
> --- /dev/null
> +++ b/kernel/bpf/test_run.c
> @@ -0,0 +1,212 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2017 Facebook
> + * Copyright (c) 2019 Tigera, Inc
> + */
> +
> +#include <asm/div64.h>
> +
> +#include <linux/bpf-cgroup.h>
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +#include <linux/filter.h>
> +#include <linux/gfp.h>
> +#include <linux/kernel.h>
> +#include <linux/limits.h>
> +#include <linux/preempt.h>
> +#include <linux/rcupdate.h>
> +#include <linux/sched.h>
> +#include <linux/sched/signal.h>
> +#include <linux/slab.h>
> +#include <linux/timekeeping.h>
> +#include <linux/uaccess.h>
> +
> +static void teardown_cgroup_storage(struct bpf_cgroup_storage **storage)
> +{
> +	enum bpf_cgroup_storage_type stype;
> +
> +	if (!storage)
> +		return;
> +	for_each_cgroup_storage_type(stype)
> +		bpf_cgroup_storage_free(storage[stype]);
> +	kfree(storage);
> +}
> +
> +static struct bpf_cgroup_storage **setup_cgroup_storage(struct bpf_prog *prog)
> +{
> +	enum bpf_cgroup_storage_type stype;
> +	struct bpf_cgroup_storage **storage;
> +	size_t size = MAX_BPF_CGROUP_STORAGE_TYPE;
> +
> +	size *= sizeof(struct bpf_cgroup_storage *);
> +	storage = kzalloc(size, GFP_KERNEL);
> +	for_each_cgroup_storage_type(stype) {
> +		storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
> +		if (IS_ERR(storage[stype])) {
> +			storage[stype] = NULL;
> +			teardown_cgroup_storage(storage);
> +			return ERR_PTR(-ENOMEM);
> +		}
> +	}
> +	return storage;
> +}
> +
> +static u32 run_bpf_prog(struct bpf_prog *prog, void *ctx, void *private_data)
> +{
> +	return BPF_PROG_RUN(prog, ctx);
> +}
> +
> +int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 flags,
> +		 u32 *retval, u32 *duration)
> +{
> +	return bpf_test_run_cb(prog, ctx, repeat, flags, run_bpf_prog, NULL,
> +			       retval, duration);
> +}
> +
> +int bpf_test_run_cb(struct bpf_prog *prog, void *ctx, u32 repeat, u32 flags,
> +		    bpf_prog_run_helper_t run_prog, void *private_data,
> +		    u32 *retval, u32 *duration)
> +{
> +	struct bpf_cgroup_storage **storage = NULL;
> +	u64 time_start, time_spent = 0;
> +	int ret = 0;
> +	u32 i;
> +
> +	if (flags & BPF_TEST_RUN_SETUP_CGROUP_STORAGE) {
You can maybe get away without a flag. prog->aux->cgroup_storage[x] has
a non-zero pointer if verifier found out that the prog is using cgroup
storage. bpf_cgroup_storage_alloc returns NULL is the program doesn't
use cgroup storage. So you should be able to dynamically figure out
whether you need to call bpf_cgroup_storage_set or not.

> +		storage = setup_cgroup_storage(prog);
> +		if (IS_ERR(storage))
> +			return PTR_ERR(storage);
> +	}
> +
> +	if (!repeat)
> +		repeat = 1;
> +
> +	rcu_read_lock();
> +	preempt_disable();
> +	time_start = ktime_get_ns();
> +	for (i = 0; i < repeat; i++) {
> +		if (storage)
> +			bpf_cgroup_storage_set(storage);
> +		*retval = run_prog(prog, ctx, private_data);
> +
> +		if (signal_pending(current)) {
> +			preempt_enable();
> +			rcu_read_unlock();
> +			teardown_cgroup_storage(storage);
> +			return -EINTR;
> +		}
> +
> +		if (need_resched()) {
> +			time_spent += ktime_get_ns() - time_start;
> +			preempt_enable();
> +			rcu_read_unlock();
> +
> +			cond_resched();
> +
> +			rcu_read_lock();
> +			preempt_disable();
> +			time_start = ktime_get_ns();
> +		}
> +	}
> +	time_spent += ktime_get_ns() - time_start;
> +	preempt_enable();
> +	rcu_read_unlock();
> +
> +	do_div(time_spent, repeat);
> +	*duration = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
> +
> +	teardown_cgroup_storage(storage);
> +
> +	return ret;
> +}
> +
> +int bpf_test_finish(union bpf_attr __user *uattr, u32 retval, u32 duration)
> +{
> +	if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
> +		return -EFAULT;
> +	if (copy_to_user(&uattr->test.duration, &duration, sizeof(duration)))
> +		return -EFAULT;
> +	return 0;
> +}
> +
> +static void *bpf_receive_mem(u64 in, u32 in_size, u32 max_size)
> +{
> +	void __user *data_in = u64_to_user_ptr(in);
> +	void *data;
> +	int err;
> +
> +	if (!data_in && in_size)
> +		return ERR_PTR(-EINVAL);
> +	data = kzalloc(max_size, GFP_USER);
> +	if (!data)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (data_in) {
> +		err = bpf_check_uarg_tail_zero(data_in, max_size, in_size);
> +		if (err) {
> +			kfree(data);
> +			return ERR_PTR(err);
> +		}
> +
> +		in_size = min_t(u32, max_size, in_size);
> +		if (copy_from_user(data, data_in, in_size)) {
> +			kfree(data);
> +			return ERR_PTR(-EFAULT);
> +		}
> +	}
> +	return data;
> +}
> +
> +static int bpf_send_mem(u64 out, u32 out_size, u32 *out_size_write,
> +                        const void *data, u32 data_size)
> +{
> +	void __user *data_out = u64_to_user_ptr(out);
> +	int err = -EFAULT;
> +	u32 copy_size = data_size;
> +
> +	if (!data_out && out_size)
> +		return -EINVAL;
> +
> +	if (!data || !data_out)
> +		return 0;
> +
> +	if (copy_size > out_size) {
> +		copy_size = out_size;
> +		err = -ENOSPC;
> +	}
> +
> +	if (copy_to_user(data_out, data, copy_size))
> +		goto out;
> +	if (copy_to_user(out_size_write, &data_size, sizeof(data_size)))
> +		goto out;
> +	if (err != -ENOSPC)
> +		err = 0;
> +out:
> +	return err;
> +}
> +
> +void *bpf_receive_data(const union bpf_attr *kattr, u32 max_size)
> +{
> +	return bpf_receive_mem(kattr->test.data_in, kattr->test.data_size_in,
> +                               max_size);
> +}
> +
> +int bpf_send_data(const union bpf_attr *kattr, union bpf_attr __user *uattr,
> +                  const void *data, u32 size)
> +{
> +	return bpf_send_mem(kattr->test.data_out, kattr->test.data_size_out,
> +			    &uattr->test.data_size_out, data, size);
> +}
> +
> +void *bpf_receive_ctx(const union bpf_attr *kattr, u32 max_size)
> +{
> +	return bpf_receive_mem(kattr->test.ctx_in, kattr->test.ctx_size_in,
> +                               max_size);
> +}
> +
> +int bpf_send_ctx(const union bpf_attr *kattr, union bpf_attr __user *uattr,
> +                 const void *data, u32 size)
> +{
> +        return bpf_send_mem(kattr->test.ctx_out, kattr->test.ctx_size_out,
> +                            &uattr->test.ctx_size_out, data, size);
> +}
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 80e6f3a6864d..fe6b7b1af0cc 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -14,97 +14,6 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/bpf_test_run.h>
>  
> -static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
> -			u32 *retval, u32 *time)
> -{
> -	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { NULL };
> -	enum bpf_cgroup_storage_type stype;
> -	u64 time_start, time_spent = 0;
> -	int ret = 0;
> -	u32 i;
> -
> -	for_each_cgroup_storage_type(stype) {
> -		storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
> -		if (IS_ERR(storage[stype])) {
> -			storage[stype] = NULL;
> -			for_each_cgroup_storage_type(stype)
> -				bpf_cgroup_storage_free(storage[stype]);
> -			return -ENOMEM;
> -		}
> -	}
> -
> -	if (!repeat)
> -		repeat = 1;
> -
> -	rcu_read_lock();
> -	preempt_disable();
> -	time_start = ktime_get_ns();
> -	for (i = 0; i < repeat; i++) {
> -		bpf_cgroup_storage_set(storage);
> -		*retval = BPF_PROG_RUN(prog, ctx);
> -
> -		if (signal_pending(current)) {
> -			ret = -EINTR;
> -			break;
> -		}
> -
> -		if (need_resched()) {
> -			time_spent += ktime_get_ns() - time_start;
> -			preempt_enable();
> -			rcu_read_unlock();
> -
> -			cond_resched();
> -
> -			rcu_read_lock();
> -			preempt_disable();
> -			time_start = ktime_get_ns();
> -		}
> -	}
> -	time_spent += ktime_get_ns() - time_start;
> -	preempt_enable();
> -	rcu_read_unlock();
> -
> -	do_div(time_spent, repeat);
> -	*time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
> -
> -	for_each_cgroup_storage_type(stype)
> -		bpf_cgroup_storage_free(storage[stype]);
> -
> -	return ret;
> -}
> -
> -static int bpf_test_finish(const union bpf_attr *kattr,
> -			   union bpf_attr __user *uattr, const void *data,
> -			   u32 size, u32 retval, u32 duration)
> -{
> -	void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
> -	int err = -EFAULT;
> -	u32 copy_size = size;
> -
> -	/* Clamp copy if the user has provided a size hint, but copy the full
> -	 * buffer if not to retain old behaviour.
> -	 */
> -	if (kattr->test.data_size_out &&
> -	    copy_size > kattr->test.data_size_out) {
> -		copy_size = kattr->test.data_size_out;
> -		err = -ENOSPC;
> -	}
> -
> -	if (data_out && copy_to_user(data_out, data, copy_size))
> -		goto out;
> -	if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size)))
> -		goto out;
> -	if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
> -		goto out;
> -	if (copy_to_user(&uattr->test.duration, &duration, sizeof(duration)))
> -		goto out;
> -	if (err != -ENOSPC)
> -		err = 0;
> -out:
> -	trace_bpf_test_finish(&err);
> -	return err;
> -}
> -
>  static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
>  			   u32 headroom, u32 tailroom)
>  {
> @@ -125,63 +34,6 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
>  	return data;
>  }
>  
> -static void *bpf_ctx_init(const union bpf_attr *kattr, u32 max_size)
> -{
> -	void __user *data_in = u64_to_user_ptr(kattr->test.ctx_in);
> -	void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out);
> -	u32 size = kattr->test.ctx_size_in;
> -	void *data;
> -	int err;
> -
> -	if (!data_in && !data_out)
> -		return NULL;
> -
> -	data = kzalloc(max_size, GFP_USER);
> -	if (!data)
> -		return ERR_PTR(-ENOMEM);
> -
> -	if (data_in) {
> -		err = bpf_check_uarg_tail_zero(data_in, max_size, size);
> -		if (err) {
> -			kfree(data);
> -			return ERR_PTR(err);
> -		}
> -
> -		size = min_t(u32, max_size, size);
> -		if (copy_from_user(data, data_in, size)) {
> -			kfree(data);
> -			return ERR_PTR(-EFAULT);
> -		}
> -	}
> -	return data;
> -}
> -
> -static int bpf_ctx_finish(const union bpf_attr *kattr,
> -			  union bpf_attr __user *uattr, const void *data,
> -			  u32 size)
> -{
> -	void __user *data_out = u64_to_user_ptr(kattr->test.ctx_out);
> -	int err = -EFAULT;
> -	u32 copy_size = size;
> -
> -	if (!data || !data_out)
> -		return 0;
> -
> -	if (copy_size > kattr->test.ctx_size_out) {
> -		copy_size = kattr->test.ctx_size_out;
> -		err = -ENOSPC;
> -	}
> -
> -	if (copy_to_user(data_out, data, copy_size))
> -		goto out;
> -	if (copy_to_user(&uattr->test.ctx_size_out, &size, sizeof(size)))
> -		goto out;
> -	if (err != -ENOSPC)
> -		err = 0;
> -out:
> -	return err;
> -}
> -
>  /**
>   * range_is_zero - test whether buffer is initialized
>   * @buf: buffer to check
> @@ -238,6 +90,36 @@ static void convert_skb_to___skb(struct sk_buff *skb, struct __sk_buff *__skb)
>  	memcpy(__skb->cb, &cb->data, QDISC_CB_PRIV_LEN);
>  }
>  
> +static int bpf_net_prog_test_run_finish(const union bpf_attr *kattr,
> +                                        union bpf_attr __user *uattr,
> +                                        const void *data, u32 data_size,
> +                                        const void *ctx, u32 ctx_size,
> +                                        u32 retval, u32 duration)
> +{
> +	int ret;
> +	union bpf_attr fixed_kattr;
> +	const union bpf_attr *kattr_ptr = kattr;
> +
> +	/* Clamp copy (in bpf_send_mem) if the user has provided a
> +	 * size hint, but copy the full buffer if not to retain old
> +	 * behaviour.
> +	 */
> +	if (!kattr->test.data_size_out && kattr->test.data_out) {
> +		fixed_kattr = *kattr;
> +		fixed_kattr.test.data_size_out = U32_MAX;
> +		kattr_ptr = &fixed_kattr;
> +	}
> +
> +	ret = bpf_send_data(kattr_ptr, uattr, data, data_size);
> +	if (!ret) {
> +		ret = bpf_test_finish(uattr, retval, duration);
> +		if (!ret && ctx)
> +			ret = bpf_send_ctx(kattr_ptr, uattr, ctx, ctx_size);
> +	}
> +	trace_bpf_test_finish(&ret);
> +	return ret;
> +}
> +
>  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  			  union bpf_attr __user *uattr)
>  {
> @@ -257,7 +139,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	if (IS_ERR(data))
>  		return PTR_ERR(data);
>  
> -	ctx = bpf_ctx_init(kattr, sizeof(struct __sk_buff));
> +	ctx = bpf_receive_ctx(kattr, sizeof(struct __sk_buff));
>  	if (IS_ERR(ctx)) {
>  		kfree(data);
>  		return PTR_ERR(ctx);
> @@ -307,7 +189,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	ret = convert___skb_to_skb(skb, ctx);
>  	if (ret)
>  		goto out;
> -	ret = bpf_test_run(prog, skb, repeat, &retval, &duration);
> +	ret = bpf_test_run(prog, skb, repeat, BPF_TEST_RUN_SETUP_CGROUP_STORAGE,
> +			   &retval, &duration);
>  	if (ret)
>  		goto out;
>  	if (!is_l2) {
> @@ -327,10 +210,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	/* bpf program can never convert linear skb to non-linear */
>  	if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
>  		size = skb_headlen(skb);
> -	ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
> -	if (!ret)
> -		ret = bpf_ctx_finish(kattr, uattr, ctx,
> -				     sizeof(struct __sk_buff));
> +	ret = bpf_net_prog_test_run_finish(kattr, uattr, skb->data, size,
> +					   ctx, sizeof(struct __sk_buff),
> +					   retval, duration);
>  out:
>  	kfree_skb(skb);
>  	bpf_sk_storage_free(sk);
> @@ -365,32 +247,48 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
>  	xdp.rxq = &rxqueue->xdp_rxq;
>  
> -	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration);
> +	ret = bpf_test_run(prog, &xdp, repeat,
> +			   BPF_TEST_RUN_SETUP_CGROUP_STORAGE,
> +			   &retval, &duration);
>  	if (ret)
>  		goto out;
>  	if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
>  	    xdp.data_end != xdp.data + size)
>  		size = xdp.data_end - xdp.data;
> -	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
> +	ret = bpf_net_prog_test_run_finish(kattr, uattr, xdp.data, size,
> +					   NULL, 0, retval, duration);
>  out:
>  	kfree(data);
>  	return ret;
>  }
>  
> +struct bpf_flow_dissect_run_data {
> +	__be16 proto;
> +	int nhoff;
> +	int hlen;
> +};
> +
> +static u32 bpf_flow_dissect_run(struct bpf_prog *prog, void *ctx,
> +				void *private_data)
> +{
> +	struct bpf_flow_dissect_run_data *data = private_data;
> +
> +	return bpf_flow_dissect(prog, ctx, data->proto, data->nhoff, data->hlen);
> +}
> +
>  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>  				     const union bpf_attr *kattr,
>  				     union bpf_attr __user *uattr)
>  {
> +	struct bpf_flow_dissect_run_data run_data = {};
>  	u32 size = kattr->test.data_size_in;
>  	struct bpf_flow_dissector ctx = {};
>  	u32 repeat = kattr->test.repeat;
>  	struct bpf_flow_keys flow_keys;
> -	u64 time_start, time_spent = 0;
>  	const struct ethhdr *eth;
>  	u32 retval, duration;
>  	void *data;
>  	int ret;
> -	u32 i;
>  
>  	if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
>  		return -EINVAL;
> @@ -407,49 +305,22 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>  
>  	eth = (struct ethhdr *)data;
>  
> -	if (!repeat)
> -		repeat = 1;
> -
>  	ctx.flow_keys = &flow_keys;
>  	ctx.data = data;
>  	ctx.data_end = (__u8 *)data + size;
>  
> -	rcu_read_lock();
> -	preempt_disable();
> -	time_start = ktime_get_ns();
> -	for (i = 0; i < repeat; i++) {
> -		retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
> -					  size);
> -
> -		if (signal_pending(current)) {
> -			preempt_enable();
> -			rcu_read_unlock();
> -
> -			ret = -EINTR;
> -			goto out;
> -		}
> -
> -		if (need_resched()) {
> -			time_spent += ktime_get_ns() - time_start;
> -			preempt_enable();
> -			rcu_read_unlock();
> -
> -			cond_resched();
> -
> -			rcu_read_lock();
> -			preempt_disable();
> -			time_start = ktime_get_ns();
> -		}
> -	}
> -	time_spent += ktime_get_ns() - time_start;
> -	preempt_enable();
> -	rcu_read_unlock();
> -
> -	do_div(time_spent, repeat);
> -	duration = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
> +	run_data.proto = eth->h_proto;
> +	run_data.nhoff = ETH_HLEN;
> +	run_data.hlen = size;
> +	ret = bpf_test_run_cb(prog, &ctx, repeat, BPF_TEST_RUN_PLAIN,
> +			      bpf_flow_dissect_run, &run_data,
> +			      &retval, &duration);
> +	if (!ret)
> +		goto out;
>  
> -	ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
> -			      retval, duration);
> +	ret = bpf_net_prog_test_run_finish(kattr, uattr, &flow_keys,
> +					   sizeof(flow_keys), NULL, 0,
> +					   retval, duration);
>  
>  out:
>  	kfree(data);
> -- 
> 2.20.1
> 

^ permalink raw reply

* Re: [bpf-next v3 10/12] bpf: Implement bpf_prog_test_run for perf event programs
From: Stanislav Fomichev @ 2019-07-11 20:30 UTC (permalink / raw)
  To: Krzesimir Nowak
  Cc: linux-kernel, Alban Crequy, Iago López Galeiras,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	netdev, bpf, xdp-newbies
In-Reply-To: <20190708163121.18477-11-krzesimir@kinvolk.io>

On 07/08, Krzesimir Nowak wrote:
> As an input, test run for perf event program takes struct
> bpf_perf_event_data as ctx_in and struct bpf_perf_event_value as
> data_in. For an output, it basically ignores ctx_out and data_out.
> 
> The implementation sets an instance of struct bpf_perf_event_data_kern
> in such a way that the BPF program reading data from context will
> receive what we passed to the bpf prog test run in ctx_in. Also BPF
> program can call bpf_perf_prog_read_value to receive what was passed
> in data_in.
> 
> Changes since v2:
> - drop the changes in perf event verifier test - they are not needed
>   anymore after reworked ctx size handling
> 
> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
> ---
>  kernel/trace/bpf_trace.c | 60 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 60 insertions(+)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..b870fc2314d0 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -19,6 +19,8 @@
>  #include "trace_probe.h"
>  #include "trace.h"
>  
> +#include <trace/events/bpf_test_run.h>
> +
>  #define bpf_event_rcu_dereference(p)					\
>  	rcu_dereference_protected(p, lockdep_is_held(&bpf_event_mutex))
>  
> @@ -1160,7 +1162,65 @@ const struct bpf_verifier_ops perf_event_verifier_ops = {
>  	.convert_ctx_access	= pe_prog_convert_ctx_access,
>  };
>  
> +static int pe_prog_test_run(struct bpf_prog *prog,
> +			    const union bpf_attr *kattr,
> +			    union bpf_attr __user *uattr)
> +{
> +	struct bpf_perf_event_data_kern real_ctx = {0, };
> +	struct perf_sample_data sample_data = {0, };
> +	struct bpf_perf_event_data *fake_ctx;
> +	struct bpf_perf_event_value *value;
> +	struct perf_event event = {0, };
> +	u32 retval = 0, duration = 0;
> +	int err;
> +
> +	if (kattr->test.data_size_out || kattr->test.data_out)
> +		return -EINVAL;
> +	if (kattr->test.ctx_size_out || kattr->test.ctx_out)
> +		return -EINVAL;
> +
> +	fake_ctx = bpf_receive_ctx(kattr, sizeof(struct bpf_perf_event_data));
> +	if (IS_ERR(fake_ctx))
> +		return PTR_ERR(fake_ctx);
> +
> +	value = bpf_receive_data(kattr, sizeof(struct bpf_perf_event_value));
> +	if (IS_ERR(value)) {
> +		kfree(fake_ctx);
> +		return PTR_ERR(value);
> +	}
nit: maybe use bpf_test_ prefix for receive_ctx/data:
* bpf_test_receive_ctx
* bpf_test_receive_data

? To signify that they are used for tests only.

> +
> +	real_ctx.regs = &fake_ctx->regs;
> +	real_ctx.data = &sample_data;
> +	real_ctx.event = &event;
> +	perf_sample_data_init(&sample_data, fake_ctx->addr,
> +			      fake_ctx->sample_period);
> +	event.cpu = smp_processor_id();
> +	event.oncpu = -1;
> +	event.state = PERF_EVENT_STATE_OFF;
> +	local64_set(&event.count, value->counter);
> +	event.total_time_enabled = value->enabled;
> +	event.total_time_running = value->running;
> +	/* make self as a leader - it is used only for checking the
> +	 * state field
> +	 */
> +	event.group_leader = &event;
> +	err = bpf_test_run(prog, &real_ctx, kattr->test.repeat,
> +			   BPF_TEST_RUN_PLAIN, &retval, &duration);
> +	if (err) {
> +		kfree(value);
> +		kfree(fake_ctx);
> +		return err;
> +	}
> +
> +	err = bpf_test_finish(uattr, retval, duration);
> +	trace_bpf_test_finish(&err);
Can probably do:

	err = bpf_test_run(...)
	if (!err) {
		err = bpf_test_finish(uattr, retval, duration);
		trace_bpf_test_finish(&err);
	}
	kfree(..);
	kfree(..);
	return err;

So you don't have to copy-paste the error handling.

> +	kfree(value);
> +	kfree(fake_ctx);
> +	return err;
> +}
> +
>  const struct bpf_prog_ops perf_event_prog_ops = {
> +	.test_run	= pe_prog_test_run,
>  };
>  
>  static DEFINE_MUTEX(bpf_event_mutex);
> -- 
> 2.20.1
> 

^ permalink raw reply

* Re: [PATCH v4 bpf-next 0/4] selftests/bpf: fix compiling loop{1,2,3}.c on s390
From: Stanislav Fomichev @ 2019-07-11 20:35 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, netdev, ys114321, daniel, davem, ast
In-Reply-To: <20190711142930.68809-1-iii@linux.ibm.com>

On 07/11, Ilya Leoshkevich wrote:
> Use PT_REGS_RC(ctx) instead of ctx->rax, which is not present on s390.
> 
> This patch series consists of three preparatory commits, which make it
> possible to use PT_REGS_RC in BPF selftests, followed by the actual fix.
> 
> > > Will this also work for 32-bit x86?
> > Thanks, this is a good catch: this builds, but makes 64-bit accesses, as
> > if it used the 64-bit variant of pt_regs. I will fix this.
> I found four problems in this area:
> 
> 1. Selftest tracing progs are built with -target bpf, leading to struct
>    pt_regs and friends being interpreted incorrectly.
> 2. When the Makefile is adjusted to build them without -target bpf, it
>    still lacks -m32/-m64, leading to a similar issue.
> 3. There is no __i386__ define, leading to incorrect userspace struct
>    pt_regs variant being chosen for x86.
> 4. Finally, there is an issue in my patch: when 1-3 are fixed, it fails
>    to build, since i386 defines yet another set of field names.
> 
> I will send fixes for problems 1-3 separately, I believe for this patch
> series to be correct, it's enough to fix #4 (which I did by adding
> another #ifdef).
> 
> I've also changed ARCH to SRCARCH in patch #1, since while ARCH can be
> e.g. "i386", SRCARCH always corresponds to directory names under arch/.
> 
> v1->v2: Split into multiple patches.
> v2->v3: Added arm64 support.
> v3->v4: Added i386 support, use SRCARCH instead of ARCH.
Still looks good to me, thanks!

Reviewed-by: Stanislav Fomichev <sdf@google.com>

Again, should probably go via bpf to fix the existing tests, not bpf-next
(but I see bpf tree is not synced with net tree yet).

> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> 
> 

^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom
From: Pravin Shelar @ 2019-07-11 21:07 UTC (permalink / raw)
  To: Gregory Rose
  Cc: David Miller, ap420073, ovs dev, Linux Kernel Network Developers
In-Reply-To: <b40f4a39-8de4-482c-2ee8-66adf5c606be@gmail.com>

I was bit busy for last couple of days. I will finish review by EOD today.

Thanks,
Pravin.

On Mon, Jul 8, 2019 at 4:22 PM Gregory Rose <gvrose8192@gmail.com> wrote:
>
>
>
> On 7/8/2019 4:18 PM, Gregory Rose wrote:
> > On 7/8/2019 4:08 PM, David Miller wrote:
> >> From: Taehee Yoo <ap420073@gmail.com>
> >> Date: Sat,  6 Jul 2019 01:08:09 +0900
> >>
> >>> When a vport is deleted, the maximum headroom size would be changed.
> >>> If the vport which has the largest headroom is deleted,
> >>> the new max_headroom would be set.
> >>> But, if the new headroom size is equal to the old headroom size,
> >>> updating routine is unnecessary.
> >>>
> >>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> >> I'm not so sure about the logic here and I'd therefore like an OVS
> >> expert
> >> to review this.
> >
> > I'll review and test it and get back.  Pravin may have input as well.
> >
>
> Err, adding Pravin.
>
> - Greg
>
> > Thanks,
> >
> > - Greg
> >
> >> Thanks.
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>

^ permalink raw reply

* Re: [net-next] net: fib_rules: do not flow dissect local packets
From: David Miller @ 2019-07-11 21:23 UTC (permalink / raw)
  To: ppenkov; +Cc: netdev, roopa, edumazet
In-Reply-To: <20190705184643.249884-1-ppenkov@google.com>

From: Petar Penkov <ppenkov@google.com>
Date: Fri,  5 Jul 2019 11:46:43 -0700

> Rules matching on loopback iif do not need early flow dissection as the
> packet originates from the host. Stop counting such rules in
> fib_rule_requires_fldissect
> 
> Signed-off-by: Petar Penkov <ppenkov@google.com>

Applied, thank you.

^ permalink raw reply

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
From: John Fastabend @ 2019-07-11 21:25 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <20190711113218.2f0b8c1f@cakuba.netronome.com>

Jakub Kicinski wrote:
> On Thu, 11 Jul 2019 09:47:16 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:  
> > > > > > > +		if (sk->sk_prot->unhash)
> > > > > > > +			sk->sk_prot->unhash(sk);
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	ctx = tls_get_ctx(sk);
> > > > > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);  
> > > 
> > > Do we still need to hook into unhash? With patch 6 in place perhaps we
> > > can just do disconnect 🥺  
> > 
> > ?? "can just do a disconnect", not sure I folow. We still need unhash
> > in cases where we have a TLS socket transition from ESTABLISHED
> > to LISTEN state without calling close(). This is independent of if
> > sockmap is running or not.
> > 
> > Originally, I thought this would be extremely rare but I did see it
> > in real applications on the sockmap side so presumably it is possible
> > here as well.
> 
> Ugh, sorry, I meant shutdown. Instead of replacing the unhash callback
> replace the shutdown callback. We probably shouldn't release the socket
> lock either there, but we can sleep, so I'll be able to run the device
> connection remove callback (which sleep).
> 

ah OK seems doable to me. Do you want to write that on top of this
series? Or would you like to push it onto your branch and I can pull
it in push the rest of the patches on top and send it out? I think
if you can get to it in the next few days then it makes sense to wait.

I can't test the hardware side so probably makes more sense for
you to do it if you can.


> > > cleanup is going to kick off TX but also:
> > > 
> > > 	if (unlikely(sk->sk_write_pending) &&
> > > 	    !wait_on_pending_writer(sk, &timeo))
> > > 		tls_handle_open_record(sk, 0);
> > > 
> > > Are we guaranteed that sk_write_pending is 0?  Otherwise
> > > wait_on_pending_writer is hiding yet another release_sock() :(  
> > 
> > Not seeing the path to release_sock() at the moment?
> > 
> >    tls_handle_open_record
> >      push_pending_record
> >       tls_sw_push_pending_record
> >         bpf_exec_tx_verdict
> 
> wait_on_pending_writer
>   sk_wait_event
>     release_sock
> 

ah OK. I'll check on sk_write_pending...

> > If bpf_exec_tx_verdict does a redirect we could hit a relase but that
> > is another fix I have to get queued up shortly. I think we can fix
> > that in another series.
> 
> Ugh.



^ permalink raw reply

* Re: [PATCH 00/12] treewide: Fix GENMASK misuses
From: David Miller @ 2019-07-11 21:30 UTC (permalink / raw)
  To: joe
  Cc: akpm, venture, yuenn, benjaminfair, andrew, openbmc, linux-kernel,
	linux-aspeed, linux-arm-kernel, linux-amlogic, netdev,
	linux-mediatek, linux-stm32, linux-wireless, linux-media,
	dri-devel, linux-iio, linux-mmc, devel, alsa-devel
In-Reply-To: <cover.1562734889.git.joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Tue,  9 Jul 2019 22:04:13 -0700

> These GENMASK uses are inverted argument order and the
> actual masks produced are incorrect.  Fix them.
> 
> Add checkpatch tests to help avoid more misuses too.

Patches #7 and #8 applied to 'net', with appropriate Fixes tags
added to #8.

^ permalink raw reply

* Re: [Patch net] hsr: switch ->dellink() to ->ndo_uninit()
From: David Miller @ 2019-07-11 21:38 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, syzbot+097ef84cdc95843fbaa8, arvid.brodin
In-Reply-To: <20190710062454.16386-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue,  9 Jul 2019 23:24:54 -0700

> Switching from ->priv_destructor to dellink() has an unexpected
> consequence: existing RCU readers, that is, hsr_port_get_hsr()
> callers, may still be able to read the port list.
> 
> Instead of checking the return value of each hsr_port_get_hsr(),
> we can just move it to ->ndo_uninit() which is called after
> device unregister and synchronize_net(), and we still have RTNL
> lock there.
> 
> Fixes: b9a1e627405d ("hsr: implement dellink to clean up resources")
> Fixes: edf070a0fb45 ("hsr: fix a NULL pointer deref in hsr_dev_xmit()")
> Reported-by: syzbot+097ef84cdc95843fbaa8@syzkaller.appspotmail.com
> Cc: Arvid Brodin <arvid.brodin@alten.se>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] ipv6: Use ipv6_authlen for len
From: David Miller @ 2019-07-11 21:43 UTC (permalink / raw)
  To: xingwu.yang
  Cc: kuznet, yoshfuji, netdev, linux-kernel, pablo, kadlec, fw,
	netfilter-devel, coreteam
In-Reply-To: <20190710131410.75825-1-xingwu.yang@gmail.com>

From: yangxingwu <xingwu.yang@gmail.com>
Date: Wed, 10 Jul 2019 21:14:10 +0800

> The length of AH header is computed manually as (hp->hdrlen+2)<<2.
> However, in include/linux/ipv6.h, a macro named ipv6_authlen is
> already defined for exactly the same job. This commit replaces
> the manual computation code with the macro.
> 
> Signed-off-by: yangxingwu <xingwu.yang@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipv6: tcp: fix flowlabels reflection for RST packets
From: David Miller @ 2019-07-11 21:43 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, marek
In-Reply-To: <20190710134011.221210-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 10 Jul 2019 06:40:09 -0700

> In 323a53c41292 ("ipv6: tcp: enable flowlabel reflection in some RST packets")
> and 50a8accf1062 ("ipv6: tcp: send consistent flowlabel in TIME_WAIT state")
> we took care of IPv6 flowlabel reflections for two cases.
> 
> This patch takes care of the remaining case, when the RST packet
> is sent on behalf of a 'full' socket.
> 
> In Marek use case, this was a socket in TCP_CLOSE state.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Marek Majkowski <marek@cloudflare.com>
> Tested-by: Marek Majkowski <marek@cloudflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipv6: fix potential crash in ip6_datagram_dst_update()
From: David Miller @ 2019-07-11 21:43 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, willemb, syzkaller
In-Reply-To: <20190710134011.221210-2-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 10 Jul 2019 06:40:10 -0700

> Willem forgot to change one of the calls to fl6_sock_lookup(),
> which can now return an error or NULL.
> 
> syzbot reported :
 ...
> Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipv6: fix static key imbalance in fl_create()
From: David Miller @ 2019-07-11 21:44 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, willemb, syzkaller
In-Reply-To: <20190710134011.221210-3-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 10 Jul 2019 06:40:11 -0700

> fl_create() should call static_branch_deferred_inc() only in
> case of success.
> 
> Also we should not call fl_free() in error path, as this could
> cause a static key imbalance.
 ...
> Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] Mellanox, mlx5 build fixes
From: David Miller @ 2019-07-11 22:05 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20190711193937.29802-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 11 Jul 2019 19:39:53 +0000

> I know net-next is closed but these patches are fixing some compiler
> build and warnings issues people have been complaining about.
> 
> I hope it is not too late, but in case it is a lot of trouble for
> you, I guess they can wait.

Never too late to submit build fixes :-)

Series applied, thanks.

^ permalink raw reply

* Re: [pull request][net 0/6] Mellanox, mlx5 fixes 2019-07-11
From: David Miller @ 2019-07-11 22:08 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20190711185353.5715-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 11 Jul 2019 18:54:08 +0000

> This series introduces some fixes to mlx5 driver.
> 
> Please pull and let me know if there is any problem.

Pulled.

> For -stable v4.15
> ('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn')
> 
> For -stable v5.1
> ('net/mlx5e: Fix port tunnel GRE entropy control')
> ('net/mlx5e: Rx, Fix checksum calculation for new hardware')
> ('net/mlx5e: Fix return value from timeout recover function')
> ('net/mlx5e: Fix error flow in tx reporter diagnose')
> 
> For -stable v5.2
> ('net/mlx5: E-Switch, Fix default encap mode')

Queued up.

> Conflict note: This pull request will produce a small conflict when
> merged with net-next.
> In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> Take the hunk from net and replace:
> esw_offloads_steering_init(esw, vf_nvports, total_nvports);
> with:
> esw_offloads_steering_init(esw);

Thank you.

^ permalink raw reply

* Re: [net][PATCH 0/5] rds fixes
From: David Miller @ 2019-07-11 22:27 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev
In-Reply-To: <1562736764-31752-1-git-send-email-santosh.shilimkar@oracle.com>

From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Date: Tue,  9 Jul 2019 22:32:39 -0700

> Few rds fixes which makes rds rdma transport reliably working on mainline
> 
> First two fixes are applicable to v4.11+ stable versions and last
> three patches applies to only v5.1 stable and current mainline.
> 
> Patchset is re-based against 'net' and also available on below tree
> 
> The following changes since commit 1ff2f0fa450ea4e4f87793d9ed513098ec6e12be:
> 
>   net/mlx5e: Return in default case statement in tx_post_resync_params (2019-07-09 21:40:20 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git net/rds-fixes

Pulled, thanks.

^ permalink raw reply

* [PATCH v2] tipc: ensure head->lock is initialised
From: Chris Packham @ 2019-07-11 22:41 UTC (permalink / raw)
  To: jon.maloy, eric.dumazet, ying.xue, davem
  Cc: linux-kernel, netdev, tipc-discussion, Chris Packham

tipc_named_node_up() creates a skb list. It passes the list to
tipc_node_xmit() which has some code paths that can call
skb_queue_purge() which relies on the list->lock being initialised.

The spin_lock is only needed if the messages end up on the receive path
but when the list is created in tipc_named_node_up() we don't
necessarily know if it is going to end up there.

Once all the skb list users are updated in tipc it will then be possible
to update them to use the unlocked variants of the skb list functions
and initialise the lock when we know the message will follow the receive
path.

Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---

I'm updating our products to use the latest kernel. One change that we have that
doesn't appear to have been upstreamed is related to the following soft lockup.

NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
Modules linked in: tipc jitterentropy_rng echainiv drbg platform_driver(O) ipifwd(PO)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: P           O    4.4.6-at1 #1
task: a3054e00 ti: ac6b4000 task.ti: a307a000
NIP: 806891c4 LR: 804f5060 CTR: 804f50d0
REGS: ac6b59b0 TRAP: 0901   Tainted: P           O     (4.4.6-at1)
MSR: 00029002 <CE,EE,ME>  CR: 84002088  XER: 20000000

GPR00: 804f50fc ac6b5a60 a3054e00 00029002 00000101 01001011 00000000 00000001
GPR08: 00021002 c1502d1c ac6b5ae4 00000000 804f50d0
NIP [806891c4] _raw_spin_lock_irqsave+0x44/0x80
LR [804f5060] skb_dequeue+0x20/0x90
Call Trace:
[ac6b5a80] [804f50fc] skb_queue_purge+0x2c/0x50
[ac6b5a90] [c1511058] tipc_node_xmit+0x138/0x170 [tipc]
[ac6b5ad0] [c1509e58] tipc_named_node_up+0x88/0xa0 [tipc]
[ac6b5b00] [c150fc1c] tipc_netlink_compat_stop+0x9bc/0xf50 [tipc]
[ac6b5b20] [c1511638] tipc_rcv+0x418/0x9b0 [tipc]
[ac6b5bc0] [c150218c] tipc_bcast_stop+0xfc/0x7b0 [tipc]
[ac6b5bd0] [80504e38] __netif_receive_skb_core+0x468/0xa10
[ac6b5c70] [805082fc] netif_receive_skb_internal+0x3c/0xe0
[ac6b5ca0] [80642a48] br_handle_frame_finish+0x1d8/0x4d0
[ac6b5d10] [80642f30] br_handle_frame+0x1f0/0x330
[ac6b5d60] [80504ec8] __netif_receive_skb_core+0x4f8/0xa10
[ac6b5e00] [805082fc] netif_receive_skb_internal+0x3c/0xe0
[ac6b5e30] [8044c868] _dpa_rx+0x148/0x5c0
[ac6b5ea0] [8044b0c8] priv_rx_default_dqrr+0x98/0x170
[ac6b5ed0] [804d1338] qman_p_poll_dqrr+0x1b8/0x240
[ac6b5f00] [8044b1c0] dpaa_eth_poll+0x20/0x60
[ac6b5f20] [805087cc] net_rx_action+0x15c/0x320
[ac6b5f80] [8002594c] __do_softirq+0x13c/0x250
[ac6b5fe0] [80025c34] irq_exit+0xb4/0xf0
[ac6b5ff0] [8000d81c] call_do_irq+0x24/0x3c
[a307be60] [80004acc] do_IRQ+0x8c/0x120
[a307be80] [8000f450] ret_from_except+0x0/0x18
--- interrupt: 501 at arch_cpu_idle+0x24/0x70

Eyeballing the code I think it can still happen since tipc_named_node_up
allocates struct sk_buff_head head on the stack so it could have arbitrary
content.

Changes in v2:
- fixup commit subject
- add more information to commit message from mailing list discussion

 net/tipc/name_distr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index 61219f0b9677..44abc8e9c990 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -190,7 +190,7 @@ void tipc_named_node_up(struct net *net, u32 dnode)
 	struct name_table *nt = tipc_name_table(net);
 	struct sk_buff_head head;
 
-	__skb_queue_head_init(&head);
+	skb_queue_head_init(&head);
 
 	read_lock_bh(&nt->cluster_scope_lock);
 	named_distribute(net, &head, dnode, &nt->cluster_scope);
-- 
2.22.0


^ permalink raw reply related

* Copia de: Schöne Mädchen für Sex in deiner Stadt
From: Transquintero @ 2019-07-11 22:49 UTC (permalink / raw)
  To: netdev

Copia de:

Esta es una consulta de correo electrónico http://transquintero.com/ a través de:
Ismaelisore <netdev@vger.kernel.org>

Mädchen suchen Sex in Ihrer Stadt: http://merky.de/58lnxh?Q41Tj5b0wKjpAx


^ permalink raw reply

* [PATCH v1 0/6] Harden list_for_each_entry_rcu() and family
From: Joel Fernandes (Google) @ 2019-07-11 23:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Paul E. McKenney,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi,
This series aims to provide lockdep checking to RCU list macros.

RCU has a number of primitives for "consumption" of an RCU protected pointer.
Most of the time, these consumers make sure that such accesses are under a RCU
reader-section (such as rcu_dereference{,sched,bh} or under a lock, such as
with rcu_dereference_protected()).

However, there are other ways to consume RCU pointers, such as by
list_for_each_entry_rcu or hlist_for_each_enry_rcu. Unlike the rcu_dereference
family, these consumers do no lockdep checking at all. And with the growing
number of RCU list uses (1000+), it is possible for bugs to creep in and go
unnoticed which lockdep checks can catch.

Since RCU consolidation efforts last year, the different traditional RCU
flavors (preempt, bh, sched) are all consolidated. In other words, any of these
flavors can cause a reader section to occur and all of them must cease before
the reader section is considered to be unlocked. Thanks to this, we can
generically check if we are in an RCU reader. This is what patch 1 does. Note
that the list_for_each_entry_rcu and family are different from the
rcu_dereference family in that, there is no _bh or _sched version of this
macro. They are used under many different RCU reader flavors, and also SRCU.
Patch 1 adds a new internal function rcu_read_lock_any_held() which checks
if any reader section is active at all, when these macros are called. If no
reader section exists, then the optional fourth argument to
list_for_each_entry_rcu() can be a lockdep expression which is evaluated
(similar to how rcu_dereference_check() works). If no lockdep expression is
passed, and we are not in a reader, then a splat occurs. Just take off the
lockdep expression after applying the patches, by using the following diff and
see what happens:

+++ b/arch/x86/pci/mmconfig-shared.c
@@ -55,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
        struct pci_mmcfg_region *cfg;

        /* keep list sorted by segment and starting bus number */
-       list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
+       list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {


The optional argument trick to list_for_each_entry_rcu() can also be used in
the future to possibly remove rcu_dereference_{,bh,sched}_protected() API and
we can pass an optional lockdep expression to rcu_dereference() itself. Thus
eliminating 3 more RCU APIs.

Note that some list macro wrappers already do their own lockdep checking in the
caller side. These can be eliminated in favor of the built-in lockdep checking
in the list macro that this series adds. For example, workqueue code has a
assert_rcu_or_wq_mutex() function which is called in for_each_wq().  This
series replaces that in favor of the built-in check.

Also in the future, we can extend these checks to list_entry_rcu() and other
list macros as well, if needed.

Joel Fernandes (Google) (6):
rcu: Add support for consolidated-RCU reader checking
ipv4: add lockdep condition to fix for_each_entry
driver/core: Convert to use built-in RCU list checking
workqueue: Convert for_each_wq to use built-in list check
x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator
acpi: Use built-in RCU list checking for acpi_ioremaps list

arch/x86/pci/mmconfig-shared.c |  5 +++--
drivers/acpi/osl.c             |  6 ++++--
drivers/base/base.h            |  1 +
drivers/base/core.c            | 10 ++++++++++
drivers/base/power/runtime.c   | 15 ++++++++++-----
include/linux/rculist.h        | 29 ++++++++++++++++++++++++-----
include/linux/rcupdate.h       |  7 +++++++
kernel/rcu/Kconfig.debug       | 11 +++++++++++
kernel/rcu/update.c            | 26 ++++++++++++++++++++++++++
kernel/workqueue.c             |  5 ++---
net/ipv4/fib_frontend.c        |  3 ++-
11 files changed, 100 insertions(+), 18 deletions(-)

--
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply

* [PATCH v1 2/6] ipv4: add lockdep condition to fix for_each_entry
From: Joel Fernandes (Google) @ 2019-07-11 23:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Paul E. McKenney,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190711234401.220336-1-joel@joelfernandes.org>

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 net/ipv4/fib_frontend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index b298255f6fdb..ef7c9f8e8682 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -127,7 +127,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
 	h = id & (FIB_TABLE_HASHSZ - 1);
 
 	head = &net->ipv4.fib_table_hash[h];
-	hlist_for_each_entry_rcu(tb, head, tb_hlist) {
+	hlist_for_each_entry_rcu(tb, head, tb_hlist,
+				 lockdep_rtnl_is_held()) {
 		if (tb->tb_id == id)
 			return tb;
 	}
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related

* [PATCH v1 3/6] driver/core: Convert to use built-in RCU list checking
From: Joel Fernandes (Google) @ 2019-07-11 23:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Paul E. McKenney,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190711234401.220336-1-joel@joelfernandes.org>

list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
it in driver core.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 drivers/base/base.h          |  1 +
 drivers/base/core.c          | 10 ++++++++++
 drivers/base/power/runtime.c | 15 ++++++++++-----
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index b405436ee28e..0d32544b6f91 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; }
 /* Device links support */
 extern int device_links_read_lock(void);
 extern void device_links_read_unlock(int idx);
+extern int device_links_read_lock_held(void);
 extern int device_links_check_suppliers(struct device *dev);
 extern void device_links_driver_bound(struct device *dev);
 extern void device_links_driver_cleanup(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..6c5ca9685647 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -68,6 +68,11 @@ void device_links_read_unlock(int idx)
 {
 	srcu_read_unlock(&device_links_srcu, idx);
 }
+
+int device_links_read_lock_held(void)
+{
+	return srcu_read_lock_held(&device_links_srcu);
+}
 #else /* !CONFIG_SRCU */
 static DECLARE_RWSEM(device_links_lock);
 
@@ -91,6 +96,11 @@ void device_links_read_unlock(int not_used)
 {
 	up_read(&device_links_lock);
 }
+
+int device_links_read_lock_held(void)
+{
+	return lock_is_held(&device_links_lock);
+}
 #endif /* !CONFIG_SRCU */
 
 /**
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 952a1e7057c7..7a10e8379a70 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev)
 {
 	struct device_link *link;
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held()) {
 		int retval;
 
 		if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
@@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev)
 {
 	struct device_link *link;
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held()) {
 		if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
 			continue;
 
@@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev)
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
+	list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
+				device_links_read_lock_held()) {
 		if (link->flags & DL_FLAG_STATELESS)
 			continue;
 
@@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev)
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held())
 		if (link->flags & DL_FLAG_PM_RUNTIME) {
 			link->supplier_preactivated = true;
 			refcount_inc(&link->rpm_active);
@@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev)
 
 	idx = device_links_read_lock();
 
-	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
+	list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
+				device_links_read_lock_held())
 		if (link->supplier_preactivated) {
 			link->supplier_preactivated = false;
 			if (refcount_dec_not_one(&link->rpm_active))
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related

* [PATCH v1 6/6] acpi: Use built-in RCU list checking for acpi_ioremaps list
From: Joel Fernandes (Google) @ 2019-07-11 23:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Paul E. McKenney,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190711234401.220336-1-joel@joelfernandes.org>

list_for_each_entry_rcu has built-in RCU and lock checking. Make use of
it for acpi_ioremaps list traversal.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 drivers/acpi/osl.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index f29e427d0d1d..c8b5d712c7ae 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -28,6 +28,7 @@
 #include <linux/slab.h>
 #include <linux/mm.h>
 #include <linux/highmem.h>
+#include <linux/lockdep.h>
 #include <linux/pci.h>
 #include <linux/interrupt.h>
 #include <linux/kmod.h>
@@ -94,6 +95,7 @@ struct acpi_ioremap {
 
 static LIST_HEAD(acpi_ioremaps);
 static DEFINE_MUTEX(acpi_ioremap_lock);
+#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
 
 static void __init acpi_request_region (struct acpi_generic_address *gas,
 	unsigned int length, char *desc)
@@ -220,7 +222,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
 {
 	struct acpi_ioremap *map;
 
-	list_for_each_entry_rcu(map, &acpi_ioremaps, list)
+	list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
 		if (map->phys <= phys &&
 		    phys + size <= map->phys + map->size)
 			return map;
@@ -263,7 +265,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
 {
 	struct acpi_ioremap *map;
 
-	list_for_each_entry_rcu(map, &acpi_ioremaps, list)
+	list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
 		if (map->virt <= virt &&
 		    virt + size <= map->virt + map->size)
 			return map;
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related

* [PATCH v1 5/6] x86/pci: Pass lockdep condition to pcm_mmcfg_list iterator
From: Joel Fernandes (Google) @ 2019-07-11 23:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google), Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Josh Triplett, keescook, kernel-hardening,
	Lai Jiangshan, Len Brown, linux-acpi, linux-pci, linux-pm,
	Mathieu Desnoyers, neilb, netdev, oleg, Paul E. McKenney,
	Pavel Machek, peterz, Rafael J. Wysocki, Rasmus Villemoes, rcu,
	Steven Rostedt, Tejun Heo, Thomas Gleixner, will,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190711234401.220336-1-joel@joelfernandes.org>

The pcm_mmcfg_list is traversed with list_for_each_entry_rcu without a
reader-lock held, because the pci_mmcfg_lock is already held. Make this
known to the list macro so that it fixes new lockdep warnings that
trigger due to lockdep checks added to list_for_each_entry_rcu().

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 arch/x86/pci/mmconfig-shared.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 7389db538c30..6fa42e9c4e6f 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -29,6 +29,7 @@
 static bool pci_mmcfg_running_state;
 static bool pci_mmcfg_arch_init_failed;
 static DEFINE_MUTEX(pci_mmcfg_lock);
+#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
 
 LIST_HEAD(pci_mmcfg_list);
 
@@ -54,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
 	struct pci_mmcfg_region *cfg;
 
 	/* keep list sorted by segment and starting bus number */
-	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
 		if (cfg->segment > new->segment ||
 		    (cfg->segment == new->segment &&
 		     cfg->start_bus >= new->start_bus)) {
@@ -118,7 +119,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
 {
 	struct pci_mmcfg_region *cfg;
 
-	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
+	list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held())
 		if (cfg->segment == segment &&
 		    cfg->start_bus <= bus && bus <= cfg->end_bus)
 			return cfg;
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox