Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: Michal Hocko @ 2019-08-02  9:12 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	linux-nfs, linux-rdma, linux-rpi-kernel, linux-xfs, netdev,
	rds-devel, sparclinux, x86, xen-devel, John Hubbard
In-Reply-To: <20190802022005.5117-1-jhubbard@nvidia.com>

On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
[...]
> 2) Convert all of the call sites for get_user_pages*(), to
> invoke put_user_page*(), instead of put_page(). This involves dozens of
> call sites, and will take some time.

How do we make sure this is the case and it will remain the case in the
future? There must be some automagic to enforce/check that. It is simply
not manageable to do it every now and then because then 3) will simply
be never safe.

Have you considered coccinele or some other scripted way to do the
transition? I have no idea how to deal with future changes that would
break the balance though.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH net-next] net: can: Fix compiling warning
From: Sergei Shtylyov @ 2019-08-02  8:59 UTC (permalink / raw)
  To: Mao Wenan, socketcan, davem, netdev; +Cc: linux-kernel, kernel-janitors
In-Reply-To: <20190802033643.84243-1-maowenan@huawei.com>

Hello!

On 02.08.2019 6:36, Mao Wenan wrote:

> There are two warings in net/can, fix them by setting bcm_sock_no_ioctlcmd

    Warnings. :-)

> and raw_sock_no_ioctlcmd as static.
> 
> net/can/bcm.c:1683:5: warning: symbol 'bcm_sock_no_ioctlcmd' was not declared. Should it be static?
> net/can/raw.c:840:5: warning: symbol 'raw_sock_no_ioctlcmd' was not declared. Should it be static?
> 
> Fixes: 473d924d7d46 ("can: fix ioctl function removal")
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [v2,1/2] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-02  8:49 UTC (permalink / raw)
  To: Y Song; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRXkr5oD=yTr8qevFMLBuLyv1v-E7BLme8n2FA8+uPe6sg@mail.gmail.com>

On Fri, Aug 2, 2019 at 3:23 PM Y Song <ys114321@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 2:04 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > By this commit, using `bpftool net attach`, user can attach XDP prog on
> > interface. New type of enum 'net_attach_type' has been made, as stated at
> > cover-letter, the meaning of 'attach' is, prog will be attached on interface.
> >
> > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> > Changes in v2:
> >   - command 'load' changed to 'attach' for the consistency
> >   - 'NET_ATTACH_TYPE_XDP_DRIVE' changed to 'NET_ATTACH_TYPE_XDP_DRIVER'
> >
> >  tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 106 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index 67e99c56bc88..f3b57660b303 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -55,6 +55,35 @@ struct bpf_attach_info {
> >         __u32 flow_dissector_id;
> >  };
> >
> > +enum net_attach_type {
> > +       NET_ATTACH_TYPE_XDP,
> > +       NET_ATTACH_TYPE_XDP_GENERIC,
> > +       NET_ATTACH_TYPE_XDP_DRIVER,
> > +       NET_ATTACH_TYPE_XDP_OFFLOAD,
> > +       __MAX_NET_ATTACH_TYPE
> > +};
> > +
> > +static const char * const attach_type_strings[] = {
> > +       [NET_ATTACH_TYPE_XDP] = "xdp",
> > +       [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric",
> > +       [NET_ATTACH_TYPE_XDP_DRIVER] = "xdpdrv",
> > +       [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
> > +       [__MAX_NET_ATTACH_TYPE] = NULL,
> > +};
> > +
> > +static enum net_attach_type parse_attach_type(const char *str)
> > +{
> > +       enum net_attach_type type;
> > +
> > +       for (type = 0; type < __MAX_NET_ATTACH_TYPE; type++) {
> > +               if (attach_type_strings[type] &&
> > +                  is_prefix(str, attach_type_strings[type]))
> > +                       return type;
> > +       }
> > +
> > +       return __MAX_NET_ATTACH_TYPE;
> > +}
> > +
> >  static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
> >  {
> >         struct bpf_netdev_t *netinfo = cookie;
> > @@ -223,6 +252,77 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
> >         return 0;
> >  }
> >
> > +static int parse_attach_args(int argc, char **argv, int *progfd,
> > +                            enum net_attach_type *attach_type, int *ifindex)
> > +{
> > +       if (!REQ_ARGS(3))
> > +               return -EINVAL;
> > +
> > +       *progfd = prog_parse_fd(&argc, &argv);
> > +       if (*progfd < 0)
> > +               return *progfd;
> > +
> > +       *attach_type = parse_attach_type(*argv);
> > +       if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> > +               p_err("invalid net attach/detach type");
> > +               return -EINVAL;
> > +       }
> > +
> > +       NEXT_ARG();
> > +       if (!REQ_ARGS(1))
> > +               return -EINVAL;
> > +
> > +       *ifindex = if_nametoindex(*argv);
> > +       if (!*ifindex) {
> > +               p_err("Invalid ifname");
>
> Do you want to use the full function name "invalid if_nametoindex" here?
>

No. I was trying to fix the message as "Invalid devname", since the
word "devanme"
is mentioned in 'bpftool net help'.

> > +               return -EINVAL;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> > +                               int *ifindex)
>
> You can just use plain int as the argument type here.
>

I will change the parameter as pass by value.

> > +{
> > +       __u32 flags;
> > +       int err;
> > +
> > +       flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> > +       if (*attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> > +               flags |= XDP_FLAGS_SKB_MODE;
> > +       if (*attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> > +               flags |= XDP_FLAGS_DRV_MODE;
> > +       if (*attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> > +               flags |= XDP_FLAGS_HW_MODE;
> > +
> > +       err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
> > +
> > +       return err;
>
> Just do "return bpf_set_link_xdp_fd(...)" and you do not need variable err.
>

Oh. I've misunderstood why variable err won't be needed.
I'll remove the variable err and update to it.

> > +}
> > +
> > +static int do_attach(int argc, char **argv)
> > +{
> > +       enum net_attach_type attach_type;
> > +       int err, progfd, ifindex;
> > +
> > +       err = parse_attach_args(argc, argv, &progfd, &attach_type, &ifindex);
> > +       if (err)
> > +               return err;
> > +
> > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > +               err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
> > +
> > +       if (err < 0) {
> > +               p_err("link set %s failed", attach_type_strings[attach_type]);
> > +               return -1;
> > +       }
>
> The above "if (err < 0)" can be true only under the above "if (is_prefix(..))"
> conditions. But compiler may optimize this. So I think current form is okay.
> But could you change the return value to "return err" instead of "return -1"?
>

Okay. I'll update the return value to "return err".

> > +
> > +       if (json_output)
> > +               jsonw_null(json_wtr);
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_show(int argc, char **argv)
> >  {
> >         struct bpf_attach_info attach_info = {};
> > @@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
> >
> >         fprintf(stderr,
> >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> > +               "       %s %s attach PROG LOAD_TYPE <devname>\n"
> >                 "       %s %s help\n"
> > +               "\n"
> > +               "       " HELP_SPEC_PROGRAM "\n"
> > +               "       LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
> >                 "Note: Only xdp and tc attachments are supported now.\n"
> >                 "      For progs attached to cgroups, use \"bpftool cgroup\"\n"
> >                 "      to dump program attachments. For program types\n"
> >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> >                 "      consult iproute2.\n",
> > -               bin_name, argv[-2], bin_name, argv[-2]);
> > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> >
> >         return 0;
> >  }
> > @@ -319,6 +423,7 @@ static int do_help(int argc, char **argv)
> >  static const struct cmd cmds[] = {
> >         { "show",       do_show },
> >         { "list",       do_show },
> > +       { "attach",     do_attach },
> >         { "help",       do_help },
> >         { 0 }
> >  };
> > --
> > 2.20.1
> >

Thank you for the review :)

^ permalink raw reply

* [patch 1/1] drivers/net/ethernet/marvell/mvmdio.c: Fix non OF case
From: Arnaud Patard @ 2019-08-02  8:32 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn, Arnaud Patard

Orion5.x systems are still using machine files and not device-tree.
Commit 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be
specified for orion-mdio") has replaced devm_clk_get() with of_clk_get(),
leading to a oops at boot and not working network, as reported in 
https://lists.debian.org/debian-arm/2019/07/msg00088.html and possibly in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908712.

Link: https://lists.debian.org/debian-arm/2019/07/msg00088.html
Fixes: 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio")
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Index: linux-next/drivers/net/ethernet/marvell/mvmdio.c
===================================================================
--- linux-next.orig/drivers/net/ethernet/marvell/mvmdio.c
+++ linux-next/drivers/net/ethernet/marvell/mvmdio.c
@@ -319,20 +319,33 @@ static int orion_mdio_probe(struct platf
 
 	init_waitqueue_head(&dev->smi_busy_wait);
 
-	for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
-		dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
-		if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
+	if (pdev->dev.of_node) {
+		for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
+			dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
+			if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
+				ret = -EPROBE_DEFER;
+				goto out_clk;
+			}
+			if (IS_ERR(dev->clk[i]))
+				break;
+			clk_prepare_enable(dev->clk[i]);
+		}
+
+		if (!IS_ERR(of_clk_get(pdev->dev.of_node,
+				       ARRAY_SIZE(dev->clk))))
+			dev_warn(&pdev->dev,
+				 "unsupported number of clocks, limiting to the first "
+				 __stringify(ARRAY_SIZE(dev->clk)) "\n");
+	} else {
+		dev->clk[0] = clk_get(&pdev->dev, NULL);
+		if (PTR_ERR(dev->clk[0]) == -EPROBE_DEFER) {
 			ret = -EPROBE_DEFER;
 			goto out_clk;
 		}
-		if (IS_ERR(dev->clk[i]))
-			break;
-		clk_prepare_enable(dev->clk[i]);
+		if (!IS_ERR(dev->clk[0]))
+			clk_prepare_enable(dev->clk[0]);
 	}
 
-	if (!IS_ERR(of_clk_get(pdev->dev.of_node, ARRAY_SIZE(dev->clk))))
-		dev_warn(&pdev->dev, "unsupported number of clocks, limiting to the first "
-			 __stringify(ARRAY_SIZE(dev->clk)) "\n");
 
 	dev->err_interrupt = platform_get_irq(pdev, 0);
 	if (dev->err_interrupt > 0 &&



^ permalink raw reply

* [PATCH net] net/smc: avoid fallback in case of non-blocking connect
From: Karsten Graul @ 2019-08-02  8:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, gor, heiko.carstens, raspl, ubraun

From: Ursula Braun <ubraun@linux.ibm.com>

FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.

Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db032 ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
---
 net/smc/af_smc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index f5ea09258ab0..5b932583e407 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -263,7 +263,7 @@ static int smc_bind(struct socket *sock, struct sockaddr *uaddr,
 
 	/* Check if socket is already active */
 	rc = -EINVAL;
-	if (sk->sk_state != SMC_INIT)
+	if (sk->sk_state != SMC_INIT || smc->connect_nonblock)
 		goto out_rel;
 
 	smc->clcsock->sk->sk_reuse = sk->sk_reuse;
@@ -1390,7 +1390,8 @@ static int smc_listen(struct socket *sock, int backlog)
 	lock_sock(sk);
 
 	rc = -EINVAL;
-	if ((sk->sk_state != SMC_INIT) && (sk->sk_state != SMC_LISTEN))
+	if ((sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) ||
+	    smc->connect_nonblock)
 		goto out;
 
 	rc = 0;
@@ -1518,7 +1519,7 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 		goto out;
 
 	if (msg->msg_flags & MSG_FASTOPEN) {
-		if (sk->sk_state == SMC_INIT) {
+		if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) {
 			smc_switch_to_fallback(smc);
 			smc->fallback_rsn = SMC_CLC_DECL_OPTUNSUPP;
 		} else {
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2 2/2] cxgb4: smt: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:41 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/smt.c | 14 +++++++-------
 drivers/net/ethernet/chelsio/cxgb4/smt.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/smt.c b/drivers/net/ethernet/chelsio/cxgb4/smt.c
index eaf1fb74689c..343887fa52aa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.c
@@ -57,7 +57,7 @@ struct smt_data *t4_init_smt(void)
 		s->smtab[i].state = SMT_STATE_UNUSED;
 		memset(&s->smtab[i].src_mac, 0, ETH_ALEN);
 		spin_lock_init(&s->smtab[i].lock);
-		atomic_set(&s->smtab[i].refcnt, 0);
+		refcount_set(&s->smtab[i].refcnt, 1);
 	}
 	return s;
 }
@@ -68,7 +68,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 	struct smt_entry *e, *end;
 
 	for (e = &s->smtab[0], end = &s->smtab[s->smt_size]; e != end; ++e) {
-		if (atomic_read(&e->refcnt) == 0) {
+		if (refcount_read(&e->refcnt) == 1) {
 			if (!first_free)
 				first_free = e;
 		} else {
@@ -98,7 +98,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 static void t4_smte_free(struct smt_entry *e)
 {
 	spin_lock_bh(&e->lock);
-	if (atomic_read(&e->refcnt) == 0) {  /* hasn't been recycled */
+	if (refcount_read(&e->refcnt) == 1) {  /* hasn't been recycled */
 		e->state = SMT_STATE_UNUSED;
 	}
 	spin_unlock_bh(&e->lock);
@@ -111,7 +111,7 @@ static void t4_smte_free(struct smt_entry *e)
  */
 void cxgb4_smt_release(struct smt_entry *e)
 {
-	if (atomic_dec_and_test(&e->refcnt))
+	if (refcount_dec_and_test(&e->refcnt))
 		t4_smte_free(e);
 }
 EXPORT_SYMBOL(cxgb4_smt_release);
@@ -215,14 +215,14 @@ static struct smt_entry *t4_smt_alloc_switching(struct adapter *adap, u16 pfvf,
 	e = find_or_alloc_smte(s, smac);
 	if (e) {
 		spin_lock(&e->lock);
-		if (!atomic_read(&e->refcnt)) {
-			atomic_set(&e->refcnt, 1);
+		if (refcount_read(&e->refcnt) == 1) {
+			refcount_set(&e->refcnt, 2);
 			e->state = SMT_STATE_SWITCHING;
 			e->pfvf = pfvf;
 			memcpy(e->src_mac, smac, ETH_ALEN);
 			write_smt_entry(adap, e);
 		} else {
-			atomic_inc(&e->refcnt);
+			refcount_inc(&e->refcnt);
 		}
 		spin_unlock(&e->lock);
 	}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/smt.h b/drivers/net/ethernet/chelsio/cxgb4/smt.h
index d6c2cc271398..4774606a0101 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.h
@@ -59,7 +59,7 @@ struct smt_entry {
 	u16 idx;
 	u16 pfvf;
 	u8 src_mac[ETH_ALEN];
-	atomic_t refcnt;
+	refcount_t refcnt;
 	spinlock_t lock;	/* protect smt entry add,removal */
 };
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:41 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/sched.c | 8 ++++----
 drivers/net/ethernet/chelsio/cxgb4/sched.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c b/drivers/net/ethernet/chelsio/cxgb4/sched.c
index 60218dc676a8..0d902d125be4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -173,7 +173,7 @@ static int t4_sched_queue_unbind(struct port_info *pi, struct ch_sched_queue *p)
 
 		list_del(&qe->list);
 		kvfree(qe);
-		if (atomic_dec_and_test(&e->refcnt)) {
+		if (refcount_dec_and_test(&e->refcnt)) {
 			e->state = SCHED_STATE_UNUSED;
 			memset(&e->info, 0, sizeof(e->info));
 		}
@@ -216,7 +216,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p)
 		goto out_err;
 
 	list_add_tail(&qe->list, &e->queue_list);
-	atomic_inc(&e->refcnt);
+	refcount_inc(&e->refcnt);
 	return err;
 
 out_err:
@@ -434,7 +434,7 @@ static struct sched_class *t4_sched_class_alloc(struct port_info *pi,
 		if (err)
 			return NULL;
 		memcpy(&e->info, &np, sizeof(e->info));
-		atomic_set(&e->refcnt, 0);
+		refcount_set(&e->refcnt, 1);
 		e->state = SCHED_STATE_ACTIVE;
 	}
 
@@ -488,7 +488,7 @@ struct sched_table *t4_init_sched(unsigned int sched_size)
 		s->tab[i].idx = i;
 		s->tab[i].state = SCHED_STATE_UNUSED;
 		INIT_LIST_HEAD(&s->tab[i].queue_list);
-		atomic_set(&s->tab[i].refcnt, 0);
+		refcount_set(&s->tab[i].refcnt, 1);
 	}
 	return s;
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.h b/drivers/net/ethernet/chelsio/cxgb4/sched.h
index 168fb4ce3759..23a6ca1e6d3e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.h
@@ -69,7 +69,7 @@ struct sched_class {
 	u8 idx;
 	struct ch_sched_params info;
 	struct list_head queue_list;
-	atomic_t refcnt;
+	refcount_t refcnt;
 };
 
 struct sched_table {      /* per port scheduling table */
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next] net: can: Fix compiling warning
From: maowenan @ 2019-08-02  8:39 UTC (permalink / raw)
  To: Oliver Hartkopp, davem, netdev; +Cc: linux-kernel, kernel-janitors
In-Reply-To: <0050efdb-af9f-49b9-8d83-f574b3d46a2e@hartkopp.net>



On 2019/8/2 16:10, Oliver Hartkopp wrote:
> On 02/08/2019 05.36, Mao Wenan wrote:
>> There are two warings in net/can, fix them by setting bcm_sock_no_ioctlcmd
>> and raw_sock_no_ioctlcmd as static.
>>
>> net/can/bcm.c:1683:5: warning: symbol 'bcm_sock_no_ioctlcmd' was not declared. Should it be static?
>> net/can/raw.c:840:5: warning: symbol 'raw_sock_no_ioctlcmd' was not declared. Should it be static?
>>
>> Fixes: 473d924d7d46 ("can: fix ioctl function removal")
>>
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> 
> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
> 
> Thanks Mao!
> 
> Btw. what kind of compiler/make switches are you using so that I can see these warnings myself the next time?
> 

I use ARCH=mips CROSS_COMPILE=mips-linux-gnu- .

> Best regards,
> Oliver
> 
>> ---
>>   net/can/bcm.c | 2 +-
>>   net/can/raw.c | 2 +-
>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/can/bcm.c b/net/can/bcm.c
>> index bf1d0bbecec8..b8a32b4ac368 100644
>> --- a/net/can/bcm.c
>> +++ b/net/can/bcm.c
>> @@ -1680,7 +1680,7 @@ static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>>       return size;
>>   }
>>   -int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>> +static int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>>                unsigned long arg)
>>   {
>>       /* no ioctls for socket layer -> hand it down to NIC layer */
>> diff --git a/net/can/raw.c b/net/can/raw.c
>> index da386f1fa815..a01848ff9b12 100644
>> --- a/net/can/raw.c
>> +++ b/net/can/raw.c
>> @@ -837,7 +837,7 @@ static int raw_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>>       return size;
>>   }
>>   -int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>> +static int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>>                unsigned long arg)
>>   {
>>       /* no ioctls for socket layer -> hand it down to NIC layer */
>>
> 
> .
> 


^ permalink raw reply

* [PATCH v2 2/2] cxgb4: smt: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:35 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/smt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/smt.c b/drivers/net/ethernet/chelsio/cxgb4/smt.c
index eaf1fb74689c..343887fa52aa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.c
@@ -57,7 +57,7 @@ struct smt_data *t4_init_smt(void)
 		s->smtab[i].state = SMT_STATE_UNUSED;
 		memset(&s->smtab[i].src_mac, 0, ETH_ALEN);
 		spin_lock_init(&s->smtab[i].lock);
-		atomic_set(&s->smtab[i].refcnt, 0);
+		refcount_set(&s->smtab[i].refcnt, 1);
 	}
 	return s;
 }
@@ -68,7 +68,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 	struct smt_entry *e, *end;
 
 	for (e = &s->smtab[0], end = &s->smtab[s->smt_size]; e != end; ++e) {
-		if (atomic_read(&e->refcnt) == 0) {
+		if (refcount_read(&e->refcnt) == 1) {
 			if (!first_free)
 				first_free = e;
 		} else {
@@ -98,7 +98,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 static void t4_smte_free(struct smt_entry *e)
 {
 	spin_lock_bh(&e->lock);
-	if (atomic_read(&e->refcnt) == 0) {  /* hasn't been recycled */
+	if (refcount_read(&e->refcnt) == 1) {  /* hasn't been recycled */
 		e->state = SMT_STATE_UNUSED;
 	}
 	spin_unlock_bh(&e->lock);
@@ -111,7 +111,7 @@ static void t4_smte_free(struct smt_entry *e)
  */
 void cxgb4_smt_release(struct smt_entry *e)
 {
-	if (atomic_dec_and_test(&e->refcnt))
+	if (refcount_dec_and_test(&e->refcnt))
 		t4_smte_free(e);
 }
 EXPORT_SYMBOL(cxgb4_smt_release);
@@ -215,14 +215,14 @@ static struct smt_entry *t4_smt_alloc_switching(struct adapter *adap, u16 pfvf,
 	e = find_or_alloc_smte(s, smac);
 	if (e) {
 		spin_lock(&e->lock);
-		if (!atomic_read(&e->refcnt)) {
-			atomic_set(&e->refcnt, 1);
+		if (refcount_read(&e->refcnt) == 1) {
+			refcount_set(&e->refcnt, 2);
 			e->state = SMT_STATE_SWITCHING;
 			e->pfvf = pfvf;
 			memcpy(e->src_mac, smac, ETH_ALEN);
 			write_smt_entry(adap, e);
 		} else {
-			atomic_inc(&e->refcnt);
+			refcount_inc(&e->refcnt);
 		}
 		spin_unlock(&e->lock);
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:35 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/sched.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c b/drivers/net/ethernet/chelsio/cxgb4/sched.c
index 60218dc676a8..0d902d125be4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -173,7 +173,7 @@ static int t4_sched_queue_unbind(struct port_info *pi, struct ch_sched_queue *p)
 
 		list_del(&qe->list);
 		kvfree(qe);
-		if (atomic_dec_and_test(&e->refcnt)) {
+		if (refcount_dec_and_test(&e->refcnt)) {
 			e->state = SCHED_STATE_UNUSED;
 			memset(&e->info, 0, sizeof(e->info));
 		}
@@ -216,7 +216,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p)
 		goto out_err;
 
 	list_add_tail(&qe->list, &e->queue_list);
-	atomic_inc(&e->refcnt);
+	refcount_inc(&e->refcnt);
 	return err;
 
 out_err:
@@ -434,7 +434,7 @@ static struct sched_class *t4_sched_class_alloc(struct port_info *pi,
 		if (err)
 			return NULL;
 		memcpy(&e->info, &np, sizeof(e->info));
-		atomic_set(&e->refcnt, 0);
+		refcount_set(&e->refcnt, 1);
 		e->state = SCHED_STATE_ACTIVE;
 	}
 
@@ -488,7 +488,7 @@ struct sched_table *t4_init_sched(unsigned int sched_size)
 		s->tab[i].idx = i;
 		s->tab[i].state = SCHED_STATE_UNUSED;
 		INIT_LIST_HEAD(&s->tab[i].queue_list);
-		atomic_set(&s->tab[i].refcnt, 0);
+		refcount_set(&s->tab[i].refcnt, 1);
 	}
 	return s;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [v2,2/2] tools: bpftool: add net detach command to detach XDP on interface
From: Daniel T. Lee @ 2019-08-02  8:33 UTC (permalink / raw)
  To: Y Song; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRX_OCnN82GESHBD+-wZvZqo7fba0ExDyqTh_3_tfRR1Nw@mail.gmail.com>

On Fri, Aug 2, 2019 at 3:26 PM Y Song <ys114321@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 2:04 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > By this commit, using `bpftool net detach`, the attached XDP prog can
> > be detached. Detaching the BPF prog will be done through libbpf
> > 'bpf_set_link_xdp_fd' with the progfd set to -1.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> > Changes in v2:
> >   - command 'unload' changed to 'detach' for the consistency
> >
> >  tools/bpf/bpftool/net.c | 55 ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 54 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index f3b57660b303..2ae9a613b05c 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -281,6 +281,31 @@ static int parse_attach_args(int argc, char **argv, int *progfd,
> >         return 0;
> >  }
> >
> > +static int parse_detach_args(int argc, char **argv,
> > +                            enum net_attach_type *attach_type, int *ifindex)
> > +{
> > +       if (!REQ_ARGS(2))
> > +               return -EINVAL;
> > +
> > +       *attach_type = parse_attach_type(*argv);
> > +       if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> > +               p_err("invalid net attach/detach type");
> > +               return -EINVAL;
> > +       }
> > +
> > +       NEXT_ARG();
> > +       if (!REQ_ARGS(1))
> > +               return -EINVAL;
> > +
> > +       *ifindex = if_nametoindex(*argv);
> > +       if (!*ifindex) {
> > +               p_err("Invalid ifname");
> > +               return -EINVAL;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> >                                 int *ifindex)
> >  {
> > @@ -323,6 +348,31 @@ static int do_attach(int argc, char **argv)
> >         return 0;
> >  }
> >
> > +static int do_detach(int argc, char **argv)
> > +{
> > +       enum net_attach_type attach_type;
> > +       int err, progfd, ifindex;
> > +
> > +       err = parse_detach_args(argc, argv, &attach_type, &ifindex);
> > +       if (err)
> > +               return err;
> > +
> > +       /* to detach xdp prog */
> > +       progfd = -1;
> > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > +               err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
>
> Similar to previous patch, parameters no need to be pointer.
>

I will change the parameter as pass by value.

> > +
> > +       if (err < 0) {
> > +               p_err("link set %s failed", attach_type_strings[attach_type]);
> > +               return -1;
>
> Maybe "return err"?
>

Hadn't thought of that.
I will change to it!

> > +       }
> > +
> > +       if (json_output)
> > +               jsonw_null(json_wtr);
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_show(int argc, char **argv)
> >  {
> >         struct bpf_attach_info attach_info = {};
> > @@ -406,6 +456,7 @@ static int do_help(int argc, char **argv)
> >         fprintf(stderr,
> >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> >                 "       %s %s attach PROG LOAD_TYPE <devname>\n"
> > +               "       %s %s detach LOAD_TYPE <devname>\n"
> >                 "       %s %s help\n"
> >                 "\n"
> >                 "       " HELP_SPEC_PROGRAM "\n"
> > @@ -415,7 +466,8 @@ static int do_help(int argc, char **argv)
> >                 "      to dump program attachments. For program types\n"
> >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> >                 "      consult iproute2.\n",
> > -               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> > +               bin_name, argv[-2]);
> >
> >         return 0;
> >  }
> > @@ -424,6 +476,7 @@ static const struct cmd cmds[] = {
> >         { "show",       do_show },
> >         { "list",       do_show },
> >         { "attach",     do_attach },
> > +       { "detach",     do_detach },
> >         { "help",       do_help },
> >         { 0 }
> >  };
> > --
> > 2.20.1
> >

Thanks for the review!

^ permalink raw reply

* [PATCH net] net/smc: do not schedule tx_work in SMC_CLOSED state
From: Karsten Graul @ 2019-08-02  8:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, gor, heiko.carstens, raspl, ubraun

From: Ursula Braun <ubraun@linux.ibm.com>

The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
tx worker. Make sure the socket is not yet moved into SMC_CLOSED
state (for instance by a shutdown SHUT_RDWR call).

Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
---
 net/smc/af_smc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 302e355f2ebc..f5ea09258ab0 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1732,14 +1732,18 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 		}
 		break;
 	case TCP_NODELAY:
-		if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+		if (sk->sk_state != SMC_INIT &&
+		    sk->sk_state != SMC_LISTEN &&
+		    sk->sk_state != SMC_CLOSED) {
 			if (val && !smc->use_fallback)
 				mod_delayed_work(system_wq, &smc->conn.tx_work,
 						 0);
 		}
 		break;
 	case TCP_CORK:
-		if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+		if (sk->sk_state != SMC_INIT &&
+		    sk->sk_state != SMC_LISTEN &&
+		    sk->sk_state != SMC_CLOSED) {
 			if (!val && !smc->use_fallback)
 				mod_delayed_work(system_wq, &smc->conn.tx_work,
 						 0);
-- 
2.21.0


^ permalink raw reply related

* RE: [net-next 2/9] i40e: make visible changed vf mac on host
From: Loktionov, Aleksandr @ 2019-08-02  8:14 UTC (permalink / raw)
  To: Shannon Nelson, Kirsher, Jeffrey T, davem@davemloft.net
  Cc: netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
	Bowers, AndrewX
In-Reply-To: <9a3a4675-b031-7666-f259-978d18b6db19@pensando.io>

Good day Nelson

In 99% cases VF has _only one_ unicast mac anyway, and the last MAC has been chosen because of VF mac address change algo -  it marks unicast filter for deletion and appends a new unicast filter to the list.
The implementation has been chosen because of simplicity /* Just 3 more lines to solve the issue */, from one point it may look wasteful for some 1% of VF corner cases.
But from another point of view, more complicated code will affect 99% normal cases. Modern cpu are sensitive to cache thrash by code size and pipeline stalls by conditionals.

Alex

-----Original Message-----
From: Shannon Nelson [mailto:snelson@pensando.io] 
Sent: Friday, August 2, 2019 2:11 AM
To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; davem@davemloft.net
Cc: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; netdev@vger.kernel.org; nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX <andrewx.bowers@intel.com>
Subject: Re: [net-next 2/9] i40e: make visible changed vf mac on host

On 8/1/19 1:51 PM, Jeff Kirsher wrote:
> From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>
> This patch makes changed VM mac address visible on host via ip link 
> show command. This problem is fixed by copying last unicast mac filter 
> to vf->default_lan_addr.addr. Without this patch if VF MAC was not set 
> from host side and if you run ip link show $pf, on host side you'd 
> always see a zero MAC, not the real VF MAC that VF assigned to itself.
>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
> b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index 02b09a8ad54c..21f7ac514d1f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -2629,6 +2629,9 @@ static int i40e_vc_add_mac_addr_msg(struct i40e_vf *vf, u8 *msg)
>   			} else {
>   				vf->num_mac++;
>   			}
> +			if (is_valid_ether_addr(al->list[i].addr))
> +				ether_addr_copy(vf->default_lan_addr.addr,
> +						al->list[i].addr);
>   		}
>   	}
>   	spin_unlock_bh(&vsi->mac_filter_hash_lock);

Since this copy is done inside the for-loop, it looks like you are copying every address in the list, not just the last one.  This seems wasteful and unnecessary.

Since it is possible, altho' unlikely, that the filter sync that happens a little later could fail, might it be better to do the copy after you know that the sync has succeeded?

Why is the last mac chosen for display rather than the first?  Is there anything special about the last mac as opposed to the first mac?

sln

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

^ permalink raw reply

* Re: [PATCH net-next] net: can: Fix compiling warning
From: Oliver Hartkopp @ 2019-08-02  8:10 UTC (permalink / raw)
  To: Mao Wenan, davem, netdev; +Cc: linux-kernel, kernel-janitors
In-Reply-To: <20190802033643.84243-1-maowenan@huawei.com>

On 02/08/2019 05.36, Mao Wenan wrote:
> There are two warings in net/can, fix them by setting bcm_sock_no_ioctlcmd
> and raw_sock_no_ioctlcmd as static.
> 
> net/can/bcm.c:1683:5: warning: symbol 'bcm_sock_no_ioctlcmd' was not declared. Should it be static?
> net/can/raw.c:840:5: warning: symbol 'raw_sock_no_ioctlcmd' was not declared. Should it be static?
> 
> Fixes: 473d924d7d46 ("can: fix ioctl function removal")
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>

Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>

Thanks Mao!

Btw. what kind of compiler/make switches are you using so that I can see 
these warnings myself the next time?

Best regards,
Oliver

> ---
>   net/can/bcm.c | 2 +-
>   net/can/raw.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/can/bcm.c b/net/can/bcm.c
> index bf1d0bbecec8..b8a32b4ac368 100644
> --- a/net/can/bcm.c
> +++ b/net/can/bcm.c
> @@ -1680,7 +1680,7 @@ static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>   	return size;
>   }
>   
> -int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
> +static int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>   			 unsigned long arg)
>   {
>   	/* no ioctls for socket layer -> hand it down to NIC layer */
> diff --git a/net/can/raw.c b/net/can/raw.c
> index da386f1fa815..a01848ff9b12 100644
> --- a/net/can/raw.c
> +++ b/net/can/raw.c
> @@ -837,7 +837,7 @@ static int raw_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>   	return size;
>   }
>   
> -int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
> +static int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>   			 unsigned long arg)
>   {
>   	/* no ioctls for socket layer -> hand it down to NIC layer */
> 

^ permalink raw reply

* [PATCH bpf-next v4 2/2] xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP
From: Björn Töpel @ 2019-08-02  8:11 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, bruce.richardson,
	songliubraving, bpf
In-Reply-To: <20190802081154.30962-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flags when updating
an entry. This patch addresses that.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 kernel/bpf/xskmap.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 780639309f6b..8864dfe1d9ef 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -226,8 +226,6 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 		return -EINVAL;
 	if (unlikely(i >= m->map.max_entries))
 		return -E2BIG;
-	if (unlikely(map_flags == BPF_NOEXIST))
-		return -EEXIST;
 
 	sock = sockfd_lookup(fd, &err);
 	if (!sock)
@@ -253,14 +251,29 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 
 	spin_lock_bh(&m->lock);
+	entry = &m->xsk_map[i];
+	old_xs = READ_ONCE(*entry);
+	if (old_xs && map_flags == BPF_NOEXIST) {
+		err = -EEXIST;
+		goto out;
+	} else if (!old_xs && map_flags == BPF_EXIST) {
+		err = -ENOENT;
+		goto out;
+	}
 	xsk_map_sock_add(xs, node);
-	old_xs = xchg(entry, xs);
+	WRITE_ONCE(*entry, xs);
 	if (old_xs)
 		xsk_map_sock_delete(old_xs, entry);
 	spin_unlock_bh(&m->lock);
 
 	sockfd_put(sock);
 	return 0;
+
+out:
+	spin_unlock_bh(&m->lock);
+	sockfd_put(sock);
+	xsk_map_node_free(node);
+	return err;
 }
 
 static int xsk_map_delete_elem(struct bpf_map *map, void *key)
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next v4 1/2] xsk: remove AF_XDP socket from map when the socket is released
From: Björn Töpel @ 2019-08-02  8:11 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, bruce.richardson,
	songliubraving, bpf
In-Reply-To: <20190802081154.30962-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

When an AF_XDP socket is released/closed the XSKMAP still holds a
reference to the socket in a "released" state. The socket will still
use the netdev queue resource, and block newly created sockets from
attaching to that queue, but no user application can access the
fill/complete/rx/tx queues. This results in that all applications need
to explicitly clear the map entry from the old "zombie state"
socket. This should be done automatically.

In this patch, the sockets tracks, and have a reference to, which maps
it resides in. When the socket is released, it will remove itself from
all maps.

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/net/xdp_sock.h |  18 +++++++
 kernel/bpf/xskmap.c    | 113 ++++++++++++++++++++++++++++++++++-------
 net/xdp/xsk.c          |  48 +++++++++++++++++
 3 files changed, 160 insertions(+), 19 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 69796d264f06..066e3ae446a8 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -50,6 +50,16 @@ struct xdp_umem {
 	struct list_head xsk_list;
 };
 
+/* Nodes are linked in the struct xdp_sock map_list field, and used to
+ * track which maps a certain socket reside in.
+ */
+struct xsk_map;
+struct xsk_map_node {
+	struct list_head node;
+	struct xsk_map *map;
+	struct xdp_sock **map_entry;
+};
+
 struct xdp_sock {
 	/* struct sock must be the first member of struct xdp_sock */
 	struct sock sk;
@@ -75,6 +85,9 @@ struct xdp_sock {
 	/* Protects generic receive. */
 	spinlock_t rx_lock;
 	u64 rx_dropped;
+	struct list_head map_list;
+	/* Protects map_list */
+	spinlock_t map_list_lock;
 };
 
 struct xdp_buff;
@@ -96,6 +109,11 @@ struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
 void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
 struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
 
+void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
+			     struct xdp_sock **map_entry);
+int xsk_map_inc(struct xsk_map *map);
+void xsk_map_put(struct xsk_map *map);
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return umem->pages[addr >> PAGE_SHIFT].addr + (addr & (PAGE_SIZE - 1));
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 9bb96ace9fa1..780639309f6b 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -13,8 +13,71 @@ struct xsk_map {
 	struct bpf_map map;
 	struct xdp_sock **xsk_map;
 	struct list_head __percpu *flush_list;
+	spinlock_t lock; /* Synchronize map updates */
 };
 
+int xsk_map_inc(struct xsk_map *map)
+{
+	struct bpf_map *m = &map->map;
+
+	m = bpf_map_inc(m, false);
+	return IS_ERR(m) ? PTR_ERR(m) : 0;
+}
+
+void xsk_map_put(struct xsk_map *map)
+{
+	bpf_map_put(&map->map);
+}
+
+static struct xsk_map_node *xsk_map_node_alloc(struct xsk_map *map,
+					       struct xdp_sock **map_entry)
+{
+	struct xsk_map_node *node;
+	int err;
+
+	node = kzalloc(sizeof(*node), GFP_ATOMIC | __GFP_NOWARN);
+	if (!node)
+		return NULL;
+
+	err = xsk_map_inc(map);
+	if (err) {
+		kfree(node);
+		return ERR_PTR(err);
+	}
+
+	node->map = map;
+	node->map_entry = map_entry;
+	return node;
+}
+
+static void xsk_map_node_free(struct xsk_map_node *node)
+{
+	xsk_map_put(node->map);
+	kfree(node);
+}
+
+static void xsk_map_sock_add(struct xdp_sock *xs, struct xsk_map_node *node)
+{
+	spin_lock_bh(&xs->map_list_lock);
+	list_add_tail(&node->node, &xs->map_list);
+	spin_unlock_bh(&xs->map_list_lock);
+}
+
+static void xsk_map_sock_delete(struct xdp_sock *xs,
+				struct xdp_sock **map_entry)
+{
+	struct xsk_map_node *n, *tmp;
+
+	spin_lock_bh(&xs->map_list_lock);
+	list_for_each_entry_safe(n, tmp, &xs->map_list, node) {
+		if (map_entry == n->map_entry) {
+			list_del(&n->node);
+			xsk_map_node_free(n);
+		}
+	}
+	spin_unlock_bh(&xs->map_list_lock);
+}
+
 static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 {
 	struct xsk_map *m;
@@ -34,6 +97,7 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 
 	bpf_map_init_from_attr(&m->map, attr);
+	spin_lock_init(&m->lock);
 
 	cost = (u64)m->map.max_entries * sizeof(struct xdp_sock *);
 	cost += sizeof(struct list_head) * num_possible_cpus();
@@ -71,21 +135,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 static void xsk_map_free(struct bpf_map *map)
 {
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
-	int i;
 
 	bpf_clear_redirect_map(map);
 	synchronize_net();
-
-	for (i = 0; i < map->max_entries; i++) {
-		struct xdp_sock *xs;
-
-		xs = m->xsk_map[i];
-		if (!xs)
-			continue;
-
-		sock_put((struct sock *)xs);
-	}
-
 	free_percpu(m->flush_list);
 	bpf_map_area_free(m->xsk_map);
 	kfree(m);
@@ -165,7 +217,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 {
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
 	u32 i = *(u32 *)key, fd = *(u32 *)value;
-	struct xdp_sock *xs, *old_xs;
+	struct xdp_sock *xs, *old_xs, **entry;
+	struct xsk_map_node *node;
 	struct socket *sock;
 	int err;
 
@@ -192,11 +245,19 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 		return -EOPNOTSUPP;
 	}
 
-	sock_hold(sock->sk);
+	entry = &m->xsk_map[i];
+	node = xsk_map_node_alloc(m, entry);
+	if (IS_ERR(node)) {
+		sockfd_put(sock);
+		return PTR_ERR(node);
+	}
 
-	old_xs = xchg(&m->xsk_map[i], xs);
+	spin_lock_bh(&m->lock);
+	xsk_map_sock_add(xs, node);
+	old_xs = xchg(entry, xs);
 	if (old_xs)
-		sock_put((struct sock *)old_xs);
+		xsk_map_sock_delete(old_xs, entry);
+	spin_unlock_bh(&m->lock);
 
 	sockfd_put(sock);
 	return 0;
@@ -205,19 +266,33 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 static int xsk_map_delete_elem(struct bpf_map *map, void *key)
 {
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
-	struct xdp_sock *old_xs;
+	struct xdp_sock *old_xs, **map_entry;
 	int k = *(u32 *)key;
 
 	if (k >= map->max_entries)
 		return -EINVAL;
 
-	old_xs = xchg(&m->xsk_map[k], NULL);
+	spin_lock_bh(&m->lock);
+	map_entry = &m->xsk_map[k];
+	old_xs = xchg(map_entry, NULL);
 	if (old_xs)
-		sock_put((struct sock *)old_xs);
+		xsk_map_sock_delete(old_xs, map_entry);
+	spin_unlock_bh(&m->lock);
 
 	return 0;
 }
 
+void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
+			     struct xdp_sock **map_entry)
+{
+	spin_lock_bh(&map->lock);
+	if (READ_ONCE(*map_entry) == xs) {
+		WRITE_ONCE(*map_entry, NULL);
+		xsk_map_sock_delete(xs, map_entry);
+	}
+	spin_unlock_bh(&map->lock);
+}
+
 const struct bpf_map_ops xsk_map_ops = {
 	.map_alloc = xsk_map_alloc,
 	.map_free = xsk_map_free,
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 59b57d708697..c3447bad608a 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -362,6 +362,50 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 	dev_put(dev);
 }
 
+static struct xsk_map *xsk_get_map_list_entry(struct xdp_sock *xs,
+					      struct xdp_sock ***map_entry)
+{
+	struct xsk_map *map = NULL;
+	struct xsk_map_node *node;
+
+	*map_entry = NULL;
+
+	spin_lock_bh(&xs->map_list_lock);
+	node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node,
+					node);
+	if (node) {
+		WARN_ON(xsk_map_inc(node->map));
+		map = node->map;
+		*map_entry = node->map_entry;
+	}
+	spin_unlock_bh(&xs->map_list_lock);
+	return map;
+}
+
+static void xsk_delete_from_maps(struct xdp_sock *xs)
+{
+	/* This function removes the current XDP socket from all the
+	 * maps it resides in. We need to take extra care here, due to
+	 * the two locks involved. Each map has a lock synchronizing
+	 * updates to the entries, and each socket has a lock that
+	 * synchronizes access to the list of maps (map_list). For
+	 * deadlock avoidance the locks need to be taken in the order
+	 * "map lock"->"socket map list lock". We start off by
+	 * accessing the socket map list, and take a reference to the
+	 * map to guarantee existence. Then we ask the map to remove
+	 * the socket, which tries to remove the socket from the
+	 * map. Note that there might be updates to the map between
+	 * xsk_get_map_list_entry() and xsk_map_try_sock_delete().
+	 */
+	struct xdp_sock **map_entry = NULL;
+	struct xsk_map *map;
+
+	while ((map = xsk_get_map_list_entry(xs, &map_entry))) {
+		xsk_map_try_sock_delete(map, xs, map_entry);
+		xsk_map_put(map);
+	}
+}
+
 static int xsk_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
@@ -381,6 +425,7 @@ static int xsk_release(struct socket *sock)
 	sock_prot_inuse_add(net, sk->sk_prot, -1);
 	local_bh_enable();
 
+	xsk_delete_from_maps(xs);
 	xsk_unbind_dev(xs);
 
 	xskq_destroy(xs->rx);
@@ -855,6 +900,9 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
 	spin_lock_init(&xs->rx_lock);
 	spin_lock_init(&xs->tx_completion_lock);
 
+	INIT_LIST_HEAD(&xs->map_list);
+	spin_lock_init(&xs->map_list_lock);
+
 	mutex_lock(&net->xdp.lock);
 	sk_add_node_rcu(sk, &net->xdp.list);
 	mutex_unlock(&net->xdp.lock);
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next v4 0/2] net: xdp: XSKMAP improvements
From: Björn Töpel @ 2019-08-02  8:11 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, bjorn.topel,
	bruce.richardson, songliubraving, bpf

This series (v4 and counting) add two improvements for the XSKMAP,
used by AF_XDP sockets.

1. Automatic cleanup when an AF_XDP socket goes out of scope/is
   released. Instead of require that the user manually clears the
   "released" state socket from the map, this is done
   automatically. Each socket tracks which maps it resides in, and
   remove itself from those maps at relase. A notable implementation
   change, is that the sockets references the map, instead of the map
   referencing the sockets. Which implies that when the XSKMAP is
   freed, it is by definition cleared of sockets.

2. The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flag on insert,
   which this patch addresses.

Daniel, I (hopefully...) addressed the issues you found in
[1]. Instead of popping the tracked map, it's simply read. Then, the
socket is removed iff it's the same socket, i.e. no updates has
occurred. There are some code comments in the xsk_delete_from_maps()
function as well.

Thanks,
Björn

[1] https://lore.kernel.org/bpf/2417e1ab-16fa-d3ed-564e-1a50c4cb6717@iogearbox.net/

v1->v2: Fixed deadlock and broken cleanup. (Daniel)
v2->v3: Rebased onto bpf-next
v3->v4: {READ, WRITE}_ONCE consistency. (Daniel)
        Socket release/map update race. (Daniel)

Björn Töpel (2):
  xsk: remove AF_XDP socket from map when the socket is released
  xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP

 include/net/xdp_sock.h |  18 ++++++
 kernel/bpf/xskmap.c    | 130 ++++++++++++++++++++++++++++++++++-------
 net/xdp/xsk.c          |  48 +++++++++++++++
 3 files changed, 175 insertions(+), 21 deletions(-)

-- 
2.20.1

^ permalink raw reply

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: Peter Zijlstra @ 2019-08-02  8:05 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	linux-nfs, linux-rdma, linux-rpi-kernel, linux-xfs, netdev,
	rds-devel, sparclinux, x86, xen-devel, John Hubbard
In-Reply-To: <20190802021653.4882-1-jhubbard@nvidia.com>

On Thu, Aug 01, 2019 at 07:16:19PM -0700, john.hubbard@gmail.com wrote:

> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions"). That commit
> has an extensive description of the problem and the planned steps to
> solve it, but the highlites are:

That is one horridly mangled Changelog there :-/ It looks like it's
partially duplicated.

Anyway; no objections to any of that, but I just wanted to mention that
there are other problems with long term pinning that haven't been
mentioned, notably they inhibit compaction.

A long time ago I proposed an interface to mark pages as pinned, such
that we could run compaction before we actually did the pinning.

^ permalink raw reply

* [PATCH v2 2/2] cnic: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:05 UTC (permalink / raw)
  Cc: David S . Miller, netdev, linux-kernel, Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

This patch depends on the patch:
"cnic: Explicitly initialize all reference counts to 0."

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/broadcom/cnic.c    | 30 ++++++++++++-------------
 drivers/net/ethernet/broadcom/cnic_if.h |  6 ++---
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 155599dcee76..1f59f9606b85 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -141,22 +141,22 @@ static int cnic_uio_close(struct uio_info *uinfo, struct inode *inode)
 
 static inline void cnic_hold(struct cnic_dev *dev)
 {
-	atomic_inc(&dev->ref_count);
+	refcount_inc(&dev->ref_count);
 }
 
 static inline void cnic_put(struct cnic_dev *dev)
 {
-	atomic_dec(&dev->ref_count);
+	refcount_dec(&dev->ref_count);
 }
 
 static inline void csk_hold(struct cnic_sock *csk)
 {
-	atomic_inc(&csk->ref_count);
+	refcount_inc(&csk->ref_count);
 }
 
 static inline void csk_put(struct cnic_sock *csk)
 {
-	atomic_dec(&csk->ref_count);
+	refcount_dec(&csk->ref_count);
 }
 
 static struct cnic_dev *cnic_from_netdev(struct net_device *netdev)
@@ -177,12 +177,12 @@ static struct cnic_dev *cnic_from_netdev(struct net_device *netdev)
 
 static inline void ulp_get(struct cnic_ulp_ops *ulp_ops)
 {
-	atomic_inc(&ulp_ops->ref_count);
+	refcount_inc(&ulp_ops->ref_count);
 }
 
 static inline void ulp_put(struct cnic_ulp_ops *ulp_ops)
 {
-	atomic_dec(&ulp_ops->ref_count);
+	refcount_dec(&ulp_ops->ref_count);
 }
 
 static void cnic_ctx_wr(struct cnic_dev *dev, u32 cid_addr, u32 off, u32 val)
@@ -494,7 +494,7 @@ int cnic_register_driver(int ulp_type, struct cnic_ulp_ops *ulp_ops)
 	}
 	read_unlock(&cnic_dev_lock);
 
-	atomic_set(&ulp_ops->ref_count, 0);
+	refcount_set(&ulp_ops->ref_count, 1);
 	rcu_assign_pointer(cnic_ulp_tbl[ulp_type], ulp_ops);
 	mutex_unlock(&cnic_lock);
 
@@ -545,12 +545,12 @@ int cnic_unregister_driver(int ulp_type)
 
 	mutex_unlock(&cnic_lock);
 	synchronize_rcu();
-	while ((atomic_read(&ulp_ops->ref_count) != 0) && (i < 20)) {
+	while ((refcount_read(&ulp_ops->ref_count) != 1) && (i < 20)) {
 		msleep(100);
 		i++;
 	}
 
-	if (atomic_read(&ulp_ops->ref_count) != 0)
+	if (refcount_read(&ulp_ops->ref_count) != 1)
 		pr_warn("%s: Failed waiting for ref count to go to zero\n",
 			__func__);
 	return 0;
@@ -3596,7 +3596,7 @@ static int cnic_cm_create(struct cnic_dev *dev, int ulp_type, u32 cid,
 	}
 
 	csk1 = &cp->csk_tbl[l5_cid];
-	if (atomic_read(&csk1->ref_count))
+	if (refcount_read(&csk1->ref_count) != 1)
 		return -EAGAIN;
 
 	if (test_and_set_bit(SK_F_INUSE, &csk1->flags))
@@ -3651,7 +3651,7 @@ static int cnic_cm_destroy(struct cnic_sock *csk)
 	csk_hold(csk);
 	clear_bit(SK_F_INUSE, &csk->flags);
 	smp_mb__after_atomic();
-	while (atomic_read(&csk->ref_count) != 1)
+	while (refcount_read(&csk->ref_count) != 2)
 		msleep(1);
 	cnic_cm_cleanup(csk);
 
@@ -4104,7 +4104,7 @@ static int cnic_cm_alloc_mem(struct cnic_dev *dev)
 		return -ENOMEM;
 
 	for (i = 0; i < MAX_CM_SK_TBL_SZ; i++)
-		atomic_set(&cp->csk_tbl[i].ref_count, 0);
+		refcount_set(&cp->csk_tbl[i].ref_count, 1);
 
 	port_id = prandom_u32();
 	port_id %= CNIC_LOCAL_PORT_RANGE;
@@ -5436,11 +5436,11 @@ static void cnic_free_dev(struct cnic_dev *dev)
 {
 	int i = 0;
 
-	while ((atomic_read(&dev->ref_count) != 0) && i < 10) {
+	while ((refcount_read(&dev->ref_count) != 1) && i < 10) {
 		msleep(100);
 		i++;
 	}
-	if (atomic_read(&dev->ref_count) != 0)
+	if (refcount_read(&dev->ref_count) != 1)
 		netdev_err(dev->netdev, "Failed waiting for ref count to go to zero\n");
 
 	netdev_info(dev->netdev, "Removed CNIC device\n");
@@ -5484,7 +5484,7 @@ static struct cnic_dev *cnic_alloc_dev(struct net_device *dev,
 	cdev->unregister_device = cnic_unregister_device;
 	cdev->iscsi_nl_msg_recv = cnic_iscsi_nl_msg_recv;
 	cdev->get_fc_npiv_tbl = cnic_get_fc_npiv_tbl;
-	atomic_set(&cdev->ref_count, 0);
+	refcount_set(&cdev->ref_count, 1);
 
 	cp = cdev->cnic_priv;
 	cp->dev = cdev;
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 789e5c7e9311..5232a05ac7ba 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -300,7 +300,7 @@ struct cnic_sock {
 #define SK_F_CLOSING		7
 #define SK_F_HW_ERR		8
 
-	atomic_t ref_count;
+	refcount_t ref_count;
 	u32 state;
 	struct kwqe kwqe1;
 	struct kwqe kwqe2;
@@ -335,7 +335,7 @@ struct cnic_dev {
 #define CNIC_F_CNIC_UP		1
 #define CNIC_F_BNX2_CLASS	3
 #define CNIC_F_BNX2X_CLASS	4
-	atomic_t	ref_count;
+	refcount_t	ref_count;
 	u8		mac_addr[ETH_ALEN];
 
 	int		max_iscsi_conn;
@@ -378,7 +378,7 @@ struct cnic_ulp_ops {
 				  char *data, u16 data_size);
 	int (*cnic_get_stats)(void *ulp_ctx);
 	struct module *owner;
-	atomic_t ref_count;
+	refcount_t ref_count;
 };
 
 int cnic_register_driver(int ulp_type, struct cnic_ulp_ops *ulp_ops);
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 1/2] bnxt_en: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:05 UTC (permalink / raw)
  Cc: Michael Chan, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 8 ++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index fc77caf0a076..742a8ed200cb 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -49,7 +49,7 @@ static int bnxt_register_dev(struct bnxt_en_dev *edev, int ulp_id,
 			return -ENOMEM;
 	}
 
-	atomic_set(&ulp->ref_count, 0);
+	refcount_set(&ulp->ref_count, 1);
 	ulp->handle = handle;
 	rcu_assign_pointer(ulp->ulp_ops, ulp_ops);
 
@@ -87,7 +87,7 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int ulp_id)
 	synchronize_rcu();
 	ulp->max_async_event_id = 0;
 	ulp->async_events_bmap = NULL;
-	while (atomic_read(&ulp->ref_count) != 0 && i < 10) {
+	while (refcount_read(&ulp->ref_count) != 1 && i < 10) {
 		msleep(100);
 		i++;
 	}
@@ -246,12 +246,12 @@ static int bnxt_send_msg(struct bnxt_en_dev *edev, int ulp_id,
 
 static void bnxt_ulp_get(struct bnxt_ulp *ulp)
 {
-	atomic_inc(&ulp->ref_count);
+	refcount_inc(&ulp->ref_count);
 }
 
 static void bnxt_ulp_put(struct bnxt_ulp *ulp)
 {
-	atomic_dec(&ulp->ref_count);
+	refcount_dec(&ulp->ref_count);
 }
 
 void bnxt_ulp_stop(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index cd78453d0bf0..fc4aa582d190 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -52,7 +52,7 @@ struct bnxt_ulp {
 	u16		max_async_event_id;
 	u16		msix_requested;
 	u16		msix_base;
-	atomic_t	ref_count;
+	refcount_t	ref_count;
 };
 
 struct bnxt_en_dev {
-- 
2.20.1


^ permalink raw reply related

* Re: [net v1 PATCH 4/4] net: fix bpf_xdp_adjust_head regression for generic-XDP
From: Jesper Dangaard Brouer @ 2019-08-02  7:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, xdp-newbies, Daniel Borkmann,
	brandon.cazander, Alexei Starovoitov, brouer
In-Reply-To: <20190801174406.0b554bb9@cakuba.netronome.com>

On Thu, 1 Aug 2019 17:44:06 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Thu, 01 Aug 2019 20:00:31 +0200, Jesper Dangaard Brouer wrote:
> > When generic-XDP was moved to a later processing step by commit
> > 458bf2f224f0 ("net: core: support XDP generic on stacked devices.")
> > a regression was introduced when using bpf_xdp_adjust_head.
> > 
> > The issue is that after this commit the skb->network_header is now
> > changed prior to calling generic XDP and not after. Thus, if the header
> > is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
> > also need to be updated again.  Fix by calling skb_reset_network_header().
> > 
> > Fixes: 458bf2f224f0 ("net: core: support XDP generic on stacked devices.")
> > Reported-by: Brandon Cazander <brandon.cazander@multapplied.net>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>  
> 
> Out of curiosity what was your conclusion regarding resetting the
> transport header as well?

Well, I don't know... I need some review, from e.g. Stephen that
changed this... I've added code snippets below signature to helper
reviewers (also helps understand below paragraph).

I think, we perhaps should call skb_reset_transport_header(), as we
change skb->data (via either __skb_pull() or __skb_push()), *BUT* I'm
not sure it is needed/required, as someone/something afterwards still
need to call skb_set_transport_header(), which also calls
skb_reset_transport_header() anyway.

I also want to ask reviewers, if we should move the call to
skb_reset_mac_len() (which Stephen added) in __netif_receive_skb_core()
into netif_receive_generic_xdp() (after the call to
skb_reset_network_header()), (it would be more natural to keep them
together)?

- - 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Code snippet
	/* check if bpf_xdp_adjust_head was used */
	off = xdp->data - orig_data;
	if (off) {
		if (off > 0)
			__skb_pull(skb, off);
		else if (off < 0)
			__skb_push(skb, -off);

		skb->mac_header += off;
		skb_reset_network_header(skb);
	}


static inline bool skb_transport_header_was_set(const struct sk_buff *skb)
{
	return skb->transport_header != (typeof(skb->transport_header))~0U;
}

static inline unsigned char *skb_transport_header(const struct sk_buff *skb)
{
	return skb->head + skb->transport_header;
}

static inline void skb_reset_transport_header(struct sk_buff *skb)
{
	skb->transport_header = skb->data - skb->head;
}

static inline void skb_set_transport_header(struct sk_buff *skb,
					    const int offset)
{
	skb_reset_transport_header(skb);
	skb->transport_header += offset;
}


^ permalink raw reply

* Re: [PATCH v2 00/11] VSOCK: add vsock_test test suite
From: Stefano Garzarella @ 2019-08-02  7:55 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org, Stefan Hajnoczi,
	virtualization@lists.linux-foundation.org, David S. Miller,
	Jorgen Hansen, linux-kernel@vger.kernel.org
In-Reply-To: <PU1P153MB0169B265ECA51CB0AE1212DEBFDE0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thu, Aug 01, 2019 at 04:16:37PM +0000, Dexuan Cui wrote:
> > From: Stefano Garzarella <sgarzare@redhat.com>
> > Sent: Thursday, August 1, 2019 8:26 AM
> > 
> > The vsock_diag.ko module already has a test suite but the core AF_VSOCK
> > functionality has no tests.  This patch series adds several test cases that
> > exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv,
> > connect/accept,
> > half-closed connections, simultaneous connections).
> > 
> > Dexuan: Do you think can be useful to test HyperV?
> 
> Hi Stefano,
> Thanks! This should be useful, though I have to write the Windows host side
> code to use the test program(s). :-)
> 

Oh, yeah, I thought so :-)

Let me know when you'll try to find out if there's a problem.

Thanks,
Stefano

^ permalink raw reply

* Re: [patch net-next v2 1/3] net: devlink: allow to change namespaces
From: Jiri Pirko @ 2019-08-02  7:48 UTC (permalink / raw)
  To: David Ahern; +Cc: Jakub Kicinski, netdev, davem, sthemmin, mlxsw
In-Reply-To: <89dc6908-68b8-5b0d-0ef7-1eaf1e4e886b@gmail.com>

Wed, Jul 31, 2019 at 09:58:10PM CEST, dsahern@gmail.com wrote:
>On 7/31/19 1:46 PM, David Ahern wrote:
>> On 7/31/19 1:45 PM, Jiri Pirko wrote:
>>>> check. e.g., what happens if a resource controller has been configured
>>>> for the devlink instance and it is moved to a namespace whose existing
>>>> config exceeds those limits?
>>>
>>> It's moved with all the values. The whole instance is moved.
>>>
>> 
>> The values are moved, but the FIB in a namespace could already contain
>> more routes than the devlink instance allows.
>> 
>
>From a quick test your recent refactoring to netdevsim broke the
>resource controller. It was, and is intended to be, per network namespace.

unifying devlink instances with network namespace in netdevsim was
really odd. Netdevsim is also a device, like any other. With other
devices, you do not do this so I don't see why to do this with netdevsim.

Now you create netdevsim instance in sysfs, there is proper bus probe
mechanism done, there is a devlink instance created for this device,
there are netdevices and devlink ports created. Same as for the real
hardware.

Honestly, creating a devlink instance per-network namespace
automagically, no relation to netdevsim devices, that is simply wrong.
There should be always 1:1 relationshin between a device and devlink
instance.

^ permalink raw reply

* Re: [patch net-next v2 1/3] net: devlink: allow to change namespaces
From: Jiri Pirko @ 2019-08-02  7:43 UTC (permalink / raw)
  To: David Ahern; +Cc: Jakub Kicinski, netdev, davem, sthemmin, mlxsw
In-Reply-To: <087f584d-06c5-f4b9-722b-ccb72ce0e5de@gmail.com>

Wed, Jul 31, 2019 at 09:46:13PM CEST, dsahern@gmail.com wrote:
>On 7/31/19 1:45 PM, Jiri Pirko wrote:
>>> check. e.g., what happens if a resource controller has been configured
>>> for the devlink instance and it is moved to a namespace whose existing
>>> config exceeds those limits?
>> 
>> It's moved with all the values. The whole instance is moved.
>> 
>
>The values are moved, but the FIB in a namespace could already contain
>more routes than the devlink instance allows.


There is no relation between fib and devlink.
>

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jiri Pirko @ 2019-08-02  7:42 UTC (permalink / raw)
  To: David Ahern; +Cc: Jakub Kicinski, netdev, davem, sthemmin, mlxsw
In-Reply-To: <de94a881-cb87-e251-d55c-9f40d24b08d5@gmail.com>

Thu, Aug 01, 2019 at 12:31:52AM CEST, dsahern@gmail.com wrote:
>On 7/31/19 4:28 PM, Jakub Kicinski wrote:
>> On Wed, 31 Jul 2019 16:07:31 -0600, David Ahern wrote:
>>> On 7/31/19 4:02 PM, Jakub Kicinski wrote:
>>>> Can you elaborate further? Ports for most purposes are represented by
>>>> netdevices. Devlink port instances expose global topological view of
>>>> the ports which is primarily relevant if you can see the entire ASIC.
>>>> I think the global configuration and global view of resources is still
>>>> the most relevant need, so in your diagram you must account for some
>>>> "all-seeing" instance, e.g.:
>>>>
>>>>    namespace 1 |  namespace 2  | ... | namespace N
>>>>                |               |     |
>>>>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>>>>                |               |     |
>>>> subdevlink 1   | subdevlink 2  | ... |  subdevlink N
>>>>          \______      |              _______/
>>>>                  master ASIC devlink
>>>>   =================================================
>>>>                    driver
>>>>
>>>> No?
>>>
>>> sure, there could be a master devlink visible to the user if that makes
>>> sense or the driver can account for it behind the scenes as the sum of
>>> the devlink instances.
>>>
>>> The goal is to allow ports within an asic [1] to be divided across
>>> network namespace where each namespace sees a subset of the ports. This
>>> allows creating multiple logical switches from a single physical asic.
>>>
>>> [1] within constraints imposed by the driver/hardware - for example to
>>> account for resources shared by a set of ports. e.g., front panel ports
>>> 1 - 4 have shared resources and must always be in the same devlink instance.
>> 
>> So the ASIC would start out all partitioned? Presumably some would
>> still like to use it non-partitioned? What follows there must be a top
>> level instance to decide on partitioning, and moving resources between
>> sub-instances.
>> 
>> Right now I don't think there is much info in devlink ports which would
>> be relevant without full view of the ASIC..
>> 
>
>not sure how it would play out. really just 'thinking out loud' about
>the above use case to make sure devlink with proper namespace support
>allows it - or does not prevent it.

I Don't see reason or usecase to have ports or other objects of devlink
in separate namespaces. Devlink and it's objects are one big item,
should be always together.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox