Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] net: ethoc: Misc improvements
From: Florian Fainelli @ 2016-12-04 20:40 UTC (permalink / raw)
  To: netdev; +Cc: tremyfr, tklauser, davem, thierry.reding, andrew,
	Florian Fainelli

Hi all,

This patch series fixes/improves a few things:

- implement a proper PHYLIB adjust_link callback to set the duplex mode
  accordingly
- do not open code the fetching of a MAC address in OF/DT environments
- demote an error message that occurs more frequently than expected in low
  CPU/memory/bandwidth environments

Tested on a Cirrus Logic EP93xx / TS7300 board.

Florian Fainelli (3):
  net: ethoc: Account for duplex changes
  net: ethoc: Utilize of_get_mac_address()
  net: ethoc: Demote packet dropped error message to debug

 drivers/net/ethernet/ethoc.c | 44 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net-next 1/3] net: ethoc: Account for duplex changes
From: Florian Fainelli @ 2016-12-04 20:40 UTC (permalink / raw)
  To: netdev; +Cc: tremyfr, tklauser, davem, thierry.reding, andrew,
	Florian Fainelli
In-Reply-To: <20161204204030.9853-1-f.fainelli@gmail.com>

ethoc_mdio_poll() which is our PHYLIB adjust_link callback does nothing,
we should at least react to duplex changes and change MODER accordingly.
Speed changes is not a problem, since the OpenCores Ethernet core seems
to be reacting okay without us telling it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/ethoc.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 6456c180114b..877c02a36c85 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -221,6 +221,9 @@ struct ethoc {
 	struct mii_bus *mdio;
 	struct clk *clk;
 	s8 phy_id;
+
+	int old_link;
+	int old_duplex;
 };
 
 /**
@@ -667,6 +670,32 @@ static int ethoc_mdio_write(struct mii_bus *bus, int phy, int reg, u16 val)
 
 static void ethoc_mdio_poll(struct net_device *dev)
 {
+	struct ethoc *priv = netdev_priv(dev);
+	struct phy_device *phydev = dev->phydev;
+	bool changed = false;
+	u32 mode;
+
+	if (priv->old_link != phydev->link) {
+		changed = true;
+		priv->old_link = phydev->link;
+	}
+
+	if (priv->old_duplex != phydev->duplex) {
+		changed = true;
+		priv->old_duplex = phydev->duplex;
+	}
+
+	if (!changed)
+		return;
+
+	mode = ethoc_read(priv, MODER);
+	if (phydev->duplex == DUPLEX_FULL)
+		mode |= MODER_FULLD;
+	else
+		mode &= ~MODER_FULLD;
+	ethoc_write(priv, MODER, mode);
+
+	phy_print_status(phydev);
 }
 
 static int ethoc_mdio_probe(struct net_device *dev)
@@ -685,6 +714,9 @@ static int ethoc_mdio_probe(struct net_device *dev)
 		return -ENXIO;
 	}
 
+	priv->old_duplex = -1;
+	priv->old_link = -1;
+
 	err = phy_connect_direct(dev, phy, ethoc_mdio_poll,
 				 PHY_INTERFACE_MODE_GMII);
 	if (err) {
@@ -721,6 +753,9 @@ static int ethoc_open(struct net_device *dev)
 		netif_start_queue(dev);
 	}
 
+	priv->old_link = -1;
+	priv->old_duplex = -1;
+
 	phy_start(dev->phydev);
 	napi_enable(&priv->napi);
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 2/3] net: ethoc: Utilize of_get_mac_address()
From: Florian Fainelli @ 2016-12-04 20:40 UTC (permalink / raw)
  To: netdev; +Cc: tremyfr, tklauser, davem, thierry.reding, andrew,
	Florian Fainelli
In-Reply-To: <20161204204030.9853-1-f.fainelli@gmail.com>

Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/ethoc.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 877c02a36c85..8d0cb5ce87ee 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -23,6 +23,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/of.h>
+#include <linux/of_net.h>
 #include <linux/module.h>
 #include <net/ethoc.h>
 
@@ -1158,11 +1159,9 @@ static int ethoc_probe(struct platform_device *pdev)
 		memcpy(netdev->dev_addr, pdata->hwaddr, IFHWADDRLEN);
 		priv->phy_id = pdata->phy_id;
 	} else {
-		const uint8_t *mac;
+		const void *mac;
 
-		mac = of_get_property(pdev->dev.of_node,
-				      "local-mac-address",
-				      NULL);
+		mac = of_get_mac_address(pdev->dev.of_node);
 		if (mac)
 			memcpy(netdev->dev_addr, mac, IFHWADDRLEN);
 		priv->phy_id = -1;
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 3/3] net: ethoc: Demote packet dropped error message to debug
From: Florian Fainelli @ 2016-12-04 20:40 UTC (permalink / raw)
  To: netdev; +Cc: tremyfr, tklauser, davem, thierry.reding, andrew,
	Florian Fainelli
In-Reply-To: <20161204204030.9853-1-f.fainelli@gmail.com>

Spamming the console with: net eth1: packet dropped can happen
fairly frequently if the adapter is busy transmitting, demote the
message to a debug print.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/ethoc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 8d0cb5ce87ee..45abc81f6f55 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -576,7 +576,7 @@ static irqreturn_t ethoc_interrupt(int irq, void *dev_id)
 
 	/* We always handle the dropped packet interrupt */
 	if (pending & INT_MASK_BUSY) {
-		dev_err(&dev->dev, "packet dropped\n");
+		dev_dbg(&dev->dev, "packet dropped\n");
 		dev->stats.rx_dropped++;
 	}
 
-- 
2.9.3

^ permalink raw reply related

* "af_unix: conditionally use freezable blocking calls in read" is wrong
From: Al Viro @ 2016-12-04 21:04 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang

	Could we please kill that kludge?  "af_unix: use freezable blocking
calls in read" had been wrong to start with; having a method make assumptions
of that sort ("nobody will call me while holding locks I hadn't thought of")
is asking for serious trouble.  splice is just a place where lockdep has
caught that - we *can't* assume that nobody will ever call kernel_recvmsg()
while holding some locks.

	I've run into that converting AF_UNIX to generic_file_splice_read();
I can kludge around that ("freezable unless ->msg_iter is ITER_PIPE"), but
that only delays trouble.

	Note that the only other user of freezable_schedule_timeout() is
a very different story - it's a kernel thread, which *does* have a guaranteed
locking environment.  Making such assumptions in unix_stream_recvmsg(),
OTOH, is insane...

^ permalink raw reply

* Re: [PATCH] mlx4: Use kernel sizeof and alloc styles
From: Eric Dumazet @ 2016-12-04 20:58 UTC (permalink / raw)
  To: Joe Perches; +Cc: Yishai Hadas, Tariq Toukan, netdev, linux-rdma, linux-kernel
In-Reply-To: <8b20668cb08cce8b4af872863440e149bd25fa94.1480882180.git.joe@perches.com>

On Sun, 2016-12-04 at 12:11 -0800, Joe Perches wrote:
> Convert sizeof foo to sizeof(foo) and allocations with multiplications
> to the appropriate kcalloc/kmalloc_array styles.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---

Gah.

This is one of the hotest NIC driver on linux at this moment, 
with XDP and other efforts going on.

Some kmalloc() are becoming kmalloc_node() in some dev branches, and
there is no kmalloc_array_node() yet.

This kind of patch is making rebases/backports very painful.

Could we wait ~6 months before doing such cleanup/changes please ?

If you believe a bug needs a fix, please send a patch to address it.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 2/3] net/act_pedit: Support using offset relative to the conventional network headers
From: Or Gerlitz @ 2016-12-04 21:55 UTC (permalink / raw)
  To: David Miller
  Cc: Linux Netdev List, Jamal Hadi Salim, Hadar Hen Zion, Jiri Pirko
In-Reply-To: <20161202104029.GA7729@office.localdomain>

On Fri, Dec 2, 2016 at 12:40 PM, Amir Vadai <amir@vadai.me> wrote:
> On Thu, Dec 01, 2016 at 02:41:14PM -0500, David Miller wrote:
>> From: Amir Vadai <amir@vadai.me>
>> Date: Wed, 30 Nov 2016 11:09:27 +0200

>> > +static int pedit_skb_hdr_offset(struct sk_buff *skb,
>> > +                           enum pedit_header_type htype, int *hoffset)
>> > +{
>> > +   int ret = -1;
>> > +
>> > +   switch (htype) {
>> > +   case PEDIT_HDR_TYPE_ETH:
>> > +           if (skb_mac_header_was_set(skb)) {
>> > +                   *hoffset = skb_mac_offset(skb);
>> > +                   ret = 0;
>> > +           }
>> > +           break;
>> > +   case PEDIT_HDR_TYPE_RAW:
>> > +   case PEDIT_HDR_TYPE_IP4:
>> > +   case PEDIT_HDR_TYPE_IP6:
>> > +           *hoffset = skb_network_offset(skb);
>> > +           ret = 0;
>> > +           break;
>> > +   case PEDIT_HDR_TYPE_TCP:
>> > +   case PEDIT_HDR_TYPE_UDP:
>> > +           if (skb_transport_header_was_set(skb)) {
>> > +                   *hoffset = skb_transport_offset(skb);
>> > +                   ret = 0;
>> > +           }
>> > +           break;
>> > +   };
>> > +
>> > +   return ret;
>> > +}
>> > +

>> The only distinction between the cases is "L2", "L3", and "L4".

>> Therefore I don't see any reason to break it down into IP4 vs. IP6 vs.
>> RAW, for example.  They all map to the same thing.

>> So why not just have PEDIT_HDR_TYPE_L2, PEDIT_HDR_TYPE_L3, and
>> PEDIT_HDR_TYPE_L4?  It definitely seems more straightforward
>> and cleaner that way.

> Yeh, is isn't by mistake. The next step will be to implement hardware
> offloading of the action, and for that we would like to keep the
> information about the specific header type.

Hi Dave,

I see that this patch is marked as "Changes Requested" @ your patchworks.

Just wanted to make a note as Amir explained here and as mentioned on
the change log, this was done in purpose, as heads up for HW offloads.
Typically HW APIs would let you do things also based on header type
they have parsed, etc, so that's why we added this small redundancy
e.g of IPv4/IPv6 header ID instead of network header ID - while SW
wise both IPv4/IPv6 are using the same code path, for HW offloads, the
HW driver could choose to use the IPv4/IPv6 header ID info.

Or.

^ permalink raw reply

* Re: [PATCH v1 net-next 1/5] net: dsa: mv88e6xxx: Reserved Management frames to CPU
From: Vivien Didelot @ 2016-12-04 22:12 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Miller, netdev
In-Reply-To: <20161204202234.GA20743@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

>> The mv88e6xxx_ops actually implements the *features*. They can be
>> prefixed for clarity (e.g. .ppu_*, port_*, .atu_*, etc.). They don't
>> describe the register layout.
>> 
>> But we can discuss two ways of seeing this structure implementation:
>
> or
>
> 3) We have a prefix for us humans to help us find the code. Now we
> have ops, i cannot simply do M-. and emacs will take me to the
> implementation. I have to search for it a bit. Having the hint g1_
> tells me to go look in global1.c. Having the hint g2_ tells me to go
> look in global2.c. Having the port_ tells me to go look in port.c.
> Having no prefix tells me the code is scattered around and grep is my
> friend.

Just to be clear:

I totally agree for an implementation (e.g. mv88e6095_g1_set_cpu_port),
that's why I've been doing it since I started splitting the code around
in device-specific files. But I disagree for an mv88e6xxx_ops member.

You can have several implementations in the same file (e.g. global1.c),
so again the only value is the function name, not the struct member.


Thanks,

     Vivien

^ permalink raw reply

* [PATCH net-next 1/3] bpf: remove type arg from __is_valid_{,xdp_}access
From: Daniel Borkmann @ 2016-12-04 22:19 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480889473.git.daniel@iogearbox.net>

Commit d691f9e8d440 ("bpf: allow programs to write to certain skb
fields") pushed access type check outside of __is_valid_access()
to have different restrictions for socket filters and tc programs.
type is thus not used anymore within __is_valid_access() and should
be removed as a function argument. Same for __is_valid_xdp_access()
introduced by 6a773a15a1e8 ("bpf: add XDP prog type for early driver
filter").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 net/core/filter.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 56b4358..b751202 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2748,7 +2748,7 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
 	}
 }
 
-static bool __is_valid_access(int off, int size, enum bpf_access_type type)
+static bool __is_valid_access(int off, int size)
 {
 	if (off < 0 || off >= sizeof(struct __sk_buff))
 		return false;
@@ -2782,7 +2782,7 @@ static bool sk_filter_is_valid_access(int off, int size,
 		}
 	}
 
-	return __is_valid_access(off, size, type);
+	return __is_valid_access(off, size);
 }
 
 static bool lwt_is_valid_access(int off, int size,
@@ -2815,7 +2815,7 @@ static bool lwt_is_valid_access(int off, int size,
 		break;
 	}
 
-	return __is_valid_access(off, size, type);
+	return __is_valid_access(off, size);
 }
 
 static bool sock_filter_is_valid_access(int off, int size,
@@ -2833,11 +2833,9 @@ static bool sock_filter_is_valid_access(int off, int size,
 
 	if (off < 0 || off + size > sizeof(struct bpf_sock))
 		return false;
-
 	/* The verifier guarantees that size > 0. */
 	if (off % size != 0)
 		return false;
-
 	if (size != sizeof(__u32))
 		return false;
 
@@ -2910,11 +2908,10 @@ static bool tc_cls_act_is_valid_access(int off, int size,
 		break;
 	}
 
-	return __is_valid_access(off, size, type);
+	return __is_valid_access(off, size);
 }
 
-static bool __is_valid_xdp_access(int off, int size,
-				  enum bpf_access_type type)
+static bool __is_valid_xdp_access(int off, int size)
 {
 	if (off < 0 || off >= sizeof(struct xdp_md))
 		return false;
@@ -2942,7 +2939,7 @@ static bool xdp_is_valid_access(int off, int size,
 		break;
 	}
 
-	return __is_valid_xdp_access(off, size, type);
+	return __is_valid_xdp_access(off, size);
 }
 
 void bpf_warn_invalid_xdp_action(u32 act)
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 2/3] bpf, cls: consolidate prog deletion path
From: Daniel Borkmann @ 2016-12-04 22:19 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480889473.git.daniel@iogearbox.net>

Commit 18cdb37ebf4c ("net: sched: do not use tcf_proto 'tp' argument from
call_rcu") removed the last usage of tp from cls_bpf_delete_prog(), so also
remove it from the function as argument to not give a wrong impression. tp
is illegal to access from this callback, since it could already have been
freed.

Refactor the deletion code a bit, so that cls_bpf_destroy() can call into
the same code for prog deletion as cls_bpf_delete() op, instead of having
it unnecessarily duplicated.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 net/sched/cls_bpf.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index c37aa8b..f70e03d 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -241,7 +241,7 @@ static int cls_bpf_init(struct tcf_proto *tp)
 	return 0;
 }
 
-static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
+static void __cls_bpf_delete_prog(struct cls_bpf_prog *prog)
 {
 	tcf_exts_destroy(&prog->exts);
 
@@ -255,22 +255,22 @@ static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 	kfree(prog);
 }
 
-static void __cls_bpf_delete_prog(struct rcu_head *rcu)
+static void cls_bpf_delete_prog_rcu(struct rcu_head *rcu)
 {
-	struct cls_bpf_prog *prog = container_of(rcu, struct cls_bpf_prog, rcu);
-
-	cls_bpf_delete_prog(prog->tp, prog);
+	__cls_bpf_delete_prog(container_of(rcu, struct cls_bpf_prog, rcu));
 }
 
-static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 {
-	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) arg;
-
 	cls_bpf_stop_offload(tp, prog);
 	list_del_rcu(&prog->link);
 	tcf_unbind_filter(tp, &prog->res);
-	call_rcu(&prog->rcu, __cls_bpf_delete_prog);
+	call_rcu(&prog->rcu, cls_bpf_delete_prog_rcu);
+}
 
+static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+{
+	__cls_bpf_delete(tp, (struct cls_bpf_prog *) arg);
 	return 0;
 }
 
@@ -282,12 +282,8 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
 	if (!force && !list_empty(&head->plist))
 		return false;
 
-	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
-		cls_bpf_stop_offload(tp, prog);
-		list_del_rcu(&prog->link);
-		tcf_unbind_filter(tp, &prog->res);
-		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
-	}
+	list_for_each_entry_safe(prog, tmp, &head->plist, link)
+		__cls_bpf_delete(tp, prog);
 
 	kfree_rcu(head, rcu);
 	return true;
@@ -511,14 +507,14 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 
 	ret = cls_bpf_offload(tp, prog, oldprog);
 	if (ret) {
-		cls_bpf_delete_prog(tp, prog);
+		__cls_bpf_delete_prog(prog);
 		return ret;
 	}
 
 	if (oldprog) {
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
-		call_rcu(&oldprog->rcu, __cls_bpf_delete_prog);
+		call_rcu(&oldprog->rcu, cls_bpf_delete_prog_rcu);
 	} else {
 		list_add_rcu(&prog->link, &head->plist);
 	}
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 0/3] Minor BPF cleanups and digest
From: Daniel Borkmann @ 2016-12-04 22:19 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann

First two patches are minor cleanups, and the third one adds
a prog digest. For details, please see individual patches.
After this one, I have a set with tracepoint support that makes
use of this facility as well.

Thanks!

Daniel Borkmann (3):
  bpf: remove type arg from __is_valid_{,xdp_}access
  bpf, cls: consolidate prog deletion path
  bpf: add prog_digest and expose it via fdinfo/netlink

 include/linux/bpf.h                |  1 +
 include/linux/filter.h             |  7 +++-
 include/uapi/linux/pkt_cls.h       |  1 +
 include/uapi/linux/tc_act/tc_bpf.h |  1 +
 kernel/bpf/core.c                  | 65 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c               | 24 +++++++++++++-
 kernel/bpf/verifier.c              |  2 ++
 net/core/filter.c                  | 15 ++++-----
 net/sched/act_bpf.c                |  9 ++++++
 net/sched/cls_bpf.c                | 38 ++++++++++++----------
 10 files changed, 135 insertions(+), 28 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [PATCH net-next 3/3] bpf: add prog_digest and expose it via fdinfo/netlink
From: Daniel Borkmann @ 2016-12-04 22:19 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1480889473.git.daniel@iogearbox.net>

When loading a BPF program via bpf(2), calculate the digest over
the program's instruction stream and store it in struct bpf_prog's
digest member. This is done at a point in time before any instructions
are rewritten by the verifier. Any unstable map file descriptor
number part of the imm field will be zeroed for the hash.

fdinfo example output for progs:

  # cat /proc/1590/fdinfo/5
  pos:          0
  flags:        02000002
  mnt_id:       11
  prog_type:    1
  prog_jited:   1
  prog_digest:  b27e8b06da22707513aa97363dfb11c7c3675d28
  memlock:      4096

When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded. Furthermore, this can also be
exposed through other means such as netlink in case of a tc cls/act
dump (or xdp in future), but also through tracepoints or other
facilities to identify the program. Other than that, the digest can
also serve as a base name for the work in progress kallsyms support
of programs. The digest doesn't depend/select the crypto layer, since
we need to keep dependencies to a minimum. iproute2 will get support
for this facility.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h                |  1 +
 include/linux/filter.h             |  7 +++-
 include/uapi/linux/pkt_cls.h       |  1 +
 include/uapi/linux/tc_act/tc_bpf.h |  1 +
 kernel/bpf/core.c                  | 65 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c               | 24 +++++++++++++-
 kernel/bpf/verifier.c              |  2 ++
 net/sched/act_bpf.c                |  9 ++++++
 net/sched/cls_bpf.c                |  8 +++++
 9 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 69d0a7f..8796ff0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -216,6 +216,7 @@ struct bpf_event_entry {
 u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
+void bpf_prog_calc_digest(struct bpf_prog *fp);
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9733813..f078d2b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -14,6 +14,7 @@
 #include <linux/workqueue.h>
 #include <linux/sched.h>
 #include <linux/capability.h>
+#include <linux/cryptohash.h>
 
 #include <net/sch_generic.h>
 
@@ -56,6 +57,9 @@
 /* BPF program can access up to 512 bytes of stack space. */
 #define MAX_BPF_STACK	512
 
+/* Maximum BPF program size in bytes. */
+#define MAX_BPF_SIZE	(BPF_MAXINSNS * sizeof(struct bpf_insn))
+
 /* Helper macros for filter block array initializers. */
 
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
@@ -404,8 +408,9 @@ struct bpf_prog {
 				cb_access:1,	/* Is control block accessed? */
 				dst_needed:1;	/* Do we need dst entry? */
 	kmemcheck_bitfield_end(meta);
-	u32			len;		/* Number of filter blocks */
 	enum bpf_prog_type	type;		/* Type of BPF program */
+	u32			len;		/* Number of filter blocks */
+	u32			digest[SHA_DIGEST_WORDS]; /* Program digest */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
 	unsigned int		(*bpf_func)(const void *ctx,
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 86786d4..1adc0b6 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -397,6 +397,7 @@ enum {
 	TCA_BPF_NAME,
 	TCA_BPF_FLAGS,
 	TCA_BPF_FLAGS_GEN,
+	TCA_BPF_DIGEST,
 	__TCA_BPF_MAX,
 };
 
diff --git a/include/uapi/linux/tc_act/tc_bpf.h b/include/uapi/linux/tc_act/tc_bpf.h
index 063d9d4..a6b88a6 100644
--- a/include/uapi/linux/tc_act/tc_bpf.h
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -27,6 +27,7 @@ enum {
 	TCA_ACT_BPF_FD,
 	TCA_ACT_BPF_NAME,
 	TCA_ACT_BPF_PAD,
+	TCA_ACT_BPF_DIGEST,
 	__TCA_ACT_BPF_MAX,
 };
 #define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 82a0414..bdcc9f4 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -136,6 +136,71 @@ void __bpf_prog_free(struct bpf_prog *fp)
 	vfree(fp);
 }
 
+#define SHA_BPF_RAW_SIZE						\
+	round_up(MAX_BPF_SIZE + sizeof(__be64) + 1, SHA_MESSAGE_BYTES)
+
+/* Called under verifier mutex. */
+void bpf_prog_calc_digest(struct bpf_prog *fp)
+{
+	const u32 bits_offset = SHA_MESSAGE_BYTES - sizeof(__be64);
+	static u32 ws[SHA_WORKSPACE_WORDS];
+	static u8 raw[SHA_BPF_RAW_SIZE];
+	struct bpf_insn *dst = (void *)raw;
+	u32 i, bsize, psize, blocks;
+	bool was_ld_map;
+	u8 *todo = raw;
+	__be32 *result;
+	__be64 *bits;
+
+	sha_init(fp->digest);
+	memset(ws, 0, sizeof(ws));
+
+	/* We need to take out the map fd for the digest calculation
+	 * since they are unstable from user space side.
+	 */
+	for (i = 0, was_ld_map = false; i < fp->len; i++) {
+		dst[i] = fp->insnsi[i];
+		if (!was_ld_map &&
+		    dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+		    dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+			was_ld_map = true;
+			dst[i].imm = 0;
+		} else if (was_ld_map &&
+			   dst[i].code == 0 &&
+			   dst[i].dst_reg == 0 &&
+			   dst[i].src_reg == 0 &&
+			   dst[i].off == 0) {
+			was_ld_map = false;
+			dst[i].imm = 0;
+		} else {
+			was_ld_map = false;
+		}
+	}
+
+	psize = fp->len * sizeof(struct bpf_insn);
+	memset(&raw[psize], 0, sizeof(raw) - psize);
+	raw[psize++] = 0x80;
+
+	bsize  = round_up(psize, SHA_MESSAGE_BYTES);
+	blocks = bsize / SHA_MESSAGE_BYTES;
+	if (bsize - psize >= sizeof(__be64)) {
+		bits = (__be64 *)(todo + bsize - sizeof(__be64));
+	} else {
+		bits = (__be64 *)(todo + bsize + bits_offset);
+		blocks++;
+	}
+	*bits = cpu_to_be64((psize - 1) << 3);
+
+	while (blocks--) {
+		sha_transform(fp->digest, todo, ws);
+		todo += SHA_MESSAGE_BYTES;
+	}
+
+	result = (__force __be32 *)fp->digest;
+	for (i = 0; i < SHA_DIGEST_WORDS; i++)
+		result[i] = cpu_to_be32(fp->digest[i]);
+}
+
 static bool bpf_is_jmp_and_has_target(const struct bpf_insn *insn)
 {
 	return BPF_CLASS(insn->code) == BPF_JMP  &&
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 85af86c..c0d2b42 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -662,8 +662,30 @@ static int bpf_prog_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+#ifdef CONFIG_PROC_FS
+static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	const struct bpf_prog *prog = filp->private_data;
+	char prog_digest[sizeof(prog->digest) * 2 + 1] = { };
+
+	bin2hex(prog_digest, prog->digest, sizeof(prog->digest));
+	seq_printf(m,
+		   "prog_type:\t%u\n"
+		   "prog_jited:\t%u\n"
+		   "prog_digest:\t%s\n"
+		   "memlock:\t%llu\n",
+		   prog->type,
+		   prog->jited,
+		   prog_digest,
+		   prog->pages * 1ULL << PAGE_SHIFT);
+}
+#endif
+
 static const struct file_operations bpf_prog_fops = {
-        .release = bpf_prog_release,
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= bpf_prog_show_fdinfo,
+#endif
+	.release	= bpf_prog_release,
 };
 
 int bpf_prog_new_fd(struct bpf_prog *prog)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0e74221..16ad38b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3171,6 +3171,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 		log_level = 0;
 	}
 
+	bpf_prog_calc_digest(env->prog);
+
 	ret = replace_map_fd_with_map_ptr(env);
 	if (ret < 0)
 		goto skip_full_check;
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 84c1d2d..1c60317 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -117,10 +117,19 @@ static int tcf_bpf_dump_bpf_info(const struct tcf_bpf *prog,
 static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf *prog,
 				  struct sk_buff *skb)
 {
+	struct nlattr *nla;
+
 	if (prog->bpf_name &&
 	    nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
 
+	nla = nla_reserve(skb, TCA_ACT_BPF_DIGEST,
+			  sizeof(prog->filter->digest));
+	if (nla == NULL)
+		return -EMSGSIZE;
+
+	memcpy(nla_data(nla), prog->filter->digest, nla_len(nla));
+
 	return 0;
 }
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index f70e03d..adc7760 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -549,10 +549,18 @@ static int cls_bpf_dump_bpf_info(const struct cls_bpf_prog *prog,
 static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
 				  struct sk_buff *skb)
 {
+	struct nlattr *nla;
+
 	if (prog->bpf_name &&
 	    nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
 
+	nla = nla_reserve(skb, TCA_BPF_DIGEST, sizeof(prog->filter->digest));
+	if (nla == NULL)
+		return -EMSGSIZE;
+
+	memcpy(nla_data(nla), prog->filter->digest, nla_len(nla));
+
 	return 0;
 }
 
-- 
1.9.3

^ permalink raw reply related

* [PATCH v2 main-v4.9-rc7] net/ipv6: allow sysctl to change link-local address generation mode
From: Felix Jia @ 2016-12-04 22:31 UTC (permalink / raw)
  To: netdev; +Cc: Felix Jia, Carl Smith

Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.

The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.

An possible improvement is to remove the addrgenmode variable from the
idev structure and use the systcl storage for the flag.

The patch is based from v4.9-rc7 in mainline.

Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Cc: Carl Smith <carl.smith@alliedtelesis.co.nz>
---
 include/linux/ipv6.h      |  1 +
 include/uapi/linux/ipv6.h |  1 +
 net/ipv6/addrconf.c       | 73 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index a064997..0d9e5d4 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -64,6 +64,7 @@ struct ipv6_devconf {
 	} stable_secret;
 	__s32		use_oif_addrs_only;
 	__s32		keep_addr_on_down;
+	__s32		addrgenmode;
 
 	struct ctl_table_header *sysctl_header;
 };
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 8c27723..0524e2c 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -178,6 +178,7 @@ enum {
 	DEVCONF_DROP_UNSOLICITED_NA,
 	DEVCONF_KEEP_ADDR_ON_DOWN,
 	DEVCONF_RTR_SOLICIT_MAX_INTERVAL,
+	DEVCONF_ADDRGENMODE,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4bc5ba3..2b83cc7 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -238,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.use_oif_addrs_only	= 0,
 	.ignore_routes_with_linkdown = 0,
 	.keep_addr_on_down	= 0,
+	.addrgenmode = IN6_ADDR_GEN_MODE_EUI64,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -284,6 +285,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.use_oif_addrs_only	= 0,
 	.ignore_routes_with_linkdown = 0,
 	.keep_addr_on_down	= 0,
+	.addrgenmode = IN6_ADDR_GEN_MODE_EUI64,
 };
 
 /* Check if a valid qdisc is available */
@@ -378,7 +380,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
 	if (ndev->cnf.stable_secret.initialized)
 		ndev->addr_gen_mode = IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
 	else
-		ndev->addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64;
+		ndev->addr_gen_mode = ipv6_devconf_dflt.addrgenmode;
 
 	ndev->cnf.mtu6 = dev->mtu;
 	ndev->nd_parms = neigh_parms_alloc(dev, &nd_tbl);
@@ -4950,6 +4952,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = cnf->drop_unicast_in_l2_multicast;
 	array[DEVCONF_DROP_UNSOLICITED_NA] = cnf->drop_unsolicited_na;
 	array[DEVCONF_KEEP_ADDR_ON_DOWN] = cnf->keep_addr_on_down;
+	array[DEVCONF_ADDRGENMODE] = cnf->addrgenmode;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5496,6 +5499,67 @@ int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
 	return proc_dointvec_minmax(&lctl, write, buffer, lenp, ppos);
 }
 
+static void addrconf_addrgenmode_change(struct net *net)
+{
+	struct net_device *dev;
+	struct inet6_dev *idev;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		idev = __in6_dev_get(dev);
+		if (idev) {
+			idev->cnf.addrgenmode = ipv6_devconf_dflt.addrgenmode;
+			idev->addr_gen_mode = ipv6_devconf_dflt.addrgenmode;
+			addrconf_dev_config(idev->dev);
+		}
+	}
+	rcu_read_unlock();
+}
+
+static int addrconf_sysctl_addrgenmode(struct ctl_table *ctl, int write,
+								void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret;
+	int new_val;
+	struct inet6_dev *idev = (struct inet6_dev *)ctl->extra1;
+	struct net *net = (struct net *)ctl->extra2;
+
+	if (write) { /* sysctl write request */
+		ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
+		new_val = *((int *)ctl->data);
+
+		/* request for the all */
+		if (&net->ipv6.devconf_all->addrgenmode == ctl->data) {
+			ipv6_devconf_dflt.addrgenmode = new_val;
+			addrconf_addrgenmode_change(net);
+
+		/* request for default */
+		} else if (&net->ipv6.devconf_dflt->addrgenmode == ctl->data) {
+			ipv6_devconf_dflt.addrgenmode = new_val;
+
+		/* request for individual inet device */
+		} else {
+			if (!idev) {
+				return ret;
+			}
+			if (idev->addr_gen_mode != new_val) {
+				idev->addr_gen_mode = new_val;
+				rtnl_lock();
+				addrconf_dev_config(idev->dev);
+				rtnl_unlock();
+			}
+		}
+
+	} else { /* sysctl read request */
+		if (idev) {
+			idev->cnf.addrgenmode = idev->addr_gen_mode;
+		}
+		ret = proc_dointvec(ctl, 0, buffer, lenp, ppos);
+	}
+
+	return ret;
+}
+
 static void dev_disable_change(struct inet6_dev *idev)
 {
 	struct netdev_notifier_info info;
@@ -6042,6 +6106,13 @@ static const struct ctl_table addrconf_sysctl[] = {
 
 	},
 	{
+		.procname	= "addrgenmode",
+		.data		= &ipv6_devconf.addrgenmode,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= addrconf_sysctl_addrgenmode,
+	},
+	{
 		/* sentinel */
 	}
 };
-- 
2.10.2

^ permalink raw reply related

* [PATCH] net: calxeda: xgmac: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2016-12-04 22:37 UTC (permalink / raw)
  To: davem, jarod; +Cc: netdev, linux-kernel, Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
 drivers/net/ethernet/calxeda/xgmac.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/calxeda/xgmac.c b/drivers/net/ethernet/calxeda/xgmac.c
index 6e72366..ce7de6f 100644
--- a/drivers/net/ethernet/calxeda/xgmac.c
+++ b/drivers/net/ethernet/calxeda/xgmac.c
@@ -1530,15 +1530,14 @@ static int xgmac_set_features(struct net_device *dev, netdev_features_t features
 	.ndo_set_features = xgmac_set_features,
 };
 
-static int xgmac_ethtool_getsettings(struct net_device *dev,
-					  struct ethtool_cmd *cmd)
+static int xgmac_ethtool_get_link_ksettings(struct net_device *dev,
+					    struct ethtool_link_ksettings *cmd)
 {
-	cmd->autoneg = 0;
-	cmd->duplex = DUPLEX_FULL;
-	ethtool_cmd_speed_set(cmd, 10000);
-	cmd->supported = 0;
-	cmd->advertising = 0;
-	cmd->transceiver = XCVR_INTERNAL;
+	cmd->base.autoneg = 0;
+	cmd->base.duplex = DUPLEX_FULL;
+	cmd->base.speed = 10000;
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported, 0);
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising, 0);
 	return 0;
 }
 
@@ -1681,7 +1680,6 @@ static int xgmac_set_wol(struct net_device *dev,
 }
 
 static const struct ethtool_ops xgmac_ethtool_ops = {
-	.get_settings = xgmac_ethtool_getsettings,
 	.get_link = ethtool_op_get_link,
 	.get_pauseparam = xgmac_get_pauseparam,
 	.set_pauseparam = xgmac_set_pauseparam,
@@ -1690,6 +1688,7 @@ static int xgmac_set_wol(struct net_device *dev,
 	.get_wol = xgmac_get_wol,
 	.set_wol = xgmac_set_wol,
 	.get_sset_count = xgmac_get_sset_count,
+	.get_link_ksettings = xgmac_ethtool_get_link_ksettings,
 };
 
 /**
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH v1 net-next 1/5] net: dsa: mv88e6xxx: Reserved Management frames to CPU
From: Andrew Lunn @ 2016-12-04 22:40 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: David Miller, netdev
In-Reply-To: <87oa0r9wei.fsf@ketchup.i-did-not-set--mail-host-address--so-tickle-me>

> You can have several implementations in the same file (e.g. global1.c),
> so again the only value is the function name, not the struct member.

The structure member have g1_ has a lot of value.

        if (chip->info->ops->set_cpu_port) {
                err = chip->info->ops->set_cpu_port(chip, upstream_port);
                if (err)
                        return err;
        }

Where to i need to go look for set_cpu_port? I have no idea.

        if (chip->info->ops->g1_set_cpu_port) {
                err = chip->info->ops->g1_set_cpu_port(chip, upstream_port);
                if (err)
                        return err;
        }

Humm, the hint tells me it is in global1.c. And i also know that all
of them are in global1.c.

These ops do make the code simpler. But the downside is it makes it
harder to find the actual code, now that it is spread over multiple
files. And these hits help negate the downside a little.

   Andrew

^ permalink raw reply

* Re: [PATCH] mlx4: Use kernel sizeof and alloc styles
From: Joe Perches @ 2016-12-04 22:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Yishai Hadas, Tariq Toukan, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1480885139.18162.484.camel-XN9IlZ5yJG9HTL0Zs8A6p+yfmBU6pStAUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

On Sun, 2016-12-04 at 12:58 -0800, Eric Dumazet wrote:
> On Sun, 2016-12-04 at 12:11 -0800, Joe Perches wrote:
> > Convert sizeof foo to sizeof(foo) and allocations with multiplications
> > to the appropriate kcalloc/kmalloc_array styles.
> > 
> > Signed-off-by: Joe Perches <joe-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
> > ---
> 
> Gah.
> 
> This is one of the hotest NIC driver on linux at this moment, 
> with XDP and other efforts going on.
> 
> Some kmalloc() are becoming kmalloc_node() in some dev branches, and
> there is no kmalloc_array_node() yet.

Well that kmalloc_array_node, like this patch, is pretty trivial to add.
Something like the attached for kmalloc_array_node and kcalloc_node.

> This kind of patch is making rebases/backports very painful.

That's really not an issue for me.

> Could we wait ~6 months before doing such cleanup/changes please ?

This is certainly a trivial patch that could be
done at almost any time.

> If you believe a bug needs a fix, please send a patch to address it.
> 
> Thanks.

No worries.

[-- Attachment #2: slab.diff --]
[-- Type: text/x-patch, Size: 1494 bytes --]

 include/linux/slab.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 084b12bad198..d98c07713c03 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -647,6 +647,37 @@ static inline void *kzalloc_node(size_t size, gfp_t flags, int node)
 	return kmalloc_node(size, flags | __GFP_ZERO, node);
 }
 
+/**
+ * kmalloc_array_node - allocate memory for an array
+ * from a particular memory node.
+ * @n: number of elements.
+ * @size: element size.
+ * @flags: the type of memory to allocate (see kmalloc).
+ * @node: memory node from which to allocate
+ */
+static inline void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
+				       int node)
+{
+	if (size != 0 && n > SIZE_MAX / size)
+		return NULL;
+	if (__builtin_constant_p(n) && __builtin_constant_p(size))
+		return kmalloc_node(n * size, flags, node);
+	return __kmalloc_node(n * size, flags, node);
+}
+
+/**
+ * kcalloc_node - allocate memory for an array from a particular memory node.
+ * The memory is set to zero.
+ * @n: number of elements.
+ * @size: element size.
+ * @flags: the type of memory to allocate (see kmalloc).
+ * @node: memory node from which to allocate
+ */
+static inline void *kcalloc_node(size_t n, size_t size, gfp_t flags, int node)
+{
+	return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
+}
+
 unsigned int kmem_cache_size(struct kmem_cache *s);
 void __init kmem_cache_init_late(void);
 

^ permalink raw reply related

* Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
From: Saeed Mahameed @ 2016-12-04 23:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Martin KaFai Lau, Linux Netdev List, Brenden Blanco,
	Daniel Borkmann, David Miller, Saeed Mahameed, Tariq Toukan,
	Kernel Team
In-Reply-To: <58421775.6090905@fb.com>

On Sat, Dec 3, 2016 at 2:53 AM, Alexei Starovoitov <ast@fb.com> wrote:
> On 12/2/16 4:38 PM, Eric Dumazet wrote:
>>
>> On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote:
>>>
>>> When XDP prog is attached, it is currently limiting
>>> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514
>>> in x86.
>>>
>>> AFAICT, since mlx4 is doing one page per packet for XDP,
>>> we can at least raise the MTU limitation up to
>>> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is
>>> doing.  It will be useful in the next patch which allows
>>> XDP program to extend the packet by adding new header(s).
>>>
>>> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
>>> ---
>>
>>
>> Have you tested your patch on a host with PAGE_SIZE = 64 KB ?
>>
>> Looks XDP really kills arches with bigger pages :(
>
>
> I'm afraid xdp mlx[45] support was not tested on arches
> with 64k pages at all. Not just this patch.

Yep, in mlx5 page per packet became the default, with or without XDP,
unlike mlx4.
currently we allow 64KB pages per packet! which is wrong and need to be fixed.

I will get to this task soon.

> I think people who care about such archs should test?

We do test mlx5 and mlx4 on PPC arch. other than we require more
memory than we need, we don't see any issues. and we don't test XDP on
those archs.

> Note page per packet is not a hard requirement for all drivers
> and all archs. For mlx[45] it was the easiest and the most
> convenient way to achieve desired performance.
> If there are ways to do the same performance differently,
> I'm all ears :)
>

when bigger pages, i.e  PAGE_SIZE > 8K, my current low hanging fruit
options for mlx5 are
1. start sharing pages for multi packets.
2. Go back to the SKB allocator (allocate ring of SKBs on advance
rather than page per packet/s).

this means that default RX memory scheme will be different than XDP's
on such ARCHs (XDP wil still use page per packet)

Alexei, we should start considering PPC archs for XDP use cases,
demanding page per packet on those archs is a little bit heavy
requirement

^ permalink raw reply

* Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
From: Pravin Shelar @ 2016-12-05  0:22 UTC (permalink / raw)
  To: Jiri Benc; +Cc: Jarno Rajahalme, Linux Kernel Network Developers, Eric Garver
In-Reply-To: <20161202102509.065df1e8@griffin>

On Fri, Dec 2, 2016 at 1:25 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Thu, 1 Dec 2016 11:50:00 -0800, Pravin Shelar wrote:
>> This is not changing any behavior compared to current OVS vlan checks.
>> Single vlan header is not considered for MTU check.
>
> It is changing it.
>
> Consider the case when there's an interface with MTU 1500 forwarding to
> an interface with MTU 1496. Obviously, full-sized vlan frames
> ingressing on the first interface are not forwardable to the second
> one. Yet, if the vlan tag is accelerated (and thus not counted in
> skb->len), is_skb_forwardable happily returns true because of the check
>
>         len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
>         if (skb->len <= len)
>
ok, This case would be allowed due to this patch. But core linux stack
and bridge is using this check then why not just use same forwarding
check in OVS too, this make it consistent with core networking
forwarding expectations.

^ permalink raw reply

* Re: [PATCH v1 net-next 1/5] net: dsa: mv88e6xxx: Reserved Management frames to CPU
From: Vivien Didelot @ 2016-12-05  0:23 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David Miller, netdev
In-Reply-To: <20161204224054.GA24118@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

>> You can have several implementations in the same file (e.g. global1.c),
>> so again the only value is the function name, not the struct member.
>
> The structure member have g1_ has a lot of value.
>
>         if (chip->info->ops->set_cpu_port) {
>                 err = chip->info->ops->set_cpu_port(chip, upstream_port);
>                 if (err)
>                         return err;
>         }
>
> Where to i need to go look for set_cpu_port? I have no idea.

In your chip's ops definition, as for any ops structure. Same as for
your example right below which is unfortunately not a solution per-se.

>
>         if (chip->info->ops->g1_set_cpu_port) {
>                 err = chip->info->ops->g1_set_cpu_port(chip, upstream_port);
>                 if (err)
>                         return err;
>         }
>
> Humm, the hint tells me it is in global1.c. And i also know that all
> of them are in global1.c.

Until a new chip relocates a feature somewhere else.

Then you'll have to rename the structure member(s) because you have a
policy saying "no prefix means different set of registers".


       Vivien

^ permalink raw reply

* Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
From: Eric Dumazet @ 2016-12-05  0:52 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Alexei Starovoitov, Martin KaFai Lau, Linux Netdev List,
	Brenden Blanco, Daniel Borkmann, David Miller, Saeed Mahameed,
	Tariq Toukan, Kernel Team
In-Reply-To: <CALzJLG_82MNOyuq+y63tR2SDmKo3ZQA5XAgT1_r15B8_V19xKg@mail.gmail.com>

On Mon, 2016-12-05 at 01:31 +0200, Saeed Mahameed wrote:

> Alexei, we should start considering PPC archs for XDP use cases,
> demanding page per packet on those archs is a little bit heavy
> requirement

Well, 'little' is an understatement ;)

Note that PPC had serious problems before  commit bd68a2a854ad5a85f0
("net: set SK_MEM_QUANTUM to 4096")

So I suspect one page per frame will likely be a huge problem
for hosts dealing with 10^5 or more TCP sockets.

Either skb->truesize is set to 64KB and TCP window must be really tiny,
or skb->truesize is set to ~2KB and OOM is waiting to happen.

^ permalink raw reply

* Re: [PATCH v2 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active
From: Saeed Mahameed @ 2016-12-05  0:54 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
	Daniel Borkmann, David Miller, Jesper Dangaard Brouer,
	Saeed Mahameed, Tariq Toukan, Kernel Team
In-Reply-To: <1480821446-4122277-4-git-send-email-kafai@fb.com>

On Sun, Dec 4, 2016 at 5:17 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> Reserve XDP_PACKET_HEADROOM and honor bpf_xdp_adjust_head()
> when XDP prog is active.  This patch only affects the code
> path when XDP is active.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---

Hi Martin, Sorry for the late review, i have some comments below

>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 +++++++++++++++--
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 23 +++++++++++++++++------
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c     |  9 +++++----
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  3 ++-
>  4 files changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 311c14153b8b..094a13b52cf6 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -51,7 +51,8 @@
>  #include "mlx4_en.h"
>  #include "en_port.h"
>
> -#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN)))
> +#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
> +                                  XDP_PACKET_HEADROOM))
>
>  int mlx4_en_setup_tc(struct net_device *dev, u8 up)
>  {
> @@ -1551,6 +1552,7 @@ int mlx4_en_start_port(struct net_device *dev)
>         struct mlx4_en_tx_ring *tx_ring;
>         int rx_index = 0;
>         int err = 0;
> +       int mtu;
>         int i, t;
>         int j;
>         u8 mc_list[16] = {0};
> @@ -1684,8 +1686,12 @@ int mlx4_en_start_port(struct net_device *dev)
>         }
>
>         /* Configure port */
> +       mtu = priv->rx_skb_size + ETH_FCS_LEN;
> +       if (priv->tx_ring_num[TX_XDP])
> +               mtu += XDP_PACKET_HEADROOM;
> +

Why would the physical MTU care for the headroom you preserve for XDP prog?
This is the wire MTU, it shouldn't be changed, please keep it as
before, any preservation you make in packets buffers are needed only
for FWD case or modify case (HW or wire should not care about them).

>         err = mlx4_SET_PORT_general(mdev->dev, priv->port,
> -                                   priv->rx_skb_size + ETH_FCS_LEN,
> +                                   mtu,
>                                     priv->prof->tx_pause,
>                                     priv->prof->tx_ppp,
>                                     priv->prof->rx_pause,
> @@ -2255,6 +2261,13 @@ static bool mlx4_en_check_xdp_mtu(struct net_device *dev, int mtu)
>  {
>         struct mlx4_en_priv *priv = netdev_priv(dev);
>
> +       if (mtu + XDP_PACKET_HEADROOM > priv->max_mtu) {
> +               en_err(priv,
> +                      "Device max mtu:%d does not allow %d bytes reserved headroom for XDP prog\n",
> +                      priv->max_mtu, XDP_PACKET_HEADROOM);
> +               return false;
> +       }
> +
>         if (mtu > MLX4_EN_MAX_XDP_MTU) {
>                 en_err(priv, "mtu:%d > max:%d when XDP prog is attached\n",
>                        mtu, MLX4_EN_MAX_XDP_MTU);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 23e9d04d1ef4..324771ac929e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -96,7 +96,6 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>         struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
>         const struct mlx4_en_frag_info *frag_info;
>         struct page *page;
> -       dma_addr_t dma;
>         int i;
>
>         for (i = 0; i < priv->num_frags; i++) {
> @@ -115,9 +114,10 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>
>         for (i = 0; i < priv->num_frags; i++) {
>                 frags[i] = ring_alloc[i];
> -               dma = ring_alloc[i].dma + ring_alloc[i].page_offset;
> +               frags[i].page_offset += priv->frag_info[i].rx_headroom;

I don't see any need for headroom on frag_info other that frag0 (which
where the packet starts).
What is the meaning of a headroom of a frag in a middle of a packet ?

if you agree with me then, you can use XDP_PACKET_HEADROOM as is where
needed (i.e frag0 page offset) and remove
"priv->frag_info[i].rx_headroom"

...

After going through the code a little bit i see that this code is
shared between XDP and common path, and you didn't want to add boolean
conditions.

Ok i see what you did here.

Maybe we can pass headroom as a function parameter and split frag0
handling from the rest ?
If it is too much then i am ok with the code as it is,

> +               rx_desc->data[i].addr = cpu_to_be64(frags[i].dma +
> +                                                   frags[i].page_offset);
>                 ring_alloc[i] = page_alloc[i];
> -               rx_desc->data[i].addr = cpu_to_be64(dma);
>         }
>
>         return 0;
> @@ -250,7 +250,8 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
>
>         if (ring->page_cache.index > 0) {
>                 frags[0] = ring->page_cache.buf[--ring->page_cache.index];
> -               rx_desc->data[0].addr = cpu_to_be64(frags[0].dma);
> +               rx_desc->data[0].addr = cpu_to_be64(frags[0].dma +
> +                                                   frags[0].page_offset);
>                 return 0;
>         }
>
> @@ -889,6 +890,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>                 if (xdp_prog) {
>                         struct xdp_buff xdp;
>                         dma_addr_t dma;
> +                       void *pg_addr, *orig_data;
>                         u32 act;
>
>                         dma = be64_to_cpu(rx_desc->data[0].addr);
> @@ -896,11 +898,18 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>                                                 priv->frag_info[0].frag_size,
>                                                 DMA_FROM_DEVICE);
>
> -                       xdp.data = page_address(frags[0].page) +
> -                                                       frags[0].page_offset;
> +                       pg_addr = page_address(frags[0].page);
> +                       orig_data = pg_addr + frags[0].page_offset;
> +                       xdp.data = orig_data;
>                         xdp.data_end = xdp.data + length;
>
>                         act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +
> +                       if (xdp.data != orig_data) {
> +                               length = xdp.data_end - xdp.data;
> +                               frags[0].page_offset = xdp.data - pg_addr;
> +                       }
> +
>

is this needed only for XDP FWD case ?
is this the only way to detect that the user modified the packet
headers (comparing pointers, before and after) ?

if the answer is yes, it should be faster to unconditionally reset
packet offset and lenght on XDP_FWD :
case XDP_FWD:
   length = xdp.data_end - xdp.data;
   frags[0].page_offset = xdp.data - pg_addr;


>                         switch (act) {
>                         case XDP_PASS:
>                                 break;
> @@ -1180,6 +1189,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>                  */
>                 priv->frag_info[0].frag_stride = PAGE_SIZE;
>                 priv->frag_info[0].dma_dir = PCI_DMA_BIDIRECTIONAL;
> +               priv->frag_info[0].rx_headroom = XDP_PACKET_HEADROOM;
>                 i = 1;
>         } else {
>                 int buf_size = 0;
> @@ -1194,6 +1204,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>                                 ALIGN(priv->frag_info[i].frag_size,
>                                       SMP_CACHE_BYTES);
>                         priv->frag_info[i].dma_dir = PCI_DMA_FROMDEVICE;
> +                       priv->frag_info[i].rx_headroom = 0;

IMHO, redundant. as you see here frag0 and other frags handling are
separated, maybe we can do the same in mlx4_en_alloc_frags.

>                         buf_size += priv->frag_info[i].frag_size;
>                         i++;
>                 }
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 4b597dca5c52..9e5f38cefe5f 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -354,7 +354,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
>         struct mlx4_en_rx_alloc frame = {
>                 .page = tx_info->page,
>                 .dma = tx_info->map0_dma,
> -               .page_offset = 0,
> +               .page_offset = XDP_PACKET_HEADROOM,
>                 .page_size = PAGE_SIZE,
>         };
>
> @@ -1132,7 +1132,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>         tx_info->page = frame->page;
>         frame->page = NULL;
>         tx_info->map0_dma = dma;
> -       tx_info->map0_byte_count = length;
> +       tx_info->map0_byte_count = length + frame->page_offset;

Didn't you already take care of lenght by the following code:
                       if (xdp.data != orig_data) {
                               length = xdp.data_end - xdp.data;
                               frags[0].page_offset = xdp.data - pg_addr;
                        }

and here  frame->page_offset is not really page offset, it can only be
XDP_PACKET_HEADROOM.

>         tx_info->nr_txbb = nr_txbb;
>         tx_info->nr_bytes = max_t(unsigned int, length, ETH_ZLEN);
>         tx_info->data_offset = (void *)data - (void *)tx_desc;
> @@ -1141,9 +1141,10 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>         tx_info->linear = 1;
>         tx_info->inl = 0;
>
> -       dma_sync_single_for_device(priv->ddev, dma, length, PCI_DMA_TODEVICE);
> +       dma_sync_single_range_for_device(priv->ddev, dma, frame->page_offset,
> +                                        length, PCI_DMA_TODEVICE);
>
> -       data->addr = cpu_to_be64(dma);
> +       data->addr = cpu_to_be64(dma + frame->page_offset);
>         data->lkey = ring->mr_key;
>         dma_wmb();
>         data->byte_count = cpu_to_be32(length);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 20a936428f4a..ba1c6cd0cc79 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -475,7 +475,8 @@ struct mlx4_en_frag_info {
>         u16 frag_prefix_size;
>         u32 frag_stride;
>         enum dma_data_direction dma_dir;
> -       int order;
> +       u16 order;
> +       u16 rx_headroom;
>  };
>
>  #ifdef CONFIG_MLX4_EN_DCB
> --
> 2.5.1
>

^ permalink raw reply

* Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
From: Pravin Shelar @ 2016-12-05  0:58 UTC (permalink / raw)
  To: Jiri Benc; +Cc: Jarno Rajahalme, Linux Kernel Network Developers, Eric Garver
In-Reply-To: <20161202104202.426b2c80@griffin>

On Fri, Dec 2, 2016 at 1:42 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote:
>> On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> > I'm not opposed to changing this but I'm afraid it needs much deeper
>> > review. Because with this in place, no core kernel functions that
>> > depend on skb->protocol may be called from within openvswitch.
>> >
>> Can you give specific example where it does not work?
>
> I can't, I haven't reviewed the usage. I'm just saying that the stack
> does not expect skb->protocol being ETH_P_8021Q for e.g. IPv4 packets.
> It may not be relevant for the calls used by openvswitch but we should
> be sure about that. Especially defragmentation and conntrack is worth
> looking at.
>
> Again, I'm not saying this is wrong nor that there is an actual
> problem. I'm just pointing out that openvswitch has different
> expectations about skb wrt. vlans than the rest of the kernel and we
> should be reasonably sure the behavior is correct when passing between
> the two.
>
I agree that conntrack does not expect skb-protocol to be vlan
protocol. We could accelerate vlan if there is vlan header in packet
itself. That would make the packet consistent across upcalls.

>> skb-protocol value is set by the caller, so it should not be
>> arbitrary. is it missing in any case?
>
> It's not set exactly by the caller, because that's what this patch is
> removing. It is set by whoever handed over the packet to openvswitch.
> The point is we don't know *what* it is set to. It may as well be
> ETH_P_8021Q, breaking the conditions here. It should not happen in
> practice but still, it seems weird to depend on the fact that the
> packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor
> ETH_P_8021AD.
>

We are kind of dependent on this atleast for L3 packets injected back
by vswitchd. For rest of entry points I think we have to trust the
networking stack would set skb-protocol to correct value. If that is
not true in some case, it is bug and we will need to fix it.

^ permalink raw reply

* Re: [PATCH v2 main-v4.9-rc7] net/ipv6: allow sysctl to change link-local address generation mode
From: Roopa Prabhu @ 2016-12-05  2:08 UTC (permalink / raw)
  To: Felix Jia; +Cc: netdev, Carl Smith
In-Reply-To: <20161204223136.12119-1-felix.jia@alliedtelesis.co.nz>

On 12/4/16, 2:31 PM, Felix Jia wrote:
> Removed the rtnl lock and switch to use RCU lock to iterate through
> the netdev list.
>
> The address generation mode for IPv6 link-local can only be configured
> by netlink messages. This patch adds the ability to change the address
> generation mode via sysctl.
>
> An possible improvement is to remove the addrgenmode variable from the
> idev structure and use the systcl storage for the flag.
>
> The patch is based from v4.9-rc7 in mainline.
>
> Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
> Cc: Carl Smith <carl.smith@alliedtelesis.co.nz>
> ---
>  include/linux/ipv6.h      |  1 +
>  include/uapi/linux/ipv6.h |  1 +
>  net/ipv6/addrconf.c       | 73 ++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index a064997..0d9e5d4 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -64,6 +64,7 @@ struct ipv6_devconf {
>  	} stable_secret;
>  	__s32		use_oif_addrs_only;
>  	__s32		keep_addr_on_down;
> +	__s32		addrgenmode;
>  
>  	struct ctl_table_header *sysctl_header;
>  };
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 8c27723..0524e2c 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -178,6 +178,7 @@ enum {
>  	DEVCONF_DROP_UNSOLICITED_NA,
>  	DEVCONF_KEEP_ADDR_ON_DOWN,
>  	DEVCONF_RTR_SOLICIT_MAX_INTERVAL,
> +	DEVCONF_ADDRGENMODE,
>  	DEVCONF_MAX
>  };
>  
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4bc5ba3..2b83cc7 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -238,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>  	.use_oif_addrs_only	= 0,
>  	.ignore_routes_with_linkdown = 0,
>  	.keep_addr_on_down	= 0,
> +	.addrgenmode = IN6_ADDR_GEN_MODE_EUI64,
>  };
>  
>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> @@ -284,6 +285,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>  	.use_oif_addrs_only	= 0,
>  	.ignore_routes_with_linkdown = 0,
>  	.keep_addr_on_down	= 0,
> +	.addrgenmode = IN6_ADDR_GEN_MODE_EUI64,
>  };
>  
>  /* Check if a valid qdisc is available */
> @@ -378,7 +380,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
>  	if (ndev->cnf.stable_secret.initialized)
>  		ndev->addr_gen_mode = IN6_ADDR_GEN_MODE_STABLE_PRIVACY;
>  	else
> -		ndev->addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64;
> +		ndev->addr_gen_mode = ipv6_devconf_dflt.addrgenmode;
>  
>  	ndev->cnf.mtu6 = dev->mtu;
>  	ndev->nd_parms = neigh_parms_alloc(dev, &nd_tbl);
> @@ -4950,6 +4952,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
>  	array[DEVCONF_DROP_UNICAST_IN_L2_MULTICAST] = cnf->drop_unicast_in_l2_multicast;
>  	array[DEVCONF_DROP_UNSOLICITED_NA] = cnf->drop_unsolicited_na;
>  	array[DEVCONF_KEEP_ADDR_ON_DOWN] = cnf->keep_addr_on_down;
> +	array[DEVCONF_ADDRGENMODE] = cnf->addrgenmode;
>  }
>  
>  static inline size_t inet6_ifla6_size(void)
> @@ -5496,6 +5499,67 @@ int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
>  	return proc_dointvec_minmax(&lctl, write, buffer, lenp, ppos);
>  }
>  
> +static void addrconf_addrgenmode_change(struct net *net)
> +{
> +	struct net_device *dev;
> +	struct inet6_dev *idev;
> +
> +	rcu_read_lock();
> +	for_each_netdev_rcu(net, dev) {
> +		idev = __in6_dev_get(dev);
> +		if (idev) {
> +			idev->cnf.addrgenmode = ipv6_devconf_dflt.addrgenmode;
> +			idev->addr_gen_mode = ipv6_devconf_dflt.addrgenmode;
> +			addrconf_dev_config(idev->dev);
> +		}
> +	}
> +	rcu_read_unlock();
> +}
> +
> +static int addrconf_sysctl_addrgenmode(struct ctl_table *ctl, int write,
> +								void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	int ret;
> +	int new_val;
> +	struct inet6_dev *idev = (struct inet6_dev *)ctl->extra1;
> +	struct net *net = (struct net *)ctl->extra2;
> +
> +	if (write) { /* sysctl write request */
> +		ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
> +		new_val = *((int *)ctl->data);
> +
>
unless I missed it, I don't see a check for valid values for new_val.
The netlink attribute  is checked  for valid values in the existing equivalent netlink code.

^ permalink raw reply

* Re: [PATCH v2 net-next 8/8] tcp: tsq: move tsq_flags close to sk_wmem_alloc
From: David Miller @ 2016-12-04  1:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: edumazet, netdev, ycheng
In-Reply-To: <1480814031.18162.439.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 03 Dec 2016 17:13:51 -0800

> On Sat, 2016-12-03 at 19:16 -0500, David Miller wrote:
>> From: Eric Dumazet <edumazet@google.com>
>> Date: Sat,  3 Dec 2016 11:14:57 -0800
>> 
>> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> > index d8be083ab0b0..fc5848dad7a4 100644
>> > --- a/include/linux/tcp.h
>> > +++ b/include/linux/tcp.h
>> > @@ -186,7 +186,6 @@ struct tcp_sock {
>> >  	u32	tsoffset;	/* timestamp offset */
>> >  
>> >  	struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
>> > -	unsigned long	tsq_flags;
>> >  
>> >  	/* Data for direct copy to user */
>> >  	struct {
>> 
>> Hmmm, did you forget to "git add include/net/sock.h" before making
>> this commit?
> 
> sk_tsq_flags was added in prior patch in the series ( 7/8 net:
> reorganize struct sock for better data locality)
> 
> What is the problem with this part ?

Sorry, just noticed by visual inspection.  I expected the
struct sock part to show up in the same patch as the one
that removed it from tcp_sock and adjusted the users.

I'll re-review this series, thanks.

^ permalink raw reply

* Re: [PATCH V2 net 00/20] Increase ENA driver version to 1.1.2
From: David Miller @ 2016-12-05  2:37 UTC (permalink / raw)
  To: netanel
  Cc: linux-kernel, netdev, dwmw, zorik, alex, saeed, msw, aliguori,
	nafea
In-Reply-To: <1480857578-5065-1-git-send-email-netanel@annapurnalabs.com>

It is not appropriate to submit so many patches at one time.

Please keep your patch series to no more than about a dozen
at a time.

Also, group your changes logically and tie an appropriately
descriptive cover letter.

"Increase driver version to X.Y.Z" tells the reader absolutely
nothing.  Someone reading that Subject line in the GIT logs
will have no idea what the overall purpose of the patch series
is and what it accomplishes.

You really need to describe the high level purpose of the patch set.
Is it adding a new feature?  What is that feature?  Why are you
adding that feature?  How is that feature implemented?  Why is
it implemented that way?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox