Netdev List

Netdev List
 help / color / mirror / Atom feed

* Multiple Txqs with One Qdisc
From: Michael Ma @ 2017-04-26  4:46 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Hi -

As I mentioned in a previous thread, we're implementing a qdisc
similar to mqprio which can associate multiple txqs to one qdisc, so
that we can work around the root qdisc bottleneck. I think this should
be in general fine since without MQ, root qdisc is associated with
multiple txqs anyway. However we encountered kernel panic as I
mentioned in the "corrupted skb" thread.

Any thought on potential issue of doing this? I'm asking this because
mqprio only associates one txq with one qdisc which I think must have
been done this way for a reason.

Would appreciate if anyone can shed some light on this.

Thanks,
Michael

^ permalink raw reply

* [Patch net 0/3] net: fix a stack out-of-bound access
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang

This patchset fixes the following kernel bug reported by Andrey:

 BUG: KASAN: stack-out-of-bounds in bond_enslave+0xe0a/0x4ef0 at addr
 ffff8800666b7792
 Write of size 16 by task a.out/3894
 page:ffffea000199adc0 count:0 mapcount:0 mapping:          (null) index:0x0
 flags: 0x100000000000000()
 raw: 0100000000000000 0000000000000000 0000000000000000 00000000ffffffff
 raw: 0000000000000000 ffffea000199ade0 0000000000000000 0000000000000000
 page dumped because: kasan: bad access detected
 CPU: 1 PID: 3894 Comm: a.out Not tainted 4.11.0-rc7+ #251
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:16
  dump_stack+0x292/0x398 lib/dump_stack.c:52
  kasan_report_error mm/kasan/report.c:212
  kasan_report+0x4d8/0x510 mm/kasan/report.c:347
  check_memory_region_inline mm/kasan/kasan.c:326
  check_memory_region+0x139/0x190 mm/kasan/kasan.c:333
  memcpy+0x37/0x50 mm/kasan/kasan.c:369
  bond_enslave+0xe0a/0x4ef0 drivers/net/bonding/bond_main.c:1491
  bond_do_ioctl+0xb5d/0xec0 drivers/net/bonding/bond_main.c:3449
  dev_ifsioc+0x53f/0x9f0 net/core/dev_ioctl.c:338
  dev_ioctl+0x249/0x1160 net/core/dev_ioctl.c:532
  sock_do_ioctl+0x94/0xb0 net/socket.c:913
  sock_ioctl+0x28f/0x440 net/socket.c:1004
  vfs_ioctl fs/ioctl.c:45
  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
  SYSC_ioctl fs/ioctl.c:700
  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204
 RIP: 0033:0x7fc764b31b79
 RSP: 002b:00007ffc1d73aa58 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007ffc1d73abb0 RCX: 00007fc764b31b79
 RDX: 00000000209eafd8 RSI: 0000008000008990 RDI: 0000000000000003
 RBP: 00000000004004e0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
 R13: 00007ffc1d73abb0 R14: 0000000000000000 R15: 0000000000000000

Please see each patch for details.

Cong Wang (3):
  net: check mac address length for dev_set_mac_address()
  bonding: use a larger struct for mac address
  team: use a larger struct for mac address

 drivers/net/bonding/bond_alb.c     |  8 ++++----
 drivers/net/bonding/bond_main.c    | 17 +++++++++--------
 drivers/net/bonding/bonding_priv.h |  6 ++++++
 drivers/net/team/team.c            |  9 ++++++---
 include/linux/netdevice.h          |  2 +-
 net/core/dev.c                     | 10 +++++++---
 net/core/dev_ioctl.c               |  2 ++
 7 files changed, 35 insertions(+), 19 deletions(-)

-- 
2.5.5

^ permalink raw reply

* [Patch net 1/3] net: check mac address length for dev_set_mac_address()
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang
In-Reply-To: <1493183003-884-1-git-send-email-xiyou.wangcong@gmail.com>

dev_set_mac_address() accepts a struct sockaddr pointer as
input but we have various types of mac addresse whose lengths
are up to MAX_ADDR_LEN, this is confusing.

Make it void like ->ndo_set_mac_address() and let callers check
its length before calling it. It is too late to fix dev_ifsioc()
due to API compatiblity, so just reject those larger than
sizeof(struct sockaddr).

This is also a preparation for the following patches.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/linux/netdevice.h |  2 +-
 net/core/dev.c            | 10 +++++++---
 net/core/dev_ioctl.c      |  2 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97456b25..40e674c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3271,7 +3271,7 @@ int dev_set_alias(struct net_device *, const char *, size_t);
 int dev_change_net_namespace(struct net_device *, struct net *, const char *);
 int dev_set_mtu(struct net_device *, int);
 void dev_set_group(struct net_device *, int);
-int dev_set_mac_address(struct net_device *, struct sockaddr *);
+int dev_set_mac_address(struct net_device *, void *);
 int dev_change_carrier(struct net_device *, bool new_carrier);
 int dev_get_phys_port_id(struct net_device *dev,
 			 struct netdev_phys_item_id *ppid);
diff --git a/net/core/dev.c b/net/core/dev.c
index 533a6d6..cc670e1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6610,13 +6610,17 @@ EXPORT_SYMBOL(dev_set_group);
 /**
  *	dev_set_mac_address - Change Media Access Control Address
  *	@dev: device
- *	@sa: new address
+ *	@addr: new address, whose type could be either struct sockaddr or
+ *	any other compatible type whose length is up to MAX_ADDR_LEN depending
+ *	on the dev->addr_len. Callers should check if its length is smaller than
+ *	dev->addr_len!!
  *
  *	Change the hardware (MAC) address of the device
  */
-int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
+int dev_set_mac_address(struct net_device *dev, void *addr)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
+	struct sockaddr *sa = addr;
 	int err;
 
 	if (!ops->ndo_set_mac_address)
@@ -6625,7 +6629,7 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
 		return -EINVAL;
 	if (!netif_device_present(dev))
 		return -ENODEV;
-	err = ops->ndo_set_mac_address(dev, sa);
+	err = ops->ndo_set_mac_address(dev, addr);
 	if (err)
 		return err;
 	dev->addr_assign_type = NET_ADDR_SET;
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..11f262e 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -261,6 +261,8 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
 		return dev_set_mtu(dev, ifr->ifr_mtu);
 
 	case SIOCSIFHWADDR:
+		if (dev->addr_len > sizeof(struct sockaddr))
+			return -EINVAL;
 		return dev_set_mac_address(dev, &ifr->ifr_hwaddr);
 
 	case SIOCSIFHWBROADCAST:
-- 
2.5.5

^ permalink raw reply related

* [Patch net 2/3] bonding: use a larger struct for mac address
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang, Jay Vosburgh
In-Reply-To: <1493183003-884-1-git-send-email-xiyou.wangcong@gmail.com>

IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.

Fix it by using a larger address storage.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/net/bonding/bond_alb.c     |  8 ++++----
 drivers/net/bonding/bond_main.c    | 17 +++++++++--------
 drivers/net/bonding/bonding_priv.h |  6 ++++++
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c80b023..0038266 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -39,7 +39,7 @@
 #include <asm/byteorder.h>
 #include <net/bonding.h>
 #include <net/bond_alb.h>
-
+#include "bonding_priv.h"
 
 
 static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = {
@@ -1234,7 +1234,7 @@ static int alb_set_mac_address(struct bonding *bond, void *addr)
 {
 	struct slave *slave, *rollback_slave;
 	struct list_head *iter;
-	struct sockaddr sa;
+	struct bond_mac_addr sa;
 	char tmp_addr[ETH_ALEN];
 	int res;
 
@@ -1257,8 +1257,8 @@ static int alb_set_mac_address(struct bonding *bond, void *addr)
 	return 0;
 
 unwind:
-	memcpy(sa.sa_data, bond->dev->dev_addr, bond->dev->addr_len);
-	sa.sa_family = bond->dev->type;
+	memcpy(sa.addr, bond->dev->dev_addr, bond->dev->addr_len);
+	sa.type = bond->dev->type;
 
 	/* unwind from head to the slave that failed */
 	bond_for_each_slave(bond, rollback_slave, iter) {
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8a4ba8b..7a9bd30 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1330,7 +1330,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	struct bonding *bond = netdev_priv(bond_dev);
 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
 	struct slave *new_slave = NULL, *prev_slave;
-	struct sockaddr addr;
+	struct bond_mac_addr addr;
 	int link_reporting;
 	int res = 0, i;
 
@@ -1488,8 +1488,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		/* Set slave to master's mac address.  The application already
 		 * set the master's mac address to that of the first slave
 		 */
-		memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
-		addr.sa_family = slave_dev->type;
+		memcpy(addr.addr, bond_dev->dev_addr, bond_dev->addr_len);
+		addr.type = slave_dev->type;
 		res = dev_set_mac_address(slave_dev, &addr);
 		if (res) {
 			netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res);
@@ -1773,8 +1773,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		 * MAC if this slave's MAC is in use by the bond, or at
 		 * least print a warning.
 		 */
-		ether_addr_copy(addr.sa_data, new_slave->perm_hwaddr);
-		addr.sa_family = slave_dev->type;
+		ether_addr_copy(addr.addr, new_slave->perm_hwaddr);
+		addr.type = slave_dev->type;
 		dev_set_mac_address(slave_dev, &addr);
 	}
 
@@ -3619,7 +3619,8 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave, *rollback_slave;
-	struct sockaddr *sa = addr, tmp_sa;
+	struct sockaddr *sa = addr;
+	struct bond_mac_addr tmp_sa;
 	struct list_head *iter;
 	int res = 0;
 
@@ -3659,8 +3660,8 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 	return 0;
 
 unwind:
-	memcpy(tmp_sa.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
-	tmp_sa.sa_family = bond_dev->type;
+	memcpy(tmp_sa.addr, bond_dev->dev_addr, bond_dev->addr_len);
+	tmp_sa.type = bond_dev->type;
 
 	/* unwind from head to the slave that failed */
 	bond_for_each_slave(bond, rollback_slave, iter) {
diff --git a/drivers/net/bonding/bonding_priv.h b/drivers/net/bonding/bonding_priv.h
index 5a4d81a..8b3a7e3 100644
--- a/drivers/net/bonding/bonding_priv.h
+++ b/drivers/net/bonding/bonding_priv.h
@@ -22,4 +22,10 @@
 
 #define bond_version DRV_DESCRIPTION ": v" DRV_VERSION " (" DRV_RELDATE ")\n"
 
+/* Must be compatible with struct sockaddr */
+struct bond_mac_addr {
+	unsigned short type;
+	unsigned char addr[MAX_ADDR_LEN];
+};
+
 #endif
-- 
2.5.5

^ permalink raw reply related

* [Patch net 3/3] team: use a larger struct for mac address
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang, Jiri Pirko
In-Reply-To: <1493183003-884-1-git-send-email-xiyou.wangcong@gmail.com>

IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.

Fix it by using a larger address storage.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/net/team/team.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 85c0124..88878f1 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -60,10 +60,13 @@ static struct team_port *team_port_get_rtnl(const struct net_device *dev)
 static int __set_port_dev_addr(struct net_device *port_dev,
 			       const unsigned char *dev_addr)
 {
-	struct sockaddr addr;
+	struct {
+		unsigned short type;
+		unsigned char addr[MAX_ADDR_LEN];
+	} addr;
 
-	memcpy(addr.sa_data, dev_addr, port_dev->addr_len);
-	addr.sa_family = port_dev->type;
+	memcpy(addr.addr, dev_addr, port_dev->addr_len);
+	addr.type = port_dev->type;
 	return dev_set_mac_address(port_dev, &addr);
 }
 
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH net-next 3/3] samples/bpf: check before defining offsetof
From: Alexander Alemayhu @ 2017-04-26  5:17 UTC (permalink / raw)
  To: David Laight
  Cc: 'Daniel Borkmann', netdev@vger.kernel.org, ast@fb.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DCFFDAD25@AcuExch.aculab.com>

On Tue, Apr 25, 2017 at 02:27:16PM +0000, David Laight wrote:
> 
> Isn't the correct fix to include stddef.h ?
>
If you think it's the correct fix, please send a patch to netdev.

Thanks.

-- 
Mit freundlichen Grüßen

Alexander Alemayhu

^ permalink raw reply

* 47490 netdev
From: kelley @ 2017-04-26  5:19 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 47176743700.zip --]
[-- Type: application/zip, Size: 1674 bytes --]

^ permalink raw reply

* Re: IGMP on IPv6
From: Cong Wang @ 2017-04-26  5:24 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: Hangbin Liu, open list:TI NETCP ETHERNET DRIVER, David Miller
In-Reply-To: <58FF7656.8050506@ti.com>

On Tue, Apr 25, 2017 at 9:16 AM, Murali Karicheri <m-karicheri2@ti.com> wrote:
> On 04/18/2017 06:37 PM, Cong Wang wrote:
>> Did you assign IPv4 and IPv6 addresses to the HSR master device?
>
> No. I just used IPv4. From the trace mld_ifc_timer_expire() -> mld_sendpack() -> ip6_output()
> do you know what is it trying to do? Is it some neighbor discovery message or something
> going over the lower interface instead of the hsr interface?
>

IPv6 is special here because it has link-local address configured
automatically for each device, so this mld_ifc_timer is armed
for link-local multicasts. See:


 464         /* Join interface-local all-node multicast group */
 465         ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes);
 466
 467         /* Join all-node multicast group */
 468         ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
 469
 470         /* Join all-router multicast group if forwarding is set */
 471         if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST))
 472                 ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);

^ permalink raw reply

* Re: hmmm...
From: Alexei Starovoitov @ 2017-04-26  5:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20170425.233820.1357836307906136732.davem@davemloft.net>

On 4/25/17 8:38 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Tue, 25 Apr 2017 19:56:06 -0700
>
>> On 4/25/17 6:52 PM, David Miller wrote:
>>>
>>> Alexei, I found something strange on my computer :-)
>>>
>>> [davem@localhost binutils]$ ./objdump -d x.o
>>
>> No way! :) I thought it will take weeks!
>> Ship it. Ship it. Ship it.
>> Cannot wait to pull.
>> This is awesome. Thanks a ton!
>
> Relax, it is in a very raw state still. :)

hehe, before replied i checked binutils repo...
hoping to find a miracle there :)

>> What is the mnemonic for 32-bit alu ?
>
> It is of the form "xxx32".  Here is opcodes table below.
>
> I think there are no formal mnenomics defined anywhere yet right?
> So just tell me what adjustments you want to make.
>
> const struct bpf_opcode bpf_opcodes[] = {
>   { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
>   { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
>   { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },

the very first bpf gen looked like canonical asm,
but then I found myself spending too much time
looking at ldb and ldd and thinking is it 8-byte
or 64-byte load?
jgt/jge/jsgt/sge was a stumbling block for me as well,
since it still takes me longer than necessary to disambiguate
into > vs >= and signed/unsigned
gcc bpf backend was done way before llvm and it had
quite confusing GT vs bpf GT vs x86 GT.
https://github.com/iovisor-obsolete/bpf_gcc/commit/ff47b3acd6d3c60900fab8cd93afd8106c49c8c3#diff-ee684c67b9329b734b7fb535d86a558dR452
gcc GTU == bpf GT == x86 GA
gcc GT == bpf SGT == x86 G
gcc GE == bpf SGE == x86 GE
I couldn't change bpf GT to GTU to match gcc, since it came
from classic bpf, so could only add SGT. S for Signed.
Through all this confusion I figured that the only output
not superhuman can quickly read is C-like output,
hence the final gcc output looked like:
https://github.com/iovisor-obsolete/bpf_gcc/commit/ff47b3acd6d3c60900fab8cd93afd8106c49c8c3#diff-0db71d60fd74b52befee102a27f0b4c4R292
+(define_insn "movdi_2mem"
+  [(set (match_operand:DI 0 "memory_operand" "=m,m")
+	(match_operand:DI 1 "int_reg_or_const_operand" "r,K"))]
+  ""
+  "@
+   BPF_INSN_ST(BPF_DW, %0, %1), // *(uint64*)(%0)=%1
+   BPF_INSN_ST_IMM(BPF_DW, %0, %1), // *(uint64*)(%0)=%1"
+[(set_attr "type" "store, store")])

yes, gcc was emitting C macros with C looking comments
into fake .s file that we #included into .c code to be loaded
into the kernel.
(that code is abandoned and probably doesn't compile anymore)

I made the verifier to emit C-like output to make it easier
for program writers to understand.

llvm asm until version 4.0 looked like:
         mov     r0, 1
         ldd     r4, 8(r1)
         ldd     r3, 0(r1)
         mov     r5, r3
         addi    r5, 14
         jgt     r5, r4 goto LBB0_3
which was causing confusion as well, so when I realized
that llvm can emit and parse any type of text, I switched
to emit verifier like asm code:
         r2 = *(u64 *)(r10 - 16)
         r5 = r2
         r5 += 34
         r0 = 2
         if r5 > r3 goto LBB0_1
which I think is way easier to read.

Though I think Daniel still prefers old classic bpf asm ;)

Anyway, back to the question...
since BFD and GCC are so much entrenched into canonical style
of asm code, I don't mind that gnu toolchain will be using it.

I like that you used 'dw' in 'ldxdw' instead of just 'd'
though 'x' can probably be dropped.

'x' should be added here instead:
  { "stb", BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_B, "[1+O],i" },
  { "stb", BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_B, "[1+O],2" },

I would use ld8/ld16/ld32/ld64 and st8/stx8/st16/stx16/...
but I spent too much time looking into asm and I'm biased.

Thanks again for working on it! This is major milestone.

^ permalink raw reply

* Re: [Patch net 3/3] team: use a larger struct for mac address
From: Jiri Pirko @ 2017-04-26  5:40 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, andreyknvl
In-Reply-To: <1493183003-884-4-git-send-email-xiyou.wangcong@gmail.com>

Wed, Apr 26, 2017 at 07:03:23AM CEST, xiyou.wangcong@gmail.com wrote:
>IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
>but in many places especially bonding, we use struct sockaddr
>to copy and set mac addr, this could lead to stack out-of-bounds
>access.
>
>Fix it by using a larger address storage.
>
>Reported-by: Andrey Konovalov <andreyknvl@google.com>
>Cc: Jiri Pirko <jiri@resnulli.us>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>---
> drivers/net/team/team.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index 85c0124..88878f1 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -60,10 +60,13 @@ static struct team_port *team_port_get_rtnl(const struct net_device *dev)
> static int __set_port_dev_addr(struct net_device *port_dev,
> 			       const unsigned char *dev_addr)
> {
>-	struct sockaddr addr;
>+	struct {
>+		unsigned short type;
>+		unsigned char addr[MAX_ADDR_LEN];
>+	} addr;

Wouldn't it make sense to define this struct somewhere in the core
headers?


> 
>-	memcpy(addr.sa_data, dev_addr, port_dev->addr_len);
>-	addr.sa_family = port_dev->type;
>+	memcpy(addr.addr, dev_addr, port_dev->addr_len);
>+	addr.type = port_dev->type;
> 	return dev_set_mac_address(port_dev, &addr);
> }
> 
>-- 
>2.5.5
>

^ permalink raw reply

* Re: [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice
From: Sven Eckelmann @ 2017-04-26  5:57 UTC (permalink / raw)
  To: Gao Feng
  Cc: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, a,
	'Gao Feng', davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <002901d2be25$d9092bd0$8b1b8370$@foxmail.com>

[-- Attachment #1: Type: text/plain, Size: 1536 bytes --]

On Mittwoch, 26. April 2017 08:41:30 CEST Gao Feng wrote:
> On Dienstag, 25. April 2017 20:03:20 CEST gfree.wind-H32Fclmsjq1BDgjK7y7TUQ@public.gmane.org wrote:
> > From: Gao Feng <fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org>
> > 
> > Because the func batadv_softif_init_late allocate some resources and
> > it would be invoked in register_netdevice. So we need to invoke the
> > func batadv_softif_free instead of free_netdev to cleanup when fail
> > to register_netdevice.
> 
> I could be wrong, but shouldn't the destructor be replaced with free_netdevice 
> and the batadv_softif_free (without the free_netdev) used as ndo_uninit? The 
> line you've changed should then be kept as free_netdevice.
> 
> At least this seems to be important when using rtnl_newlink() instead of the 
> legacy sysfs netdev stuff from batman-adv. rtnl_newlink() would also only call 
> free_netdevice and thus also not run batadv_softif_free. This seems to be only 
> fixable by calling ndo_uninit.
> 
> Sorry, I don't get you.
> The net_dev is created in this func batadv_softif_create.
> Why couldn't invoke batadv_softif_free to cleanup when fail to
> register_netdevice?
> 

Because it is the legacy way to create the batadv interfaces and there is a 
"new" one. The new way is to use rtnl link (see batadv_link_ops). 

The rtnl linke (rtnl_newlink) would not benefit from your fix and therefore 
still show the old behavior. I think a different fix is necessary to solve the 
problem for both ways to create an batadv interface.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26  6:19 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: davem, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <4b7789f7-69e0-4764-7029-f6e15d6e7d69@mojatatu.com>

Tue, Apr 25, 2017 at 10:29:40PM CEST, jhs@mojatatu.com wrote:
>On 17-04-25 12:04 PM, Jiri Pirko wrote:
>> Tue, Apr 25, 2017 at 03:01:22PM CEST, jhs@mojatatu.com wrote:
>> > On 17-04-25 08:13 AM, Jiri Pirko wrote:
>> > > Tue, Apr 25, 2017 at 01:54:06PM CEST, jhs@mojatatu.com wrote:
>
>> > > > 
>> > > > +static inline bool tca_flags_valid(u32 act_flags)
>> > > > +{
>> > > > +	u32 invalid_flags_mask  = ~VALID_TCA_FLAGS;
>> > > > +
>> > > > +	if (act_flags & invalid_flags_mask)
>> > > > +		return false;
>> > > 
>> > > I don't see how this resolves anything. VALID_TCA_FLAGS is set in stone
>> > > not going to change anytime in future, right?
>> > 
>> > Every time a new bit gets added VALID_TCA_FLAGS changes.
>> 
>> You mean flag that user can set? If that is the case, you are breaking
>> UAPI for newer app running on older kernel.
>> 
>
>Ok, let me try to explain with more clarity. The rules Iam
>trying to follow are:
>if i see any bit set that i dont understand I will reject.
>
>So lets in first kernel I have support for bit 0.
>My validation check is to make sure only bit 0 is set.
>The valid_flags currently then only constitutes bit 0.
>i.e
>If you set bit 2 or 3, the function above will reject and i return
>the error to the user.
>
>That is expected behavior correct?
>
>3 months down the road:
>I add two flags - bit 1 and 2.
>So now my valid_flags changes to bits 1, 2 and 0.
>
>The function above will now return true for bits 0-2 but
>will reject if you set bit 3.
>
>That is expected behavior, correct?

The same app compiled against new kernel with bits (0, 1, 2) will run with
this kernel good. But if you run it with older kernel, the kernel (0)
would refuse. Is that ok?



>
>On u32/16/8:
>I am choosing u32 so it allows me to add more upto 32 bit flags.
>Not all 32 are needed today but it is better insurance.
>If I used u8 then the 24 of those 32 bits i dont use will be used
>as pads in the TLV. So it doesnt make sense for me to use a u8/16.

Jamal, note that I never suggested having more flags in a single attr.
Therefore I suggested u8 to carry a single flag.

You say that it has performance impact having 3 flag attrs in compare to
one bit flag attr. Could you please provide some numbers?

I expect that you will not be able to show the difference. And if there
is no difference in performance, your main argument goes away. And we
can do this in a nice, clear, TLV fashion.


>
>Does that make more sense?
>
>cheers,
>jamal

^ permalink raw reply

* [PATCH net v3 1/3] net: hns: support deferred probe when can not obtain irq
From: Yankejian @ 2017-04-26  7:00 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493190022-91343-1-git-send-email-yankejian@huawei.com>

From: lipeng <lipeng321@huawei.com>

In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

We check for probe deferral in the hw layer probe, so we not
probe into the main layer and memories, etc., to later learn
that we need to defer the probe.

Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
---
V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++++++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index 6ea8722..a41cf95 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
 
 		hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
 
-		hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+		ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+		if (ret)
+			goto get_rcb_cfg_fail;
 	}
 
 	for (i = 0; i < HNS_PPE_COM_NUM; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index f0ed80d6..673a5d3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common)
  *hns_rcb_get_cfg - get rcb config
  *@rcb_common: rcb common device
  */
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 {
 	struct ring_pair_cb *ring_pair_cb;
 	u32 i;
@@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 		ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
 		is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
 			  platform_get_irq(pdev, base_irq_idx + i * 3);
+		if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) ||
+		    (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
+			return -EPROBE_DEFER;
+
 		ring_pair_cb->q.phy_base =
 			RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
 		hns_rcb_ring_pair_get_cfg(ring_pair_cb);
 	}
+
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
index 99b4e1b..3d7b484 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
@@ -110,7 +110,7 @@ struct rcb_common_cb {
 void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index);
 int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
 void hns_rcb_start(struct hnae_queue *q, u32 val);
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
 void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
 			    u16 *max_vfn, u16 *max_q_per_vf);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v3 0/3] net: hns: bug fix for HNS driver
From: Yankejian @ 2017-04-26  7:00 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm

From: lipeng <lipeng321@huawei.com>

The first two patches [1/3] [2/3] of this serie add support defered
dsaf probe when mdio and mbigen module is not insmod. The third
patch [3/3] fixes a bug that skb still be used after freed, which will
cuase panic.

For more details, please refer to individual patch.

change log:
V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;

V1 -> V2:
1. Return appropriate errno in hns_mac_register_phy;

lipeng (3):
  net: hns: support deferred probe when can not obtain irq
  net: hns: support deferred probe when no mdio
  net: hns: fixed bug that skb used after kfree

 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++++++++++++++++++----
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c |  4 ++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  8 ++++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h |  2 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c     | 22 ++++++-------
 drivers/net/ethernet/hisilicon/hns/hns_enet.h     |  6 ++--
 6 files changed, 57 insertions(+), 24 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net v3 2/3] net: hns: support deferred probe when no mdio
From: Yankejian @ 2017-04-26  7:00 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493190022-91343-1-git-send-email-yankejian@huawei.com>

From: lipeng <lipeng321@huawei.com>

In the hip06 and hip07 SoCs, phy connect to mdio bus.The mdio
module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

We check for probe deferral in the mac init, so we not init DSAF
when there is no mdio, and free all resource, to later learn that
we need to defer the probe.

Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 39 +++++++++++++++++++----
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index bdd8cdd..8c00e09 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -735,6 +735,8 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
 	rc = phy_device_register(phy);
 	if (rc) {
 		phy_device_free(phy);
+		dev_err(&mdio->dev, "registered phy fail at address %i\n",
+			addr);
 		return -ENODEV;
 	}
 
@@ -745,7 +747,7 @@ static int hns_mac_init_ex(struct hns_mac_cb *mac_cb)
 	return 0;
 }
 
-static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
+static int hns_mac_register_phy(struct hns_mac_cb *mac_cb)
 {
 	struct acpi_reference_args args;
 	struct platform_device *pdev;
@@ -755,24 +757,39 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
 
 	/* Loop over the child nodes and register a phy_device for each one */
 	if (!to_acpi_device_node(mac_cb->fw_port))
-		return;
+		return -ENODEV;
 
 	rc = acpi_node_get_property_reference(
 			mac_cb->fw_port, "mdio-node", 0, &args);
 	if (rc)
-		return;
+		return rc;
 
 	addr = hns_mac_phy_parse_addr(mac_cb->dev, mac_cb->fw_port);
 	if (addr < 0)
-		return;
+		return addr;
 
 	/* dev address in adev */
 	pdev = hns_dsaf_find_platform_device(acpi_fwnode_handle(args.adev));
+	if (!pdev) {
+		dev_err(mac_cb->dev, "mac%d mdio pdev is NULL\n",
+			mac_cb->mac_id);
+		return  -EINVAL;
+	}
+
 	mii_bus = platform_get_drvdata(pdev);
+	if (!mii_bus) {
+		dev_err(mac_cb->dev,
+			"mac%d mdio is NULL, dsaf will probe again later\n",
+			mac_cb->mac_id);
+		return -EPROBE_DEFER;
+	}
+
 	rc = hns_mac_register_phydev(mii_bus, mac_cb, addr);
 	if (!rc)
 		dev_dbg(mac_cb->dev, "mac%d register phy addr:%d\n",
 			mac_cb->mac_id, addr);
+
+	return rc;
 }
 
 #define MAC_MEDIA_TYPE_MAX_LEN		16
@@ -793,7 +810,7 @@ static void hns_mac_register_phy(struct hns_mac_cb *mac_cb)
  *@np:device node
  * return: 0 --success, negative --fail
  */
-static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
+static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
 {
 	struct device_node *np;
 	struct regmap *syscon;
@@ -903,7 +920,15 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
 			}
 		}
 	} else if (is_acpi_node(mac_cb->fw_port)) {
-		hns_mac_register_phy(mac_cb);
+		ret = hns_mac_register_phy(mac_cb);
+		/*
+		 * Mac can work well if there is phy or not.If the port don't
+		 * connect with phy, the return value will be ignored. Only
+		 * when there is phy but can't find mdio bus, the return value
+		 * will be handled.
+		 */
+		if (ret == -EPROBE_DEFER)
+			return ret;
 	} else {
 		dev_err(mac_cb->dev, "mac%d cannot find phy node\n",
 			mac_cb->mac_id);
@@ -1065,6 +1090,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
 			dsaf_dev->mac_cb[port_id] = mac_cb;
 		}
 	}
+
 	/* init mac_cb for all port */
 	for (port_id = 0; port_id < max_port_num; port_id++) {
 		mac_cb = dsaf_dev->mac_cb[port_id];
@@ -1074,6 +1100,7 @@ int hns_mac_init(struct dsaf_device *dsaf_dev)
 		ret = hns_mac_get_cfg(dsaf_dev, mac_cb);
 		if (ret)
 			return ret;
+
 		ret = hns_mac_init_ex(mac_cb);
 		if (ret)
 			return ret;
-- 
1.9.1

^ permalink raw reply related

* [PATCH net v3 3/3] net: hns: fixed bug that skb used after kfree
From: Yankejian @ 2017-04-26  7:00 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493190022-91343-1-git-send-email-yankejian@huawei.com>

From: lipeng <lipeng321@huawei.com>

There is KASAN warning which turn out it's a skb used after free:

    BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x62c/0x940...
    [17659.112635]      alloc_debug_processing+0x18c/0x1a0
    [17659.117208]      __slab_alloc+0x52c/0x560
    [17659.120909]      kmem_cache_alloc_node+0xac/0x2c0
    [17659.125309]      __alloc_skb+0x6c/0x260
    [17659.128837]      tcp_send_ack+0x8c/0x280
    [17659.132449]      __tcp_ack_snd_check+0x9c/0xf0
    [17659.136587]      tcp_rcv_established+0x5a4/0xa70
    [17659.140899]      tcp_v4_do_rcv+0x27c/0x620
    [17659.144687]      tcp_prequeue_process+0x108/0x170
    [17659.149085]      tcp_recvmsg+0x940/0x1020
    [17659.152787]      inet_recvmsg+0x124/0x180
    [17659.156488]      sock_recvmsg+0x64/0x80
    [17659.160012]      SyS_recvfrom+0xd8/0x180
    [17659.163626]      __sys_trace_return+0x0/0x4
    [17659.167506] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=23 cpu=1 pid=13
    [17659.174000]      free_debug_processing+0x1d4/0x2c0
    [17659.178486]      __slab_free+0x240/0x390
    [17659.182100]      kmem_cache_free+0x24c/0x270
    [17659.186062]      kfree_skbmem+0xa0/0xb0
    [17659.189587]      __kfree_skb+0x28/0x40
    [17659.193025]      napi_gro_receive+0x168/0x1c0
    [17659.197074]      hns_nic_rx_up_pro+0x58/0x90
    [17659.201038]      hns_nic_rx_poll_one+0x518/0xbc0
    [17659.205352]      hns_nic_common_poll+0x94/0x140
    [17659.209576]      net_rx_action+0x458/0x5e0
    [17659.213363]      __do_softirq+0x1b8/0x480
    [17659.217062]      run_ksoftirqd+0x64/0x80
    [17659.220679]      smpboot_thread_fn+0x224/0x310
    [17659.224821]      kthread+0x150/0x170
    [17659.228084]      ret_from_fork+0x10/0x40

    BUG: KASAN: use-after-free in hns_nic_net_xmit+0x8c/0xc0...
    [17751.080490]      __slab_alloc+0x52c/0x560
    [17751.084188]      kmem_cache_alloc+0x244/0x280
    [17751.088238]      __build_skb+0x40/0x150
    [17751.091764]      build_skb+0x28/0x100
    [17751.095115]      __alloc_rx_skb+0x94/0x150
    [17751.098900]      __napi_alloc_skb+0x34/0x90
    [17751.102776]      hns_nic_rx_poll_one+0x180/0xbc0
    [17751.107097]      hns_nic_common_poll+0x94/0x140
    [17751.111333]      net_rx_action+0x458/0x5e0
    [17751.115123]      __do_softirq+0x1b8/0x480
    [17751.118823]      run_ksoftirqd+0x64/0x80
    [17751.122437]      smpboot_thread_fn+0x224/0x310
    [17751.126575]      kthread+0x150/0x170
    [17751.129838]      ret_from_fork+0x10/0x40
    [17751.133454] INFO: Freed in kfree_skbmem+0xa0/0xb0 age=19 cpu=7 pid=43
    [17751.139951]      free_debug_processing+0x1d4/0x2c0
    [17751.144436]      __slab_free+0x240/0x390
    [17751.148051]      kmem_cache_free+0x24c/0x270
    [17751.152014]      kfree_skbmem+0xa0/0xb0
    [17751.155543]      __kfree_skb+0x28/0x40
    [17751.159022]      napi_gro_receive+0x168/0x1c0
    [17751.163074]      hns_nic_rx_up_pro+0x58/0x90
    [17751.167041]      hns_nic_rx_poll_one+0x518/0xbc0
    [17751.171358]      hns_nic_common_poll+0x94/0x140
    [17751.175585]      net_rx_action+0x458/0x5e0
    [17751.179373]      __do_softirq+0x1b8/0x480
    [17751.183076]      run_ksoftirqd+0x64/0x80
    [17751.186691]      smpboot_thread_fn+0x224/0x310
    [17751.190826]      kthread+0x150/0x170
    [17751.194093]      ret_from_fork+0x10/0x40

in drivers/net/ethernet/hisilicon/hns/hns_enet.c

    static netdev_tx_t hns_nic_net_xmit(struct sk_buff *skb,
                                        struct net_device *ndev)
    {
        struct hns_nic_priv *priv = netdev_priv(ndev);
        int ret;

        assert(skb->queue_mapping < ndev->ae_handle->q_num);
        ret = hns_nic_net_xmit_hw(ndev, skb,
                                  &tx_ring_data(priv, skb->queue_mapping));
        if (ret == NETDEV_TX_OK) {
                netif_trans_update(ndev);
                ndev->stats.tx_bytes += skb->len;
                ndev->stats.tx_packets++;
        }
         return (netdev_tx_t)ret;
    }

skb maybe freed in hns_nic_net_xmit_hw() and return NETDEV_TX_OK:

    out_err_tx_ok:

        dev_kfree_skb_any(skb);
        return NETDEV_TX_OK;

This patch fix this bug.

Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 22 ++++++++++------------
 drivers/net/ethernet/hisilicon/hns/hns_enet.h |  6 +++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index fca37e2..1e04df6 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -300,9 +300,9 @@ static void fill_tso_desc(struct hnae_ring *ring, void *priv,
 			     mtu);
 }
 
-int hns_nic_net_xmit_hw(struct net_device *ndev,
-			struct sk_buff *skb,
-			struct hns_nic_ring_data *ring_data)
+netdev_tx_t hns_nic_net_xmit_hw(struct net_device *ndev,
+				struct sk_buff *skb,
+				struct hns_nic_ring_data *ring_data)
 {
 	struct hns_nic_priv *priv = netdev_priv(ndev);
 	struct hnae_ring *ring = ring_data->ring;
@@ -361,6 +361,10 @@ int hns_nic_net_xmit_hw(struct net_device *ndev,
 	dev_queue = netdev_get_tx_queue(ndev, skb->queue_mapping);
 	netdev_tx_sent_queue(dev_queue, skb->len);
 
+	netif_trans_update(ndev);
+	ndev->stats.tx_bytes += skb->len;
+	ndev->stats.tx_packets++;
+
 	wmb(); /* commit all data before submit */
 	assert(skb->queue_mapping < priv->ae_handle->q_num);
 	hnae_queue_xmit(priv->ae_handle->qs[skb->queue_mapping], buf_num);
@@ -1474,17 +1478,11 @@ static netdev_tx_t hns_nic_net_xmit(struct sk_buff *skb,
 				    struct net_device *ndev)
 {
 	struct hns_nic_priv *priv = netdev_priv(ndev);
-	int ret;
 
 	assert(skb->queue_mapping < ndev->ae_handle->q_num);
-	ret = hns_nic_net_xmit_hw(ndev, skb,
-				  &tx_ring_data(priv, skb->queue_mapping));
-	if (ret == NETDEV_TX_OK) {
-		netif_trans_update(ndev);
-		ndev->stats.tx_bytes += skb->len;
-		ndev->stats.tx_packets++;
-	}
-	return (netdev_tx_t)ret;
+
+	return hns_nic_net_xmit_hw(ndev, skb,
+				   &tx_ring_data(priv, skb->queue_mapping));
 }
 
 static int hns_nic_change_mtu(struct net_device *ndev, int new_mtu)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.h b/drivers/net/ethernet/hisilicon/hns/hns_enet.h
index 5b412de..7bc6a6e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.h
@@ -91,8 +91,8 @@ struct hns_nic_priv {
 void hns_nic_net_reset(struct net_device *ndev);
 void hns_nic_net_reinit(struct net_device *netdev);
 int hns_nic_init_phy(struct net_device *ndev, struct hnae_handle *h);
-int hns_nic_net_xmit_hw(struct net_device *ndev,
-			struct sk_buff *skb,
-			struct hns_nic_ring_data *ring_data);
+netdev_tx_t hns_nic_net_xmit_hw(struct net_device *ndev,
+				struct sk_buff *skb,
+				struct hns_nic_ring_data *ring_data);
 
 #endif	/**__HNS_ENET_H */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice
From: Gao Feng @ 2017-04-26  6:44 UTC (permalink / raw)
  To: 'Sven Eckelmann'
  Cc: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, a,
	'Gao Feng', davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <3994535.Z9kqCnttI2@bentobox>

> From: Sven Eckelmann [mailto:sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org]
> Sent: Wednesday, April 26, 2017 1:58 PM
> On Mittwoch, 26. April 2017 08:41:30 CEST Gao Feng wrote:
> > On Dienstag, 25. April 2017 20:03:20 CEST gfree.wind-H32Fclmsjq1BDgjK7y7TUQ@public.gmane.org wrote:
> > > From: Gao Feng <fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org>
> > >
> > > Because the func batadv_softif_init_late allocate some resources and
> > > it would be invoked in register_netdevice. So we need to invoke the
> > > func batadv_softif_free instead of free_netdev to cleanup when fail
> > > to register_netdevice.
> >
> > I could be wrong, but shouldn't the destructor be replaced with
> > free_netdevice and the batadv_softif_free (without the free_netdev)
> > used as ndo_uninit? The line you've changed should then be kept as
> free_netdevice.
> >
> > At least this seems to be important when using rtnl_newlink() instead
> > of the legacy sysfs netdev stuff from batman-adv. rtnl_newlink() would
> > also only call free_netdevice and thus also not run
> > batadv_softif_free. This seems to be only fixable by calling ndo_uninit.
> >
> > Sorry, I don't get you.
> > The net_dev is created in this func batadv_softif_create.
> > Why couldn't invoke batadv_softif_free to cleanup when fail to
> > register_netdevice?
> >
> 
> Because it is the legacy way to create the batadv interfaces and there is
a
> "new" one. The new way is to use rtnl link (see batadv_link_ops).
> 
> The rtnl linke (rtnl_newlink) would not benefit from your fix and
therefore still
> show the old behavior. I think a different fix is necessary to solve the
problem
> for both ways to create an batadv interface.
> 
> Kind regards,
> 	Sven

I get it now, thanks.
Actually I sent another patch about the memleaks when invoke newlink and
fail to register_netdev.
You could refer to it by
https://www.mail-archive.com/netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg165253.html.

In this patch, I add the cleanup when fail to register_netdev.
It would be more simple if modify the rtnl_newlink and invoke the destructor
of netdev when failed.
Like the following codes.
	if (ops->newlink) {
		err = ops->newlink(link_net ? : net, dev, tb, data);
		if (err < 0) {
			if (dev->reg_state == NETREG_UNINITIALIZED)
				if (dev->destructor)
					dev->destructor(dev)
				else
					free_netdev(dev);
				goto out;
			}
		} else {
			err = register_netdevice(dev);
			if (err < 0) {
				if (dev->destructor)
					dev->destructor(dev);
				else
					free_netdev(dev);
				goto out;
			}
	}

But I don't know if it breaks the design newlink and destructor.

BTW, I think although the batadv_softif_create is legacy, we should fix it
when it still exists :)

Regards
Feng

^ permalink raw reply

* [iproute2] iplink: add support for IFLA_CARRIER attribute
From: Zhang Shengju @ 2017-04-26  7:08 UTC (permalink / raw)
  To: netdev

Add support to set IFLA_CARRIER attribute.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 ip/iplink.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/ip/iplink.c b/ip/iplink.c
index 866ad72..263bfdd 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -72,6 +72,7 @@ void iplink_usage(void)
 		"	                  [ allmulticast { on | off } ]\n"
 		"	                  [ promisc { on | off } ]\n"
 		"	                  [ trailers { on | off } ]\n"
+		"	                  [ carrier { on | off } ]\n"
 		"	                  [ txqueuelen PACKETS ]\n"
 		"	                  [ name NEWNAME ]\n"
 		"	                  [ address LLADDR ]\n"
@@ -673,6 +674,17 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 				req->i.ifi_flags |= IFF_NOARP;
 			else
 				return on_off("arp", *argv);
+		} else if (strcmp(*argv, "carrier") == 0) {
+			int carrier;
+			NEXT_ARG();
+			if (strcmp(*argv, "on") == 0)
+				carrier = 1;
+			else if (strcmp(*argv, "off") == 0)
+				carrier = 0;
+			else
+				return on_off("carrier", *argv);
+
+			addattr8(&req->n, sizeof(*req), IFLA_CARRIER, carrier);
 		} else if (strcmp(*argv, "vf") == 0) {
 			struct rtattr *vflist;
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice
From: Sven Eckelmann @ 2017-04-26  7:16 UTC (permalink / raw)
  To: Gao Feng
  Cc: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, a,
	'Gao Feng', davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <002101d2be58$8bcaaa00$a35ffe00$@foxmail.com>

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

On Mittwoch, 26. April 2017 14:44:24 CEST Gao Feng wrote:
[...]
> I get it now, thanks.
[...]
> BTW, I think although the batadv_softif_create is legacy, we should fix it
> when it still exists :)

I didn't meant that we should not fix it. I just said that it looks to me like 
the fix should look different to ensure that it actually fixes the sysfs and 
rtnl link implementation for the batadv interface creation. Right now the 
ndo_uninit (when it would be set by batadv) is called in the netdev core 
functions when an error happens during the registration. This is not the case 
for the destructor. Your patch would not change it. It therefore looks like 
you simply have to move the current destructor (without the free_netdev) to 
ndo_uninit and change the destructor to free_netdev.

The batadv ops doesn't have a newlink function. It will therefore use the 
register_netdevice code path which calls free_netdev on failures. The extra 
cleanup you've added in
 https://www.mail-archive.com/netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg165253.html can 
therefore not work for batman-adv. Actually, it is not touching anything 
batman-adv related. The suggestion to change the register_netdevice -> 
free_netdev part in rtnl_newlink was new in the reply to the batadv 
discussion. It is therefore still an open discussion how it is correctly 
fixed.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [RFC 1/4] netlink: make extended ACK setting NULL-friendly
From: Johannes Berg @ 2017-04-26  7:17 UTC (permalink / raw)
  To: Jakub Kicinski, daniel
  Cc: netdev, davem, dsa, alexei.starovoitov, bblanco, john.fastabend,
	oss-drivers
In-Reply-To: <20170425135306.0ff1db27@cakuba.netronome.com>

On Tue, 2017-04-25 at 13:53 -0700, Jakub Kicinski wrote:
> On Tue, 25 Apr 2017 10:13:34 +0200, Johannes Berg wrote:
> > On Tue, 2017-04-25 at 01:06 -0700, Jakub Kicinski wrote:
> > 
> > > +#define NL_SET_ERR_MSG(extack, msg) do {		\
> > > +	struct netlink_ext_ack *_extack = (extack);	\
> > > +	static const char _msg[] = (msg);		\
> > > +							\
> > > +	if (_extack)					\
> > > +		_extack->_msg = _msg;			\
> > > +	else						\
> > > +		pr_info("%s\n", _msg);			\
> > >  } while (0)  

> I'm leaning towards dropping the else clause and never printing, that
> will add an incentive for people to convert more paths to provide the
> ext ack.  Any thoughts on that?

Personally, I'm happy with that.

johannes

^ permalink raw reply

* [PATCH net-next v1] net: ipv6: make sure multicast packets are not forwarded beyond the different scopes
From: Donatas Abraitis @ 2017-04-26  7:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, stable, donatas.abraitis

	    RFC4291 2.7 Routers must not forward any multicast packets
	    beyond of the scope indicated by the scop field in the
	    destination multicast address.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
---
 net/ipv6/ip6_input.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 9ee208a..a834797 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -165,6 +165,14 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 	    IPV6_ADDR_MC_SCOPE(&hdr->daddr) == 0)
 		goto err;
 
+	/* RFC4291 2.7
+	 * Routers must not forward any multicast packets beyond of the scope
+	 * indicated by the scop field in the destination multicast address.
+	*/
+	if (ipv6_addr_is_multicast(&hdr->daddr) &&
+	    IPV6_ADDR_MC_SCOPE(&hdr->daddr) != IPV6_ADDR_MC_SCOPE(&hdr->saddr)
+	        goto err;
+
 	/*
 	 * RFC4291 2.7
 	 * Multicast addresses must not be used as source addresses in IPv6
-- 
2.10.0

^ permalink raw reply related

* Re: [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice
From: Gao Feng @ 2017-04-26  7:28 UTC (permalink / raw)
  To: 'Sven Eckelmann'
  Cc: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, a,
	'Gao Feng', davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <2857108.moq831zFsU@bentobox>

> From: Sven Eckelmann [mailto:sven-KaDOiPu9UxWEi8DpZVb4nw@public.gmane.org]
> Sent: Wednesday, April 26, 2017 3:17 PM
> On Mittwoch, 26. April 2017 14:44:24 CEST Gao Feng wrote:
> [...]
> > I get it now, thanks.
> [...]
> > BTW, I think although the batadv_softif_create is legacy, we should
> > fix it when it still exists :)
> 
> I didn't meant that we should not fix it. I just said that it looks to me
like the fix
> should look different to ensure that it actually fixes the sysfs and rtnl
link
> implementation for the batadv interface creation. Right now the ndo_uninit
> (when it would be set by batadv) is called in the netdev core functions
when an
> error happens during the registration. This is not the case for the
destructor.

Thanks your answer.
I assumed the destructor is not for this case before.. 

> Your patch would not change it. It therefore looks like you simply have to
move
> the current destructor (without the free_netdev) to ndo_uninit and change
the
> destructor to free_netdev.

Yes, that patch didn't touch badman-adv. Because current badman-adv doesn't
support newlink now.
It would be good that cleanup the resource in ndo_uninit routine.

Best Regards
Feng
 
> 
> The batadv ops doesn't have a newlink function. It will therefore use the
> register_netdevice code path which calls free_netdev on failures. The
extra
> cleanup you've added in
> https://www.mail-archive.com/netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg165253.html can
> therefore not work for batman-adv. Actually, it is not touching anything
> batman-adv related. The suggestion to change the register_netdevice ->
> free_netdev part in rtnl_newlink was new in the reply to the batadv
discussion.
> It is therefore still an open discussion how it is correctly fixed.
> 
> Kind regards,
> 	Sven

^ permalink raw reply

* Re: [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
From: Roger Pau Monné @ 2017-04-26  7:37 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Boris Ostrovsky, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b, James E.J. Bottomley,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sumit Semwal,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	linux-media-u79uwXL29TY76Z2rM5mHXA, Juergen Gross, Julien Grall,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w, Jens Axboe,
	Martin K. Petersen, netdev-u79uwXL29TY76Z2rM5mHXA, Matthew Wilcox,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <1493144468-22493-16-git-send-email-logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

On Tue, Apr 25, 2017 at 12:21:02PM -0600, Logan Gunthorpe wrote:
> Straightforward conversion to the new helper, except due to the lack
> of error path, we have to use SG_MAP_MUST_NOT_FAIL which may BUG_ON in
> certain cases in the future.
> 
> Signed-off-by: Logan Gunthorpe <logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
> Cc: Boris Ostrovsky <boris.ostrovsky-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Cc: Juergen Gross <jgross-IBi9RG/b67k@public.gmane.org>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Cc: "Roger Pau Monné" <roger.pau-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/block/xen-blkfront.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 3945963..ed62175 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -816,8 +816,9 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  		BUG_ON(sg->offset + sg->length > PAGE_SIZE);
>  
>  		if (setup.need_copy) {
> -			setup.bvec_off = sg->offset;
> -			setup.bvec_data = kmap_atomic(sg_page(sg));
> +			setup.bvec_off = 0;
> +			setup.bvec_data = sg_map(sg, 0, SG_KMAP_ATOMIC |
> +						 SG_MAP_MUST_NOT_FAIL);

I assume that sg_map already adds sg->offset to the address?

Also wondering whether we can get rid of bvec_off and just increment bvec_data,
adding Julien who IIRC added this code.

>  		}
>  
>  		gnttab_foreach_grant_in_range(sg_page(sg),
> @@ -827,7 +828,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  					      &setup);
>  
>  		if (setup.need_copy)
> -			kunmap_atomic(setup.bvec_data);
> +			sg_unmap(sg, setup.bvec_data, 0, SG_KMAP_ATOMIC);
>  	}
>  	if (setup.segments)
>  		kunmap_atomic(setup.segments);
> @@ -1053,7 +1054,7 @@ static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
>  		case XEN_SCSI_DISK5_MAJOR:
>  		case XEN_SCSI_DISK6_MAJOR:
>  		case XEN_SCSI_DISK7_MAJOR:
> -			*offset = (*minor / PARTS_PER_DISK) + 
> +			*offset = (*minor / PARTS_PER_DISK) +
>  				((major - XEN_SCSI_DISK1_MAJOR + 1) * 16) +
>  				EMULATED_SD_DISK_NAME_OFFSET;
>  			*minor = *minor +
> @@ -1068,7 +1069,7 @@ static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
>  		case XEN_SCSI_DISK13_MAJOR:
>  		case XEN_SCSI_DISK14_MAJOR:
>  		case XEN_SCSI_DISK15_MAJOR:
> -			*offset = (*minor / PARTS_PER_DISK) + 
> +			*offset = (*minor / PARTS_PER_DISK) +
>  				((major - XEN_SCSI_DISK8_MAJOR + 8) * 16) +
>  				EMULATED_SD_DISK_NAME_OFFSET;
>  			*minor = *minor +
> @@ -1119,7 +1120,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>  	if (!VDEV_IS_EXTENDED(info->vdevice)) {
>  		err = xen_translate_vdev(info->vdevice, &minor, &offset);
>  		if (err)
> -			return err;		
> +			return err;

Cosmetic changes should go in a separate patch please.

Roger.

^ permalink raw reply

* Re: [PATCH v2 01/21] scatterlist: Introduce sg_map helper functions
From: Christoph Hellwig @ 2017-04-26  7:44 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-raid-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw,
	megaraidlinux.pdl-dY08KVG/lbpWk0Htik3J/w,
	sparmaintainer-GLv8BlqOqDDQT0dZR+AlfA,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	target-devel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, Jens Axboe,
	Greg Kroah-Hartman, Dan Williams, Ross Zwisler, Matthew Wilcox,
	Sumit Semwal, Stephen Bates <s
In-Reply-To: <1493144468-22493-2-git-send-email-logang-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>

On Tue, Apr 25, 2017 at 12:20:48PM -0600, Logan Gunthorpe wrote:
> This patch introduces functions which kmap the pages inside an sgl.
> These functions replace a common pattern of kmap(sg_page(sg)) that is
> used in more than 50 places within the kernel.
> 
> The motivation for this work is to eventually safely support sgls that
> contain io memory. In order for that to work, any access to the contents
> of an iomem SGL will need to be done with iomemcpy or hit some warning.
> (The exact details of how this will work have yet to be worked out.)

I think we'll at least need a draft of those to make sense of these
patches.  Otherwise they just look very clumsy.

> + *   Use this function to map a page in the scatterlist at the specified
> + *   offset. sg->offset is already added for you. Note: the semantics of
> + *   this function are that it may fail. Thus, its output should be checked
> + *   with IS_ERR and PTR_ERR. Otherwise, a pointer to the specified offset
> + *   in the mapped page is returned.
> + *
> + *   Flags can be any of:
> + *	* SG_KMAP		- Use kmap to create the mapping
> + *	* SG_KMAP_ATOMIC	- Use kmap_atomic to map the page atommically.
> + *				  Thus, the rules of that function apply: the
> + *				  cpu may not sleep until it is unmaped.
> + *	* SG_MAP_MUST_NOT_FAIL	- Indicate that sg_map must not fail.
> + *				  If it does, it will issue a BUG_ON instead.
> + *				  This is intended for legacy code only, it
> + *				  is not to be used in new code.

I'm sorry but this API is just a trainwreck.  Right now we have the
nice little kmap_atomic API, which never fails and has a very nice
calling convention where we just pass back the return address, but does
not support sleeping inside the critical section.

And kmap, whіch may fail and requires the original page to be passed
back.  Anything that mixes these two concepts up is simply a non-starter.

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH] IB/IPoIB: Check the headroom size
From: Paolo Abeni @ 2017-04-26  7:46 UTC (permalink / raw)
  To: Or Gerlitz, Doug Ledford
  Cc: Erez Shitrit, Honggang LI, Erez Shitrit,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux Netdev List, David Miller
In-Reply-To: <CAJ3xEMjqkJrKi+6ronuPBn2P6y8p6sdhVppDzFtMwQrhL13bzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 2017-04-26 at 00:50 +0300, Or Gerlitz wrote:
> so maybe @ least for the time being, we should be picking Hong's patch
> with proper change log and without the giant stack dump till we have
> something better. If you agree, can you do the re-write of the change
> log?

I think that Hong's patch is following the correct way to fix the
issue: ipoib_hard_header() can't assume that the skb headroom is at
least IPOIB_HARD_LEN bytes, as wrongly implied by commit fc791b633515
(my fault, I'm sorry).

Perhaps we can make the code a little more robust with something
alongside the following (only compile tested):
---
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index d1d3fb7..d53d2e1 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1160,6 +1160,11 @@ static int ipoib_hard_header(struct sk_buff *skb,
                             const void *daddr, const void *saddr, unsigned len)
 {
        struct ipoib_header *header;
+       int ret;
+
+       ret = skb_cow_head(skb, IPOIB_HARD_LEN);
+       if (ret)
+               return ret;
 
        header = (struct ipoib_header *) skb_push(skb, sizeof *header);
---

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox