Netdev List
 help / color / mirror / Atom feed
* [PATCH net v3 1/3] net: hns: support deferred probe when can not obtain irq
From: Yankejian @ 2017-04-26  7:00 UTC (permalink / raw)
  To: davem, salil.mehta, yisen.zhuang, matthias.bgg, yankejian,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493190022-91343-1-git-send-email-yankejian@huawei.com>

From: lipeng <lipeng321@huawei.com>

In the hip06 and hip07 SoCs, the interrupt lines from the
DSAF controllers are connected to mbigen hw module.
The mbigen module is probed with module_init, and, as such,
is not guaranteed to probe before the HNS driver. So we need
to support deferred probe.

We check for probe deferral in the hw layer probe, so we not
probe into the main layer and memories, etc., to later learn
that we need to defer the probe.

Signed-off-by: lipeng <lipeng321@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
---
V2 -> V3:
1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++++++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index 6ea8722..a41cf95 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
 
 		hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
 
-		hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+		ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
+		if (ret)
+			goto get_rcb_cfg_fail;
 	}
 
 	for (i = 0; i < HNS_PPE_COM_NUM; i++)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index f0ed80d6..673a5d3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -452,7 +452,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common)
  *hns_rcb_get_cfg - get rcb config
  *@rcb_common: rcb common device
  */
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 {
 	struct ring_pair_cb *ring_pair_cb;
 	u32 i;
@@ -477,10 +477,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
 		ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
 		is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
 			  platform_get_irq(pdev, base_irq_idx + i * 3);
+		if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) ||
+		    (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
+			return -EPROBE_DEFER;
+
 		ring_pair_cb->q.phy_base =
 			RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
 		hns_rcb_ring_pair_get_cfg(ring_pair_cb);
 	}
+
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
index 99b4e1b..3d7b484 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
@@ -110,7 +110,7 @@ struct rcb_common_cb {
 void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index);
 int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
 void hns_rcb_start(struct hnae_queue *q, u32 val);
-void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
+int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
 void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
 			    u16 *max_vfn, u16 *max_q_per_vf);
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next v8 2/3] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jiri Pirko @ 2017-04-26  6:19 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: davem, xiyou.wangcong, eric.dumazet, netdev
In-Reply-To: <4b7789f7-69e0-4764-7029-f6e15d6e7d69@mojatatu.com>

Tue, Apr 25, 2017 at 10:29:40PM CEST, jhs@mojatatu.com wrote:
>On 17-04-25 12:04 PM, Jiri Pirko wrote:
>> Tue, Apr 25, 2017 at 03:01:22PM CEST, jhs@mojatatu.com wrote:
>> > On 17-04-25 08:13 AM, Jiri Pirko wrote:
>> > > Tue, Apr 25, 2017 at 01:54:06PM CEST, jhs@mojatatu.com wrote:
>
>> > > > 
>> > > > +static inline bool tca_flags_valid(u32 act_flags)
>> > > > +{
>> > > > +	u32 invalid_flags_mask  = ~VALID_TCA_FLAGS;
>> > > > +
>> > > > +	if (act_flags & invalid_flags_mask)
>> > > > +		return false;
>> > > 
>> > > I don't see how this resolves anything. VALID_TCA_FLAGS is set in stone
>> > > not going to change anytime in future, right?
>> > 
>> > Every time a new bit gets added VALID_TCA_FLAGS changes.
>> 
>> You mean flag that user can set? If that is the case, you are breaking
>> UAPI for newer app running on older kernel.
>> 
>
>Ok, let me try to explain with more clarity. The rules Iam
>trying to follow are:
>if i see any bit set that i dont understand I will reject.
>
>So lets in first kernel I have support for bit 0.
>My validation check is to make sure only bit 0 is set.
>The valid_flags currently then only constitutes bit 0.
>i.e
>If you set bit 2 or 3, the function above will reject and i return
>the error to the user.
>
>That is expected behavior correct?
>
>3 months down the road:
>I add two flags - bit 1 and 2.
>So now my valid_flags changes to bits 1, 2 and 0.
>
>The function above will now return true for bits 0-2 but
>will reject if you set bit 3.
>
>That is expected behavior, correct?

The same app compiled against new kernel with bits (0, 1, 2) will run with
this kernel good. But if you run it with older kernel, the kernel (0)
would refuse. Is that ok?



>
>On u32/16/8:
>I am choosing u32 so it allows me to add more upto 32 bit flags.
>Not all 32 are needed today but it is better insurance.
>If I used u8 then the 24 of those 32 bits i dont use will be used
>as pads in the TLV. So it doesnt make sense for me to use a u8/16.

Jamal, note that I never suggested having more flags in a single attr.
Therefore I suggested u8 to carry a single flag.

You say that it has performance impact having 3 flag attrs in compare to
one bit flag attr. Could you please provide some numbers?

I expect that you will not be able to show the difference. And if there
is no difference in performance, your main argument goes away. And we
can do this in a nice, clear, TLV fashion.


>
>Does that make more sense?
>
>cheers,
>jamal

^ permalink raw reply

* Re: [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice
From: Sven Eckelmann @ 2017-04-26  5:57 UTC (permalink / raw)
  To: Gao Feng
  Cc: mareklindner-rVWd3aGhH2z5bpWLKbzFeg,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, a,
	'Gao Feng', davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <002901d2be25$d9092bd0$8b1b8370$@foxmail.com>

[-- Attachment #1: Type: text/plain, Size: 1536 bytes --]

On Mittwoch, 26. April 2017 08:41:30 CEST Gao Feng wrote:
> On Dienstag, 25. April 2017 20:03:20 CEST gfree.wind-H32Fclmsjq1BDgjK7y7TUQ@public.gmane.org wrote:
> > From: Gao Feng <fgao-KlmEoCYek3zQT0dZR+AlfA@public.gmane.org>
> > 
> > Because the func batadv_softif_init_late allocate some resources and
> > it would be invoked in register_netdevice. So we need to invoke the
> > func batadv_softif_free instead of free_netdev to cleanup when fail
> > to register_netdevice.
> 
> I could be wrong, but shouldn't the destructor be replaced with free_netdevice 
> and the batadv_softif_free (without the free_netdev) used as ndo_uninit? The 
> line you've changed should then be kept as free_netdevice.
> 
> At least this seems to be important when using rtnl_newlink() instead of the 
> legacy sysfs netdev stuff from batman-adv. rtnl_newlink() would also only call 
> free_netdevice and thus also not run batadv_softif_free. This seems to be only 
> fixable by calling ndo_uninit.
> 
> Sorry, I don't get you.
> The net_dev is created in this func batadv_softif_create.
> Why couldn't invoke batadv_softif_free to cleanup when fail to
> register_netdevice?
> 

Because it is the legacy way to create the batadv interfaces and there is a 
"new" one. The new way is to use rtnl link (see batadv_link_ops). 

The rtnl linke (rtnl_newlink) would not benefit from your fix and therefore 
still show the old behavior. I think a different fix is necessary to solve the 
problem for both ways to create an batadv interface.

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [Patch net 3/3] team: use a larger struct for mac address
From: Jiri Pirko @ 2017-04-26  5:40 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, andreyknvl
In-Reply-To: <1493183003-884-4-git-send-email-xiyou.wangcong@gmail.com>

Wed, Apr 26, 2017 at 07:03:23AM CEST, xiyou.wangcong@gmail.com wrote:
>IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
>but in many places especially bonding, we use struct sockaddr
>to copy and set mac addr, this could lead to stack out-of-bounds
>access.
>
>Fix it by using a larger address storage.
>
>Reported-by: Andrey Konovalov <andreyknvl@google.com>
>Cc: Jiri Pirko <jiri@resnulli.us>
>Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>---
> drivers/net/team/team.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index 85c0124..88878f1 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -60,10 +60,13 @@ static struct team_port *team_port_get_rtnl(const struct net_device *dev)
> static int __set_port_dev_addr(struct net_device *port_dev,
> 			       const unsigned char *dev_addr)
> {
>-	struct sockaddr addr;
>+	struct {
>+		unsigned short type;
>+		unsigned char addr[MAX_ADDR_LEN];
>+	} addr;

Wouldn't it make sense to define this struct somewhere in the core
headers?


> 
>-	memcpy(addr.sa_data, dev_addr, port_dev->addr_len);
>-	addr.sa_family = port_dev->type;
>+	memcpy(addr.addr, dev_addr, port_dev->addr_len);
>+	addr.type = port_dev->type;
> 	return dev_set_mac_address(port_dev, &addr);
> }
> 
>-- 
>2.5.5
>

^ permalink raw reply

* Re: hmmm...
From: Alexei Starovoitov @ 2017-04-26  5:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20170425.233820.1357836307906136732.davem@davemloft.net>

On 4/25/17 8:38 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Tue, 25 Apr 2017 19:56:06 -0700
>
>> On 4/25/17 6:52 PM, David Miller wrote:
>>>
>>> Alexei, I found something strange on my computer :-)
>>>
>>> [davem@localhost binutils]$ ./objdump -d x.o
>>
>> No way! :) I thought it will take weeks!
>> Ship it. Ship it. Ship it.
>> Cannot wait to pull.
>> This is awesome. Thanks a ton!
>
> Relax, it is in a very raw state still. :)

hehe, before replied i checked binutils repo...
hoping to find a miracle there :)

>> What is the mnemonic for 32-bit alu ?
>
> It is of the form "xxx32".  Here is opcodes table below.
>
> I think there are no formal mnenomics defined anywhere yet right?
> So just tell me what adjustments you want to make.
>
> const struct bpf_opcode bpf_opcodes[] = {
>   { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
>   { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
>   { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },

the very first bpf gen looked like canonical asm,
but then I found myself spending too much time
looking at ldb and ldd and thinking is it 8-byte
or 64-byte load?
jgt/jge/jsgt/sge was a stumbling block for me as well,
since it still takes me longer than necessary to disambiguate
into > vs >= and signed/unsigned
gcc bpf backend was done way before llvm and it had
quite confusing GT vs bpf GT vs x86 GT.
https://github.com/iovisor-obsolete/bpf_gcc/commit/ff47b3acd6d3c60900fab8cd93afd8106c49c8c3#diff-ee684c67b9329b734b7fb535d86a558dR452
gcc GTU == bpf GT == x86 GA
gcc GT == bpf SGT == x86 G
gcc GE == bpf SGE == x86 GE
I couldn't change bpf GT to GTU to match gcc, since it came
from classic bpf, so could only add SGT. S for Signed.
Through all this confusion I figured that the only output
not superhuman can quickly read is C-like output,
hence the final gcc output looked like:
https://github.com/iovisor-obsolete/bpf_gcc/commit/ff47b3acd6d3c60900fab8cd93afd8106c49c8c3#diff-0db71d60fd74b52befee102a27f0b4c4R292
+(define_insn "movdi_2mem"
+  [(set (match_operand:DI 0 "memory_operand" "=m,m")
+	(match_operand:DI 1 "int_reg_or_const_operand" "r,K"))]
+  ""
+  "@
+   BPF_INSN_ST(BPF_DW, %0, %1), // *(uint64*)(%0)=%1
+   BPF_INSN_ST_IMM(BPF_DW, %0, %1), // *(uint64*)(%0)=%1"
+[(set_attr "type" "store, store")])

yes, gcc was emitting C macros with C looking comments
into fake .s file that we #included into .c code to be loaded
into the kernel.
(that code is abandoned and probably doesn't compile anymore)

I made the verifier to emit C-like output to make it easier
for program writers to understand.

llvm asm until version 4.0 looked like:
         mov     r0, 1
         ldd     r4, 8(r1)
         ldd     r3, 0(r1)
         mov     r5, r3
         addi    r5, 14
         jgt     r5, r4 goto LBB0_3
which was causing confusion as well, so when I realized
that llvm can emit and parse any type of text, I switched
to emit verifier like asm code:
         r2 = *(u64 *)(r10 - 16)
         r5 = r2
         r5 += 34
         r0 = 2
         if r5 > r3 goto LBB0_1
which I think is way easier to read.

Though I think Daniel still prefers old classic bpf asm ;)

Anyway, back to the question...
since BFD and GCC are so much entrenched into canonical style
of asm code, I don't mind that gnu toolchain will be using it.

I like that you used 'dw' in 'ldxdw' instead of just 'd'
though 'x' can probably be dropped.

'x' should be added here instead:
  { "stb", BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_B, "[1+O],i" },
  { "stb", BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_B, "[1+O],2" },

I would use ld8/ld16/ld32/ld64 and st8/stx8/st16/stx16/...
but I spent too much time looking into asm and I'm biased.

Thanks again for working on it! This is major milestone.

^ permalink raw reply

* Re: IGMP on IPv6
From: Cong Wang @ 2017-04-26  5:24 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: Hangbin Liu, open list:TI NETCP ETHERNET DRIVER, David Miller
In-Reply-To: <58FF7656.8050506@ti.com>

On Tue, Apr 25, 2017 at 9:16 AM, Murali Karicheri <m-karicheri2@ti.com> wrote:
> On 04/18/2017 06:37 PM, Cong Wang wrote:
>> Did you assign IPv4 and IPv6 addresses to the HSR master device?
>
> No. I just used IPv4. From the trace mld_ifc_timer_expire() -> mld_sendpack() -> ip6_output()
> do you know what is it trying to do? Is it some neighbor discovery message or something
> going over the lower interface instead of the hsr interface?
>

IPv6 is special here because it has link-local address configured
automatically for each device, so this mld_ifc_timer is armed
for link-local multicasts. See:


 464         /* Join interface-local all-node multicast group */
 465         ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes);
 466
 467         /* Join all-node multicast group */
 468         ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
 469
 470         /* Join all-router multicast group if forwarding is set */
 471         if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST))
 472                 ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);

^ permalink raw reply

* 47490 netdev
From: kelley @ 2017-04-26  5:19 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 47176743700.zip --]
[-- Type: application/zip, Size: 1674 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/3] samples/bpf: check before defining offsetof
From: Alexander Alemayhu @ 2017-04-26  5:17 UTC (permalink / raw)
  To: David Laight
  Cc: 'Daniel Borkmann', netdev@vger.kernel.org, ast@fb.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DCFFDAD25@AcuExch.aculab.com>

On Tue, Apr 25, 2017 at 02:27:16PM +0000, David Laight wrote:
> 
> Isn't the correct fix to include stddef.h ?
>
If you think it's the correct fix, please send a patch to netdev.

Thanks.

-- 
Mit freundlichen Grüßen

Alexander Alemayhu

^ permalink raw reply

* [Patch net 3/3] team: use a larger struct for mac address
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang, Jiri Pirko
In-Reply-To: <1493183003-884-1-git-send-email-xiyou.wangcong@gmail.com>

IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.

Fix it by using a larger address storage.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/net/team/team.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 85c0124..88878f1 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -60,10 +60,13 @@ static struct team_port *team_port_get_rtnl(const struct net_device *dev)
 static int __set_port_dev_addr(struct net_device *port_dev,
 			       const unsigned char *dev_addr)
 {
-	struct sockaddr addr;
+	struct {
+		unsigned short type;
+		unsigned char addr[MAX_ADDR_LEN];
+	} addr;
 
-	memcpy(addr.sa_data, dev_addr, port_dev->addr_len);
-	addr.sa_family = port_dev->type;
+	memcpy(addr.addr, dev_addr, port_dev->addr_len);
+	addr.type = port_dev->type;
 	return dev_set_mac_address(port_dev, &addr);
 }
 
-- 
2.5.5

^ permalink raw reply related

* [Patch net 2/3] bonding: use a larger struct for mac address
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang, Jay Vosburgh
In-Reply-To: <1493183003-884-1-git-send-email-xiyou.wangcong@gmail.com>

IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.

Fix it by using a larger address storage.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 drivers/net/bonding/bond_alb.c     |  8 ++++----
 drivers/net/bonding/bond_main.c    | 17 +++++++++--------
 drivers/net/bonding/bonding_priv.h |  6 ++++++
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c80b023..0038266 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -39,7 +39,7 @@
 #include <asm/byteorder.h>
 #include <net/bonding.h>
 #include <net/bond_alb.h>
-
+#include "bonding_priv.h"
 
 
 static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = {
@@ -1234,7 +1234,7 @@ static int alb_set_mac_address(struct bonding *bond, void *addr)
 {
 	struct slave *slave, *rollback_slave;
 	struct list_head *iter;
-	struct sockaddr sa;
+	struct bond_mac_addr sa;
 	char tmp_addr[ETH_ALEN];
 	int res;
 
@@ -1257,8 +1257,8 @@ static int alb_set_mac_address(struct bonding *bond, void *addr)
 	return 0;
 
 unwind:
-	memcpy(sa.sa_data, bond->dev->dev_addr, bond->dev->addr_len);
-	sa.sa_family = bond->dev->type;
+	memcpy(sa.addr, bond->dev->dev_addr, bond->dev->addr_len);
+	sa.type = bond->dev->type;
 
 	/* unwind from head to the slave that failed */
 	bond_for_each_slave(bond, rollback_slave, iter) {
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8a4ba8b..7a9bd30 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1330,7 +1330,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	struct bonding *bond = netdev_priv(bond_dev);
 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
 	struct slave *new_slave = NULL, *prev_slave;
-	struct sockaddr addr;
+	struct bond_mac_addr addr;
 	int link_reporting;
 	int res = 0, i;
 
@@ -1488,8 +1488,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		/* Set slave to master's mac address.  The application already
 		 * set the master's mac address to that of the first slave
 		 */
-		memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
-		addr.sa_family = slave_dev->type;
+		memcpy(addr.addr, bond_dev->dev_addr, bond_dev->addr_len);
+		addr.type = slave_dev->type;
 		res = dev_set_mac_address(slave_dev, &addr);
 		if (res) {
 			netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res);
@@ -1773,8 +1773,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		 * MAC if this slave's MAC is in use by the bond, or at
 		 * least print a warning.
 		 */
-		ether_addr_copy(addr.sa_data, new_slave->perm_hwaddr);
-		addr.sa_family = slave_dev->type;
+		ether_addr_copy(addr.addr, new_slave->perm_hwaddr);
+		addr.type = slave_dev->type;
 		dev_set_mac_address(slave_dev, &addr);
 	}
 
@@ -3619,7 +3619,8 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave, *rollback_slave;
-	struct sockaddr *sa = addr, tmp_sa;
+	struct sockaddr *sa = addr;
+	struct bond_mac_addr tmp_sa;
 	struct list_head *iter;
 	int res = 0;
 
@@ -3659,8 +3660,8 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 	return 0;
 
 unwind:
-	memcpy(tmp_sa.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
-	tmp_sa.sa_family = bond_dev->type;
+	memcpy(tmp_sa.addr, bond_dev->dev_addr, bond_dev->addr_len);
+	tmp_sa.type = bond_dev->type;
 
 	/* unwind from head to the slave that failed */
 	bond_for_each_slave(bond, rollback_slave, iter) {
diff --git a/drivers/net/bonding/bonding_priv.h b/drivers/net/bonding/bonding_priv.h
index 5a4d81a..8b3a7e3 100644
--- a/drivers/net/bonding/bonding_priv.h
+++ b/drivers/net/bonding/bonding_priv.h
@@ -22,4 +22,10 @@
 
 #define bond_version DRV_DESCRIPTION ": v" DRV_VERSION " (" DRV_RELDATE ")\n"
 
+/* Must be compatible with struct sockaddr */
+struct bond_mac_addr {
+	unsigned short type;
+	unsigned char addr[MAX_ADDR_LEN];
+};
+
 #endif
-- 
2.5.5

^ permalink raw reply related

* [Patch net 1/3] net: check mac address length for dev_set_mac_address()
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang
In-Reply-To: <1493183003-884-1-git-send-email-xiyou.wangcong@gmail.com>

dev_set_mac_address() accepts a struct sockaddr pointer as
input but we have various types of mac addresse whose lengths
are up to MAX_ADDR_LEN, this is confusing.

Make it void like ->ndo_set_mac_address() and let callers check
its length before calling it. It is too late to fix dev_ifsioc()
due to API compatiblity, so just reject those larger than
sizeof(struct sockaddr).

This is also a preparation for the following patches.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/linux/netdevice.h |  2 +-
 net/core/dev.c            | 10 +++++++---
 net/core/dev_ioctl.c      |  2 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97456b25..40e674c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3271,7 +3271,7 @@ int dev_set_alias(struct net_device *, const char *, size_t);
 int dev_change_net_namespace(struct net_device *, struct net *, const char *);
 int dev_set_mtu(struct net_device *, int);
 void dev_set_group(struct net_device *, int);
-int dev_set_mac_address(struct net_device *, struct sockaddr *);
+int dev_set_mac_address(struct net_device *, void *);
 int dev_change_carrier(struct net_device *, bool new_carrier);
 int dev_get_phys_port_id(struct net_device *dev,
 			 struct netdev_phys_item_id *ppid);
diff --git a/net/core/dev.c b/net/core/dev.c
index 533a6d6..cc670e1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6610,13 +6610,17 @@ EXPORT_SYMBOL(dev_set_group);
 /**
  *	dev_set_mac_address - Change Media Access Control Address
  *	@dev: device
- *	@sa: new address
+ *	@addr: new address, whose type could be either struct sockaddr or
+ *	any other compatible type whose length is up to MAX_ADDR_LEN depending
+ *	on the dev->addr_len. Callers should check if its length is smaller than
+ *	dev->addr_len!!
  *
  *	Change the hardware (MAC) address of the device
  */
-int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
+int dev_set_mac_address(struct net_device *dev, void *addr)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
+	struct sockaddr *sa = addr;
 	int err;
 
 	if (!ops->ndo_set_mac_address)
@@ -6625,7 +6629,7 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
 		return -EINVAL;
 	if (!netif_device_present(dev))
 		return -ENODEV;
-	err = ops->ndo_set_mac_address(dev, sa);
+	err = ops->ndo_set_mac_address(dev, addr);
 	if (err)
 		return err;
 	dev->addr_assign_type = NET_ADDR_SET;
diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
index b94b1d2..11f262e 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -261,6 +261,8 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
 		return dev_set_mtu(dev, ifr->ifr_mtu);
 
 	case SIOCSIFHWADDR:
+		if (dev->addr_len > sizeof(struct sockaddr))
+			return -EINVAL;
 		return dev_set_mac_address(dev, &ifr->ifr_hwaddr);
 
 	case SIOCSIFHWBROADCAST:
-- 
2.5.5

^ permalink raw reply related

* [Patch net 0/3] net: fix a stack out-of-bound access
From: Cong Wang @ 2017-04-26  5:03 UTC (permalink / raw)
  To: netdev; +Cc: andreyknvl, Cong Wang

This patchset fixes the following kernel bug reported by Andrey:

 BUG: KASAN: stack-out-of-bounds in bond_enslave+0xe0a/0x4ef0 at addr
 ffff8800666b7792
 Write of size 16 by task a.out/3894
 page:ffffea000199adc0 count:0 mapcount:0 mapping:          (null) index:0x0
 flags: 0x100000000000000()
 raw: 0100000000000000 0000000000000000 0000000000000000 00000000ffffffff
 raw: 0000000000000000 ffffea000199ade0 0000000000000000 0000000000000000
 page dumped because: kasan: bad access detected
 CPU: 1 PID: 3894 Comm: a.out Not tainted 4.11.0-rc7+ #251
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:16
  dump_stack+0x292/0x398 lib/dump_stack.c:52
  kasan_report_error mm/kasan/report.c:212
  kasan_report+0x4d8/0x510 mm/kasan/report.c:347
  check_memory_region_inline mm/kasan/kasan.c:326
  check_memory_region+0x139/0x190 mm/kasan/kasan.c:333
  memcpy+0x37/0x50 mm/kasan/kasan.c:369
  bond_enslave+0xe0a/0x4ef0 drivers/net/bonding/bond_main.c:1491
  bond_do_ioctl+0xb5d/0xec0 drivers/net/bonding/bond_main.c:3449
  dev_ifsioc+0x53f/0x9f0 net/core/dev_ioctl.c:338
  dev_ioctl+0x249/0x1160 net/core/dev_ioctl.c:532
  sock_do_ioctl+0x94/0xb0 net/socket.c:913
  sock_ioctl+0x28f/0x440 net/socket.c:1004
  vfs_ioctl fs/ioctl.c:45
  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
  SYSC_ioctl fs/ioctl.c:700
  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
  entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:204
 RIP: 0033:0x7fc764b31b79
 RSP: 002b:00007ffc1d73aa58 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007ffc1d73abb0 RCX: 00007fc764b31b79
 RDX: 00000000209eafd8 RSI: 0000008000008990 RDI: 0000000000000003
 RBP: 00000000004004e0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
 R13: 00007ffc1d73abb0 R14: 0000000000000000 R15: 0000000000000000

Please see each patch for details.

Cong Wang (3):
  net: check mac address length for dev_set_mac_address()
  bonding: use a larger struct for mac address
  team: use a larger struct for mac address

 drivers/net/bonding/bond_alb.c     |  8 ++++----
 drivers/net/bonding/bond_main.c    | 17 +++++++++--------
 drivers/net/bonding/bonding_priv.h |  6 ++++++
 drivers/net/team/team.c            |  9 ++++++---
 include/linux/netdevice.h          |  2 +-
 net/core/dev.c                     | 10 +++++++---
 net/core/dev_ioctl.c               |  2 ++
 7 files changed, 35 insertions(+), 19 deletions(-)

-- 
2.5.5

^ permalink raw reply

* Multiple Txqs with One Qdisc
From: Michael Ma @ 2017-04-26  4:46 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Hi -

As I mentioned in a previous thread, we're implementing a qdisc
similar to mqprio which can associate multiple txqs to one qdisc, so
that we can work around the root qdisc bottleneck. I think this should
be in general fine since without MQ, root qdisc is associated with
multiple txqs anyway. However we encountered kernel panic as I
mentioned in the "corrupted skb" thread.

Any thought on potential issue of doing this? I'm asking this because
mqprio only associates one txq with one qdisc which I think must have
been done this way for a reason.

Would appreciate if anyone can shed some light on this.

Thanks,
Michael

^ permalink raw reply

* Re: Corrupted SKB
From: Michael Ma @ 2017-04-26  4:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, jin.oyj
In-Reply-To: <CAAmHdhy0N2VttXNXL+S+4G=4=mf4ihpW7KsNWUYpiOFXez3B7w@mail.gmail.com>

2017-04-18 21:46 GMT-07:00 Michael Ma <make0818@gmail.com>:
> 2017-04-18 16:12 GMT-07:00 Cong Wang <xiyou.wangcong@gmail.com>:
>> On Mon, Apr 17, 2017 at 5:39 PM, Michael Ma <make0818@gmail.com> wrote:
>>> Hi -
>>>
>>> We've implemented a "glue" qdisc similar to mqprio which can associate
>>> one qdisc to multiple txqs as the root qdisc. Reference count of the
>>> child qdiscs have been adjusted properly in this case so that it
>>> represents the number of txqs it has been attached to. However when
>>> sending packets we saw the skb from dequeue_skb() corrupted with the
>>> following call stack:
>>>
>>>     [exception RIP: netif_skb_features+51]
>>>     RIP: ffffffff815292b3  RSP: ffff8817f6987940  RFLAGS: 00010246
>>>
>>>  #9 [ffff8817f6987968] validate_xmit_skb at ffffffff815294aa
>>> #10 [ffff8817f69879a0] validate_xmit_skb at ffffffff8152a0d9
>>> #11 [ffff8817f69879b0] __qdisc_run at ffffffff8154a193
>>> #12 [ffff8817f6987a00] dev_queue_xmit at ffffffff81529e03
>>>
>>> It looks like the skb has already been released since its dev pointer
>>> field is invalid.
>>>
>>> Any clue on how this can be investigated further? My current thought
>>> is to add some instrumentation to the place where skb is released and
>>> analyze whether there is any race condition happening there. However
>>
>> Either dropwatch or perf could do the work to instrument kfree_skb().
>
> Thanks - will try it out.

I'm using perf to collect the callstack for kfree_skb and trying to
correlate that with the corrupted SKB address however when system
crashes the perf.data file is also corrupted - how can I view this
file in case the system crashes before perf exits?
>>
>>> by looking through the existing code I think the case where one root
>>> qdisc is associated with multiple txqs already exists (when mqprio is
>>> not used) so not sure why it won't work when we group txqs and assign
>>> each group a root qdisc. Any insight on this issue would be much
>>> appreciated!
>>
>> How do you implement ->attach()? How does it work with netdev_pick_tx()?
>
> attach() essentially grafts the default qdisc(pfifo) to each "txq
> group" represented by a TC class. For netdev_pick_txq() we use classid
> of the socket to select a class based on a "class id base" and the
> class to txq mapping defined together with this glue qdisc - it's
> pretty much the same as mqprio with the difference of mapping one
> class to multiple txqs and selecting the txq through a hash.

^ permalink raw reply

* (unknown), 
From: prasad padiyar @ 2017-04-26  3:57 UTC (permalink / raw)
  To: netdev

subscribe linux-netdev

^ permalink raw reply

* Re: llvm-objdump...
From: Alexei Starovoitov @ 2017-04-26  3:48 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170425.131337.2175176102097155053.davem@davemloft.net>

On 4/25/17 10:13 AM, David Miller wrote:
>
> I think there are some endianness issues ;-)
>
> davem@patience:~/src/GIT/net-next/tools/testing/selftests/bpf$ llvm-objdump -S x.o

nice host name ;)

> x.o:    file format ELF64-BPF
>
> Disassembly of section test1:
> process:
>        0:       b7 00 00 00 00 00 00 02         r0 = 33554432
>        1:       61 21 00 50 00 00 00 00         r1 = *(u32 *)(r2 + 20480)
>
> That first instruction should be "r0 = 2"

hmm. I haven't tested it on big endian.
When last time s390 folks tested samples/bpf with llvm we didn't even
have automatic -march=bpf in llvm, so they used -march=bpfeb.
There was no llvm-objdump support either.

llvm side does this:
tatic Triple::ArchType parseBPFArch(StringRef ArchName) {
   if (ArchName.equals("bpf")) {
     if (sys::IsLittleEndianHost)
       return Triple::bpfel;
     else
       return Triple::bpfeb;
   } else if (ArchName.equals("bpf_be") || ArchName.equals("bpfeb")) {
     return Triple::bpfeb;
   } else if (ArchName.equals("bpf_le") || ArchName.equals("bpfel")) {
     return Triple::bpfel;

It works for clang and for llvm.
I thought llvm-objdump should infer triple from elf file
and do the 'right thing'... hmm

could you please test it with -g and see whether dwarf is still
correct in .o ?
llvm-objdump -S should print original C code next to asm.
Hope bpf dwarf is not broken on big-endian...

^ permalink raw reply

* Re: more test_progs...
From: Alexei Starovoitov @ 2017-04-26  3:42 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, netdev
In-Reply-To: <20170425.125217.1962662516948420246.davem@davemloft.net>

On 4/25/17 9:52 AM, David Miller wrote:
>
> 10: (15) if r3 == 0xdd86 goto pc+9
>  R0=imm2,min_value=2,max_value=2 R1=pkt(id=0,off=0,r=14) R2=pkt_end R3=inv R4=pkt(id=0,off=14,r=14) R5=inv56 R10=fp
>
> Hmmm, endianness looks wrong here.  "-target bpf" defaults to the
> endianness of whatever cpu that llvm was built for, right?

yeah. so here the code comes from:
         } else if (eth->h_proto == _htons(ETH_P_IPV6)) {

and in the beginning of that .c file:
#define _htons __builtin_bswap16
^^^ that's the bug.
It should have been nop for sparc.
We cannot use htons() from system header, since it uses x86 inline asm.

>  R0=imm2,min_value=2,max_value=2 R1=inv56,min_value=6,max_value=6 R2=pkt_end R3=imm3,min_value=3,max_value=3 R4=pkt(id=1,off=14,r=34) R5=pkt(id=1,off=34,r=34) R10=fp
> 36: (07) r4 += 18
> 37: (2d) if r4 > r2 goto pc+5
>  R0=imm2,min_value=2,max_value=2 R1=inv56,min_value=6,max_value=6 R2=pkt_end R3=imm3,min_value=3,max_value=3 R4=pkt(id=1,off=32,r=34) R5=pkt(id=1,off=34,r=34) R10=fp
> 38: (b7) r0 = 0
> 39: (69) r1 = *(u16 *)(r4 +0)
> Unknown alignment. Only byte-sized access allowed in packet access.
>
> And this seems to load the urgent pointer as a u16 which is what the verifier rejects.
>
> Oh I see, this is guarded by CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

yes. exactly.
Your analysis is correct and offending 16-bit load is this C code:
                 if (tcp->urg_ptr == 123)

the parsing code before that line deals with variable length ip header:
                 tcp = (struct tcphdr *)((void *)(iph) + ihl_len);

and at that point verifier cannot track the alignment of the pointer
anymore. It knows 'iph' alignment, but as soon as some variable is added
to it cannot assume good alignment anymore and from there on it
allows >1 byte accesses only on HAVE_EFFICIENT_UNALIGNED_ACCESS
architectures.
That's the 'if' that you found:
  'if (reg->id && size != 1) {'

ref->id > 0 means that some variable was added to ptr_to_packet.

That sucks for sparc, of course. I don't have good suggestions for
workaround other than marking tcphdr as packed :(
 From code efficiency point of view it still will be faster than
LD_ABS insn.
Another idea is to try to track that ihl_len variable is also
somehow multiple of 4, but it will be quite complicated on
the verifier side in its existing architecture and maybe? worth
waiting until the verifier has proper dataflow analysis.

^ permalink raw reply

* Re: hmmm...
From: David Miller @ 2017-04-26  3:38 UTC (permalink / raw)
  To: ast; +Cc: netdev
In-Reply-To: <af3781d2-f8af-1c6e-b3a0-163d8c1de75a@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Tue, 25 Apr 2017 19:56:06 -0700

> On 4/25/17 6:52 PM, David Miller wrote:
>>
>> Alexei, I found something strange on my computer :-)
>>
>> [davem@localhost binutils]$ ./objdump -d x.o
> 
> No way! :) I thought it will take weeks!
> Ship it. Ship it. Ship it.
> Cannot wait to pull.
> This is awesome. Thanks a ton!

Relax, it is in a very raw state still. :)

> What is the mnemonic for 32-bit alu ?

It is of the form "xxx32".  Here is opcodes table below.

I think there are no formal mnenomics defined anywhere yet right?
So just tell me what adjustments you want to make.

#include "sysdep.h"
#include <stdio.h>
#include "opcode/bpf.h"

#define BPF_OPC_ALU64	0x07
#define BPF_OPC_DW	0x18
#define BPF_OPC_XADD	0xc0
#define BPF_OPC_MOV	0xb0
#define BPF_OPC_ARSH	0xc0
#define BPF_OPC_END	0xd0
#define BPF_OPC_TO_LE	0x00
#define BPF_OPC_TO_BE	0x08
#define BPF_OPC_JNE	0x50
#define BPF_OPC_JSGT	0x60
#define BPF_OPC_JSGE	0x70
#define BPF_OPC_CALL	0x80
#define BPF_OPC_EXIT	0x90

#define BPF_OPC_LD	0x00
#define BPF_OPC_LDX	0x01
#define BPF_OPC_ST	0x02
#define BPF_OPC_STX	0x03
#define BPF_OPC_ALU	0x04
#define BPF_OPC_JMP	0x05
#define BPF_OPC_RET	0x06
#define BPF_OPC_MISC	0x07

#define BPF_OPC_W	0x00
#define BPF_OPC_H	0x08
#define BPF_OPC_B	0x10

#define BPF_OPC_IMM	0x00
#define BPF_OPC_ABS	0x20
#define BPF_OPC_IND	0x40
#define BPF_OPC_MEM	0x60
#define BPF_OPC_LEL	0x80
#define BPF_OPC_MSH	0xa0

#define BPF_OPC_ADD	0x00
#define BPF_OPC_SUB	0x10
#define BPF_OPC_MUL	0x20
#define BPF_OPC_DIV	0x30
#define BPF_OPC_OR	0x40
#define BPF_OPC_AND	0x50
#define BPF_OPC_LSH	0x60
#define BPF_OPC_RSH	0x70
#define BPF_OPC_NEG	0x80
#define BPF_OPC_MOD	0x90
#define BPF_OPC_XOR	0xa0

#define BPF_OPC_JA	0x00
#define BPF_OPC_JEQ	0x10
#define BPF_OPC_JGT	0x20
#define BPF_OPC_JGE	0x30
#define BPF_OPC_JSET	0x40

#define BPF_OPC_K	0x00
#define BPF_OPC_X	0x08

const struct bpf_opcode bpf_opcodes[] = {
  { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
  { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_X,     "1,2" },
  { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },
  { "add",     BPF_OPC_ALU64 | BPF_OPC_ADD  | BPF_OPC_X,     "1,2" },
  { "sub32",   BPF_OPC_ALU   | BPF_OPC_SUB  | BPF_OPC_X,     "1,2" },
  { "sub",     BPF_OPC_ALU64 | BPF_OPC_SUB  | BPF_OPC_X,     "1,2" },
  { "and32",   BPF_OPC_ALU   | BPF_OPC_AND  | BPF_OPC_X,     "1,2" },
  { "and",     BPF_OPC_ALU64 | BPF_OPC_AND  | BPF_OPC_X,     "1,2" },
  { "or32",    BPF_OPC_ALU   | BPF_OPC_OR   | BPF_OPC_X,     "1,2" },
  { "or",      BPF_OPC_ALU64 | BPF_OPC_OR   | BPF_OPC_X,     "1,2" },
  { "xor32",   BPF_OPC_ALU   | BPF_OPC_XOR  | BPF_OPC_X,     "1,2" },
  { "xor",     BPF_OPC_ALU64 | BPF_OPC_XOR  | BPF_OPC_X,     "1,2" },
  { "mul32",   BPF_OPC_ALU   | BPF_OPC_MUL  | BPF_OPC_X,     "1,2" },
  { "mul",     BPF_OPC_ALU64 | BPF_OPC_MUL  | BPF_OPC_X,     "1,2" },
  { "div32",   BPF_OPC_ALU   | BPF_OPC_DIV  | BPF_OPC_X,     "1,2" },
  { "div",     BPF_OPC_ALU64 | BPF_OPC_DIV  | BPF_OPC_X,     "1,2" },
  { "mod32",   BPF_OPC_ALU   | BPF_OPC_MOD  | BPF_OPC_X,     "1,2" },
  { "mod",     BPF_OPC_ALU64 | BPF_OPC_MOD  | BPF_OPC_X,     "1,2" },
  { "lsh32",   BPF_OPC_ALU   | BPF_OPC_LSH  | BPF_OPC_X,     "1,2" },
  { "lsh",     BPF_OPC_ALU64 | BPF_OPC_LSH  | BPF_OPC_X,     "1,2" },
  { "rsh32",   BPF_OPC_ALU   | BPF_OPC_RSH  | BPF_OPC_X,     "1,2" },
  { "rsh",     BPF_OPC_ALU64 | BPF_OPC_RSH  | BPF_OPC_X,     "1,2" },
  { "arsh32",  BPF_OPC_ALU   | BPF_OPC_ARSH | BPF_OPC_X,     "1,2" },
  { "arsh",    BPF_OPC_ALU64 | BPF_OPC_ARSH | BPF_OPC_X,     "1,2" },
  { "neg32",   BPF_OPC_ALU   | BPF_OPC_NEG  | BPF_OPC_X,     "1" },
  { "neg",     BPF_OPC_ALU64 | BPF_OPC_NEG  | BPF_OPC_X,     "1" },
  { "endbe",   BPF_OPC_ALU   | BPF_OPC_END  | BPF_OPC_TO_BE, "1,i" },
  { "endle",   BPF_OPC_ALU   | BPF_OPC_END  | BPF_OPC_TO_LE, "1,i" },
  { "mov32",   BPF_OPC_ALU   | BPF_OPC_MOV  | BPF_OPC_K,     "1,i" },
  { "mov",     BPF_OPC_ALU64 | BPF_OPC_MOV  | BPF_OPC_K,     "1,i" },
  { "add32",   BPF_OPC_ALU   | BPF_OPC_ADD  | BPF_OPC_K,     "1,i" },
  { "add",     BPF_OPC_ALU64 | BPF_OPC_ADD  | BPF_OPC_K,     "1,i" },
  { "sub32",   BPF_OPC_ALU   | BPF_OPC_SUB  | BPF_OPC_K,     "1,i" },
  { "sub",     BPF_OPC_ALU64 | BPF_OPC_SUB  | BPF_OPC_K,     "1,i" },
  { "and32",   BPF_OPC_ALU   | BPF_OPC_AND  | BPF_OPC_K,     "1,i" },
  { "and",     BPF_OPC_ALU64 | BPF_OPC_AND  | BPF_OPC_K,     "1,i" },
  { "or32",    BPF_OPC_ALU   | BPF_OPC_XOR  | BPF_OPC_K,     "1,i" },
  { "or",      BPF_OPC_ALU64 | BPF_OPC_XOR  | BPF_OPC_K,     "1,i" },
  { "xor32",   BPF_OPC_ALU   | BPF_OPC_OR   | BPF_OPC_K,     "1,i" },
  { "xor",     BPF_OPC_ALU64 | BPF_OPC_OR   | BPF_OPC_K,     "1,i" },
  { "mul32",   BPF_OPC_ALU   | BPF_OPC_MUL  | BPF_OPC_K,     "1,i" },
  { "mul",     BPF_OPC_ALU64 | BPF_OPC_MUL  | BPF_OPC_K,     "1,i" },
  { "div32",   BPF_OPC_ALU   | BPF_OPC_DIV  | BPF_OPC_K,     "1,i" },
  { "div",     BPF_OPC_ALU64 | BPF_OPC_DIV  | BPF_OPC_K,     "1,i" },
  { "mod32",   BPF_OPC_ALU   | BPF_OPC_MOD  | BPF_OPC_K,     "1,i" },
  { "mod",     BPF_OPC_ALU64 | BPF_OPC_MOD  | BPF_OPC_K,     "1,i" },
  { "lsh32",   BPF_OPC_ALU   | BPF_OPC_LSH  | BPF_OPC_K,     "1,i" },
  { "lsh",     BPF_OPC_ALU64 | BPF_OPC_LSH  | BPF_OPC_K,     "1,i" },
  { "rsh32",   BPF_OPC_ALU   | BPF_OPC_RSH  | BPF_OPC_K,     "1,i" },
  { "rsh",     BPF_OPC_ALU64 | BPF_OPC_RSH  | BPF_OPC_K,     "1,i" },
  { "arsh32",  BPF_OPC_ALU   | BPF_OPC_ARSH | BPF_OPC_K,     "1,i" },
  { "arsh",    BPF_OPC_ALU64 | BPF_OPC_ARSH | BPF_OPC_K,     "1,i" },
  { "ja",      BPF_OPC_JMP   | BPF_OPC_JA,                   "L" },
  { "jeq",     BPF_OPC_JMP   | BPF_OPC_JEQ  | BPF_OPC_X,     "1,2,L" },
  { "jgt",     BPF_OPC_JMP   | BPF_OPC_JGT  | BPF_OPC_X,     "1,2,L" },
  { "jge",     BPF_OPC_JMP   | BPF_OPC_JGE  | BPF_OPC_X,     "1,2,L" },
  { "jne",     BPF_OPC_JMP   | BPF_OPC_JNE  | BPF_OPC_X,     "1,2,L" },
  { "jsgt",    BPF_OPC_JMP   | BPF_OPC_JSGT | BPF_OPC_X,     "1,2,L" },
  { "jsge",    BPF_OPC_JMP   | BPF_OPC_JSGE | BPF_OPC_X,     "1,2,L" },
  { "jset",    BPF_OPC_JMP   | BPF_OPC_JSET | BPF_OPC_X,     "1,2,L" },
  { "jeq",     BPF_OPC_JMP   | BPF_OPC_JEQ  | BPF_OPC_K,     "1,i,L" },
  { "jgt",     BPF_OPC_JMP   | BPF_OPC_JGT  | BPF_OPC_K,     "1,i,L" },
  { "jge",     BPF_OPC_JMP   | BPF_OPC_JGE  | BPF_OPC_K,     "1,i,L" },
  { "jne",     BPF_OPC_JMP   | BPF_OPC_JNE  | BPF_OPC_K,     "1,i,L" },
  { "jsgt",    BPF_OPC_JMP   | BPF_OPC_JSGT | BPF_OPC_K,     "1,i,L" },
  { "jsge",    BPF_OPC_JMP   | BPF_OPC_JSGE | BPF_OPC_K,     "1,i,L" },
  { "jset",    BPF_OPC_JMP   | BPF_OPC_JSET | BPF_OPC_K,     "1,i,L" },
  { "call",    BPF_OPC_JMP   | BPF_OPC_CALL,                 "C" },
  { "tailcall",BPF_OPC_JMP   | BPF_OPC_CALL | BPF_OPC_X,     "C" },
  { "exit",    BPF_OPC_JMP   | BPF_OPC_EXIT,                 "" },
  { "ldimm64", BPF_OPC_LD    | BPF_OPC_IMM  | BPF_OPC_DW,    "1,D" },
  { "ldxw",    BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_W,     "1,[2+O]" },
  { "ldxh",    BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_H,     "1,[2+O]" },
  { "ldxb",    BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_B,     "1,[2+O]" },
  { "ldxdw",   BPF_OPC_LDX   | BPF_OPC_MEM  | BPF_OPC_DW,    "1,[2+O]" },
  { "stw",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_W,     "[1+O],i" },
  { "sth",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_H,     "[1+O],i" },
  { "stb",     BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_B,     "[1+O],i" },
  { "stdw",    BPF_OPC_ST    | BPF_OPC_MEM  | BPF_OPC_DW,    "[1+O],i" },
  { "stw",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_W,     "[1+O],2" },
  { "sth",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_H,     "[1+O],2" },
  { "stb",     BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_B,     "[1+O],2" },
  { "stdw",    BPF_OPC_STX   | BPF_OPC_MEM  | BPF_OPC_DW,    "[1+O],2" },
  { "xaddw",   BPF_OPC_STX   | BPF_OPC_XADD | BPF_OPC_W,     "[1+O],2" },
  { "xadddw",  BPF_OPC_STX   | BPF_OPC_XADD | BPF_OPC_DW,    "[1+O],2" },
};
const int bpf_num_opcodes = ((sizeof bpf_opcodes)/(sizeof bpf_opcodes[0]));

^ permalink raw reply

* [net-next] net: update comment for netif_dormant() function
From: Zhang Shengju @ 2017-04-26  3:05 UTC (permalink / raw)
  To: netdev, davem

This patch updates the comment for netif_dormant() function to reflect
the intended usage.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5d5267f..b8c8e5b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3400,10 +3400,10 @@ static inline void netif_dormant_off(struct net_device *dev)
 }
 
 /**
- *	netif_dormant - test if carrier present
+ *	netif_dormant - test if device is dormant
  *	@dev: network device
  *
- * Check if carrier is present on device
+ * Check if device is dormant.
  */
 static inline bool netif_dormant(const struct net_device *dev)
 {
-- 
1.8.3.1

^ permalink raw reply related

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: John Fastabend @ 2017-04-26  3:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Andy Gospodarek, Daniel Borkmann,
	Alexei Starovoitov, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org
In-Reply-To: <20170426002610.eihwmz4knbmrolfw@ast-mbp.dhcp.thefacebook.com>

On 17-04-25 05:26 PM, Alexei Starovoitov wrote:
> On Tue, Apr 25, 2017 at 11:34:53AM +0200, Jesper Dangaard Brouer wrote:
>>> Note the very first bpf patchset years ago contained the port table
>>> abstraction. ovs has concept of vports as well. These two very
>>> different projects needed port table to provide a layer of
>>> indirection between ifindex==netdev and virtual port number.
>>> This is still the case and I'd like to see this port table to be
>>> implemented for both cls_bpf and xdp. In that sense xdp is not
>>> special.
>>
>> Glad to hear you want to see this implemented, I will start coding on
>> this then.  Good point with cls_bpf, I was planning to make this port
>> table strongly connected to XDP, guess I should also think of cls_bpf.
> 
> perfect.
> I think we should try to make all additions to bpf networking world
> to be usable for both tc and xdp, since both are actively used and
> it wouldn't be great to have cool feature for one, but not the other.
> I think port table is an excellent candidate that applies to both.

+1

Jesper, I was working up the code for the redirect piece for ixgbe and
virtio, please use this as a base for your virtual port number table. I'll
push an update onto github tomorrow. I think the table should drop in fairly
nicely.

One piece that isn't clear to me is how do you plan to instantiate and
program this table. Is it a new static bpf map that is created any time we see
the redirect command? I think this would be preferred.

> 
>> I'm not worried about the DROP case, I agree that is fine (as you also
>> say).  The problem is unintentionally sending a packet to a wrong
>> ifindex.  This is clearly an eBPF program error, BUT with XDP this
>> becomes a very hard to debug program error.  With TC-redirect/cls_bpf
>> we can tcpdump the packets, with XDP there is no visibility into this
>> happening (the NSA is going to love this "feature").  Maybe we could add
>> yet-another tracepoint to allow debugging this.  My proposal that we
>> simply remove the possibility for such program errors, by as you say
>> move the validation from run-time into static insertion-time, via a
>> port table.
> 
> I think lack of tcpdump-like debugging in xdp is a separate issue.
> As I was saying in the other thread we have trivial 'xdpdump' kern+user
> app that emits pcap file, but it's too specific to how we use
> tail_calls+prog_array in our xdp setup. I'm working on the program
> chaining that will be generic and allow us transparently add multiple
> xdp or tc progs to the same attachment point and will allow us to
> do 'xdpdump' at any point of this pipeline, so debugging of what
> happened to the packet will be easier and done in the same way
> for both tc and xdp.
> btw in our experience working with both tc and xdp the tc+bpf was
> actually harder to use and more bug prone.
> 

Nice, the tcpdump-like debugging looks interesting.

^ permalink raw reply

* Re: [PATCH net v2 0/3] net: hns: bug fix for HNS driver
From: lipeng (Y) @ 2017-04-26  3:05 UTC (permalink / raw)
  To: David Miller, yankejian
  Cc: salil.mehta, yisen.zhuang, huangdaode, zhouhuiru, netdev,
	charles.chenxin, linuxarm
In-Reply-To: <20170424.122448.1775404720510848518.davem@davemloft.net>



On 2017/4/25 0:24, David Miller wrote:
> From: Yankejian <yankejian@huawei.com>
> Date: Fri, 21 Apr 2017 15:44:41 +0800
>
>> From: lipeng <lipeng321@huawei.com>
>>
>> This series adds support defered probe when mdio or mbigen module
>> insmod behind HNS driver, and fixes a bug that a skb has been
>> freed, but it may be still used in driver.
>>
>> change log:
>> V1 -> V2:
>> 1. Return appropriate errno in hns_mac_register_phy;
> At a minimum you're going to have to expand your commit message in
> patch #1 so that it has more detail and explains the situation better.
> .
OK, will add more detail and explains for #1 patch.

thanks

^ permalink raw reply

* Re: hmmm...
From: Alexei Starovoitov @ 2017-04-26  2:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20170425.215232.727894789143301373.davem@davemloft.net>

On 4/25/17 6:52 PM, David Miller wrote:
>
> Alexei, I found something strange on my computer :-)
>
> [davem@localhost binutils]$ ./objdump -d x.o

No way! :) I thought it will take weeks!
Ship it. Ship it. Ship it.
Cannot wait to pull.
This is awesome. Thanks a ton!

What is the mnemonic for 32-bit alu ?

^ permalink raw reply

* Re: Bug and configuration MPLS error?
From: David Ahern @ 2017-04-26  2:22 UTC (permalink / raw)
  To: Алексей Болдырев,
	netdev
In-Reply-To: <184081493141314@web16g.yandex.ru>

On 4/25/17 11:28 AM, Алексей Болдырев wrote:
> 226 sysctl -w net.mpls.conf.lo.input=1
> 227 sysctl -w net.mpls.platform_labels=1048575
> 228 ip link add veth0 type veth peer name veth1
> 229 ip link add veth2 type veth peer name veth3
> 230 sysctl -w net.mpls.conf.veth0.input=1
> 231 sysctl -w net.mpls.conf.veth2.input=1
> 232 ifconfig veth0 10.3.3.1 netmask 255.255.255.0
> 233 ifconfig veth2 10.4.4.1 netmask 255.255.255.0
> 234 ip netns add host1
> 235 ip netns add host2
> 236 ip link set veth1 netns host1
> 237 ip link set veth3 netns host2
> 238 ip netns exec host1 ifconfig veth1 10.3.3.2 netmask 255.255.255.0 up
> 239 ip netns exec host2 ifconfig veth3 10.4.4.2 netmask 255.255.255.0 up
> 240 ip netns exec host1 ip route add 10.10.10.2/32 encap mpls 112 via inet 10.3.3.1
> 241 ip netns exec host2 ip route add 10.10.10.1/32 encap mpls 111 via inet 10.4.4.1
> 242 ip -f mpls route add 111 via inet 10.3.3.2
> 243 ip -f mpls route add 112 via inet 10.4.4.2

your setup is incomplete.

# ip netns exec host2 ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
^C
--- 10.10.10.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1038ms


If you run tcpdump on veth1 in host1 you see the packets come in but no
response:

# ip netns exec host1 tcpdump -n -i veth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth1, link-type EN10MB (Ethernet), capture size 262144 bytes
19:20:24.599529 IP6 fe80::347d:e3ff:fe93:944b > ff02::2: ICMP6, router
solicitation, length 16
19:20:27.413901 IP 10.4.4.2 > 10.10.10.1: ICMP echo request, id 978, seq
1, length 64
19:20:28.439574 IP 10.4.4.2 > 10.10.10.1: ICMP echo request, id 978, seq
2, length 64

and the lack of response is b/c:
1. host1 has no address for 10.10.10.1 and
2. even if it did, there is no return route to 10.4.4.2:

# ip -netns host1 ro ls
10.3.3.0/24 dev veth1 proto kernel scope link src 10.3.3.2
10.10.10.2  encap mpls  112 via 10.3.3.1 dev veth1

^ permalink raw reply

* hmmm...
From: David Miller @ 2017-04-26  1:52 UTC (permalink / raw)
  To: ast; +Cc: netdev


Alexei, I found something strange on my computer :-)

[davem@localhost binutils]$ ./objdump -d x.o

x.o:     file format elf64-bpf


Disassembly of section test1:

0000000000000000 <process>:
   0:	b7 00 00 00 00 00 00 02 	mov	r0, 2
   8:	61 21 00 50 00 00 00 00 	ldxw	r2, [r1+80]
  10:	61 11 00 4c 00 00 00 00 	ldxw	r1, [r1+76]
  18:	bf 41 00 00 00 00 00 00 	mov	r4, r1
  20:	07 40 00 00 00 00 00 0e 	add	r4, 14
  28:	2d 42 00 25 00 00 00 00 	jgt	r4, r2, 148 <LBB0_11>
  30:	71 51 00 0d 00 00 00 00 	ldxb	r5, [r1+13]
  38:	71 31 00 0c 00 00 00 00 	ldxb	r3, [r1+12]
  40:	67 30 00 00 00 00 00 08 	lsh	r3, 8
  48:	4f 35 00 00 00 00 00 00 	or	r3, r5
  50:	15 30 00 09 00 00 dd 86 	jeq	r3, 56710, 90 <process+0x90>
  58:	55 30 00 1d 00 00 00 08 	jne	r3, 8, 138 <LBB0_6+0x70>
  60:	bf 31 00 00 00 00 00 00 	mov	r3, r1
  68:	07 30 00 00 00 00 00 22 	add	r3, 34
  70:	2d 32 00 1c 00 00 00 00 	jgt	r3, r2, 148 <LBB0_11>
  78:	b7 30 00 00 00 00 00 03 	mov	r3, 3
  80:	71 54 00 00 00 00 00 00 	ldxb	r5, [r4+0]
  88:	67 50 00 00 00 00 00 02 	lsh	r5, 2
  90:	57 50 00 00 00 00 00 3c 	and	r5, 60
  98:	05 00 00 05 00 00 00 00 	ja	b8 <LBB0_5+0x18>

00000000000000a0 <LBB0_5>:
  a0:	b7 50 00 00 00 00 00 28 	mov	r5, 40
  a8:	b7 30 00 00 00 00 00 00 	mov	r3, 0
  b0:	bf 41 00 00 00 00 00 00 	mov	r4, r1
  b8:	07 40 00 00 00 00 00 36 	add	r4, 54
  c0:	2d 42 00 12 00 00 00 00 	jgt	r4, r2, 148 <LBB0_11>

00000000000000c8 <LBB0_6>:
  c8:	bf 41 00 00 00 00 00 00 	mov	r4, r1
  d0:	0f 45 00 00 00 00 00 00 	add	r4, r5
  d8:	07 40 00 00 00 00 00 0e 	add	r4, 14
  e0:	15 40 00 0c 00 00 00 00 	jeq	r4, 0, 138 <LBB0_6+0x70>
  e8:	bf 54 00 00 00 00 00 00 	mov	r5, r4
  f0:	07 50 00 00 00 00 00 14 	add	r5, 20
  f8:	2d 52 00 0b 00 00 00 00 	jgt	r5, r2, 148 <LBB0_11>
 100:	0f 13 00 00 00 00 00 00 	add	r1, r3
 108:	71 11 00 14 00 00 00 00 	ldxb	r1, [r1+20]
 110:	57 10 00 00 00 00 00 ff 	and	r1, 255
 118:	55 10 00 07 00 00 00 06 	jne	r1, 6, 148 <LBB0_11>
 120:	07 40 00 00 00 00 00 12 	add	r4, 18
 128:	2d 42 00 05 00 00 00 00 	jgt	r4, r2, 148 <LBB0_11>
 130:	b7 00 00 00 00 00 00 00 	mov	r0, 0
 138:	69 14 00 00 00 00 00 00 	ldxh	r1, [r4+0]
 140:	15 10 00 02 00 00 00 7b 	jeq	r1, 123, 148 <LBB0_11>

0000000000000148 <LBB0_11>:
 148:	18 00 00 00 ff ff ff ff 	ldimm64	r0, 4294967295
 150:	00 00 00 00 00 00 00 00 

0000000000000158 <LBB0_12>:
 158:	95 00 00 00 00 00 00 00 	exit	
[davem@localhost binutils]$ 

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the kbuild tree
From: Stephen Rothwell @ 2017-04-26  1:09 UTC (permalink / raw)
  To: David Miller, Networking, Masahiro Yamada
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Nicolas Dichtel, Gerard Garcia

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  include/uapi/linux/Kbuild

between commit:

  65017bab8a9e ("uapi: export all headers under uapi directories")

from the kbuild tree and commit:

  0b2e66448ba2 ("VSOCK: Add vsockmon device")

from the net-next tree.

I fixed it up (I just used the kbuild tree version as new entries are not
needed any more in this file) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox