Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: David Miller @ 2017-04-24 14:41 UTC (permalink / raw)
  To: jslaby
  Cc: alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe, linux-kernel,
	netdev, daniel, edumazet
In-Reply-To: <697947f4-0a2c-1480-0995-9919556dc020@suse.cz>

From: Jiri Slaby <jslaby@suse.cz>
Date: Mon, 24 Apr 2017 08:45:11 +0200

> On 04/21/2017, 09:32 PM, Alexei Starovoitov wrote:
>> On Fri, Apr 21, 2017 at 04:12:43PM +0200, Jiri Slaby wrote:
>>> Do not use a custom macro FUNC for starts of the global functions, use
>>> ENTRY instead.
>>>
>>> And while at it, annotate also ends of the functions by ENDPROC.
>>>
>>> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
>>> Cc: James Morris <jmorris@namei.org>
>>> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
>>> Cc: Patrick McHardy <kaber@trash.net>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: x86@kernel.org
>>> Cc: netdev@vger.kernel.org
>>> ---
>>>  arch/x86/net/bpf_jit.S | 32 ++++++++++++++++++--------------
>>>  1 file changed, 18 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/arch/x86/net/bpf_jit.S b/arch/x86/net/bpf_jit.S
>>> index f2a7faf4706e..762c29fb8832 100644
>>> --- a/arch/x86/net/bpf_jit.S
>>> +++ b/arch/x86/net/bpf_jit.S
>>> @@ -23,16 +23,12 @@
>>>  	32 /* space for rbx,r13,r14,r15 */ + \
>>>  	8 /* space for skb_copy_bits */)
>>>  
>>> -#define FUNC(name) \
>>> -	.globl name; \
>>> -	.type name, @function; \
>>> -	name:
>>> -
>>> -FUNC(sk_load_word)
>>> +ENTRY(sk_load_word)
>>>  	test	%esi,%esi
>>>  	js	bpf_slow_path_word_neg
>>> +ENDPROC(sk_load_word)
>> 
>> this doens't look right.
>> It will add alignment nops in critical paths of these pseudo functions.
>> I'm also not sure whether it will still work afterwards.
>> Was it tested?
>> I'd prefer if this code kept as-is.
> 
> It cannot stay as-is simply because we want to know where the functions
> end to inject debuginfo properly. The code above does not warrant for
> any exception.

I totally and completely disagree.

> Executing a nop takes a little and having externally-callable functions
> aligned can actually help performance (no, I haven't measured nor tested
> the code). But sure, the tool is generic, so I can introduce a local
> macros to avoid alignments in the functions:

Not for this case, it's a bunch of entry points all packed together
intentionally so that SKB accesses of different access sizes (which is
almost always the case) from BPF programs use the smallest amount of
I-cache as possible.

^ permalink raw reply

* Re: [PATCH net-next 3/3] samples/bpf: check before defining offsetof
From: Daniel Borkmann @ 2017-04-24 14:41 UTC (permalink / raw)
  To: Alexander Alemayhu, netdev; +Cc: ast
In-Reply-To: <20170424133108.31595-4-alexander@alemayhu.com>

On 04/24/2017 03:31 PM, Alexander Alemayhu wrote:
> Fixes the following warning
>
> samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined
>   #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
>
> In file included from ./tools/lib/bpf/bpf.h:25:0,
>                   from samples/bpf/libbpf.h:5,
>                   from samples/bpf/test_lru_dist.c:24:
> /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition
>   #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
>
> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Xin Long @ 2017-04-24 14:41 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: network dev, David S. Miller, Stephen Hemminger,
	bridge@lists.linux-foundation.org
In-Reply-To: <6144408f-4cfd-dd87-7444-8c61269b105a@cumulusnetworks.com>

On Mon, Apr 24, 2017 at 8:07 PM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> On 24/04/17 14:01, Nikolay Aleksandrov wrote:
>> On 24/04/17 10:25, Xin Long wrote:
>>> During removing a bridge device, if the bridge is still up, a new mdb entry
>>> still can be added in br_multicast_add_group() after all mdb entries are
>>> removed in br_multicast_dev_del(). Like the path:
>>>
>>>   mld_ifc_timer_expire ->
>>>     mld_sendpack -> ...
>>>       br_multicast_rcv ->
>>>         br_multicast_add_group
>>>
>>> The new mp's timer will be set up. If the timer expires after the bridge
>>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>>> This can happen when ip link remove a bridge or destroy a netns with a
>>> bridge device inside.
>>>
>>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>>> device after it's shutdown.
>>>
>>> This patch is to call dev_close at the beginning of br_dev_delete so that
>>> netif_running check in br_multicast_add_group can avoid this issue. But
>>> to keep consistent with before, it will not remove the IFF_UP check in
>>> br_del_bridge for brctl.
>>>
>>> Reported-by: Jianwen Ji <jiji@redhat.com>
>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>> ---
>>>  net/bridge/br_if.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>
>> +CC bridge maintainers
>>
>> I can see how this could happen, could you also provide the traceback ?
>>
>> The patch looks good to me, actually I think it fixes another issue with
>> mcast stats where the percpu pointer can be accessed after it's freed if
>> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
>> This is definitely stable material, if I'm not mistaken the issue is there since
>> the introduction of br_dev_delete:
>> commit e10177abf842
>> Author: Satish Ashok <sashok@cumulusnetworks.com>
>> Date:   Wed Jul 15 07:16:51 2015 -0700
>>
>>     bridge: multicast: fix handling of temp and perm entries
>>
>>
>>
>> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>
>
> Actually I have a better idea for a fix because dev_close() for a single device is rather heavy.
> Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ?
> That should have the same effect and be much faster.
Yes. But it seems that all cleanups for bridge should be done after
it's shutdown since beginning according to brctl. I'm not sure if there
are still other problems caused by this. maybe safer to use dev_close.
I need to check more to confirm this.

I also have another question about mp->timer removing.
As we can see, now it removes this timer with del_timer, instead of
del_timer_sync. What if the timer is running when del_timer ?
How can we be sure that br_multicast_group_expired will be done
before the bridge dev is freed. synchronize_net ?

>
> By the way I just noticed that there's also a memory leak - the mdb hash is reallocated
> and not freed due to the mdb rehash, here's also kmemleak's object:
>
yeps, ;-)

> unreferenced object 0xffff8800540ba800 (size 2048):
>   comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
>     [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
>     [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
>     [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
>     [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
>     [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
>     [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
>     [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
>     [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
>     [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
>     [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
>     [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
>     [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
>     [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
>     [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
>     [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
>
>
>

^ permalink raw reply

* Re: [PATCH net-next 2/3] samples/bpf: add static to function with no prototype
From: Daniel Borkmann @ 2017-04-24 14:40 UTC (permalink / raw)
  To: Alexander Alemayhu, netdev; +Cc: ast
In-Reply-To: <20170424133108.31595-3-alexander@alemayhu.com>

On 04/24/2017 03:31 PM, Alexander Alemayhu wrote:
> Fixes the following warning
>
> samples/bpf/cookie_uid_helper_example.c: At top level:
> samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
>   void finish(int ret)
>        ^~~~~~
>    HOSTLD  samples/bpf/per_socket_stats_example
>
> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next 1/3] samples/bpf: add -Wno-unknown-warning-option to clang
From: Daniel Borkmann @ 2017-04-24 14:37 UTC (permalink / raw)
  To: Alexander Alemayhu, netdev; +Cc: ast
In-Reply-To: <20170424133108.31595-2-alexander@alemayhu.com>

On 04/24/2017 03:31 PM, Alexander Alemayhu wrote:
> I was initially going to remove '-Wno-address-of-packed-member' because I
> thought it was not supposed to be there but Daniel suggested using
> '-Wno-unknown-warning-option'.
>
> This silences several warnings similiar to the one below
>
> warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
> 1 warning generated.

Yeah, that feature seems fairly new (Feb 2017 accepted if I
see this correctly): https://reviews.llvm.org/D20561

Given the -Wno-address-of-packed-member was there to silence
warnings in the first place, we should also silence warnings
when -Wno-address-of-packed-member is unknown to clang/llvm.

[...]
>
> Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it
From: Christoph Hellwig @ 2017-04-24 14:35 UTC (permalink / raw)
  To: Byczkowski, Jakub
  Cc: Christoph Hellwig, Bjorn Helgaas, Cabiddu, Giovanni,
	Benedetto, Salvatore, Marciniszyn, Mike, Dalessandro, Dennis,
	Derek Chickles, Satanand Burla, Felix Manlunas, Raghu Vatsavayi,
	Kirsher, Jeffrey T, linux-pci@vger.kernel.org, qat-linux,
	linux-crypto@vger.kernel.org, linux-rdma@vger.kernel.org,
	"netdev@vger.kernel.or
In-Reply-To: <A33E225544E47D4DB5713032774B132783960382@irsmsx105.ger.corp.intel.com>

On Mon, Apr 24, 2017 at 02:16:31PM +0000, Byczkowski, Jakub wrote:
> Tested-by: Jakub Byczkowski <jakub.byczkowski@intel.com>

Are you (and Doug) ok with queueing this up in the PCI tree?

^ permalink raw reply

* Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
From: Jesper Dangaard Brouer @ 2017-04-24 14:24 UTC (permalink / raw)
  To: David Miller, xdp-newbies; +Cc: netdev, brouer, Andy Gospodarek
In-Reply-To: <20170413.120925.2082322246776478766.davem@davemloft.net>


I've done a very detailed evaluation of this patch, and I've created a
blogpost like report here:

 https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html

I didn't evaluate the adjust_head part, so I hope Andy is still
planning to validate that part?

--Jesper



On Thu, 13 Apr 2017 12:09:25 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> This provides a generic SKB based non-optimized XDP path which is used
> if either the driver lacks a specific XDP implementation, or the user
> requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.
> 
> It is arguable that perhaps I should have required something like
> this as part of the initial XDP feature merge.
> 
> I believe this is critical for two reasons:
> 
> 1) Accessibility.  More people can play with XDP with less
>    dependencies.  Yes I know we have XDP support in virtio_net, but
>    that just creates another depedency for learning how to use this
>    facility.
> 
>    I wrote this to make life easier for the XDP newbies.
> 
> 2) As a model for what the expected semantics are.  If there is a pure
>    generic core implementation, it serves as a semantic example for
>    driver folks adding XDP support.
> 
> This is just a rough draft and is untested.



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Pablo Neira Ayuso @ 2017-04-24 14:20 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Simon Horman, Jiri Pirko, davem, xiyou.wangcong, eric.dumazet,
	netdev, Tom Herbert
In-Reply-To: <2ca4266f-a164-d2ad-37fa-45f7ae354eb8@mojatatu.com>

On Mon, Apr 24, 2017 at 08:49:00AM -0400, Jamal Hadi Salim wrote:
> On 17-04-24 05:14 AM, Simon Horman wrote:
> [..]
> 
> >Jamal, I am confused about why are you so concerned about the space
> >consumed by this attribute, it's per-message, right? Is it the bigger
> >picture you are worried about - a similar per-entry flag at some point in
> >the future?
> 
> 
> To me the two worries are one and the same.
> 
> Jiri strongly believes (from a big picture view) we must use
> TLVs for extensibility.
> While I agree with him in general i have strong reservations
> in this case because i can get both extensibility and
> build for performance with using a flag bitmask as the
> content of the TLV.
> 
> A TLV consumes 64 bits minimum. It doesnt matter if we decide
> to use a u8 or a u16, we are still sending 64 bits on that
> TLV with the rest being PADding. Not to be melodramatic, but
> the worst case scenario of putting everything in a TLV for 32
> flags is using about 30x more space than using a bitmask.
> 
> Yes, space is important and if i can express upto 32 flags
> with one TLV rather than 32 TLVs i choose one TLV.
> I am always looking for ways to filter out crap i dont need
> when i do stats collection. I have numerous wounds from fdb
> entries which decided to use a TLV per flag.
> 
> The design approach we have used in netlink is: flags start
> as a bitmap (whether they are on main headers or TLVs); they may be
> complemented with a bitmask/selector (refer to IFLINK messages).
> 
> Lets look at this specific patch I have sending. I have already
> changed it 3 times and involved a churn of 3 different flags.
> If you asked me in the beggining i wouldve scratched my head
> thinking for a near term use for bit #3, #4 etc,
> 
> I am fine with the counter-Postel view of having the kernel
> validate that appropriate bits are set as long as we dont make
> user space to now start learning how to play acrobatics.

jamal, what performance concern you have in building this error
message? TLVs is the most flexible way. And this is error path, so we
should build this message rarely, only if the user sends us something
incorrect, why bother...

^ permalink raw reply

* RE: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it
From: Byczkowski, Jakub @ 2017-04-24 14:16 UTC (permalink / raw)
  To: Christoph Hellwig, Bjorn Helgaas, Cabiddu, Giovanni,
	Benedetto, Salvatore, Marciniszyn, Mike, Dalessandro, Dennis,
	Derek Chickles, Satanand Burla, Felix Manlunas, Raghu Vatsavayi,
	Kirsher, Jeffrey T
  Cc: linux-pci@vger.kernel.org, qat-linux,
	linux-crypto@vger.kernel.org, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20170414191131.14286-6-hch@lst.de>

Tested-by: Jakub Byczkowski <jakub.byczkowski@intel.com>

-----Original Message-----
From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Christoph Hellwig
Sent: Friday, April 14, 2017 9:11 PM
To: Bjorn Helgaas <bhelgaas@google.com>; Cabiddu, Giovanni <giovanni.cabiddu@intel.com>; Benedetto, Salvatore <salvatore.benedetto@intel.com>; Marciniszyn, Mike <mike.marciniszyn@intel.com>; Dalessandro, Dennis <dennis.dalessandro@intel.com>; Derek Chickles <derek.chickles@caviumnetworks.com>; Satanand Burla <satananda.burla@caviumnetworks.com>; Felix Manlunas <felix.manlunas@caviumnetworks.com>; Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
Cc: linux-pci@vger.kernel.org; qat-linux <qat-linux@intel.com>; linux-crypto@vger.kernel.org; linux-rdma@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/hw/hfi1/chip.c |  4 ++--  drivers/infiniband/hw/hfi1/hfi.h  |  1 -  drivers/infiniband/hw/hfi1/pcie.c | 30 ------------------------------
 3 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c
index 121a4c920f1b..d037f72e4d96 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -13610,14 +13610,14 @@ static void init_chip(struct hfi1_devdata *dd)
 		dd_dev_info(dd, "Resetting CSRs with FLR\n");
 
 		/* do the FLR, the DC reset will remain */
-		hfi1_pcie_flr(dd);
+		pcie_flr(dd->pcidev);
 
 		/* restore command and BARs */
 		restore_pci_variables(dd);
 
 		if (is_ax(dd)) {
 			dd_dev_info(dd, "Resetting CSRs with FLR\n");
-			hfi1_pcie_flr(dd);
+			pcie_flr(dd->pcidev);
 			restore_pci_variables(dd);
 		}
 	} else {
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 0808e3c3ba39..40d7559fa723 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1764,7 +1764,6 @@ int hfi1_pcie_init(struct pci_dev *, const struct pci_device_id *);  void hfi1_pcie_cleanup(struct pci_dev *);  int hfi1_pcie_ddinit(struct hfi1_devdata *, struct pci_dev *);  void hfi1_pcie_ddcleanup(struct hfi1_devdata *); -void hfi1_pcie_flr(struct hfi1_devdata *);  int pcie_speeds(struct hfi1_devdata *);  void request_msix(struct hfi1_devdata *, u32 *, struct hfi1_msix_entry *);  void hfi1_enable_intx(struct pci_dev *); diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
index 0829fce06172..c81556e84831 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -240,36 +240,6 @@ void hfi1_pcie_ddcleanup(struct hfi1_devdata *dd)
 		iounmap(dd->piobase);
 }
 
-/*
- * Do a Function Level Reset (FLR) on the device.
- * Based on static function drivers/pci/pci.c:pcie_flr().
- */
-void hfi1_pcie_flr(struct hfi1_devdata *dd) -{
-	int i;
-	u16 status;
-
-	/* no need to check for the capability - we know the device has it */
-
-	/* wait for Transaction Pending bit to clear, at most a few ms */
-	for (i = 0; i < 4; i++) {
-		if (i)
-			msleep((1 << (i - 1)) * 100);
-
-		pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVSTA, &status);
-		if (!(status & PCI_EXP_DEVSTA_TRPND))
-			goto clear;
-	}
-
-	dd_dev_err(dd, "Transaction Pending bit is not clearing, proceeding with reset anyway\n");
-
-clear:
-	pcie_capability_set_word(dd->pcidev, PCI_EXP_DEVCTL,
-				 PCI_EXP_DEVCTL_BCR_FLR);
-	/* PCIe spec requires the function to be back within 100ms */
-	msleep(100);
-}
-
 static void msix_setup(struct hfi1_devdata *dd, int pos, u32 *msixcnt,
 		       struct hfi1_msix_entry *hfi1_msix_entry)  {
--
2.11.0

^ permalink raw reply related

* [PATCH] net: bridge: suppress broadcast when multicast flood is disabled
From: Mike Manning @ 2017-04-24 14:09 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov

Flood suppression for packets that are not unicast needs to be handled
consistently by also not flooding broadcast packets. As broadcast is a
special case of multicast, the same kernel parameter should be used to
suppress flooding for both of these packet types.

Fixes: b6cb5ac8331b ("net: bridge: add per-port multicast flood flag")
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Mike Manning <mmanning@brocade.com>
---
 net/bridge/br_forward.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 902af6b..a61c7ad 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -183,13 +183,16 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	struct net_bridge_port *p;
 
 	list_for_each_entry_rcu(p, &br->port_list, list) {
-		/* Do not flood unicast traffic to ports that turn it off */
-		if (pkt_type == BR_PKT_UNICAST && !(p->flags & BR_FLOOD))
-			continue;
-		/* Do not flood if mc off, except for traffic we originate */
-		if (pkt_type == BR_PKT_MULTICAST &&
-		    !(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
-			continue;
+		/* Do not flood unicast traffic to ports that turn it off, nor
+		 * other traffic if mc flood off except for traffic we originate
+		 */
+		if (pkt_type == BR_PKT_UNICAST) {
+			if (!(p->flags & BR_FLOOD))
+				continue;
+		} else {
+			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
+				continue;
+		}
 
 		/* Do not flood to ports that enable proxy ARP */
 		if (p->flags & BR_PROXYARP)
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v2 net] net: ipv6: regenerate host route if moved to gc list
From: Andrey Konovalov @ 2017-04-24 14:03 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, Dmitry Vyukov, mmanning
In-Reply-To: <1492879237-31566-1-git-send-email-dsa@cumulusnetworks.com>

On Sat, Apr 22, 2017 at 6:40 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> Taking down the loopback device wreaks havoc on IPv6 routes. By
> extension, taking a VRF device wreaks havoc on its table.
>
> Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
> FIB code while running syzkaller fuzzer. The root cause is a dead dst
> that is on the garbage list gets reinserted into the IPv6 FIB. While on
> the gc (or perhaps when it gets added to the gc list) the dst->next is
> set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
> out-of-bounds access.
>
> Andrey's reproducer was the key to getting to the bottom of this.
>
> With IPv6, host routes for an address have the dst->dev set to the
> loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
> a walk of the fib evicting routes with the 'lo' device which means all
> host routes are removed. That process moves the dst which is attached to
> an inet6_ifaddr to the gc list and marks it as dead.
>
> The recent change to keep global IPv6 addresses added a new function
> fixup_permanent_addr that is called on admin up. That function restarts
> dad for an inet6_ifaddr and when it completes the host route attached
> to it is inserted into the fib. Since the route was marked dead and
> moved to the gc list, we get the reported out-of-bounds accesses. If
> the device with the address is taken down or the address is removed, the
> WARN_ON in fib6_del is triggered.
>
> All of those faults are fixed by regenerating the host route of the
> existing one has been moved to the gc list, something that can be
> determined by checking if the rt6i_ref counter is 0.
>
> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> v2
> - change ifp->rt under spinlock vs cmpxchg
> - add comment about rt6i_ref == 0
>
> Dmitry / Andrey: can you guys add this patch to your tree and run
> syzkaller tests? I'd like to confirm that all of the fib traces
> are fixed. Thanks.

Tried fuzzing with your patch, I don't see any fib6 related reports
any more, so it should be good. I'll let you know if I see some
possibly related crashes.

Thanks!

>
>  net/ipv6/addrconf.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 08f9e8ea7a81..97e86158bbcb 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
>  static int fixup_permanent_addr(struct inet6_dev *idev,
>                                 struct inet6_ifaddr *ifp)
>  {
> -       if (!ifp->rt) {
> -               struct rt6_info *rt;
> +       /* rt6i_ref == 0 means the host route was removed from the
> +        * FIB, for example, if 'lo' device is taken down. In that
> +        * case regenerate the host route.
> +        */
> +       if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> +               struct rt6_info *rt, *prev;
>
>                 rt = addrconf_dst_alloc(idev, &ifp->addr, false);
>                 if (unlikely(IS_ERR(rt)))
>                         return PTR_ERR(rt);
>
> +               spin_lock(&ifp->lock);
> +               prev = ifp->rt;
>                 ifp->rt = rt;
> +               spin_unlock(&ifp->lock);
> +
> +               if (prev)
> +                       ip6_rt_put(prev);
>         }
>
>         if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
> --
> 2.1.4
>

^ permalink raw reply

* [PATCH net-next 3/3] samples/bpf: check before defining offsetof
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast
In-Reply-To: <20170424133108.31595-1-alexander@alemayhu.com>

Fixes the following warning

samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined
 #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)

In file included from ./tools/lib/bpf/bpf.h:25:0,
                 from samples/bpf/libbpf.h:5,
                 from samples/bpf/test_lru_dist.c:24:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition
 #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
---
 samples/bpf/test_lru_dist.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index d96dc88d3b04..73c357142268 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -25,7 +25,9 @@
 #include "bpf_util.h"
 
 #define min(a, b) ((a) < (b) ? (a) : (b))
-#define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
+#ifndef offsetof
+# define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
+#endif
 #define container_of(ptr, type, member) ({			\
 	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
 	(type *)( (char *)__mptr - offsetof(type,member) );})
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 2/3] samples/bpf: add static to function with no prototype
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast
In-Reply-To: <20170424133108.31595-1-alexander@alemayhu.com>

Fixes the following warning

samples/bpf/cookie_uid_helper_example.c: At top level:
samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
 void finish(int ret)
      ^~~~~~
  HOSTLD  samples/bpf/per_socket_stats_example

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
---
 samples/bpf/cookie_uid_helper_example.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c
index ad5afedf2e70..9ce55840d61d 100644
--- a/samples/bpf/cookie_uid_helper_example.c
+++ b/samples/bpf/cookie_uid_helper_example.c
@@ -273,7 +273,7 @@ static int usage(void)
 	return 1;
 }
 
-void finish(int ret)
+static void finish(int ret)
 {
 	test_finish = true;
 }
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 0/3] Misc BPF cleanup
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast

Hei,

while looking into making the Makefile in samples/bpf better handle O= I saw
several warnings when running `make clean && make samples/bpf/`. This series
reduces those warnings.

Thanks.

Alexander Alemayhu (3):
  samples/bpf: add -Wno-unknown-warning-option to clang
  samples/bpf: add static to function with no prototype
  samples/bpf: check before defining offsetof

 samples/bpf/Makefile                    | 1 +
 samples/bpf/cookie_uid_helper_example.c | 2 +-
 samples/bpf/test_lru_dist.c             | 4 +++-
 3 files changed, 5 insertions(+), 2 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [PATCH net-next 1/3] samples/bpf: add -Wno-unknown-warning-option to clang
From: Alexander Alemayhu @ 2017-04-24 13:31 UTC (permalink / raw)
  To: netdev; +Cc: Alexander Alemayhu, daniel, ast
In-Reply-To: <20170424133108.31595-1-alexander@alemayhu.com>

I was initially going to remove '-Wno-address-of-packed-member' because I
thought it was not supposed to be there but Daniel suggested using
'-Wno-unknown-warning-option'. 

This silences several warnings similiar to the one below

warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
1 warning generated.
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include
 -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  \
        -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
        -Wno-compare-distinct-pointer-types \
        -Wno-gnu-variable-sized-type-not-at-end \
        -Wno-address-of-packed-member -Wno-tautological-compare \
        -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o

$ clang --version

 clang version 3.9.1 (tags/RELEASE_391/final)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 InstalledDir: /usr/bin

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
---
 samples/bpf/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index d42b495b0992..6c7468eb3684 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -189,4 +189,5 @@ $(obj)/%.o: $(src)/%.c
 		-Wno-compare-distinct-pointer-types \
 		-Wno-gnu-variable-sized-type-not-at-end \
 		-Wno-address-of-packed-member -Wno-tautological-compare \
+		-Wno-unknown-warning-option \
 		-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v4 net-next RFC] net: Generic XDP
From: Jesper Dangaard Brouer @ 2017-04-24 13:18 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: David Miller, alexei.starovoitov, michael.chan, netdev,
	xdp-newbies, brouer
In-Reply-To: <20170420163034.053ec42c@redhat.com>


On Thu, 20 Apr 2017 16:30:34 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Wed, 19 Apr 2017 10:29:03 -0400
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> > I ran this on top of a card that uses the bnxt_en driver on a desktop
> > class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of
> > UDP traffic with flow control disabled and saw the following (all stats
> > in Million PPS).
> > 
> >                 xdp1                xdp2            xdp_tx_tunnel
> > Generic XDP      7.8    5.5 (1.3 actual)         4.6 (1.1 actual)
> > Optimized XDP   11.7		     9.7                      4.6
> > 
> > One thing to note is that the Generic XDP case shows some different
> > results for reported by the application vs actual (seen on the wire).  I
> > did not debug where the drops are happening and what counter needs to be
> > incremented to note this -- I'll add that to my TODO list.  The
> > Optimized XDP case does not have a difference in reported vs actual
> > frames on the wire.  
> 
> The reported application vs actual (seen on the wire) number sound scary.
> How do you evaluate/measure "seen on the wire"?
> 
> Perhaps you could use ethtool -S stats to see if anything is fishy?
> I recommend using my tool[1] like:
> 
>  ~/git/network-testing/bin/ethtool_stats.pl --dev mlx5p2 --sec 2
> 
> [1] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
> 
> I'm evaluating this patch on a mlx5 NIC, and something is not right...
> I'm seeing:
> 
>  Ethtool(mlx5p2) stat:     349599 (        349,599) <= tx_multicast_phy /sec
>  Ethtool(mlx5p2) stat:    4940185 (      4,940,185) <= tx_packets /sec
>  Ethtool(mlx5p2) stat:     349596 (        349,596) <= tx_packets_phy /sec
>  [...]
>  Ethtool(mlx5p2) stat:      36898 (         36,898) <= rx_cache_busy /sec
>  Ethtool(mlx5p2) stat:      36898 (         36,898) <= rx_cache_full /sec
>  Ethtool(mlx5p2) stat:    4903287 (      4,903,287) <= rx_cache_reuse /sec
>  Ethtool(mlx5p2) stat:    4940185 (      4,940,185) <= rx_csum_complete /sec
>  Ethtool(mlx5p2) stat:    4940185 (      4,940,185) <= rx_packets /sec
> 
> Something is wrong... when I tcpdump on the generator machine, I see
> garbled packets with IPv6 multicast addresses.
> 
> And it looks like I'm only sending 349,596 tx_packets_phy/sec on the "wire".
> 

Not seeing packets on the TX wire was caused by the NIC HW dropping the
packets, because the ethernet MAC-addr were not changed/swapped.

Fixed this XDP_TX bug in my test program xdp_bench01_mem_access_cost.
https://github.com/netoptimizer/prototype-kernel/commit/85f7ba2f0ea2

Even added a new option --swapmac for creating another test option for
modifying the packet.
https://github.com/netoptimizer/prototype-kernel/commit/fe080e6f3ccf

I will shortly publish a full report of testing this patch.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH v2] brcmfmac: Make skb header writable before use
From: James Hughes @ 2017-04-24 13:03 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, netdev
  Cc: James Hughes

The driver was making changes to the skb_header without
ensuring it was writable (i.e. uncloned).
This patch also removes some boiler plate header size
checking/adjustment code as that is also handled by the
skb_cow_header function used to make header writable.

This patch depends on
brcmfmac: Ensure pointer correctly set if skb data location changes

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
---
Changes in v2
  Makes the _cow_ call at the entry point of the skb in to the
  stack, means only needs to be done once, and error handling
  is easier.
  Split a separate minor bug fix off to a separate patch (which
  this patch depends on)



 .../net/wireless/broadcom/brcm80211/brcmfmac/core.c   | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 9b7c19a508ac..88f8675a94c2 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -211,22 +211,11 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 		goto done;
 	}
 
-	/* Make sure there's enough room for any header */
-	if (skb_headroom(skb) < drvr->hdrlen) {
-		struct sk_buff *skb2;
-
-		brcmf_dbg(INFO, "%s: insufficient headroom\n",
-			  brcmf_ifname(ifp));
-		drvr->bus_if->tx_realloc++;
-		skb2 = skb_realloc_headroom(skb, drvr->hdrlen);
+	/* Make sure there's enough room for any header, and make it writable */
+	if (skb_cow_head(skb, drvr->hdrlen)) {
 		dev_kfree_skb(skb);
-		skb = skb2;
-		if (skb == NULL) {
-			brcmf_err("%s: skb_realloc_headroom failed\n",
-				  brcmf_ifname(ifp));
-			ret = -ENOMEM;
-			goto done;
-		}
+		ret = -ENOMEM;
+		goto done;
 	}
 
 	/* validate length for ether packet */
-- 
2.11.0

^ permalink raw reply related

* [PATCH net v1 1/2] tipc: fix socket flow control accounting error at tipc_send_stream
From: Parthasarathy Bhuvaragan @ 2017-04-24 13:00 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion

Until now in tipc_send_stream(), we return -1 when the socket
encounters link congestion even if the socket had successfully
sent partial data. This is incorrect as the application resends
the same the partial data leading to data corruption at
receiver's end.

In this commit, we return the partially sent bytes as the return
value at link congestion.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 7130e73bd42c..b28e94f1c739 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1083,7 +1083,7 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen)
 		}
 	} while (sent < dlen && !rc);
 
-	return rc ? rc : sent;
+	return sent ? sent : rc;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related

* [PATCH net v1 2/2] tipc: fix socket flow control accounting error at tipc_recv_stream
From: Parthasarathy Bhuvaragan @ 2017-04-24 13:00 UTC (permalink / raw)
  To: netdev; +Cc: tipc-discussion
In-Reply-To: <1493038843-30621-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

Until now in tipc_recv_stream(), we update the received
unacknowledged bytes based on a stack variable and not based on the
actual message size.
If the user buffer passed at tipc_recv_stream() is smaller than the
received skb, the size variable in stack differs from the actual
message size in the skb. This leads to a flow control accounting
error causing permanent congestion.

In this commit, we fix this accounting error by always using the
size of the incoming message.

Fixes: 10724cc7bb78 ("tipc: redesign connection-level flow control")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index b28e94f1c739..566906795c8c 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1484,7 +1484,7 @@ static int tipc_recv_stream(struct socket *sock, struct msghdr *m,
 	if (unlikely(flags & MSG_PEEK))
 		goto exit;
 
-	tsk->rcv_unacked += tsk_inc(tsk, hlen + sz);
+	tsk->rcv_unacked += tsk_inc(tsk, hlen + msg_data_sz(msg));
 	if (unlikely(tsk->rcv_unacked >= (tsk->rcv_win / 4)))
 		tipc_sk_send_ack(tsk);
 	tsk_advance_rx_queue(sk);
-- 
2.1.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* [PATCH v2] net/packet: initialize val in packet_getsockopt()
From: Alexander Potapenko @ 2017-04-24 12:59 UTC (permalink / raw)
  To: dvyukov, kcc, edumazet, davem, kuznet; +Cc: linux-kernel, netdev

In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4
|val| remains uninitialized and the syscall may behave differently
depending on its value. This doesn't have security consequences (as the
uninit bytes aren't copied back), but it's still cleaner to initialize
|val| and ensure optlen is not less than sizeof(int).

This bug has been detected with KMSAN.

Signed-off-by: Alexander Potapenko <glider@google.com>
---
v2: - if len < sizeof(int), make it 0

KMSAN report below:

==================================================================
BUG: KMSAN: use of unitialized memory in packet_getsockopt+0xb9b/0xbe0
inter: 0
CPU: 0 PID: 1036 Comm: probe Tainted: G    B           4.11.0-rc5+ #2444
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x143/0x1b0 lib/dump_stack.c:52
 kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
 __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
 packet_getsockopt+0xb9b/0xbe0 net/packet/af_packet.c:3839
 SYSC_getsockopt+0x495/0x540 net/socket.c:1829
 SyS_getsockopt+0xb0/0xd0 net/socket.c:1811
 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
RIP: 0033:0x436d8a
RSP: 002b:00007ffce54e52c8 EFLAGS: 00000203 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000436d8a
RDX: 000000000000000b RSI: 0000000000000107 RDI: 0000000000000003
RBP: 00007ffce54e52b0 R08: 00007ffce54e52d8 R09: 0000000000000004
R10: 00007ffce54e52d4 R11: 0000000000000203 R12: 00007ffce54e53c8
R13: 00007ffce54e53d8 R14: 0000000000000002 R15: 0000000000000000
origin description: ----val@packet_getsockopt (origin=00000000f6600052)
local variable created at:
 packet_getsockopt+0xcd/0xbe0 net/packet/af_packet.c:3789
 SYSC_getsockopt+0x495/0x540 net/socket.c:1829
==================================================================
---
 net/packet/af_packet.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8489beff5c25..dfc762df5da9 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3787,7 +3787,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 			     char __user *optval, int __user *optlen)
 {
 	int len;
-	int val, lv = sizeof(val);
+	int val = 0, lv = sizeof(val);
 	struct sock *sk = sock->sk;
 	struct packet_sock *po = pkt_sk(sk);
 	void *data = &val;
@@ -3836,6 +3836,8 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 	case PACKET_HDRLEN:
 		if (len > sizeof(int))
 			len = sizeof(int);
+		if (len < sizeof(int))
+			len = 0;
 		if (copy_from_user(&val, optval, len))
 			return -EFAULT;
 		switch (val) {
-- 
2.12.2.816.g2cccc81164-goog

^ permalink raw reply related

* Re: [PATCH v4 13/18] arm64: allwinner: sun50i-a64: add dwmac-sun8i Ethernet driver
From: Chen-Yu Tsai @ 2017-04-24 12:58 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Maxime Ripard, Rob Herring, Mark Rutland, Chen-Yu Tsai,
	Russell King, Catalin Marinas, Will Deacon, Giuseppe Cavallaro,
	alexandre.torgue-qxv4g6HH51o, linux-sunxi, devicetree,
	linux-kernel, netdev, linux-arm-kernel
In-Reply-To: <20170424122411.GA9349@Red>

On Mon, Apr 24, 2017 at 8:24 PM, Corentin Labbe
<clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Apr 12, 2017 at 02:41:53PM +0200, Maxime Ripard wrote:
>> On Wed, Apr 12, 2017 at 01:13:55PM +0200, Corentin Labbe wrote:
>> > The dwmac-sun8i is an Ethernet MAC that supports 10/100/1000 Mbit
>> > connections. It is very similar to the device found in the Allwinner
>> > H3, but lacks the internal 100 Mbit PHY and its associated control
>> > bits.
>> > This adds the necessary bits to the Allwinner A64 SoC .dtsi, but keeps
>> > it disabled at this level.
>> >
>> > Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> > ---
>> >  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 +++++++++++++++++++++++++++
>> >  1 file changed, 37 insertions(+)
>> >
>> > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> > index 0b0f4ab..2569827 100644
>> > --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> > +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> > @@ -287,6 +287,23 @@
>> >                             bias-pull-up;
>> >                     };
>> >
>> > +                   rmii_pins: rmii_pins {
>> > +                           pins = "PD10", "PD11", "PD13", "PD14",
>> > +                                           "PD17", "PD18", "PD19", "PD20",
>> > +                                           "PD22", "PD23";
>>
>> Please align the wrapped lines on the first pin.
>>
>
> OK
>
>> > +                           function = "emac";
>> > +                           drive-strength = <40>;
>>
>> Do you actually need that for all the boards, or only a few of them?
>
> I have tried to use lower value without success on some boards. (opipc/pine64 in my memory)

FYI we need them for all the boards that use RGMII.
The signals at gigabit speed run at 125 MHz DDR.

For RMII we probably don't need it. Even at 100 Mbps,
it's only 50 MHz SDR. drive-strength = <30> should be
enough.

ChenYu

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-24 12:54 UTC (permalink / raw)
  To: Simon Horman; +Cc: David Miller, eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <20170424092754.GB25218@vergenet.net>

On 17-04-24 05:27 AM, Simon Horman wrote:
> On Fri, Apr 21, 2017 at 02:11:00PM -0400, Jamal Hadi Salim wrote:
>> On 17-04-21 12:12 PM, David Miller wrote:

>
> From my PoV, for #d user-space has to either get smart or fail.
>
> Creating new tc involved work in order to support the new feature.
> Part of that work would be a decision weather or not to provide
> compatibility for old kernel or to bail out gracefully.
>

But not that much work as is being ascribed now.
This is a big change to user space code. Are we planning to
change all netlink apps (everything in iproute2) to now discover
by testing whether their flags are accepted or not? One extra
crossing to the kernel just to test the waters.

I think there's a middle ground and see which idea takes off.
Refer to the last patch I sent which implements the idea below.

cheers,
jamal

>> There is a simpler approach that would work going forward.
>> How about letting the user choose their fate? Set something maybe
>> in the netlink header to tell the kernel "if you dont understand
>> something I am asking for - please ignore it and do what you can".
>> This would maintain current behavior but would force the user to
>> explicitly state so.
>>
>> cheers,
>> jamal
>>

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-24 12:49 UTC (permalink / raw)
  To: Simon Horman, Jiri Pirko
  Cc: davem, xiyou.wangcong, eric.dumazet, netdev, Tom Herbert,
	Pablo Neira Ayuso
In-Reply-To: <20170424091455.GA25218@vergenet.net>

On 17-04-24 05:14 AM, Simon Horman wrote:
[..]

> Jamal, I am confused about why are you so concerned about the space
> consumed by this attribute, it's per-message, right? Is it the bigger
> picture you are worried about - a similar per-entry flag at some point in
> the future?


To me the two worries are one and the same.

Jiri strongly believes (from a big picture view) we must use
TLVs for extensibility.
While I agree with him in general i have strong reservations
in this case because i can get both extensibility and
build for performance with using a flag bitmask as the
content of the TLV.

A TLV consumes 64 bits minimum. It doesnt matter if we decide
to use a u8 or a u16, we are still sending 64 bits on that
TLV with the rest being PADding. Not to be melodramatic, but
the worst case scenario of putting everything in a TLV for 32
flags is using about 30x more space than using a bitmask.

Yes, space is important and if i can express upto 32 flags
with one TLV rather than 32 TLVs i choose one TLV.
I am always looking for ways to filter out crap i dont need
when i do stats collection. I have numerous wounds from fdb
entries which decided to use a TLV per flag.

The design approach we have used in netlink is: flags start
as a bitmap (whether they are on main headers or TLVs); they may be
complemented with a bitmask/selector (refer to IFLINK messages).

Lets look at this specific patch I have sending. I have already
changed it 3 times and involved a churn of 3 different flags.
If you asked me in the beggining i wouldve scratched my head
thinking for a near term use for bit #3, #4 etc,

I am fine with the counter-Postel view of having the kernel
validate that appropriate bits are set as long as we dont make
user space to now start learning how to play acrobatics.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH v4 13/18] arm64: allwinner: sun50i-a64: add dwmac-sun8i Ethernet driver
From: Corentin Labbe @ 2017-04-24 12:24 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	wens-jdAy2FN1RRM, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20170412124153.q6zvdvqkroizaxgb@lukather>

On Wed, Apr 12, 2017 at 02:41:53PM +0200, Maxime Ripard wrote:
> On Wed, Apr 12, 2017 at 01:13:55PM +0200, Corentin Labbe wrote:
> > The dwmac-sun8i is an Ethernet MAC that supports 10/100/1000 Mbit
> > connections. It is very similar to the device found in the Allwinner
> > H3, but lacks the internal 100 Mbit PHY and its associated control
> > bits.
> > This adds the necessary bits to the Allwinner A64 SoC .dtsi, but keeps
> > it disabled at this level.
> > 
> > Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > ---
> >  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 +++++++++++++++++++++++++++
> >  1 file changed, 37 insertions(+)
> > 
> > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> > index 0b0f4ab..2569827 100644
> > --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> > +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> > @@ -287,6 +287,23 @@
> >  				bias-pull-up;
> >  			};
> >  
> > +			rmii_pins: rmii_pins {
> > +				pins = "PD10", "PD11", "PD13", "PD14",
> > +						"PD17", "PD18", "PD19", "PD20",
> > +						"PD22", "PD23";
> 
> Please align the wrapped lines on the first pin.
> 

OK

> > +				function = "emac";
> > +				drive-strength = <40>;
> 
> Do you actually need that for all the boards, or only a few of them?

I have tried to use lower value without success on some boards. (opipc/pine64 in my memory)

Regards
Corentin Labbe

^ permalink raw reply

* [PATCH net] ipv6: move stub initialization after ipv6 setup completion
From: Paolo Abeni @ 2017-04-24 12:18 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Cong Wang, Hannes Frederic Sowa

The ipv6 stub pointer is currently initialized before the ipv6
routing subsystem: a 3rd party can access and use such stub
before the routing data is ready.
Moreover, such pointer is not cleared in case of initialization
error, possibly leading to dangling pointers usage.

This change addresses the above moving the stub initialization
at the end of ipv6 init code.

Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/af_inet6.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a9a9553..e82e59f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -933,8 +933,6 @@ static int __init inet6_init(void)
 	if (err)
 		goto igmp_fail;
 
-	ipv6_stub = &ipv6_stub_impl;
-
 	err = ipv6_netfilter_init();
 	if (err)
 		goto netfilter_fail;
@@ -1010,6 +1008,10 @@ static int __init inet6_init(void)
 	if (err)
 		goto sysctl_fail;
 #endif
+
+	/* ensure that ipv6 stubs are visible only after ipv6 is ready */
+	wmb();
+	ipv6_stub = &ipv6_stub_impl;
 out:
 	return err;
 
-- 
2.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox