Netdev List
 help / color / mirror / Atom feed
* Re: net: deadlock on genl_mutex
From: Eric Dumazet @ 2016-11-29  6:06 UTC (permalink / raw)
  To: subashab
  Cc: Eric Dumazet, Dmitry Vyukov, David Miller, Matti Vaittinen,
	Tycho Andersen, Cong Wang, Florian Westphal, stephen hemminger,
	Tom Herbert, netdev, LKML, Richard Guy Briggs, syzkaller,
	netdev-owner
In-Reply-To: <0227d7e83cc5ac0a192d1ba0fee61413@codeaurora.org>

On Mon, 2016-11-28 at 22:59 -0700, subashab@codeaurora.org wrote:
> > 
> > Issue was reported yesterday and is under investigation.
> > 
> > 
> > http://marc.info/?l=linux-netdev&m=148014004331663&w=2
> > 
> > 
> > Thanks !
> 
> Hi Dmitry
> 
> Can you try the patch below with your reproducer? I haven't seen similar 
> crashes reported after this (or even with Eric's patch).
> 
> https://patchwork.ozlabs.org/patch/699937/

Yeah, I will post my patch on top of this one.

^ permalink raw reply

* [PATCH net-next] samples/bpf: fix include path
From: Alexei Starovoitov @ 2016-11-29  6:07 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, netdev

Fix the following build error:
HOSTCC  samples/bpf/test_lru_dist.o
../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory

This is due to objtree != srctree.
Use srctree, since that's where bpf_util.h is located.

Fixes: e00c7b216f34 ("bpf: fix multiple issues in selftest suite and samples")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 22b6407efa4f..3ceb5a9d86df 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -91,7 +91,7 @@ always += trace_event_kern.o
 always += sampleip_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
-HOSTCFLAGS += -I$(objtree)/tools/testing/selftests/bpf/
+HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
 HOSTLOADLIBES_fds_example += -lelf
-- 
2.8.0

^ permalink raw reply related

* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-29  6:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: ira.weiny, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Dennis Dalessandro
In-Reply-To: <20161124161545.GA20818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Thu, Nov 24, 2016 at 09:15:45AM -0700, Jason Gunthorpe wrote:
>> And will move the hfi_vnic module under
>> ‘drivers/infiniband/ulp/hfi_vnic’.
>
>I would prefer drivers/net/ethernet
>
>This is clearly not a ULP since it doesn't use verbs.
>

I understand it is not using verbs, but the control path (ib_device client) is 
using verbs (IB MAD).
Our prefernce is to keep it somewhere under drivers/infiniband. Summarizing 
reasons again here,

- VNIC control driver (ib_device client) is an IB MAD agent.
- It is purly a software construct, encapsualtes ethernet packets in Omni-path 
packet and depends on hfi1 driver here for HW access.

Doug,
Any comments?

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-29  6:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: ira.weiny, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Dennis Dalessandro
In-Reply-To: <20161125190509.GB16504-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Fri, Nov 25, 2016 at 12:05:09PM -0700, Jason Gunthorpe wrote:
>On Thu, Nov 24, 2016 at 06:13:50PM -0800, Vishwanathapura, Niranjana wrote:
>
>> In order to be truely device independent the hfi_vnic ULP should not depend
>> on a device exported symbol. Instead device should register its functions
>> with the ULP. Hence the approaches a) and b).
>
>It is not device independent, it is hard linked to hfi1, just like our
>other multi-component drivers.. So don't worry about that.
>

We would like to keep the design clean and avoid any tight coupling here (our 
original design in this series tackled these).
Any strong reason not to go with a) or b) ?

Niranjana

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the net tree
From: Daniel Borkmann @ 2016-11-29  6:32 UTC (permalink / raw)
  To: Stephen Rothwell, David Miller, Networking; +Cc: linux-next, linux-kernel
In-Reply-To: <20161129113126.2626e7fe@canb.auug.org.au>

On 11/29/2016 01:31 AM, Stephen Rothwell wrote:
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>    net/sched/cls_flower.c
>
> between commit:
>
>    d936377414fa ("net, sched: respect rcu grace period on cls destruction")
>
> from the net tree and commit:
>
>    13fa876ebd03 ("net/sched: cls_flower: merge filter delete/destroy common code")
>
> from the net-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks good to me, thanks!

^ permalink raw reply

* Re: [PATCH net-next] samples/bpf: fix include path
From: Daniel Borkmann @ 2016-11-29  6:34 UTC (permalink / raw)
  To: Alexei Starovoitov, David S . Miller; +Cc: netdev
In-Reply-To: <1480399642-2475887-1-git-send-email-ast@fb.com>

On 11/29/2016 07:07 AM, Alexei Starovoitov wrote:
> Fix the following build error:
> HOSTCC  samples/bpf/test_lru_dist.o
> ../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory
>
> This is due to objtree != srctree.
> Use srctree, since that's where bpf_util.h is located.
>
> Fixes: e00c7b216f34 ("bpf: fix multiple issues in selftest suite and samples")
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Didn't run into this so far, thanks for fixing!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH] ipv6:ip6_xmit remove unnecessary np NULL check
From: Manjeet Pawar @ 2016-11-29  6:32 UTC (permalink / raw)
  To: davem, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
  Cc: pankaj.m, ajeet.y, Rohit Thapliyal, Manjeet Pawar

From: Rohit Thapliyal <r.thapliyal@samsung.com>

np NULL check doesn't seem required here as it shall never
be NULL anyways in inet6_sk(sk).

Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com>
Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com>
Signed-off-by: David Miller <davem@davemloft.net>
Reviewed-by: Akhilesh Kumar <akhilesh.k@samsung.com>

---
v2->v3: Modified as per the suggestion from David Miller
        ip6_xmit calls are made without checking NULL np
        pointer, so no need to explicitly check NULL np in
        ip6_xmit.

 include/linux/ipv6.h  | 2 +-
 net/ipv6/ip6_output.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index a064997..6c9c604 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -299,7 +299,7 @@ struct tcp6_timewait_sock {
 
 static inline struct ipv6_pinfo *inet6_sk(const struct sock *__sk)
 {
-	return sk_fullsock(__sk) ? inet_sk(__sk)->pinet6 : NULL;
+	return inet_sk(__sk)->pinet6;
 }
 
 static inline struct raw6_sock *raw6_sk(const struct sock *sk)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 59eb4ed..f8c63ec 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -213,8 +213,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	/*
 	 *	Fill in the IPv6 header
 	 */
-	if (np)
-		hlimit = np->hop_limit;
+	hlimit = np->hop_limit;
 	if (hlimit < 0)
 		hlimit = ip6_dst_hoplimit(dst);
 
-- 
1.9.1

^ permalink raw reply related

* bpf debug info
From: Alexei Starovoitov @ 2016-11-29  6:42 UTC (permalink / raw)
  To: netdev
  Cc: Daniel Borkmann, Brenden Blanco, Thomas Graf, Wangnan, He Kuang,
	kernel-team

Hi All,

The support for debug information in BPF was recently added to llvm.

In order to use it recompile bpf programs with the following patch
in samples/bpf/Makefile
@@ -155,4 +155,4 @@ $(obj)/%.o: $(src)/%.c
        $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
                -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
                -Wno-compare-distinct-pointer-types \
-               -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+               -O2 -emit-llvm -g -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@

and compiled .o files can be consumed by standard llvm-objdump utility.

$ llvm-objdump -S -no-show-raw-insn samples/bpf/xdp1_kern.o
xdp1_kern.o:    file format ELF64-BPF

Disassembly of section xdp1:
xdp_prog1:
; {
       0:       r2 = *(u32 *)(r1 + 4)
; void *data = (void *)(long)ctx->data;
       8:       r1 = *(u32 *)(r1 + 0)
; if (data + nh_off > data_end)
      10:       r3 = r1
      18:       r3 += 14
      20:       if r3 > r2 goto 55
; h_proto = eth->h_proto;
      28:       r3 = *(u8 *)(r1 + 12)
      30:       r4 = *(u8 *)(r1 + 13)
      38:       r4 <<= 8
      40:       r4 |= r3
; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
      48:       if r4 == 43144 goto 2
      50:       r3 = 14
      58:       if r4 != 129 goto 5

LBB0_3:
; if (data + nh_off > data_end)
      60:       r3 = r1
      68:       r3 += 18
      70:       if r3 > r2 goto 45
      78:       r3 = 18
; h_proto = vhdr->h_vlan_encapsulated_proto;
      80:       r4 = *(u16 *)(r1 + 16)

LBB0_5:
      88:       r5 = r4
      90:       r5 &= 65535
; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
      98:       if r5 == 43144 goto 1
      a0:       if r5 != 129 goto 9

Notice that 'clang -S -o a.s' output and llvm-objdump disassembler
were changed to use kernel verifier style, so now it should be easier
to see what's going on.

The main advantage of debug info is that verifier error messages
are now easier to correlate to original C code.

For example, say, in samples/bpf/parse_varlen.c I forgot
to compare pointer into packet with data_end:
--- a/samples/bpf/parse_varlen.c
+++ b/samples/bpf/parse_varlen.c
@@ -33,8 +33,8 @@ static int udp(void *data, uint64_t tp_off, void *data_end)
 {
        struct udphdr *udp = data + tp_off;

-       if (udp + 1 > data_end)
-               return 0;
+//     if (udp + 1 > data_end)
+//             return 0;
        if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
            udp->source == htons(DEFAULT_PKTGEN_UDP_PORT)) {

If I try to run samples/bpf/test_cls_bpf.sh the verifier will complain:
R0=imm0,min_value=0,max_value=0 R1=pkt(id=0,off=0,r=42) R2=pkt_end
112: (0f) r4 += r3
113: (0f) r1 += r4
114: (b7) r0 = 2
115: (69) r2 = *(u16 *)(r1 +2)
invalid access to packet, off=2 size=2, R1(id=3,off=0,r=0)

Now multiply 115 * 8 and convert to hex. This is address 0x398 in llvm-objdump:
; struct udphdr *udp = data + tp_off;
     388:       r1 += r4
     390:       r0 = 2
; if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
     398:       r2 = *(u16 *)(r1 + 2)
     3a0:       if r2 == 2304 goto 16

Now it's clear which line of C code is causing the verifier to reject.

It's still not obvious why register R1 is 'invalid pointer to packet'.
The 'r=0' part of 'R1(id=3,off=0,r=0)' stands for zero bytes were
verified to be valid in this register.
Since 'if (udp + 1 > data_end)' was not done, the kernel doesn't
know that there are valid bytes in the packet after 'udp' pointer.

So next step is to improve verifier messages to be more human friendly.
The step after is to introduce BPF_COMMENT pseudo instruction
that will be ignored by the interpreter yet it will contain the text
of original source code. Then llvm-objdump step won't be necessary.
The bpf loader will load both instructions and pieces of C sources.
Then verifier errors should be even easier to read and humans
can easily understand the purpose of the program.

PS
A year ago He Kuang reported that dwarf emitted by bpf llvm backend is broken.
Sorry it took so long to fix. It's probably still broken on big endian,
since I've only tested on x86.

^ permalink raw reply

* [patch net v2] net: fec: cache statistics while device is down
From: Nikita Yushchenko @ 2016-11-29  6:44 UTC (permalink / raw)
  To: David S. Miller, Fugang Duan, Troy Kisky, Andrew Lunn,
	Eric Nelson, Philippe Reynes, Johannes Berg, netdev
  Cc: Chris Healy, Fabio Estevam, linux-kernel, Nikita Yushchenko

Execution 'ethtool -S' on fec device that is down causes OOPS on Vybrid
board:

Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0898200
pgd = ddecc000
[e0898200] *pgd=9e406811, *pte=400d1653, *ppte=400d1453
Internal error: : 1008 [#1] SMP ARM
...

Reason of OOPS is that fec_enet_get_ethtool_stats() accesses fec
registers while IPG clock is stopped by PM.

Fix that by caching statistics in fec_enet_private. Cache is initialized
at device probe time, and updated at statistics request time if device
is up, and also just before turning device off on down path.

Additional locking is not needed, since cached statistics is accessed
either before device is registered, or under rtnl_lock().

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
Changes since v1:
- initialize cache at device probe time

 drivers/net/ethernet/freescale/fec.h      |  2 ++
 drivers/net/ethernet/freescale/fec_main.c | 23 +++++++++++++++++++----
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index c865135f3cb9..5ea740b4cf14 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -574,6 +574,8 @@ struct fec_enet_private {
 	unsigned int reload_period;
 	int pps_enable;
 	unsigned int next_counter;
+
+	u64 ethtool_stats[0];
 };
 
 void fec_ptp_init(struct platform_device *pdev);
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 5aa9d4ded214..6a20c24a2003 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2313,14 +2313,24 @@ static const struct fec_stat {
 	{ "IEEE_rx_octets_ok", IEEE_R_OCTETS_OK },
 };
 
-static void fec_enet_get_ethtool_stats(struct net_device *dev,
-	struct ethtool_stats *stats, u64 *data)
+static void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
 	struct fec_enet_private *fep = netdev_priv(dev);
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(fec_stats); i++)
-		data[i] = readl(fep->hwp + fec_stats[i].offset);
+		fep->ethtool_stats[i] = readl(fep->hwp + fec_stats[i].offset);
+}
+
+static void fec_enet_get_ethtool_stats(struct net_device *dev,
+				       struct ethtool_stats *stats, u64 *data)
+{
+	struct fec_enet_private *fep = netdev_priv(dev);
+
+	if (netif_running(dev))
+		fec_enet_update_ethtool_stats(dev);
+
+	memcpy(data, fep->ethtool_stats, ARRAY_SIZE(fec_stats) * sizeof(u64));
 }
 
 static void fec_enet_get_strings(struct net_device *netdev,
@@ -2874,6 +2884,8 @@ fec_enet_close(struct net_device *ndev)
 	if (fep->quirks & FEC_QUIRK_ERR006687)
 		imx6q_cpuidle_fec_irqs_unused();
 
+	fec_enet_update_ethtool_stats(ndev);
+
 	fec_enet_clk_enable(ndev, false);
 	pinctrl_pm_select_sleep_state(&fep->pdev->dev);
 	pm_runtime_mark_last_busy(&fep->pdev->dev);
@@ -3180,6 +3192,8 @@ static int fec_enet_init(struct net_device *ndev)
 
 	fec_restart(ndev);
 
+	fec_enet_update_ethtool_stats(ndev);
+
 	return 0;
 }
 
@@ -3278,7 +3292,8 @@ fec_probe(struct platform_device *pdev)
 	fec_enet_get_queue_num(pdev, &num_tx_qs, &num_rx_qs);
 
 	/* Init network device */
-	ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private),
+	ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
+				  ARRAY_SIZE(fec_stats) * sizeof(u64),
 				  num_tx_qs, num_rx_qs);
 	if (!ndev)
 		return -ENOMEM;
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  6:49 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: yishaih, netdev, linux-rdma, Rameshwar Sahu
In-Reply-To: <a1b6f877-c40b-656d-2278-d32af1a93bc7@cogentembedded.com>

On Tue, Nov 29, 2016 at 12:36 AM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> Hello.
>
> On 11/28/2016 04:28 PM, Souptick Joarder wrote:
>
>> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
>> replaced by pci_pool_zalloc()
>>
>> Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> b/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> index e36bebc..ee3bd76 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox
>> *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>>         if (!mailbox)
>>                 return ERR_PTR(-ENOMEM);
>>
>> -       mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool,
>> GFP_KERNEL,
>> +       mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool,
>> GFP_KERNEL,
>>                                       &mailbox->dma);
>
>
>    You need to realign he continuation line now, the way it was aligned in
> the original code.
>

Ok, I will do that.





>
> MBR, Sergei
>

^ permalink raw reply

* Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
From: Cong Wang @ 2016-11-29  6:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Daniel Borkmann, David Miller, John Fastabend, Roi Dayan, ast,
	Hannes Frederic Sowa, Jiri Pirko, Linux Kernel Network Developers
In-Reply-To: <20161128104736.GX31360@linux.vnet.ibm.com>

On Mon, Nov 28, 2016 at 2:47 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> RCU callbacks are always executed in softirq context, so yes, you do need
> to use something like a work struct.  (Or a wakeup to a kthread or
> whatever.)

Thanks for your information.

^ permalink raw reply

* bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Brice Goglin @ 2016-11-29  6:57 UTC (permalink / raw)
  To: Linux Network Development list, Baoquan He

Hello

My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
those:

01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
	DeviceName: Embedded NIC 1                          
	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
	Capabilities: <access denied>
	Kernel driver in use: bnx2
	Kernel modules: bnx2

The only change in bnx2 between 4.7 and 4.8 appears to be this one:

commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
Author: Baoquan He <bhe@redhat.com>
Date:   Fri Sep 9 22:43:12 2016 +0800

    bnx2: Reset device during driver initialization

Could you patch actually break the BMC? What do I need to further debug
this issue?

Thanks
Brice

^ permalink raw reply

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: Cong Wang @ 2016-11-29  6:57 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, Roi Dayan, Jiri Pirko,
	Daniel Borkmann
In-Reply-To: <583B9D22.8090906@gmail.com>

On Sun, Nov 27, 2016 at 6:57 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Hi Cong,
>
> Thanks a lot for doing this. Can you rebase it on top of Daniel's patch
> though,
>
>  [PATCH net] net, sched: respect rcu grace period on cls destruction
>
> And then push the NULL pointer work for the cls_fw and cls_route
> classifiers into another patch.
>
> Then I believe the last thing to make this correct is to convert the
> call_rcu() paths to call_rcu_bh().

Sure, will rebase my patch once DaveM merges net into net-next.

Thanks.

^ permalink raw reply

* [WIP] net+mlx4: auto doorbell
From: Eric Dumazet @ 2016-11-29  6:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan
In-Reply-To: <1479751857.8455.419.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 2016-11-21 at 10:10 -0800, Eric Dumazet wrote:


> Not sure it this has been tried before, but the doorbell avoidance could
> be done by the driver itself, because it knows a TX completion will come
> shortly (well... if softirqs are not delayed too much !)
> 
> Doorbell would be forced only if :
> 
> (    "skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
> OR
> ( too many [1] packets were put in TX ring buffer, no point deferring
> more)
> 
> Start the pump, but once it is started, let the doorbells being done by
> TX completion.
> 
> ndo_start_xmit and TX completion handler would have to maintain a shared
> state describing if packets were ready but doorbell deferred.
> 
> 
> Note that TX completion means "if at least one packet was drained",
> otherwise busy polling, constantly calling napi->poll() would force a
> doorbell too soon for devices sharing a NAPI for both RX and TX.
> 
> But then, maybe busy poll would like to force a doorbell...
> 
> I could try these ideas on mlx4 shortly.
> 
> 
> [1] limit could be derived from active "ethtool -c" params, eg tx-frames

I have a WIP, that increases pktgen rate by 75 % on mlx4 when bulking is
not used.

lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt 
lpaa23:~# sar -n DEV 1 10|grep eth0
22:43:26         eth0      0.00 5822800.00      0.00 597064.41      0.00      0.00      1.00
22:43:27         eth0     24.00 5788237.00      2.09 593520.26      0.00      0.00      0.00
22:43:28         eth0     12.00 5817777.00      1.43 596551.47      0.00      0.00      1.00
22:43:29         eth0     22.00 5841516.00      1.61 598982.87      0.00      0.00      0.00
22:43:30         eth0      4.00 4389137.00      0.71 450058.08      0.00      0.00      1.00
22:43:31         eth0      4.00 5871008.00      0.72 602007.79      0.00      0.00      0.00
22:43:32         eth0     12.00 5891809.00      1.43 604142.60      0.00      0.00      1.00
22:43:33         eth0     10.00 5901904.00      1.12 605175.70      0.00      0.00      0.00
22:43:34         eth0      5.00 5907982.00      0.69 605798.99      0.00      0.00      1.00
22:43:35         eth0      2.00 5847086.00      0.12 599554.71      0.00      0.00      0.00
Average:         eth0      9.50 5707925.60      0.99 585285.69      0.00      0.00      0.50
lpaa23:~# echo 1 >/sys/class/net/eth0/doorbell_opt 
lpaa23:~# sar -n DEV 1 10|grep eth0
22:43:47         eth0      9.00 10226424.00      1.02 1048608.05      0.00      0.00      1.00
22:43:48         eth0      1.00 10316955.00      0.06 1057890.89      0.00      0.00      0.00
22:43:49         eth0      1.00 10310104.00      0.10 1057188.32      0.00      0.00      1.00
22:43:50         eth0      0.00 10249423.00      0.00 1050966.23      0.00      0.00      0.00
22:43:51         eth0      0.00 10210441.00      0.00 1046969.05      0.00      0.00      1.00
22:43:52         eth0      2.00 10198389.00      0.16 1045733.17      0.00      0.00      1.00
22:43:53         eth0      8.00 10079257.00      1.43 1033517.83      0.00      0.00      0.00
22:43:54         eth0      0.00 7693509.00      0.00 788885.16      0.00      0.00      0.00
22:43:55         eth0      2.00 10343076.00      0.20 1060569.32      0.00      0.00      1.00
22:43:56         eth0      1.00 10224571.00      0.14 1048417.93      0.00      0.00      0.00
Average:         eth0      2.40 9985214.90      0.31 1023874.60      0.00      0.00      0.50

And about 11 % improvement on an mono-flow UDP_STREAM test.

skb_set_owner_w() is now the most consuming function.


lpaa23:~# ./udpsnd -4 -H 10.246.7.152 -d 2 &
[1] 13696
lpaa23:~# echo 0 >/sys/class/net/eth0/doorbell_opt
lpaa23:~# sar -n DEV 1 10|grep eth0
22:50:47         eth0      3.00 1355422.00      0.45 319706.04      0.00      0.00      0.00
22:50:48         eth0      2.00 1344270.00      0.42 317035.21      0.00      0.00      1.00
22:50:49         eth0      3.00 1350503.00      0.51 318478.34      0.00      0.00      0.00
22:50:50         eth0     29.00 1348593.00      2.86 318113.02      0.00      0.00      1.00
22:50:51         eth0     14.00 1354855.00      1.83 319508.56      0.00      0.00      0.00
22:50:52         eth0      7.00 1357794.00      0.73 320226.89      0.00      0.00      1.00
22:50:53         eth0      5.00 1326130.00      0.63 312784.72      0.00      0.00      0.00
22:50:54         eth0      2.00 994584.00      0.12 234598.40      0.00      0.00      1.00
22:50:55         eth0      5.00 1318209.00      0.75 310932.46      0.00      0.00      0.00
22:50:56         eth0     20.00 1323445.00      1.73 312178.11      0.00      0.00      1.00
Average:         eth0      9.00 1307380.50      1.00 308356.18      0.00      0.00      0.50
lpaa23:~# echo 3 >/sys/class/net/eth0/doorbell_opt
lpaa23:~# sar -n DEV 1 10|grep eth0
22:51:03         eth0      4.00 1512055.00      0.54 356599.40      0.00      0.00      0.00
22:51:04         eth0      4.00 1507631.00      0.55 355609.46      0.00      0.00      1.00
22:51:05         eth0      4.00 1487789.00      0.42 350917.47      0.00      0.00      0.00
22:51:06         eth0      7.00 1474460.00      1.22 347811.16      0.00      0.00      1.00
22:51:07         eth0      2.00 1496529.00      0.24 352995.18      0.00      0.00      0.00
22:51:08         eth0      3.00 1485856.00      0.49 350425.65      0.00      0.00      1.00
22:51:09         eth0      1.00 1114808.00      0.06 262905.38      0.00      0.00      0.00
22:51:10         eth0      2.00 1510924.00      0.30 356397.53      0.00      0.00      1.00
22:51:11         eth0      2.00 1506408.00      0.30 355345.76      0.00      0.00      0.00
22:51:12         eth0      2.00 1499122.00      0.32 353668.75      0.00      0.00      1.00
Average:         eth0      3.10 1459558.20      0.44 344267.57      0.00      0.00      0.50

 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |    2 
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   90 +++++++++++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    4 
 include/linux/netdevice.h                    |    1 
 net/core/net-sysfs.c                         |   18 +++
 5 files changed, 83 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6562f78b07f4..fbea83218fc0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1089,7 +1089,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 
 	if (polled) {
 		if (doorbell_pending)
-			mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
+			mlx4_en_xmit_doorbell(dev, priv->tx_ring[TX_XDP][cq->ring]);
 
 		mlx4_cq_set_ci(&cq->mcq);
 		wmb(); /* ensure HW sees CQ consumer before we post new buffers */
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 4b597dca5c52..affebb435679 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -67,7 +67,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	ring->size = size;
 	ring->size_mask = size - 1;
 	ring->sp_stride = stride;
-	ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
+	ring->full_size = ring->size - HEADROOM - 2*MAX_DESC_TXBBS;
 
 	tmp = size * sizeof(struct mlx4_en_tx_info);
 	ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
@@ -193,6 +193,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
 	ring->sp_cqn = cq;
 	ring->prod = 0;
 	ring->cons = 0xffffffff;
+	ring->ncons = 0;
 	ring->last_nr_txbb = 1;
 	memset(ring->tx_info, 0, ring->size * sizeof(struct mlx4_en_tx_info));
 	memset(ring->buf, 0, ring->buf_size);
@@ -227,9 +228,9 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
 		       MLX4_QP_STATE_RST, NULL, 0, 0, &ring->sp_qp);
 }
 
-static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
+static inline bool mlx4_en_is_tx_ring_full(const struct mlx4_en_tx_ring *ring)
 {
-	return ring->prod - ring->cons > ring->full_size;
+	return READ_ONCE(ring->prod) - READ_ONCE(ring->cons) > ring->full_size;
 }
 
 static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
@@ -374,6 +375,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 
 	/* Skip last polled descriptor */
 	ring->cons += ring->last_nr_txbb;
+	ring->ncons += ring->last_nr_txbb;
 	en_dbg(DRV, priv, "Freeing Tx buf - cons:0x%x prod:0x%x\n",
 		 ring->cons, ring->prod);
 
@@ -389,6 +391,7 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 						!!(ring->cons & ring->size), 0,
 						0 /* Non-NAPI caller */);
 		ring->cons += ring->last_nr_txbb;
+		ring->ncons += ring->last_nr_txbb;
 		cnt++;
 	}
 
@@ -401,6 +404,38 @@ int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring)
 	return cnt;
 }
 
+void mlx4_en_xmit_doorbell(const struct net_device *dev,
+			   struct mlx4_en_tx_ring *ring)
+{
+
+	if (dev->doorbell_opt & 1) {
+		u32 oval = READ_ONCE(ring->prod_bell);
+		u32 nval = READ_ONCE(ring->prod);
+
+		if (oval == nval)
+			return;
+
+		/* I can not tell yet if a cmpxchg() is needed or not */
+		if (dev->doorbell_opt & 2)
+			WRITE_ONCE(ring->prod_bell, nval);
+		else
+			if (cmpxchg(&ring->prod_bell, oval, nval) != oval)
+				return;
+	}
+	/* Since there is no iowrite*_native() that writes the
+	 * value as is, without byteswapping - using the one
+	 * the doesn't do byteswapping in the relevant arch
+	 * endianness.
+	 */
+#if defined(__LITTLE_ENDIAN)
+	iowrite32(
+#else
+	iowrite32be(
+#endif
+		  ring->doorbell_qpn,
+		  ring->bf.uar->map + MLX4_SEND_DOORBELL);
+}
+
 static bool mlx4_en_process_tx_cq(struct net_device *dev,
 				  struct mlx4_en_cq *cq, int napi_budget)
 {
@@ -496,8 +531,13 @@ static bool mlx4_en_process_tx_cq(struct net_device *dev,
 	wmb();
 
 	/* we want to dirty this cache line once */
-	ACCESS_ONCE(ring->last_nr_txbb) = last_nr_txbb;
-	ACCESS_ONCE(ring->cons) = ring_cons + txbbs_skipped;
+	WRITE_ONCE(ring->last_nr_txbb, last_nr_txbb);
+	ring_cons += txbbs_skipped;
+	WRITE_ONCE(ring->cons, ring_cons);
+	WRITE_ONCE(ring->ncons, ring_cons + last_nr_txbb);
+
+	if (dev->doorbell_opt)
+		mlx4_en_xmit_doorbell(dev, ring);
 
 	if (ring->free_tx_desc == mlx4_en_recycle_tx_desc)
 		return done < budget;
@@ -725,29 +765,14 @@ static void mlx4_bf_copy(void __iomem *dst, const void *src,
 	__iowrite64_copy(dst, src, bytecnt / 8);
 }
 
-void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring)
-{
-	wmb();
-	/* Since there is no iowrite*_native() that writes the
-	 * value as is, without byteswapping - using the one
-	 * the doesn't do byteswapping in the relevant arch
-	 * endianness.
-	 */
-#if defined(__LITTLE_ENDIAN)
-	iowrite32(
-#else
-	iowrite32be(
-#endif
-		  ring->doorbell_qpn,
-		  ring->bf.uar->map + MLX4_SEND_DOORBELL);
-}
 
 static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
 				  struct mlx4_en_tx_desc *tx_desc,
 				  union mlx4_wqe_qpn_vlan qpn_vlan,
 				  int desc_size, int bf_index,
 				  __be32 op_own, bool bf_ok,
-				  bool send_doorbell)
+				  bool send_doorbell,
+				  const struct net_device *dev, int nr_txbb)
 {
 	tx_desc->ctrl.qpn_vlan = qpn_vlan;
 
@@ -761,6 +786,7 @@ static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
 
 		wmb();
 
+		ring->prod += nr_txbb;
 		mlx4_bf_copy(ring->bf.reg + ring->bf.offset, &tx_desc->ctrl,
 			     desc_size);
 
@@ -773,8 +799,9 @@ static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
 		 */
 		dma_wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
+		ring->prod += nr_txbb;
 		if (send_doorbell)
-			mlx4_en_xmit_doorbell(ring);
+			mlx4_en_xmit_doorbell(dev, ring);
 		else
 			ring->xmit_more++;
 	}
@@ -1017,8 +1044,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 			op_own |= cpu_to_be32(MLX4_WQE_CTRL_IIP);
 	}
 
-	ring->prod += nr_txbb;
-
 	/* If we used a bounce buffer then copy descriptor back into place */
 	if (unlikely(bounce))
 		tx_desc = mlx4_en_bounce_to_desc(priv, ring, index, desc_size);
@@ -1033,6 +1058,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
 
+	/* Doorbell avoidance : We can omit doorbell if we know a TX completion
+	 * will happen shortly.
+	 */
+	if (send_doorbell &&
+	    dev->doorbell_opt &&
+	    (s32)(READ_ONCE(ring->prod_bell) - READ_ONCE(ring->ncons)) > 0)
+		send_doorbell = false;
+
 	real_size = (real_size / 16) & 0x3f;
 
 	bf_ok &= desc_size <= MAX_BF && send_doorbell;
@@ -1043,7 +1076,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		qpn_vlan.fence_size = real_size;
 
 	mlx4_en_tx_write_desc(ring, tx_desc, qpn_vlan, desc_size, bf_index,
-			      op_own, bf_ok, send_doorbell);
+			      op_own, bf_ok, send_doorbell, dev, nr_txbb);
 
 	if (unlikely(stop_queue)) {
 		/* If queue was emptied after the if (stop_queue) , and before
@@ -1054,7 +1087,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		 */
 		smp_rmb();
 
-		ring_cons = ACCESS_ONCE(ring->cons);
 		if (unlikely(!mlx4_en_is_tx_ring_full(ring))) {
 			netif_tx_wake_queue(ring->tx_queue);
 			ring->wake_queue++;
@@ -1158,8 +1190,6 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
 	rx_ring->xdp_tx++;
 	AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, length);
 
-	ring->prod += nr_txbb;
-
 	stop_queue = mlx4_en_is_tx_ring_full(ring);
 	send_doorbell = stop_queue ||
 				*doorbell_pending > MLX4_EN_DOORBELL_BUDGET;
@@ -1173,7 +1203,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
 		qpn_vlan.fence_size = real_size;
 
 	mlx4_en_tx_write_desc(ring, tx_desc, qpn_vlan, TXBB_SIZE, bf_index,
-			      op_own, bf_ok, send_doorbell);
+			      op_own, bf_ok, send_doorbell, dev, nr_txbb);
 	*doorbell_pending = send_doorbell ? 0 : *doorbell_pending + 1;
 
 	return NETDEV_TX_OK;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 574bcbb1b38f..c3fd0deda198 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
 	 */
 	u32			last_nr_txbb;
 	u32			cons;
+	u32			ncons;
 	unsigned long		wake_queue;
 	struct netdev_queue	*tx_queue;
 	u32			(*free_tx_desc)(struct mlx4_en_priv *priv,
@@ -290,6 +291,7 @@ struct mlx4_en_tx_ring {
 
 	/* cache line used and dirtied in mlx4_en_xmit() */
 	u32			prod ____cacheline_aligned_in_smp;
+	u32			prod_bell;
 	unsigned int		tx_dropped;
 	unsigned long		bytes;
 	unsigned long		packets;
@@ -699,7 +701,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
 			       struct mlx4_en_rx_alloc *frame,
 			       struct net_device *dev, unsigned int length,
 			       int tx_ind, int *doorbell_pending);
-void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring);
+void mlx4_en_xmit_doorbell(const struct net_device *dev, struct mlx4_en_tx_ring *ring);
 bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
 			struct mlx4_en_rx_alloc *frame);
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ffcd874cc20..39565b5425a6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1816,6 +1816,7 @@ struct net_device {
 	DECLARE_HASHTABLE	(qdisc_hash, 4);
 #endif
 	unsigned long		tx_queue_len;
+	unsigned long		doorbell_opt;
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b0c04cf4851d..df05f81f5150 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -367,6 +367,23 @@ static ssize_t gro_flush_timeout_store(struct device *dev,
 }
 NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
 
+static int change_doorbell_opt(struct net_device *dev, unsigned long val)
+{
+	dev->doorbell_opt = val;
+	return 0;
+}
+
+static ssize_t doorbell_opt_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	return netdev_store(dev, attr, buf, len, change_doorbell_opt);
+}
+NETDEVICE_SHOW_RW(doorbell_opt, fmt_ulong);
+
 static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
@@ -531,6 +548,7 @@ static struct attribute *net_class_attrs[] = {
 	&dev_attr_phys_port_name.attr,
 	&dev_attr_phys_switch_id.attr,
 	&dev_attr_proto_down.attr,
+	&dev_attr_doorbell_opt.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(net_class);

^ permalink raw reply related

* [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  6:59 UTC (permalink / raw)
  To: sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sahu.rameshwar73-Re5JQEeQqe8AvxtiuMwx3w

In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc().

Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
v2:
  - Address comment from sergei
    Alignment was not proper

 drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e36bebc..96cdf9a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
 	if (!mailbox)
 		return ERR_PTR(-ENOMEM);
 
-	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
-				      &mailbox->dma);
+	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
+				       &mailbox->dma);
 	if (!mailbox->buf) {
 		kfree(mailbox);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
 
 	return mailbox;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()
From: Cong Wang @ 2016-11-29  6:59 UTC (permalink / raw)
  To: Roi Dayan
  Cc: Daniel Borkmann, Linux Kernel Network Developers, Jiri Pirko,
	John Fastabend
In-Reply-To: <583A7D67.50003@mellanox.com>

On Sat, Nov 26, 2016 at 10:29 PM, Roi Dayan <roid@mellanox.com> wrote:
> Hi,
>
> I tested "[PATCH net] net, sched: respect rcu grace period on cls
> destruction" and could not reproduce my original issue.
> I rebased "[Patch net-next] net_sched: move the empty tp check from
> ->destroy() to ->delete()" over to test it in the same tree and got into a
> new trace in fl_delete.

I will take care of this when I rebase my patch.

Thanks for testing anyway.

^ permalink raw reply

* Re: bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Baoquan He @ 2016-11-29  7:02 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Linux Network Development list
In-Reply-To: <583D26EF.60207@inria.fr>

Sorry, Brice. This has been reported by people, and it has been fixed by
later post. The commits within linus's tree are:

commit 6df77862f63f389df3b1ad879738e04440d7385d
Author: Baoquan He <bhe@redhat.com>
Date:   Sun Nov 13 13:01:33 2016 +0800

    bnx2: Wait for in-flight DMA to complete at probe stage

commit 5d0d4b91bf627f14f95167b738d524156c9d440b
Author: Baoquan He <bhe@redhat.com>
Date:   Sun Nov 13 13:01:32 2016 +0800

    Revert "bnx2: Reset device during driver initialization"
    
    This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c.

And I believe both of them also are picked up into 4.8-stable kernel.
Please have a way to get them.

Sorry again!

Thanks
Baoquan


On 11/29/16 at 07:57am, Brice Goglin wrote:
> Hello
> 
> My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
> kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
> those:
> 
> 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
> 	DeviceName: Embedded NIC 1                          
> 	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 42
> 	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
> 	Capabilities: <access denied>
> 	Kernel driver in use: bnx2
> 	Kernel modules: bnx2
> 
> The only change in bnx2 between 4.7 and 4.8 appears to be this one:
> 
> commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
> Author: Baoquan He <bhe@redhat.com>
> Date:   Fri Sep 9 22:43:12 2016 +0800
> 
>     bnx2: Reset device during driver initialization
> 
> Could you patch actually break the BMC? What do I need to further debug
> this issue?
> 
> Thanks
> Brice
> 

^ permalink raw reply

* Re: [PATCH net V2] net/sched: pedit: make sure that offset is valid
From: Amir Vadai @ 2016-11-29  7:14 UTC (permalink / raw)
  To: zhuyj
  Cc: David S. Miller, netdev, Cong Wang, Jamal Hadi Salim, Or Gerlitz,
	Hadar Har-Zion, Jiri Pirko
In-Reply-To: <CAD=hENcSiWiH-9e6=gjn+wK4R6ZQsNa21R_w_eWzDtCxiUDNVQ@mail.gmail.com>

On Tue, Nov 29, 2016 at 10:32:05AM +0800, zhuyj wrote:
>  +       if (offset > 0 && offset > skb->len)
> 
> offset > skb->len is enough?
offset is signed and skb->len is unsigned. Therefore for example if
offset=-1 and skb->len=10, the actual comparison is 0xff...>10

> 
> On Mon, Nov 28, 2016 at 6:56 PM, Amir Vadai <amir@vadai.me> wrote:
> > Add a validation function to make sure offset is valid:
> > 1. Not below skb head (could happen when offset is negative).
> > 2. Validate both 'offset' and 'at'.
> >
> > Signed-off-by: Amir Vadai <amir@vadai.me>
> > ---
> > Hi Dave,
> >
> > Please pull to -stable branches.
> >
> > Changes from V0:
> > - Add a validation to the 'at' value (this is used as an offset too)
> > - Instead of validating the output of skb_header_pointer(), make sure that the
> >         offset is good before calling it.
> >
> > Thanks,
> > Amir
> >  net/sched/act_pedit.c | 24 ++++++++++++++++++++----
> >  1 file changed, 20 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> > index b54d56d4959b..cf9b2fe8eac6 100644
> > --- a/net/sched/act_pedit.c
> > +++ b/net/sched/act_pedit.c
> > @@ -108,6 +108,17 @@ static void tcf_pedit_cleanup(struct tc_action *a, int bind)
> >         kfree(keys);
> >  }
> >
> > +static bool offset_valid(struct sk_buff *skb, int offset)
> > +{
> > +       if (offset > 0 && offset > skb->len)
> > +               return false;
> > +
> > +       if  (offset < 0 && -offset > skb_headroom(skb))
> > +               return false;
> > +
> > +       return true;
> > +}
> > +
> >  static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >                      struct tcf_result *res)
> >  {
> > @@ -134,6 +145,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >                         if (tkey->offmask) {
> >                                 char *d, _d;
> >
> > +                               if (!offset_valid(skb, off + tkey->at)) {
> > +                                       pr_info("tc filter pedit 'at' offset %d out of bounds\n",
> > +                                               off + tkey->at);
> > +                                       goto bad;
> > +                               }
> >                                 d = skb_header_pointer(skb, off + tkey->at, 1,
> >                                                        &_d);
> >                                 if (!d)
> > @@ -146,10 +162,10 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >                                         " offset must be on 32 bit boundaries\n");
> >                                 goto bad;
> >                         }
> > -                       if (offset > 0 && offset > skb->len) {
> > -                               pr_info("tc filter pedit"
> > -                                       " offset %d can't exceed pkt length %d\n",
> > -                                      offset, skb->len);
> > +
> > +                       if (!offset_valid(skb, off + offset)) {
> > +                               pr_info("tc filter pedit offset %d out of bounds\n",
> > +                                       offset);
> >                                 goto bad;
> >                         }
> >
> > --
> > 2.10.2
> >

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] openvswitch: Fix skb->protocol for vlan frames.
From: Pravin Shelar @ 2016-11-29  7:21 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: Linux Kernel Network Developers, Jiri Benc
In-Reply-To: <1480387276-123557-2-git-send-email-jarno@ovn.org>

On Mon, Nov 28, 2016 at 6:41 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> Do not set skb->protocol to be the ethertype of the L3 header, unless
> the packet only has the L3 header.  For a non-hardware offloaded VLAN
> frame skb->protocol needs to be one of the VLAN ethertypes.
>
> Any VLAN offloading is undone on the OVS netlink interface.  Also any
> VLAN tags added by userspace are non-offloaded.
>
> Incorrect skb->protocol value on a full-size non-offloaded VLAN skb
> causes packet drop due to failing MTU check, as the VLAN header should
> not be counted in when considering MTU in ovs_vport_send().
>
I think we should move to is_skb_forwardable() type of packet length
check in vport-send and get rid of skb-protocol checks altogether.

> Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> ---
> v2: Set skb->protocol when an ETH_P_TEB frame is received via ARPHRD_NONE
>     interface.
>
>  net/openvswitch/datapath.c |  1 -
>  net/openvswitch/flow.c     | 30 ++++++++++++++++++++++--------
>  2 files changed, 22 insertions(+), 9 deletions(-)
...
...
> @@ -531,15 +538,22 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
>                 if (unlikely(parse_vlan(skb, key)))
>                         return -ENOMEM;
>
> -               skb->protocol = parse_ethertype(skb);
> -               if (unlikely(skb->protocol == htons(0)))
> +               key->eth.type = parse_ethertype(skb);
> +               if (unlikely(key->eth.type == htons(0)))
>                         return -ENOMEM;
>
> +               if (skb->protocol == htons(ETH_P_TEB)) {
> +                       if (key->eth.vlan.tci & htons(VLAN_TAG_PRESENT)
> +                           && !skb_vlan_tag_present(skb))
> +                               skb->protocol = key->eth.vlan.tpid;
> +                       else
> +                               skb->protocol = key->eth.type;
> +               }
> +

I am not sure if this work in case of nested vlans.
Can we move skb-protocol assignment to parse_vlan() to avoid checking
for non-accelerated vlan case again here?

^ permalink raw reply

* Re: bnx2 breaks Dell R815 BMC IPMI since 4.8
From: Brice Goglin @ 2016-11-29  7:21 UTC (permalink / raw)
  To: Baoquan He; +Cc: Linux Network Development list
In-Reply-To: <20161129070211.GC3126@x1>

I only tested 4.8.5 and 4.9-rc5 unfortunately, they came later. I'll
ping my distro.
Thanks for the quick reply!
Brice



Le 29/11/2016 08:02, Baoquan He a écrit :
> Sorry, Brice. This has been reported by people, and it has been fixed by
> later post. The commits within linus's tree are:
>
> commit 6df77862f63f389df3b1ad879738e04440d7385d
> Author: Baoquan He <bhe@redhat.com>
> Date:   Sun Nov 13 13:01:33 2016 +0800
>
>     bnx2: Wait for in-flight DMA to complete at probe stage
>
> commit 5d0d4b91bf627f14f95167b738d524156c9d440b
> Author: Baoquan He <bhe@redhat.com>
> Date:   Sun Nov 13 13:01:32 2016 +0800
>
>     Revert "bnx2: Reset device during driver initialization"
>     
>     This reverts commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c.
>
> And I believe both of them also are picked up into 4.8-stable kernel.
> Please have a way to get them.
>
> Sorry again!
>
> Thanks
> Baoquan
>
>
> On 11/29/16 at 07:57am, Brice Goglin wrote:
>> Hello
>>
>> My Dell PowerEdge R815 doesn't have IPMI anymore when I boot a 4.8
>> kernel, the BMC doesn't even ping anymore. Its Ethernet devices are 4 of
>> those:
>>
>> 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>> 	DeviceName: Embedded NIC 1                          
>> 	Subsystem: Dell NetXtreme II BCM5709 Gigabit Ethernet
>> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 64 bytes
>> 	Interrupt: pin A routed to IRQ 42
>> 	Region 0: Memory at e6000000 (64-bit, non-prefetchable) [size=32M]
>> 	Capabilities: <access denied>
>> 	Kernel driver in use: bnx2
>> 	Kernel modules: bnx2
>>
>> The only change in bnx2 between 4.7 and 4.8 appears to be this one:
>>
>> commit 3e1be7ad2d38c6bd6aeef96df9bd0a7822f4e51c
>> Author: Baoquan He <bhe@redhat.com>
>> Date:   Fri Sep 9 22:43:12 2016 +0800
>>
>>     bnx2: Reset device during driver initialization
>>
>> Could you patch actually break the BMC? What do I need to further debug
>> this issue?
>>
>> Thanks
>> Brice
>>

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  7:25 UTC (permalink / raw)
  To: Sergei Shtylyov, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Rameshwar Sahu
In-Reply-To: <20161129065931.GA3245@gnr743-HP-ZBook-15>

Please ignore this v2 patch.

On Tue, Nov 29, 2016 at 12:29 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
> replaced by pci_pool_zalloc().
>
> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> v2:
>   - Address comment from sergei
>     Alignment was not proper
>
>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> index e36bebc..96cdf9a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>         if (!mailbox)
>                 return ERR_PTR(-ENOMEM);
>
> -       mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> -                                     &mailbox->dma);
> +       mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> +                                      &mailbox->dma);
>         if (!mailbox->buf) {
>                 kfree(mailbox);
>                 return ERR_PTR(-ENOMEM);
>         }
>
> -       memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>
>         return mailbox;
>  }
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] net: brocade: bna: use new api ethtool_{get|set}_link_ksettings
From: Mody, Rasesh @ 2016-11-29  7:37 UTC (permalink / raw)
  To: Philippe Reynes, Kalluru, Sudarsana, Dept-GE Linux NIC Dev,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1480373539-3257-1-git-send-email-tremyfr@gmail.com>

> From: Philippe Reynes [mailto:tremyfr@gmail.com]
> Sent: Monday, November 28, 2016 2:52 PM
> 
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>

Acked-by: Rasesh Mody <Rasesh.Mody@cavium.com> 

> ---
>  drivers/net/ethernet/brocade/bna/bnad_ethtool.c |   54 +++++++++++++--
> --------
>  1 files changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> index 31f61a7..2865939 100644
> --- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> +++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c

^ permalink raw reply

* Re: [PATCH net] openvswitch: Fix skb leak in IPv6 reassembly.
From: Pravin Shelar @ 2016-11-29  7:39 UTC (permalink / raw)
  To: Daniele Di Proietto
  Cc: Linux Kernel Network Developers, Florian Westphal, Joe Stringer
In-Reply-To: <20161128234353.4262-1-diproiettod@ovn.org>

On Mon, Nov 28, 2016 at 3:43 PM, Daniele Di Proietto
<diproiettod@ovn.org> wrote:
> If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it
> means that we still have a reference to the skb.  We should free it
> before returning from handle_fragments, as stated in the comment above.
>
> Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion")
> CC: Florian Westphal <fw@strlen.de>
> CC: Pravin B Shelar <pshelar@ovn.org>
> CC: Joe Stringer <joe@ovn.org>
> Signed-off-by: Daniele Di Proietto <diproiettod@ovn.org>
> ---
>  net/openvswitch/conntrack.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 31045ef..fecefa2 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -370,8 +370,11 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
>                 skb_orphan(skb);
>                 memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
>                 err = nf_ct_frag6_gather(net, skb, user);
> -               if (err)
> +               if (err) {
> +                       if (err != -EINPROGRESS)
> +                               kfree_skb(skb);
>                         return err;
> +               }
>

This fixes the code. But the patch is adding yet another skb-kfree in
conntrack code. we could simplify it by reusing error handling in
do_execute_actions().
If you think that is too complicated for stable branch, I am fine with
this patch going in as it is.

^ permalink raw reply

* Re: [PATCH v2] vxlan: fix a potential issue when create a new vxlan fdb entry.
From: Jiri Benc @ 2016-11-29  8:20 UTC (permalink / raw)
  To: Haishuang Yan
  Cc: David S. Miller, Hannes Frederic Sowa, Pravin B Shelar, netdev,
	linux-kernel
In-Reply-To: <1480384776-8252-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

On Tue, 29 Nov 2016 09:59:36 +0800, Haishuang Yan wrote:
> vxlan_fdb_append may return error, so add the proper check,
> otherwise it will cause memory leak.
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> Changes in v2:
>   - Unnecessary to initialize rc to zero.

Acked-by: Jiri Benc <jbenc@redhat.com>

^ permalink raw reply

* [net-next] neigh: remove duplicate check for same neigh
From: Zhang Shengju @ 2016-11-29  8:22 UTC (permalink / raw)
  To: netdev, dsa

Currently loop index 'idx' is used as the index in the neigh list of interest. 
It's increased only when the neigh is dumped. It's not the absolute index in 
the list. Because there is no info to record which neigh has already be scanned 
by previous loop. This will cause the filtered out neighs to be scanned mulitple 
times. 

This patch make idx as the absolute index in the list, it will increase no matter
whether the neigh is filtered. This will prevent the above problem.

And this is in line with other dump functions.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 net/core/neighbour.c | 39 ++++++++++++++++++---------------------
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2ae929f..ce32e9c 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2256,6 +2256,16 @@ static bool neigh_ifindex_filtered(struct net_device *dev, int filter_idx)
 	return false;
 }
 
+static bool neigh_dump_filtered(struct net_device *dev, int filter_idx,
+		int filter_master_idx)
+{
+	if (neigh_ifindex_filtered(dev, filter_idx) ||
+	    neigh_master_filtered(dev, filter_master_idx))
+		return true;
+
+	return false;
+}
+
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 			    struct netlink_callback *cb)
 {
@@ -2285,20 +2295,15 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 	rcu_read_lock_bh();
 	nht = rcu_dereference_bh(tbl->nht);
 
-	for (h = s_h; h < (1 << nht->hash_shift); h++) {
-		if (h > s_h)
-			s_idx = 0;
+	for (h = s_h; h < (1 << nht->hash_shift); h++, s_idx = 0) {
 		for (n = rcu_dereference_bh(nht->hash_buckets[h]), idx = 0;
 		     n != NULL;
-		     n = rcu_dereference_bh(n->next)) {
-			if (!net_eq(dev_net(n->dev), net))
-				continue;
-			if (neigh_ifindex_filtered(n->dev, filter_idx))
+		     n = rcu_dereference_bh(n->next), idx++) {
+			if (idx < s_idx || !net_eq(dev_net(n->dev), net))
 				continue;
-			if (neigh_master_filtered(n->dev, filter_master_idx))
+			if (neigh_dump_filtered(n->dev, filter_idx,
+						filter_master_idx))
 				continue;
-			if (idx < s_idx)
-				goto next;
 			if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
 					    cb->nlh->nlmsg_seq,
 					    RTM_NEWNEIGH,
@@ -2306,8 +2311,6 @@ static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 				rc = -1;
 				goto out;
 			}
-next:
-			idx++;
 		}
 	}
 	rc = skb->len;
@@ -2328,14 +2331,10 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 
 	read_lock_bh(&tbl->lock);
 
-	for (h = s_h; h <= PNEIGH_HASHMASK; h++) {
-		if (h > s_h)
-			s_idx = 0;
-		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
-			if (pneigh_net(n) != net)
+	for (h = s_h; h <= PNEIGH_HASHMASK; h++, s_idx = 0) {
+		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next, idx++) {
+			if (idx < s_idx || pneigh_net(n) != net)
 				continue;
-			if (idx < s_idx)
-				goto next;
 			if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
 					    cb->nlh->nlmsg_seq,
 					    RTM_NEWNEIGH,
@@ -2344,8 +2343,6 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
 				rc = -1;
 				goto out;
 			}
-		next:
-			idx++;
 		}
 	}
 
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox