Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII
From: Andrew Lunn @ 2016-12-09 16:39 UTC (permalink / raw)
  To: Jie Deng
  Cc: Florian Fainelli, davem, netdev, linux-kernel, CARLOS.PALMINHA,
	lars.persson, thomas.lendacky
In-Reply-To: <d42cbc77-1409-281a-161f-cf9c85443369@synopsys.com>

On Fri, Dec 09, 2016 at 01:19:07PM +0800, Jie Deng wrote:
> 
> 
> On 2016/12/9 6:15, Florian Fainelli wrote:
> > On 12/06/2016 07:57 PM, Jie Deng wrote:
> >> This patch adds phy-mode support for Synopsys XLGMAC
> > The functional changes look good, but I would like to see some
> > description of what the XL part stands for here.
> >
> > While you are modifying this, do you also mind submitting a Device Tree
> > specification change:
> >
> > https://www.devicetree.org/specifications/
> >
> > Thanks!
> Thank you for the information.
> 
> Currenlty, the XLGMAC is a new IP from Synopsys.

I think Florian wants to know about the IEEE standard or what ever
which defines what the phy-mode XLGMAC is, in the same way there are
standards for RGMII, SGMII, etc.

	  Andrew

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Eric Dumazet @ 2016-12-09 16:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S . Miller, netdev, Paolo Abeni
In-Reply-To: <20161209170509.25347c9b@redhat.com>

On Fri, 2016-12-09 at 17:05 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 08 Dec 2016 13:13:15 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote:
> > > On Thu,  8 Dec 2016 09:38:55 -0800
> > > Eric Dumazet <edumazet@google.com> wrote:
> > >   
> > > > This patch series provides about 100 % performance increase under flood.   
> > > 
> > > Could you please explain a bit more about what kind of testing you are
> > > doing that can show 100% performance improvement?
> > > 
> > > I've tested this patchset and my tests show *huge* speeds ups, but
> > > reaping the performance benefit depend heavily on setup and enabling
> > > the right UDP socket settings, and most importantly where the
> > > performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).  
> > 
> > Right.
> > 
> > So here at Google we do not try (yet) to downgrade our expensive
> > Multiqueue Nics into dumb NICS from last decade by using a single queue
> > on them. Maybe it will happen when we can process 10Mpps per core,
> > but we are not there yet  ;)
> > 
> > So my test is using a NIC, programmed with 8 queues, on a dual-socket
> > machine. (2 physical packages)
> > 
> > 4 queues are handled by 4 cpus on socket0 (NUMA node 0)
> > 4 queues are handled by 4 cpus on socket1 (NUMA node 1)
> 
> Interesting setup, it will be good to catch cache-line bouncing and
> false-sharing, which the streak of recent patches show ;-) (Hopefully
> such setup are avoided for production).

Well, if you have 100Gbit NIC, and 2 NUMA nodes, what do you suggest
exactly, when jobs run on both nodes ?

If you suggest to remove one package, or force jobs to run on Socket0,
just because the NIC is attached to it, it wont be an option.

Most of the traffic is TCP, so RSS comes nicely here to affine traffic
on one RX queue of the NIC.

Now, if for some reason an innocent UDP socket is the target of a flood,
we need to not make all cpus blocked in a spinlock to eventually queue a
packet.

Be assured that high performance UDP servers use kernel bypass, or
SO_REUSEPORT already. My effort is not targeting these special users,
since they already have good performance.

My effort is to provide some isolation, a bit like the effort I did for
SYN flood attacks (Cpus were all spinning on listener spinlock)




> 
> 
> > So I explicitly put my poor single thread UDP application in the worst
> > condition, having skbs produced on two NUMA nodes. 
> 
> On which CPU do you place the single thread UDP application?

No matter in this case. You can either force it to run on a group of
cpu, or let the scheduler choose.

If you let the scheduler choose, then it might help the single tuple
flood attack, since the user thread will be moved on a difference cpu
than the ksoftirqd.

> 
> E.g. do you allow it to run on a CPU that also process ksoftirq?
> My experience is that performance is approx half, if ksoftirq and
> UDP-thread share a CPU (after you fixed the softirq issue).

Well, this is exactly what I said earlier. Your choices about cpu
pinning might help or might hurt in different scenarios.

> 
> 
> > Then my load generator use trafgen, with spoofed UDP source addresses,
> > like a UDP flood would use. Or typical DNS traffic, malicious or not.
> 
> I also like trafgen
>  https://github.com/netoptimizer/network-testing/tree/master/trafgen
> 
> > So I have 8 cpus all trying to queue packets in a single UDP socket.
> > 
> > Of course, a real high performance server would use 8 UDP sockets, and
> > SO_REUSEPORT with nice eBPF filter to spread the packets based on the
> > queue/cpu they arrived.
> 
> Once the ksoftirq and UDP-threads are silo'ed like that, it should
> basically correspond to the benchmarks of my single queue test,
> multiplied by the number of CPUs/UDP-threads.

Well, if one cpu is shared by the producer and consumer then packets are
hot in caches, so trying to avoid cache line misses as I did is not
really helping.

I optimized the case where we do not assume both parties run on the same
cpu. If you leave process scheduler do its job, then your throughput can
be doubled ;)

Now if for some reason you are stuck with a single CPU, this is a very
different problem, and af_packet might be better.


> 
> I think it might be a good idea (for me) to implement such a
> UDP-multi-threaded sink example program (with SO_REUSEPORT and eBPF
> filter) to demonstrate and make sure the stack scales (and every
> time we/I improve single queue performance, the numbers should multiply
> with the scaling). Maybe you already have such an example program?


Well, I do have something using SO_REUSEPORT, but not yet BPF, so not in
a state I can share at this moment.

^ permalink raw reply

* Re: [PATCH V2  00/22] Broadcom RoCE Driver (bnxt_re)
From: Leon Romanovsky @ 2016-12-09 16:27 UTC (permalink / raw)
  To: Selvin Xavier; +Cc: dledford, linux-rdma, netdev
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

On Thu, Dec 08, 2016 at 10:47:54PM -0800, Selvin Xavier wrote:
> 

...

>  create mode 100644 include/uapi/rdma/bnxt_re_uverbs_abi.h

Please use already established naming format for this file.
It will simplify our future integration with rdma-core library.

Thanks

➜  linux-rdma git:(master) ls -l include/uapi/rdma/*-abi.h 
-rw-r--r-- 1 leonro leonro 2291 Dec  7 13:07 include/uapi/rdma/cxgb3-abi.h
-rw-r--r-- 1 leonro leonro 2488 Dec  7 13:07 include/uapi/rdma/cxgb4-abi.h
-rw-r--r-- 1 leonro leonro 2864 Dec  7 13:07 include/uapi/rdma/mlx4-abi.h
-rw-r--r-- 1 leonro leonro 6103 Dec  8 12:52 include/uapi/rdma/mlx5-abi.h
-rw-r--r-- 1 leonro leonro 2932 Dec  7 13:07 include/uapi/rdma/mthca-abi.h
-rw-r--r-- 1 leonro leonro 3380 Dec  7 13:07 include/uapi/rdma/nes-abi.h
-rw-r--r-- 1 leonro leonro 3918 Dec  7 13:07 include/uapi/rdma/ocrdma-abi.h
-rw-r--r-- 1 leonro leonro 2559 Dec  7 13:07 include/uapi/rdma/qedr-abi.h

> 
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: stmmac DT property snps,axi_all
From: Alexandre Torgue @ 2016-12-09 16:06 UTC (permalink / raw)
  To: Niklas Cassel, Giuseppe Cavallaro; +Cc: netdev
In-Reply-To: <e0362693-4ae9-b3d6-3955-c72df7a1b0c0@axis.com>

Hi Niklas

On 12/09/2016 10:53 AM, Niklas Cassel wrote:
> On 12/09/2016 10:20 AM, Niklas Cassel wrote:
>> On 12/08/2016 02:36 PM, Alexandre Torgue wrote:
>>> Hi Niklas,
>>>
>>> On 12/05/2016 05:18 PM, Niklas Cassel wrote:
>>>> Hello Giuseppe
>>>>
>>>>
>>>> I'm trying to figure out what snps,axi_all is supposed to represent.
>>>>
>>>> It appears that the value is saved, but never used in the code.
>>>>
>>>> Looking at the register specification, I'm guessing that it represents
>>>> Address-Aligned Beats, but there is already the property snps,aal
>>>> for that.
>>> IMO, it is not useful. Indeed AXI_AAL is a read only bit (in AXI bus mode register) and reflects the aal bit in DMA bus register.
>>> As you know we use "snps,aal" to set aal bit in DMA bus register.
>>> So "snps,axi_all" entry seems useless. Let's see with Peppe.
>> Ok, I see. GMAC and GMAC4 is different here.
>>
>> For GMAC4 AAL only exists in DMA_SYS_BUS_MODE.
>> It's not reflected anywhere else.
>>
>> The code is correct in the driver.
>>
>> If snps,axi_all is just created for a read-only register,
>> and it is currently never used in the code,
>> while we have snps,aal, which is correct and works,
>> I guess it should be ok to remove snps,axi_all.
>>
>> I can cook up a patch.
>>
>
> Here we go :)
>
> I will send it as a real patch once net-next reopens.

Thanks ;). Just check with Peppe next week (as he added in the past this 
property).

Regards
Alex

>
>
> From defc01cb7c22611b89d9cf1fcae72544092bd62c Mon Sep 17 00:00:00 2001
> From: Niklas Cassel <niklas.cassel@axis.com>
> Date: Fri, 9 Dec 2016 10:27:00 +0100
> Subject: [PATCH net-next] net: stmmac: remove unused duplicate property
>  snps,axi_all
>
> For core revision 3.x Address-Aligned Beats is available in two registers.
> The DT property snps,aal was created for AAL in the DMA bus register,
> which is a read/write bit.
> The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode
> register, which is a read only bit that reflects the value of AAL in the
> DMA bus register.
>
> Since the value of snps,axi_all is never used in the driver,
> and since the property was created for a bit that is read only,
> it should be safe to remove the property.
>
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
> ---
>  Documentation/devicetree/bindings/net/stmmac.txt      | 1 -
>  drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 1 -
>  include/linux/stmmac.h                                | 1 -
>  3 files changed, 3 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt
> index 128da752fec9..c3d2fd480a1b 100644
> --- a/Documentation/devicetree/bindings/net/stmmac.txt
> +++ b/Documentation/devicetree/bindings/net/stmmac.txt
> @@ -65,7 +65,6 @@ Optional properties:
>      - snps,wr_osr_lmt: max write outstanding req. limit
>      - snps,rd_osr_lmt: max read outstanding req. limit
>      - snps,kbbe: do not cross 1KiB boundary.
> -    - snps,axi_all: align address
>      - snps,blen: this is a vector of supported burst length.
>      - snps,fb: fixed-burst
>      - snps,mb: mixed-burst
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> index 082cd48db6a7..60ba8993c650 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
> @@ -121,7 +121,6 @@ static struct stmmac_axi *stmmac_axi_setup(struct platform_device *pdev)
>      axi->axi_lpi_en = of_property_read_bool(np, "snps,lpi_en");
>      axi->axi_xit_frm = of_property_read_bool(np, "snps,xit_frm");
>      axi->axi_kbbe = of_property_read_bool(np, "snps,axi_kbbe");
> -    axi->axi_axi_all = of_property_read_bool(np, "snps,axi_all");
>      axi->axi_fb = of_property_read_bool(np, "snps,axi_fb");
>      axi->axi_mb = of_property_read_bool(np, "snps,axi_mb");
>      axi->axi_rb =  of_property_read_bool(np, "snps,axi_rb");
> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> index 266dab9ad782..889e0e9a3f1c 100644
> --- a/include/linux/stmmac.h
> +++ b/include/linux/stmmac.h
> @@ -103,7 +103,6 @@ struct stmmac_axi {
>      u32 axi_wr_osr_lmt;
>      u32 axi_rd_osr_lmt;
>      bool axi_kbbe;
> -    bool axi_axi_all;
>      u32 axi_blen[AXI_BLEN];
>      bool axi_fb;
>      bool axi_mb;
>

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Jesper Dangaard Brouer @ 2016-12-09 16:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David S . Miller, netdev, Paolo Abeni, brouer
In-Reply-To: <1481231595.4930.142.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 08 Dec 2016 13:13:15 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote:
> > On Thu,  8 Dec 2016 09:38:55 -0800
> > Eric Dumazet <edumazet@google.com> wrote:
> >   
> > > This patch series provides about 100 % performance increase under flood.   
> > 
> > Could you please explain a bit more about what kind of testing you are
> > doing that can show 100% performance improvement?
> > 
> > I've tested this patchset and my tests show *huge* speeds ups, but
> > reaping the performance benefit depend heavily on setup and enabling
> > the right UDP socket settings, and most importantly where the
> > performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).  
> 
> Right.
> 
> So here at Google we do not try (yet) to downgrade our expensive
> Multiqueue Nics into dumb NICS from last decade by using a single queue
> on them. Maybe it will happen when we can process 10Mpps per core,
> but we are not there yet  ;)
> 
> So my test is using a NIC, programmed with 8 queues, on a dual-socket
> machine. (2 physical packages)
> 
> 4 queues are handled by 4 cpus on socket0 (NUMA node 0)
> 4 queues are handled by 4 cpus on socket1 (NUMA node 1)

Interesting setup, it will be good to catch cache-line bouncing and
false-sharing, which the streak of recent patches show ;-) (Hopefully
such setup are avoided for production).


> So I explicitly put my poor single thread UDP application in the worst
> condition, having skbs produced on two NUMA nodes. 

On which CPU do you place the single thread UDP application?

E.g. do you allow it to run on a CPU that also process ksoftirq?
My experience is that performance is approx half, if ksoftirq and
UDP-thread share a CPU (after you fixed the softirq issue).


> Then my load generator use trafgen, with spoofed UDP source addresses,
> like a UDP flood would use. Or typical DNS traffic, malicious or not.

I also like trafgen
 https://github.com/netoptimizer/network-testing/tree/master/trafgen

> So I have 8 cpus all trying to queue packets in a single UDP socket.
> 
> Of course, a real high performance server would use 8 UDP sockets, and
> SO_REUSEPORT with nice eBPF filter to spread the packets based on the
> queue/cpu they arrived.

Once the ksoftirq and UDP-threads are silo'ed like that, it should
basically correspond to the benchmarks of my single queue test,
multiplied by the number of CPUs/UDP-threads.

I think it might be a good idea (for me) to implement such a
UDP-multi-threaded sink example program (with SO_REUSEPORT and eBPF
filter) to demonstrate and make sure the stack scales (and every
time we/I improve single queue performance, the numbers should multiply
with the scaling). Maybe you already have such an example program?


> In the case you have one cpu that you need to share between ksoftirq and
> all user threads, then your test results depend on process scheduler
> decisions more than anything we can code in network land.

Yes, also my experience, the scheduler have large influence.
 
> It is actually easy for user space to get more than 50% of the cycles,
> and 'starve' ksoftirqd.

FYI, Paolo recently added an option for parsing of pktgen payload in
the udp_sink.c program, this way we can simulate the app doing something.

I've started testing with 4 CPUs doing ksoftirq, multiple flows
(pktgen_sample04_many_flows.sh) and then increasing adding udp_sink
--reuse-port programs, on other 4 CPUs, and it looks like it scales
nicely :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next] net: skb_condense() can also deal with empty skbs
From: Eric Dumazet @ 2016-12-09 16:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

It seems attackers can also send UDP packets with no payload at all.

skb_condense() can still be a win in this case.

It will be possible to replace the custom code in tcp_add_backlog()
to get full benefit from skb_condense()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 84151cf40aebb973bad5bee3ee4be0758084d83c..b1451e66d570269252ce628b2dc1714b860e1ca4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4946,16 +4946,20 @@ EXPORT_SYMBOL(pskb_extract);
  */
 void skb_condense(struct sk_buff *skb)
 {
-	if (!skb->data_len ||
-	    skb->data_len > skb->end - skb->tail ||
-	    skb_cloned(skb))
-		return;
-
-	/* Nice, we can free page frag(s) right now */
-	__pskb_pull_tail(skb, skb->data_len);
+	if (skb->data_len) {
+		if (skb->data_len > skb->end - skb->tail ||
+		    skb_cloned(skb))
+			return;
 
-	/* Now adjust skb->truesize, since __pskb_pull_tail() does
-	 * not do this.
+		/* Nice, we can free page frag(s) right now */
+		__pskb_pull_tail(skb, skb->data_len);
+	}
+	/* At this point, skb->truesize might be over estimated,
+	 * because skb had a fragment, and fragments do not tell
+	 * their truesize.
+	 * When we pulled its content into skb->head, fragment
+	 * was freed, but __pskb_pull_tail() could not possibly
+	 * adjust skb->truesize, not knowing the frag truesize.
 	 */
 	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
 }

^ permalink raw reply related

* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-09 15:54 UTC (permalink / raw)
  To: David Miller, Joao.Pinto
  Cc: peppe.cavallaro, lars.persson, rabin.vincent, netdev,
	andy.shevchenko, CARLOS.PALMINHA
In-Reply-To: <20161209.104152.1969880574279771010.davem@davemloft.net>

Às 3:41 PM de 12/9/2016, David Miller escreveu:
> From: Joao Pinto <Joao.Pinto@synopsys.com>
> Date: Fri, 9 Dec 2016 15:36:38 +0000
> 
>> Of course, I started a general discussion about the subject and
>> those were the conclusions, but I would like to know if you as the
>> subsystem maintainer also support the approach or have any
>> suggestion.
> 
> Generally, I support whatever the interested parties agree to.
> 
> But one thing I am against is changing the driver name for existing
> users.  If an existing chip is supported by the stmmac driver for
> existing users, they should still continue to use the "stmmac" driver.
> 
> Therefore, if consolidation changes the driver module name for
> existing users, then that is not a good plan at all.
> 

Of course, 100% with you! Retro-compatibility for existing drivers is a must
have. The consolidation is going to be done with extreme careful.

Joao

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: David Miller @ 2016-12-09 15:41 UTC (permalink / raw)
  To: Joao.Pinto
  Cc: peppe.cavallaro, lars.persson, rabin.vincent, netdev,
	andy.shevchenko, CARLOS.PALMINHA
In-Reply-To: <93b73b79-36aa-56b8-f975-b890b7a48bd1@synopsys.com>

From: Joao Pinto <Joao.Pinto@synopsys.com>
Date: Fri, 9 Dec 2016 15:36:38 +0000

> Of course, I started a general discussion about the subject and
> those were the conclusions, but I would like to know if you as the
> subsystem maintainer also support the approach or have any
> suggestion.

Generally, I support whatever the interested parties agree to.

But one thing I am against is changing the driver name for existing
users.  If an existing chip is supported by the stmmac driver for
existing users, they should still continue to use the "stmmac" driver.

Therefore, if consolidation changes the driver module name for
existing users, then that is not a good plan at all.

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-09 15:36 UTC (permalink / raw)
  To: David Miller, Joao.Pinto
  Cc: peppe.cavallaro, lars.persson, rabin.vincent, netdev,
	andy.shevchenko, CARLOS.PALMINHA
In-Reply-To: <20161209.103327.1742213347114742435.davem@davemloft.net>

Hi David,

Of course, I started a general discussion about the subject and those were the
conclusions, but I would like to know if you as the subsystem maintainer also
support the approach or have any suggestion.

Thanks,
Joao

Às 3:33 PM de 12/9/2016, David Miller escreveu:
> From: Joao Pinto <Joao.Pinto@synopsys.com>
> Date: Fri, 9 Dec 2016 11:29:02 +0000
> 
>> Dear David Miller,
>  ...
>> I would like to know if you support this plan.
> 
> This is not how this works.
> 
> You need to discuss and work out a plan with the other people
> with a direct interest in the existing drivers and maintainence.
> 
> Not me.
> 

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: David Miller @ 2016-12-09 15:33 UTC (permalink / raw)
  To: Joao.Pinto
  Cc: peppe.cavallaro, lars.persson, rabin.vincent, netdev,
	andy.shevchenko, CARLOS.PALMINHA
In-Reply-To: <2df7a6dd-1128-d1d6-bf61-891f76cf7200@synopsys.com>

From: Joao Pinto <Joao.Pinto@synopsys.com>
Date: Fri, 9 Dec 2016 11:29:02 +0000

> Dear David Miller,
 ...
> I would like to know if you support this plan.

This is not how this works.

You need to discuss and work out a plan with the other people
with a direct interest in the existing drivers and maintainence.

Not me.

^ permalink raw reply

* Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf
From: Daniel Borkmann @ 2016-12-09 15:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Joe Stringer
  Cc: linux-kernel, netdev, wangnan0, ast
In-Reply-To: <20161209150907.GM8257@kernel.org>

Hi Arnaldo,

On 12/09/2016 04:09 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu:
>> (Was "libbpf: Synchronize implementations")
>>
>> Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
>> samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
>> samples/bpf/libbpf.[ch].
>>
>> ---
>> v3: Add ack for first patch.
>>      Split out second patch from v2 into separate changes for remaining diff.
>>      Add patches to switch samples/bpf over to using tools/lib/.
>> v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
>>      Don't shift non-bpf code into libbpf.
>>      Drop the patch to synchronize ELF definitions with tc.
>> v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
>>      First post.
>
> Thanks, applied after addressing the -I$(objtree) issue raised by Wang,

[ Sorry for late reply. ]

First of all, glad to see us getting rid of the duplicate lib eventually! :)

Please note that this might result in hopefully just a minor merge issue
with net-next. Looks like patch 4/7 touches test_maps.c and test_verifier.c,
which moved to a new bpf selftest suite [1] this net-next cycle. Seems it's
just log buffer and some renames there, which can be discarded for both
files sitting in selftests.

Thanks,
Daniel

   [1] https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/tools/testing/selftests/bpf

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: David Miller @ 2016-12-09 15:26 UTC (permalink / raw)
  To: selvin.xavier; +Cc: dledford, linux-rdma, netdev
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier@broadcom.com>

From: Selvin Xavier <selvin.xavier@broadcom.com>
Date: Thu,  8 Dec 2016 22:47:54 -0800

> This series introduces the RoCE driver for the Broadcom
> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs. 
> This driver is dependent on the bnxt_en NIC driver and is 
> based on the bnxt_re branch in Doug's repository. bnxt_en changes
> required for this patch series is already available in this branch.
> 
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.

If this is targetted at the net-next tree, it is too late as I've
closed the net-next tree two nights ago.

Please resubmit this after the upcoming merge window closes.

Thanks.

^ permalink raw reply

* Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects
From: Eric Dumazet @ 2016-12-09 15:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Paul Gortmaker, netfilter-devel, David Miller, netdev,
	linux-next@vger.kernel.org
In-Reply-To: <1481293492.4930.168.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, 2016-12-09 at 06:24 -0800, Eric Dumazet wrote:

> It looks that you want a seqcount, even on 64bit arches,
> so that CPU 2 can restart its loop, and more importantly you need
> to not accumulate the values you read, because they might be old/invalid.

Untested patch to give general idea. I can polish it a bit later today.

 net/netfilter/nft_counter.c |   59 +++++++++++++---------------------
 1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c
index f6a02c5071c2aeafca7635da3282a809aa04d6ab..57ed95b024473a2aa76298fe5bb5013bf709801b 100644
--- a/net/netfilter/nft_counter.c
+++ b/net/netfilter/nft_counter.c
@@ -31,18 +31,25 @@ struct nft_counter_percpu_priv {
 	struct nft_counter_percpu __percpu *counter;
 };
 
+static DEFINE_PER_CPU(seqcount_t, nft_counter_seq);
+
 static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
 				       struct nft_regs *regs,
 				       const struct nft_pktinfo *pkt)
 {
 	struct nft_counter_percpu *this_cpu;
+	seqcount_t *myseq;
 
 	local_bh_disable();
 	this_cpu = this_cpu_ptr(priv->counter);
-	u64_stats_update_begin(&this_cpu->syncp);
+	myseq = this_cpu_ptr(&nft_counter_seq);
+
+	write_seqcount_begin(myseq);
+
 	this_cpu->counter.bytes += pkt->skb->len;
 	this_cpu->counter.packets++;
-	u64_stats_update_end(&this_cpu->syncp);
+
+	write_seqcount_end(myseq);
 	local_bh_enable();
 }
 
@@ -110,52 +117,30 @@ static void nft_counter_fetch(struct nft_counter_percpu __percpu *counter,
 
 	memset(total, 0, sizeof(*total));
 	for_each_possible_cpu(cpu) {
+		seqcount_t *seqp = per_cpu_ptr(&nft_counter_seq, cpu);
+
 		cpu_stats = per_cpu_ptr(counter, cpu);
 		do {
-			seq	= u64_stats_fetch_begin_irq(&cpu_stats->syncp);
+			seq	= read_seqcount_begin(seqp);
 			bytes	= cpu_stats->counter.bytes;
 			packets	= cpu_stats->counter.packets;
-		} while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, seq));
+		} while (read_seqcount_retry(seqp, seq));
 
 		total->packets += packets;
 		total->bytes += bytes;
 	}
 }
 
-static u64 __nft_counter_reset(u64 *counter)
-{
-	u64 ret, old;
-
-	do {
-		old = *counter;
-		ret = cmpxchg64(counter, old, 0);
-	} while (ret != old);
-
-	return ret;
-}
-
 static void nft_counter_reset(struct nft_counter_percpu __percpu *counter,
 			      struct nft_counter *total)
 {
 	struct nft_counter_percpu *cpu_stats;
-	u64 bytes, packets;
-	unsigned int seq;
-	int cpu;
 
-	memset(total, 0, sizeof(*total));
-	for_each_possible_cpu(cpu) {
-		bytes = packets = 0;
-
-		cpu_stats = per_cpu_ptr(counter, cpu);
-		do {
-			seq	= u64_stats_fetch_begin_irq(&cpu_stats->syncp);
-			packets	+= __nft_counter_reset(&cpu_stats->counter.packets);
-			bytes	+= __nft_counter_reset(&cpu_stats->counter.bytes);
-		} while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, seq));
-
-		total->packets += packets;
-		total->bytes += bytes;
-	}
+	local_bh_disable();
+	cpu_stats = this_cpu_ptr(counter);
+	cpu_stats->counter.packets -= total->packets;
+	cpu_stats->counter.bytes -= total->bytes;
+	local_bh_enable();
 }
 
 static int nft_counter_do_dump(struct sk_buff *skb,
@@ -164,10 +149,9 @@ static int nft_counter_do_dump(struct sk_buff *skb,
 {
 	struct nft_counter total;
 
+	nft_counter_fetch(priv->counter, &total);
 	if (reset)
 		nft_counter_reset(priv->counter, &total);
-	else
-		nft_counter_fetch(priv->counter, &total);
 
 	if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes),
 			 NFTA_COUNTER_PAD) ||
@@ -285,7 +269,10 @@ static struct nft_expr_type nft_counter_type __read_mostly = {
 
 static int __init nft_counter_module_init(void)
 {
-	int err;
+	int err, cpu;
+
+	for_each_possible_cpu(cpu)
+		seqcount_init(per_cpu_ptr(&nft_counter_seq, cpu));
 
 	err = nft_register_obj(&nft_counter_obj);
 	if (err < 0)

^ permalink raw reply related

* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Bart Van Assche @ 2016-12-09 15:18 UTC (permalink / raw)
  To: Madhani, Himanshu, Michael S. Tsirkin
  Cc: kvm@vger.kernel.org, Neil Armstrong, David Airlie,
	linux-remoteproc@vger.kernel.org, dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
	linux-scsi@vger.kernel.org, Christoph Hellwig,
	v9fs-developer@lists.sourceforge.net, Asias He, Arnd Bergmann,
	linux-kbuild@vger.kernel.org, Jens Axboe, Michal Marek,
	Stefan Hajnoczi <stef
In-Reply-To: <6199215E-2AA4-4705-9552-5D61FE03F866@cavium.com>

On 12/08/16 22:40, Madhani, Himanshu wrote:
> We’ll take a look and send patches to resolve these warnings.

Thanks!

Bart.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Initial driver for Synopsys DWC XLGMAC
From: Carlos Palminha @ 2016-12-09 15:15 UTC (permalink / raw)
  To: Jie Deng, davem@davemloft.net, f.fainelli@gmail.com,
	netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, lars.persson@axis.com,
	thomas.lendacky@amd.com
In-Reply-To: <cover.1481075763.git.jiedeng@synopsys.com>

Hi Jie,

I don't think we have the need to create the "dwc" subdirectory under "synopsys".
Its preferable to have them directly under drivers/net/ethernet/synopsys.

Regards,
C.Palminha

On 07-12-2016 03:57, Jie Deng wrote:
> This series provides the support for 25/40/50/100 GbE
> devices using Synopsys DWC Enterprise Ethernet (XLGMAC).
> 
> The first patch adds support for Synopsys XLGMII.
> The second patch provides the initial driver for Synopsys XLGMAC
> 
> The driver has three layers by refactoring AMD XGBE.
> 
> dwc-eth-xxx.x
>   The DWC ethernet core layer (DWC ECL). This layer contains codes
> can be shared by different DWC series ethernet cores
> 
> dwc-xxx.x (e.g. dwc-xlgmac.c)
>   The DWC MAC HW adapter layer (DWC MHAL). This layer contains
> special support for a specific MAC. e.g. currently, XLGMAC.
> 
> xxx-xxx-pci.c xxx-xxx-plat.c (e.g. dwc-xlgmac-pci.c)
>   The glue adapter layer (GAL). Vendors who adopt Synopsys Etherent
> cores can develop a glue driver for their platform.
> 
> Jie Deng (2):
>   net: phy: add extension of phy-mode for XLGMII
>   net: ethernet: Initial driver for Synopsys DWC XLGMAC
> 
>  Documentation/devicetree/bindings/net/ethernet.txt |    1 +
>  MAINTAINERS                                        |    6 +
>  drivers/net/ethernet/synopsys/Kconfig              |    2 +
>  drivers/net/ethernet/synopsys/Makefile             |    1 +
>  drivers/net/ethernet/synopsys/dwc/Kconfig          |   37 +
>  drivers/net/ethernet/synopsys/dwc/Makefile         |    9 +
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-dcb.c    |  228 ++
>  .../net/ethernet/synopsys/dwc/dwc-eth-debugfs.c    |  328 +++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-desc.c   |  715 +++++
>  .../net/ethernet/synopsys/dwc/dwc-eth-ethtool.c    |  567 ++++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c     | 3098 ++++++++++++++++++++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-mdio.c   |  252 ++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-net.c    | 2319 +++++++++++++++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-ptp.c    |  216 ++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth-regacc.h | 1115 +++++++
>  drivers/net/ethernet/synopsys/dwc/dwc-eth.h        |  738 +++++
>  drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c |  538 ++++
>  drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.c     |  135 +
>  drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.h     |   85 +
>  include/linux/phy.h                                |    3 +
>  20 files changed, 10393 insertions(+)
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/Kconfig
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/Makefile
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-dcb.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-debugfs.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-desc.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-ethtool.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-hw.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-mdio.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-net.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-ptp.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth-regacc.h
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-eth.h
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac-pci.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.c
>  create mode 100644 drivers/net/ethernet/synopsys/dwc/dwc-xlgmac.h
> 

^ permalink raw reply

* Re: [PATCHv3 perf/core 0/7] Reuse libbpf from samples/bpf
From: Arnaldo Carvalho de Melo @ 2016-12-09 15:09 UTC (permalink / raw)
  To: Joe Stringer; +Cc: linux-kernel, netdev, wangnan0, ast, daniel
In-Reply-To: <20161209024620.31660-1-joe@ovn.org>

Em Thu, Dec 08, 2016 at 06:46:13PM -0800, Joe Stringer escreveu:
> (Was "libbpf: Synchronize implementations")
> 
> Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
> samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
> samples/bpf/libbpf.[ch].
> 
> ---
> v3: Add ack for first patch.
>     Split out second patch from v2 into separate changes for remaining diff.
>     Add patches to switch samples/bpf over to using tools/lib/.
> v2: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
>     Don't shift non-bpf code into libbpf.
>     Drop the patch to synchronize ELF definitions with tc.
> v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
>     First post.

Thanks, applied after addressing the -I$(objtree) issue raised by Wang,

- Arnaldo

^ permalink raw reply

* Re: [PATCHv3 perf/core 6/7] samples/bpf: Remove perf_event_open() declaration
From: Arnaldo Carvalho de Melo @ 2016-12-09 14:59 UTC (permalink / raw)
  To: Joe Stringer; +Cc: linux-kernel, wangnan0, ast, daniel, netdev
In-Reply-To: <20161209024620.31660-7-joe@ovn.org>

Em Thu, Dec 08, 2016 at 06:46:19PM -0800, Joe Stringer escreveu:
> This declaration was made in samples/bpf/libbpf.c for convenience, but
> there's already one in tools/perf/perf-sys.h. Reuse that one.
> 
> Signed-off-by: Joe Stringer <joe@ovn.org>
> ---
> v3: First post.
> ---
>  samples/bpf/Makefile            | 3 ++-
>  samples/bpf/bpf_load.c          | 3 ++-
>  samples/bpf/libbpf.c            | 7 -------
>  samples/bpf/libbpf.h            | 3 ---
>  samples/bpf/sampleip_user.c     | 3 ++-
>  samples/bpf/trace_event_user.c  | 9 +++++----
>  samples/bpf/trace_output_user.c | 3 ++-
>  samples/bpf/tracex6_user.c      | 3 ++-
>  8 files changed, 15 insertions(+), 19 deletions(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index c8f7ed37b2de..0adc47e67e65 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -92,7 +92,8 @@ always += test_current_task_under_cgroup_kern.o
>  always += trace_event_kern.o
>  always += sampleip_kern.o
>  
> -HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/
> +HOSTCFLAGS += -I$(objtree)/usr/include -I$(objtree)/tools/lib/ \
> +	      -I$(objtree)/tools/include -I$(objtree)/tools/perf

Switching these to $(srctree) as well, to support building it like:

  make -j4 O=../build/v4.9.0-rc8+ samples/bpf/

>  
>  HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
>  HOSTLOADLIBES_fds_example += -lelf
> diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> index f8e3c58a0897..d683bd278171 100644
> --- a/samples/bpf/bpf_load.c
> +++ b/samples/bpf/bpf_load.c
> @@ -19,6 +19,7 @@
>  #include <ctype.h>
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define DEBUGFS "/sys/kernel/debug/tracing/"
>  
> @@ -168,7 +169,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
>  	id = atoi(buf);
>  	attr.config = id;
>  
> -	efd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
> +	efd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
>  	if (efd < 0) {
>  		printf("event %d fd %d err %s\n", id, efd, strerror(errno));
>  		return -1;
> diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
> index d9af876b4a2c..bee473a494f1 100644
> --- a/samples/bpf/libbpf.c
> +++ b/samples/bpf/libbpf.c
> @@ -34,10 +34,3 @@ int open_raw_sock(const char *name)
>  
>  	return sock;
>  }
> -
> -int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
> -		    int group_fd, unsigned long flags)
> -{
> -	return syscall(__NR_perf_event_open, attr, pid, cpu,
> -		       group_fd, flags);
> -}
> diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
> index cc815624aacf..09aedc320009 100644
> --- a/samples/bpf/libbpf.h
> +++ b/samples/bpf/libbpf.h
> @@ -188,7 +188,4 @@ struct bpf_insn;
>  /* create RAW socket and bind to interface 'name' */
>  int open_raw_sock(const char *name);
>  
> -struct perf_event_attr;
> -int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
> -		    int group_fd, unsigned long flags);
>  #endif
> diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
> index 09ab620b324c..476a11947180 100644
> --- a/samples/bpf/sampleip_user.c
> +++ b/samples/bpf/sampleip_user.c
> @@ -21,6 +21,7 @@
>  #include <sys/ioctl.h>
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define DEFAULT_FREQ	99
>  #define DEFAULT_SECS	5
> @@ -50,7 +51,7 @@ static int sampling_start(int *pmu_fd, int freq)
>  	};
>  
>  	for (i = 0; i < nr_cpus; i++) {
> -		pmu_fd[i] = perf_event_open(&pe_sample_attr, -1 /* pid */, i,
> +		pmu_fd[i] = sys_perf_event_open(&pe_sample_attr, -1 /* pid */, i,
>  					    -1 /* group_fd */, 0 /* flags */);
>  		if (pmu_fd[i] < 0) {
>  			fprintf(stderr, "ERROR: Initializing perf sampling\n");
> diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
> index de8fd0266d78..ccb0cba8324a 100644
> --- a/samples/bpf/trace_event_user.c
> +++ b/samples/bpf/trace_event_user.c
> @@ -20,6 +20,7 @@
>  #include <sys/resource.h>
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define SAMPLE_FREQ 50
>  
> @@ -126,9 +127,9 @@ static void test_perf_event_all_cpu(struct perf_event_attr *attr)
>  
>  	/* open perf_event on all cpus */
>  	for (i = 0; i < nr_cpus; i++) {
> -		pmu_fd[i] = perf_event_open(attr, -1, i, -1, 0);
> +		pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
>  		if (pmu_fd[i] < 0) {
> -			printf("perf_event_open failed\n");
> +			printf("sys_perf_event_open failed\n");
>  			goto all_cpu_err;
>  		}
>  		assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0);
> @@ -147,9 +148,9 @@ static void test_perf_event_task(struct perf_event_attr *attr)
>  	int pmu_fd;
>  
>  	/* open task bound event */
> -	pmu_fd = perf_event_open(attr, 0, -1, -1, 0);
> +	pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
>  	if (pmu_fd < 0) {
> -		printf("perf_event_open failed\n");
> +		printf("sys_perf_event_open failed\n");
>  		return;
>  	}
>  	assert(ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0);
> diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
> index 9c38f7aa4515..64e692fd7d51 100644
> --- a/samples/bpf/trace_output_user.c
> +++ b/samples/bpf/trace_output_user.c
> @@ -21,6 +21,7 @@
>  #include <signal.h>
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  static int pmu_fd;
>  
> @@ -160,7 +161,7 @@ static void test_bpf_perf_event(void)
>  	};
>  	int key = 0;
>  
> -	pmu_fd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
> +	pmu_fd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
>  
>  	assert(pmu_fd >= 0);
>  	assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
> diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
> index 7a3b4a4b19f3..1681cb7cd713 100644
> --- a/samples/bpf/tracex6_user.c
> +++ b/samples/bpf/tracex6_user.c
> @@ -10,6 +10,7 @@
>  #include <linux/bpf.h>
>  #include "libbpf.h"
>  #include "bpf_load.h"
> +#include "perf-sys.h"
>  
>  #define SAMPLE_PERIOD  0x7fffffffffffffffULL
>  
> @@ -32,7 +33,7 @@ static void test_bpf_perf_event(void)
>  	};
>  
>  	for (i = 0; i < nr_cpus; i++) {
> -		pmu_fd[i] = perf_event_open(&attr_insn_pmu, -1/*pid*/, i/*cpu*/, -1/*group_fd*/, 0);
> +		pmu_fd[i] = sys_perf_event_open(&attr_insn_pmu, -1/*pid*/, i/*cpu*/, -1/*group_fd*/, 0);
>  		if (pmu_fd[i] < 0) {
>  			printf("event syscall failed\n");
>  			goto exit;
> -- 
> 2.10.2

^ permalink raw reply

* Re: 4.9.0-rc8: tg3 dead after resume
From: Billy Shuman @ 2016-12-09 14:29 UTC (permalink / raw)
  To: Siva Reddy Kallam; +Cc: Michael Chan, Netdev
In-Reply-To: <CAMet4B6t9neFPcGstZw6ebhFCBQzRsesStXZ8bjSaC5ggcuKxw@mail.gmail.com>

On Thu, Dec 8, 2016 at 4:03 AM, Siva Reddy Kallam
<siva.kallam@broadcom.com> wrote:
> On Thu, Dec 8, 2016 at 12:14 AM, Billy Shuman <wshuman3@gmail.com> wrote:
>> On Wed, Dec 7, 2016 at 12:37 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> On Wed, Dec 7, 2016 at 7:20 AM, Billy Shuman <wshuman3@gmail.com> wrote:
>>>> After resume on 4.9.0-rc8 tg3 is dead.
>>>>
>>>> In logs I see:
>>>> kernel: tg3 0000:44:00.0: phy probe failed, err -19
>>>> kernel: tg3 0000:44:00.0: Problem fetching invariants of chip, aborting
>>>
>>> -19 is -ENODEV which means tg3 cannot read the PHY ID.
>>>
>>> If it's a true suspend/resume operation, the driver does not have to
>>> go through probe during resume.  Please explain how you do
>>> suspend/resume.
>>>
>>
>> Sorry my previous message was accidentally sent to early.
>>
>> I used systemd (systemctl suspend) to suspend.
>>
> We need more information to proceed further.
> Without suspend, Are you able to use the tg3 port?

Yes the port works fine without suspend.

> Which Broadcom card are you having in laptop?

The nic is a NetXtreme BCM57762 Gigabit Ethernet PCIe in a thunderbolt3 dock.

> Please provide complete tg3 specific logs in dmesg.
>

[   32.084010] tg3.c:v3.137 (May 11, 2014)
[   32.124695] tg3 0000:44:00.0 eth0: Tigon3 [partno(BCM957762) rev
57766001] (PCI Express) MAC address 98:e7:f4:8b:13:19
[   32.124698] tg3 0000:44:00.0 eth0: attached PHY is 57765
(10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   32.124699] tg3 0000:44:00.0 eth0: RXcsums[1] LinkChgREG[0]
MIirq[0] ASF[0] TSOcap[1]
[   32.124700] tg3 0000:44:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]
[   32.219764] tg3 0000:44:00.0 enp68s0: renamed from eth0
[   36.219245] tg3 0000:44:00.0 enp68s0: Link is up at 1000 Mbps, full duplex
[   36.219250] tg3 0000:44:00.0 enp68s0: Flow control is on for TX and on for RX
[   36.219251] tg3 0000:44:00.0 enp68s0: EEE is disabled

after resume
[   92.292838] tg3 0000:44:00.0 enp68s0: No firmware running
[   93.521744] tg3 0000:44:00.0: tg3_abort_hw timed out,
TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
[  106.704655] tg3 0000:44:00.0 enp68s0: Link is down
[  108.370356] tg3 0000:44:00.0: tg3_abort_hw timed out,
TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff

after rmmod, modprobe
[  570.933636] tg3 0000:44:00.0: tg3_abort_hw timed out,
TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
[  604.847215] tg3.c:v3.137 (May 11, 2014)
[  605.010075] tg3 0000:44:00.0: phy probe failed, err -19
[  605.010077] tg3 0000:44:00.0: Problem fetching invariants of chip, aborting




>>> Did this work before?  There has been very few changes to tg3 recently.
>>>
>>
>> This is a new laptop for me, but the same behavior is seen on 4.4.36 and 4.8.12.
>>
>>>>
>>>> rmmod and modprobe does not fix the problem only a reboot resolves the issue.
>>>>
>>>> Billy

^ permalink raw reply

* Re: [PATCHv3 perf/core 3/7] tools lib bpf: Add flags to bpf_create_map()
From: Arnaldo Carvalho de Melo @ 2016-12-09 14:27 UTC (permalink / raw)
  To: Wangnan (F); +Cc: Joe Stringer, linux-kernel, ast, daniel, netdev
In-Reply-To: <b2c274c5-5e2c-749f-648e-7664a1daa6c2@huawei.com>

Em Fri, Dec 09, 2016 at 11:36:18AM +0800, Wangnan (F) escreveu:
> 
> 
> On 2016/12/9 10:46, Joe Stringer wrote:
> > The map_flags argument to bpf_create_map() was previously not exposed.
> > By exposing it, users can access flags such as whether or not to
> > preallocate the map.
> > 
> > Signed-off-by: Joe Stringer <joe@ovn.org>
> 
> Please mention commit 6c90598174322b8888029e40dd84a4eb01f56afe in
> commit message:
> 
> Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces
> map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new
> parameter in libbpf.

will do it, thanks.

- Arnaldo
 
> Acked-by: Wang Nan <wangnan0@huawei.com>
> 
> > ---
> > v3: Split from "tools lib bpf: Sync with samples/bpf/libbpf".
> > ---
> >   tools/lib/bpf/bpf.c    | 3 ++-
> >   tools/lib/bpf/bpf.h    | 2 +-
> >   tools/lib/bpf/libbpf.c | 3 ++-
> >   3 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 89e8e8e5b60e..d0afb26c2e0f 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -54,7 +54,7 @@ static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
> >   }
> >   int bpf_create_map(enum bpf_map_type map_type, int key_size,
> > -		   int value_size, int max_entries)
> > +		   int value_size, int max_entries, __u32 map_flags)
> >   {
> >   	union bpf_attr attr;
> > @@ -64,6 +64,7 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
> >   	attr.key_size = key_size;
> >   	attr.value_size = value_size;
> >   	attr.max_entries = max_entries;
> > +	attr.map_flags = map_flags;
> >   	return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
> >   }
> > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> > index 61130170a6ad..7fcdce16fd62 100644
> > --- a/tools/lib/bpf/bpf.h
> > +++ b/tools/lib/bpf/bpf.h
> > @@ -24,7 +24,7 @@
> >   #include <linux/bpf.h>
> >   int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
> > -		   int max_entries);
> > +		   int max_entries, __u32 map_flags);
> >   /* Recommend log buffer size */
> >   #define BPF_LOG_BUF_SIZE 65536
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 2e974593f3e8..84e6b35da4bd 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -854,7 +854,8 @@ bpf_object__create_maps(struct bpf_object *obj)
> >   		*pfd = bpf_create_map(def->type,
> >   				      def->key_size,
> >   				      def->value_size,
> > -				      def->max_entries);
> > +				      def->max_entries,
> > +				      0);
> >   		if (*pfd < 0) {
> >   			size_t j;
> >   			int err = *pfd;
> 

^ permalink raw reply

* Re: [PATCH 37/50] netfilter: nf_tables: atomic dump and reset for stateful objects
From: Eric Dumazet @ 2016-12-09 14:24 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Paul Gortmaker, netfilter-devel, David Miller, netdev,
	linux-next@vger.kernel.org
In-Reply-To: <20161209102432.GA986@salvia>

On Fri, 2016-12-09 at 11:24 +0100, Pablo Neira Ayuso wrote:
> Hi Paul,

Hi Pablo

Given that bytes/packets counters are modified without cmpxchg64()  :

static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
                                       struct nft_regs *regs,
                                       const struct nft_pktinfo *pkt)
{
        struct nft_counter_percpu *this_cpu;

        local_bh_disable();
        this_cpu = this_cpu_ptr(priv->counter);
        u64_stats_update_begin(&this_cpu->syncp);
        this_cpu->counter.bytes += pkt->skb->len;
        this_cpu->counter.packets++;
        u64_stats_update_end(&this_cpu->syncp);
        local_bh_enable();
}

It means that the cmpxchg64() used to clear the stats is not good enough.

It does not help to make sure stats are properly cleared.

On 64 bit, the ->syncp is not there, so the nft_counter_reset() might
not see that a bytes or packets counter was modified by another cpu.


CPU 1                              CPU 2

LOAD PTR->BYTES into REG_A         old = *counter;
REG_A += skb->len;
                                   cmpxchg64(counter, old, 0);
PTR->BYTES = REG_A

It looks that you want a seqcount, even on 64bit arches,
so that CPU 2 can restart its loop, and more importantly you need
to not accumulate the values you read, because they might be old/invalid.

Another way would be to not use cmpxchg64() at all.
Way to expensive in fast path !

The percpu value would never be modified by an other cpu than the owner.

You need a per cpu seqcount, no need to add a syncp per nft percpu counter.


static DEFINE_PERCPU(seqcount_t, nft_pcpu_seq);

static inline void nft_counter_do_eval(struct nft_counter_percpu_priv *priv,
                                       struct nft_regs *regs,
                                       const struct nft_pktinfo *pkt)
{
        struct nft_counter_percpu *this_cpu;
	seqcount_t *myseq;

        local_bh_disable();
        this_cpu = this_cpu_ptr(priv->counter);
	myseq = this_cpu_ptr(&nft_pcpu_seq);

	write_seqcount_begin(myseq);

        this_cpu->counter.bytes += pkt->skb->len;
        this_cpu->counter.packets++;

	write_seqcount_end(myseq);
	
        local_bh_enable();
}

Thanks !

^ permalink raw reply

* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: Tejun Heo @ 2016-12-09 13:27 UTC (permalink / raw)
  To: John Stultz
  Cc: Andy Lutomirski, Alexei Starovoitov, Andy Lutomirski,
	Mickaël Salaün, Daniel Mack, David S. Miller, kafai,
	Florian Westphal, Harald Hoyer, Network Development,
	Sargun Dhillon, Pablo Neira Ayuso, lkml, Li Zefan,
	Jonathan Corbet, open list:CONTROL GROUP (CGROUP),
	Android Kernel Team, Rom Lemarchand, Colin Cross
In-Reply-To: <CALAqxLUkqSAEiHE038w+ZGUmhPgj2SpG7BLcPrrtU46VYcO=KA@mail.gmail.com>

Hello, John.

On Thu, Dec 08, 2016 at 09:39:38PM -0800, John Stultz wrote:
> So just to clarify the discussion for my purposes and make sure I
> understood, per-cgroup CAP rules was not desired, and instead we
> should either utilize an existing cap (are there still objections to
> CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie,
> bring back the older CAP_CGROUP_MIGRATE patch).

Let's create a new one.  It looks to be a bit too different to share
with an existing one.

> Tejun: Do you have a more finished version of your patch that I should
> add my changes on top of?

Oh, just submit the patch on top of the current for-next.  I can queue
mine on top of yours.  They are mostly orthogonal.

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH] net: smsc911x: back out silently on probe deferrals
From: Linus Walleij @ 2016-12-09 13:18 UTC (permalink / raw)
  To: netdev, David S . Miller, Steve Glendinning
  Cc: Guenter Roeck, Jeremy Linton, Kamlakant Patel, Pavel Fedin,
	Linus Walleij

When trying to get a regulator we may get deferred and we see
this noise:

smsc911x 1b800000.ethernet-ebi2 (unnamed net_device) (uninitialized):
   couldn't get regulators -517

Then the driver continues anyway. Which means that the regulator
may not be properly retrieved and reference counted, and may be
switched off in case noone else is using it.

Fix this by returning silently on deferred probe and let the
system work it out.

Cc: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/smsc/smsc911x.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 86b7c04e3738..c492e4ffd9e7 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -442,9 +442,16 @@ static int smsc911x_request_resources(struct platform_device *pdev)
 	ret = regulator_bulk_get(&pdev->dev,
 			ARRAY_SIZE(pdata->supplies),
 			pdata->supplies);
-	if (ret)
+	if (ret) {
+		/*
+		 * Retry on deferrals, else just report the error
+		 * and try to continue.
+		 */
+		if (ret == -EPROBE_DEFER)
+			return ret;
 		netdev_err(ndev, "couldn't get regulators %d\n",
 				ret);
+	}
 
 	/* Request optional RESET GPIO */
 	pdata->reset_gpiod = devm_gpiod_get_optional(&pdev->dev,
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable
From: Mark Lord @ 2016-12-09 13:05 UTC (permalink / raw)
  To: Hayes Wang
  Cc: netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB20105C9EE@RTITMBSV03.realtek.com.tw>

On 16-12-08 10:23 PM, Hayes Wang wrote:
> Mark Lord <mlord@pobox.com>
> 
> I find an issue about autosuspend, and it may result in the same
> problem with you. I don't sure if this is helpful to you, because
> it only occurs when enabling the autosuspend.

Thanks.  I am using ASIX adapters now.

I did try the latest 4.9-rc8, and 4.8.12 kernels with the r8152 dongle yesterday,
in hope that perhaps the many EHCI fixes from those kernels might help out.

The dongle was unusable with those newer kernels.
Most of the time it failed with "Get ether addr fail\n" at startup.

On the occasions where it got past that point, it often failed
the DHCP negotiation, but this looks more like a bug elsewhere in
the kernel, possibly racing against initialization of the random
number generators.  Adding a 2-second sleep the the r8151 probe
function made this error mostly go away.

Cheers
-- 
Mark Lord

^ permalink raw reply

* [PATCH] net:ethernet:samsung:initialize cur_rx_qnum
From: Rayagond Kokatanur @ 2016-12-09 12:14 UTC (permalink / raw)
  To: siva.kallam, bh74.an, ks.giri, vipul.pandya; +Cc: netdev, Rayagond Kokatanur

This patch initialize the cur_rx_qnum upon occurence of rx interrupt,
without this initialization driver will not work with multiple rx queues configurations.

NOTE: This patch is not tested on actual hw.
---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
index ea44a24..580a1a4 100644
--- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
+++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c
@@ -1681,6 +1681,7 @@ static irqreturn_t sxgbe_rx_interrupt(int irq, void *dev_id)
 	struct sxgbe_rx_queue *rxq = (struct sxgbe_rx_queue *)dev_id;
 	struct sxgbe_priv_data *priv = rxq->priv_ptr;
 
+	priv->cur_rx_qnum = rxq->queue_no;
 	/* get the channel status */
 	status = priv->hw->dma->rx_dma_int_status(priv->ioaddr, rxq->queue_no,
 						  &priv->xstats);
-- 
1.9.1

^ permalink raw reply related

* Re: netlink: GPF in sock_sndtimeo
From: Richard Guy Briggs @ 2016-12-09 12:12 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: netdev, LKML, linux-audit
In-Reply-To: <CACT4Y+ZsOXoQqVE4vhenb9fUJkwAbGL6wUZxGyaT2h7Cncbfog@mail.gmail.com>

On 2016-12-09 12:53, Dmitry Vyukov wrote:
> On Fri, Dec 9, 2016 at 12:48 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2016-12-09 11:49, Dmitry Vyukov wrote:
> >> On Fri, Dec 9, 2016 at 7:02 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> >> > On 2016-11-29 23:52, Richard Guy Briggs wrote:
> >> > I tried a quick compile attempt on the test case (I assume it is a
> >> > socket fuzzer) and get the following compile error:
> >> > cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c
> >> > socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined
> >> > <command-line>: warning: this is the location of the previous definition
> >> > socket_fuzz.c: In function ‘segv_handler’:
> >> > socket_fuzz.c:89: warning: implicit declaration of function ‘__atomic_load_n’
> >> > socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this function)
> >> > socket_fuzz.c:89: error: (Each undeclared identifier is reported only once
> >> > socket_fuzz.c:89: error: for each function it appears in.)
> >> > socket_fuzz.c: In function ‘loop’:
> >> > socket_fuzz.c:280: warning: unused variable ‘errno0’
> >> > socket_fuzz.c: In function ‘test’:
> >> > socket_fuzz.c:303: warning: implicit declaration of function ‘__atomic_fetch_add’
> >> > socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this function)
> >> > socket_fuzz.c:303: warning: implicit declaration of function ‘__atomic_fetch_sub’
> >>
> >> -std=gnu99 should help
> >> ignore warnings
> >
> > I got a little further, left with "__ATOMIC_RELAXED undeclared", "__ATOMIC_SEQ_CST
> > undeclared" under gcc 4.4.7-16.
> >
> > gcc 4.8.2-15 leaves me with "undefined reference to `clock_gettime'"
> 
> add -lrt

Ok, that helped.  Thanks!

> > What compiler version do you recommend?
> 
> 6.x sounds reasonable
> 4.4 branch is 7.5 years old, surprised that it does not disintegrate
> into dust yet :)

  These are under RHEL6...  so there are updates to them, but yeah, they are old.

> >> >> - RGB
> >> >
> >> > - RGB
> >
> > - RGB
> >
> > --
> > Richard Guy Briggs <rgb@redhat.com>
> > Kernel Security Engineering, Base Operating Systems, Red Hat
> > Remote, Ottawa, Canada
> > Voice: +1.647.777.2635, Internal: (81) 32635

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox