Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next 04/10] perf, bpf: allow bpf programs attach to tracepoints
From: Peter Zijlstra @ 2016-04-07 20:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, David S . Miller, Ingo Molnar, Daniel Borkmann,
	Arnaldo Carvalho de Melo, Wang Nan, Josef Bacik, Brendan Gregg,
	netdev, linux-kernel, kernel-team
In-Reply-To: <1459993411-2754735-5-git-send-email-ast@fb.com>

On Wed, Apr 06, 2016 at 06:43:25PM -0700, Alexei Starovoitov wrote:
> introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached
> to the perf tracepoint handler, which will copy the arguments into
> the per-cpu buffer and pass it to the bpf program as its first argument.
> The layout of the fields can be discovered by doing
> 'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format'
> prior to the compilation of the program with exception that first 8 bytes
> are reserved and not accessible to the program. This area is used to store
> the pointer to 'struct pt_regs' which some of the bpf helpers will use:
> +---------+
> | 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program)
> +---------+
> | N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly)
> +---------+
> | dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet)
> +---------+
> 
> Not that all of the fields are already dumped to user space via perf ring buffer
> and broken application access it directly without consulting tracepoint/format.
> Same rule applies here: static tracepoint fields should only be accessed
> in a format defined in tracepoint/format. The order of fields and
> field sizes are not an ABI.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply

* Re: [PATCH v2 net-next 03/10] perf: split perf_trace_buf_prepare into alloc and update parts
From: Peter Zijlstra @ 2016-04-07 20:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, David S . Miller, Ingo Molnar, Daniel Borkmann,
	Arnaldo Carvalho de Melo, Wang Nan, Josef Bacik, Brendan Gregg,
	netdev, linux-kernel, kernel-team
In-Reply-To: <1459993411-2754735-4-git-send-email-ast@fb.com>

On Wed, Apr 06, 2016 at 06:43:24PM -0700, Alexei Starovoitov wrote:
> split allows to move expensive update of 'struct trace_entry' to later phase.
> Repurpose unused 1st argument of perf_tp_event() to indicate event type.
> 
> While splitting use temp variable 'rctx' instead of '*rctx' to avoid
> unnecessary loads done by the compiler due to -fno-strict-aliasing
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply

* Re: [net PATCH v3] GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
From: David Miller @ 2016-04-07 20:57 UTC (permalink / raw)
  To: aduyck; +Cc: jesse, netdev, alexander.duyck, tom
In-Reply-To: <20160405161249.3861.1683.stgit@localhost.localdomain>

From: Alexander Duyck <aduyck@mirantis.com>
Date: Tue, 05 Apr 2016 09:13:39 -0700

> This patch fixes an issue I found in which we were dropping frames if we
> had enabled checksums on GRE headers that were encapsulated by either FOU
> or GUE.  Without this patch I was barely able to get 1 Gb/s of throughput.
> With this patch applied I am now at least getting around 6 Gb/s.
> 
> The issue is due to the fact that with FOU or GUE applied we do not provide
> a transport offset pointing to the GRE header, nor do we offload it in
> software as the GRE header is completely skipped by GSO and treated like a
> VXLAN or GENEVE type header.  As such we need to prevent the stack from
> generating it and also prevent GRE from generating it via any interface we
> create.
> 
> Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
> ---
> 
> v3: Basically the same patch as v1 and v2, but I am cutting it loose from
>     the IPv4 ID patch as that one will likely need to be resolved in
>     net-next.

Applied, thanks Alexander.

^ permalink raw reply

* Re: [PATCH v2 net-next 02/10] perf: remove unused __addr variable
From: Peter Zijlstra @ 2016-04-07 20:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, David S . Miller, Ingo Molnar, Daniel Borkmann,
	Arnaldo Carvalho de Melo, Wang Nan, Josef Bacik, Brendan Gregg,
	netdev, linux-kernel, kernel-team
In-Reply-To: <1459993411-2754735-3-git-send-email-ast@fb.com>

On Wed, Apr 06, 2016 at 06:43:23PM -0700, Alexei Starovoitov wrote:
> now all calls to perf_trace_buf_submit() pass 0 as 4th
> argument which will be repurposed in the next patch which will
> change the meaning of 1st arg of perf_tp_event() to event_type
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/trace/perf.h         | 7 ++-----
>  include/trace/trace_events.h | 3 ---
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index 26486fcd74ce..6f7e37869065 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -20,9 +20,6 @@
>  #undef __get_bitmask
>  #define __get_bitmask(field) (char *)__get_dynamic_array(field)
>  
> -#undef __perf_addr
> -#define __perf_addr(a)	(__addr = (a))
> -

Ah, we're not using that anylonger..

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply

* Re: [PATCH net-next 0/8] udp: GRO in UDP sockets
From: David Miller @ 2016-04-07 20:54 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team
In-Reply-To: <1459869776-2090500-1-git-send-email-tom@herbertland.com>

From: Tom Herbert <tom@herbertland.com>
Date: Tue, 5 Apr 2016 08:22:48 -0700

> This patch set adds GRO functions (gro_receive and gro_complete) to UDP
> sockets and removes udp_offload infrastructure.
> 
> Add GRO functions (gro_receive and gro_complete) to UDP sockets. In
> udp_gro_receive and udp_gro_complete a socket lookup is done instead of
> looking up the port number in udp_offloads.  If a socket is found and
> there are GRO functions for it then those are called. This feature
> allows binding GRO functions to more than just a port number.
> Eventually, we will be able to use this technique to allow application
> defined GRO for an application protocol by attaching BPF porgrams to UDP
> sockets for doing GRO.
> 
> In order to implement these functions, we added exported
> udp6_lib_lookup_skb and udp4_lib_lookup_skb functions in ipv4/udp.c and
> ipv6/udp.c. Also, inet_iif and references to skb_dst() were changed to
> check that dst is set in skbuf before derefencing. In the GRO path there
> is now a UDP socket lookup performed before dst is set, to the get the
> device in that case we simply use skb->dev.
> 
> Tested:
> 
> Ran various combinations of VXLAN and GUE TCP_STREAM and TCP_RR tests.
> Did not see any material regression.

Series applied thanks Tom.

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: David Miller @ 2016-04-07 20:50 UTC (permalink / raw)
  To: dhowells; +Cc: joe, linux-afs, netdev, linux-kernel
In-Reply-To: <5702.1460062039@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Thu, 07 Apr 2016 21:47:19 +0100

> David Miller <davem@davemloft.net> wrote:
> 
>> > Excellent only if I can get at it to find out why the something went wrong.
>> > If it's lost because the machine panics, then it is worthless.
>> 
>> If you're ok with these kenter things spewing into the logs with the
>> current facility, you can run the function tracer and have it record
>> into the logs all the time too.
> 
> They don't do that unless you enable them.  And I can enable them individually
> at compile time or by set at runtime.

As you can with the function tracer and tracepoints.

We're not telling you to kill this now, we're just saying that you should
thinking about tossing this custom stuff when you have a chance.

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: David Howells @ 2016-04-07 20:47 UTC (permalink / raw)
  To: David Miller; +Cc: dhowells, joe, linux-afs, netdev, linux-kernel
In-Reply-To: <20160407.163616.385449823878901734.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:

> > Excellent only if I can get at it to find out why the something went wrong.
> > If it's lost because the machine panics, then it is worthless.
> 
> If you're ok with these kenter things spewing into the logs with the
> current facility, you can run the function tracer and have it record
> into the logs all the time too.

They don't do that unless you enable them.  And I can enable them individually
at compile time or by set at runtime.

> I don't see any argument which is appropriate for keeping this stuff
> around.

Sigh.

As I said to Joe: I don't want to deal with converting to using ftrace at the
moment whilst I'm towards the end of reworking the rewrite of the code.  I
have a stack of patches that would need reworking yet again.

David

^ permalink raw reply

* Re: [PATCH v2 net-next 00/10] allow bpf attach to tracepoints
From: David Miller @ 2016-04-07 20:46 UTC (permalink / raw)
  To: ast
  Cc: rostedt, peterz, mingo, daniel, acme, wangnan0, jbacik,
	brendan.d.gregg, netdev, linux-kernel, kernel-team
In-Reply-To: <1459993411-2754735-1-git-send-email-ast@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 6 Apr 2016 18:43:21 -0700

> Hi Steven, Peter,

Steven/Peter, can you give this series a review?

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 0/3] sock: lockdep tightening
From: David Miller @ 2016-04-07 20:44 UTC (permalink / raw)
  To: hannes; +Cc: netdev, daniel
In-Reply-To: <1459869016-13896-1-git-send-email-hannes@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue,  5 Apr 2016 17:10:13 +0200

> First patch is from Eric Dumazet and improves lockdep accuracy for
> socket locks. After that, second patch introduces lockdep_sock_is_held
> and uses it. Final patch reverts and reworks the lockdep fix from Daniel
> in the filter code, as we now have tighter lockdep support.

Series applied, thanks Hannes.

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: David Howells @ 2016-04-07 20:42 UTC (permalink / raw)
  To: Joe Perches; +Cc: dhowells, linux-afs, netdev, linux-kernel
In-Reply-To: <1460060518.1800.8.camel@perches.com>

Joe Perches <joe@perches.com> wrote:

> > Let's see...  If the machine panics whilst I'm developing stuff (quite
> > likely if something goes wrong in BH context), how do I get at the
> > function tracing log to find out why it panicked if the log is then
> > lost?  With the serial console, at least I automatically capture the
> > output of kenter and co..
> 
> But you also don't get a bunch of other useful stuff.
> 
> btw: there is ftrace_dump_on_oops
> 
> https://lwn.net/Articles/366796/

Cute.

How do I get at the argument values and suchlike that kenter() and co print?
I presume I would need to replace any kenter() or suchlike that printed more
than just the address with a tracing hook?

David

^ permalink raw reply

* Re: [Lsf] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-07 20:38 UTC (permalink / raw)
  To: Waskiewicz, PJ
  Cc: lsf@lists.linux-foundation.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, bblanco@plumgrid.com,
	alexei.starovoitov@gmail.com,
	James.Bottomley@HansenPartnership.com, tom@herbertland.com,
	lsf-pc@lists.linux-foundation.org, brouer
In-Reply-To: <1460058531.13579.12.camel@netapp.com>

On Thu, 7 Apr 2016 19:48:50 +0000
"Waskiewicz, PJ" <PJ.Waskiewicz@netapp.com> wrote:

> On Thu, 2016-04-07 at 16:17 +0200, Jesper Dangaard Brouer wrote:
> > (Topic proposal for MM-summit)
> > 
> > Network Interface Cards (NIC) drivers, and increasing speeds stress
> > the page-allocator (and DMA APIs).  A number of driver specific
> > open-coded approaches exists that work-around these bottlenecks in
> > the
> > page allocator and DMA APIs. E.g. open-coded recycle mechanisms, and
> > allocating larger pages and handing-out page "fragments".
> > 
> > I'm proposing a generic page-pool recycle facility, that can cover
> > the
> > driver use-cases, increase performance and open up for zero-copy RX.  
> 
> Is this based on the page recycle stuff from ixgbe that used to be in
> the driver?  If so I'd really like to be part of the discussion.

Okay, so it is not part of the driver any-longer?  I've studied the
current ixgbe driver (and other NIC drivers) closely.  Do you have some
code pointers, to this older code?

The likely-fastest recycle code I've see is in the bnx2x driver.  If
you are interested see: bnx2x_reuse_rx_data().  Again is it a bit
open-coded produce/consumer ring queue (which would be nice to also
cleanup).


To amortize the cost of allocating a single page, most other drivers
use the trick of allocating a larger (compound) page, and partition
this page into smaller "fragments".  Which also amortize the cost of
dma_map/unmap (important on non-x86).

This is actually problematic performance wise, because packet-data
(in these page fragments) only get DMA_sync'ed, and is thus considered
"read-only".  As netstack need to write packet headers, yet-another
(writable) memory area is allocated per packet (plus the SKB meta-data
struct).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: David Miller @ 2016-04-07 20:36 UTC (permalink / raw)
  To: dhowells; +Cc: joe, linux-afs, netdev, linux-kernel
In-Reply-To: <1091.1460058308@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Thu, 07 Apr 2016 20:45:08 +0100

> David Miller <davem@davemloft.net> wrote:
> 
>> Yeah this custom stuff is really inappropriate given the excellent
>> infrastructure we have these days...
> 
> Excellent only if I can get at it to find out why the something went wrong.
> If it's lost because the machine panics, then it is worthless.

If you're ok with these kenter things spewing into the logs with the
current facility, you can run the function tracer and have it record
into the logs all the time too.

I don't see any argument which is appropriate for keeping this stuff
around.

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: Joe Perches @ 2016-04-07 20:21 UTC (permalink / raw)
  To: David Howells; +Cc: linux-afs, netdev, linux-kernel
In-Reply-To: <981.1460058224@warthog.procyon.org.uk>

On Thu, 2016-04-07 at 20:43 +0100, David Howells wrote:
> Joe Perches <joe@perches.com> wrote:
> > > Joe Perches <joe@perches.com> wrote:
> > > > It might be better to remove kenter and _enter
> > > > altogether and use function tracing instead.
> > > Possibly - but not at this time.
> > Swell.
> I didn't say I wouldn't do it - it's just that I'm trying to fix other stuff
> at the moment and don't particularly want to add that to the list just now.
> kenter, _enter and co. are serving me very well.

No worries, I didn't mean to imply you wouldn't.
It's your code, do what and when you want.

> > > Besides, isn't the function tracing log lost
> > > if the machine crashes?
> > I believe yes, but would it matter?
> Let's see...  If the machine panics whilst I'm developing stuff (quite likely
> if something goes wrong in BH context), how do I get at the function tracing
> log to find out why it panicked if the log is then lost?  With the serial
> console, at least I automatically capture the output of kenter and co..

But you also don't get a bunch of other useful stuff.

btw: there is ftrace_dump_on_oops

https://lwn.net/Articles/366796/

^ permalink raw reply

* Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver
From: Andrew Lunn @ 2016-04-07 20:10 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Rob Herring, Gilad Avidov, netdev,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-arm-msm,
	Sagar Dharia, shankerd-sgV2jX0FEOL9JmXXK+q4OQ, Greg Kroah-Hartman,
	vikrams-sgV2jX0FEOL9JmXXK+q4OQ, Christopher Covington
In-Reply-To: <5706B4EF.2050600-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

> I'm back to working on this driver, and I need some more help with
> how to handle the phy.  mdio-gpio.txt doesn't really tell me much.
> I'm actually working on an ACPI system and not DT.

I can help you with DT, but not ACPI.

The MDIO bus can be a separate Linux device. Since you have GPIO lines
for the MDIO bus, it makes sense for this to be a mdio-gpio device. So
in DT, you would have:

mdio0: mdio {
        compatible = "virtual,mdio-gpio";
        #address-cells = <1>;
        #size-cells = <0>;
        gpios = <&qcomgpio 123 0
                 &qcomgpio 124 0>;

	phy0: ethernet-phy@8 {
        	reg = <9>;
	};
};

Here i've assumed the PHY is using address 8 on the bus. Change as
needed.

In your MAC DT node, you then have phy-handle pointing to this phy:

  emac0: qcom,emac@feb20000 {
  	cell-index = <0>;
	compatible = "qcom,emac";
	reg-names = "base", "csr", "ptp", "sgmii";
	reg =   <0xfeb20000 0x10000>,
	    	<0xfeb36000 0x1000>,
		<0xfeb3c000 0x4000>,
		<0xfeb38000 0x400>;
	#address-cells = <0>;
	interrupt-parent = <&emac0>;
	#interrupt-cells = <1>;
	interrupts = <0 1>;
	interrupt-map-mask = <0xffffffff>;
	interrupt-map = <0 &intc 0 76 0
	 	         1 &intc 0 80 0>;
       	interrupt-names = "emac_core0", "sgmii_irq";
  	qcom,emac-tstamp-en;
	qcom,emac-ptp-frac-ns-adj = <125000000 1>;

	phy-handle = <&phy0>
}

In the driver, you need to connect the PHY to the MAC. You do this
using something like:

         if (dev->of_node) {
                 phy_np = of_parse_phandle(dev->of_node, "phy-handle", 0);
                 if (!phy_np) {
                         netdev_dbg(ndev, "No phy-handle found in DT\n");
                         return -ENODEV;
                 }
 
                 phy_dev = of_phy_connect(ndev, phy_np, &xxxx_enet_adjust_link,
                                          0, pdata->phy_mode);
                 if (!phy_dev) {
                         netdev_err(ndev, "Could not connect to PHY\n");
                         return -ENODEV;
                 }

Do you have an ACPI table describing this hardware? What does it look
like?

	Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Lsf] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Waskiewicz, PJ @ 2016-04-07 19:48 UTC (permalink / raw)
  To: lsf@lists.linux-foundation.org, linux-mm@kvack.org,
	brouer@redhat.com
  Cc: netdev@vger.kernel.org, bblanco@plumgrid.com,
	alexei.starovoitov@gmail.com,
	James.Bottomley@HansenPartnership.com, tom@herbertland.com,
	lsf-pc@lists.linux-foundation.org
In-Reply-To: <20160407161715.52635cac@redhat.com>

On Thu, 2016-04-07 at 16:17 +0200, Jesper Dangaard Brouer wrote:
> (Topic proposal for MM-summit)
> 
> Network Interface Cards (NIC) drivers, and increasing speeds stress
> the page-allocator (and DMA APIs).  A number of driver specific
> open-coded approaches exists that work-around these bottlenecks in
> the
> page allocator and DMA APIs. E.g. open-coded recycle mechanisms, and
> allocating larger pages and handing-out page "fragments".
> 
> I'm proposing a generic page-pool recycle facility, that can cover
> the
> driver use-cases, increase performance and open up for zero-copy RX.

Is this based on the page recycle stuff from ixgbe that used to be in
the driver?  If so I'd really like to be part of the discussion.

-PJ


-- 
PJ Waskiewicz
Principal Engineer, NetApp
e: pj.waskiewicz@netapp.com
d: 503.961.3705

^ permalink raw reply

* Re: [PATCH V2 6/8] net: mediatek: fix TX locking
From: Sergei Shtylyov @ 2016-04-07 19:46 UTC (permalink / raw)
  To: John Crispin, David S. Miller
  Cc: Felix Fietkau, Matthias Brugger,
	Sean Wang (王志亘), netdev, linux-mediatek,
	linux-kernel
In-Reply-To: <1460057210-55786-7-git-send-email-blogic@openwrt.org>

Hello.

On 04/07/2016 10:26 PM, John Crispin wrote:

> Inside the TX path there is a lock inside the tx_map function. This is
> however too late. The patch moves the lock to the start of the xmit
> function right before the free count check of the DMA ring happens.
> If we do not do this, the code becomes racy leading to TX stalls and
> dropped packets. This happens as there are 2 netdevs running on the
> same physical DMA ring.
>
> Signed-off-by: John Crispin <blogic@openwrt.org>
> ---
>   drivers/net/ethernet/mediatek/mtk_eth_soc.c |   20 ++++++++++----------
>   1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 60b66ab..8434355 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
[...]
> @@ -712,14 +702,22 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct mtk_eth *eth = mac->hw;
>   	struct mtk_tx_ring *ring = &eth->tx_ring;
>   	struct net_device_stats *stats = &dev->stats;
> +	unsigned long flags;
>   	bool gso = false;
>   	int tx_num;
>
> +	/* normally we can rely on the stack not calling this more than once,
> +	 * however we have 2 queues running ont he same ring so we need to lock

    s/ont he/ on the/, perhaps a good chance to fix the comment?

> +	 * the ring access
> +	 */
> +	spin_lock_irqsave(&eth->page_lock, flags);
> +
>   	tx_num = mtk_cal_txd_req(skb);
>   	if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
>   		mtk_stop_queue(eth);
>   		netif_err(eth, tx_queued, dev,
>   			  "Tx Ring full when queue awake!\n");
> +		spin_unlock_irqrestore(&eth->page_lock, flags);
>   		return NETDEV_TX_BUSY;
>   	}
>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: David Howells @ 2016-04-07 19:45 UTC (permalink / raw)
  To: David Miller; +Cc: dhowells, joe, linux-afs, netdev, linux-kernel
In-Reply-To: <20160407.133843.156901424652264502.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:

> Yeah this custom stuff is really inappropriate given the excellent
> infrastructure we have these days...

Excellent only if I can get at it to find out why the something went wrong.
If it's lost because the machine panics, then it is worthless.

David

^ permalink raw reply

* Re: [PATCH 2/9] rxrpc: Disable a debugging statement that has been left enabled.
From: David Howells @ 2016-04-07 19:43 UTC (permalink / raw)
  To: Joe Perches; +Cc: dhowells, linux-afs, netdev, linux-kernel
In-Reply-To: <1460049545.6715.98.camel@perches.com>

Joe Perches <joe@perches.com> wrote:

> > Joe Perches <joe@perches.com> wrote:
> > > It might be better to remove kenter and _enter
> > > altogether and use function tracing instead.
> > Possibly - but not at this time.
> 
> Swell.

I didn't say I wouldn't do it - it's just that I'm trying to fix other stuff
at the moment and don't particularly want to add that to the list just now.
kenter, _enter and co. are serving me very well.

> > Besides, isn't the function tracing log lost
> > if the machine crashes?
> 
> I believe yes, but would it matter?

Let's see...  If the machine panics whilst I'm developing stuff (quite likely
if something goes wrong in BH context), how do I get at the function tracing
log to find out why it panicked if the log is then lost?  With the serial
console, at least I automatically capture the output of kenter and co..

David

^ permalink raw reply

* Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Jesper Dangaard Brouer @ 2016-04-07 19:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Chuck Lever, Christoph Hellwig, James Bottomley, Tom Herbert,
	Brenden Blanco, lsf, linux-mm, netdev@vger.kernel.org, lsf-pc,
	Alexei Starovoitov, brouer
In-Reply-To: <1460045640.30063.3.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]


On Thu, 07 Apr 2016 12:14:00 -0400 Rik van Riel <riel@redhat.com> wrote:

> On Thu, 2016-04-07 at 08:48 -0700, Chuck Lever wrote:
> > > 
> > > On Apr 7, 2016, at 7:38 AM, Christoph Hellwig <hch@infradead.org>
> > > wrote:
> > > 
> > > This is also very interesting for storage targets, which face the
> > > same issue.  SCST has a mode where it caches some fully constructed
> > > SGLs, which is probably very similar to what NICs want to do.  
> >
> > +1 for NFS server.  
> 
> I have swapped around my slot (into the MM track)
> with Jesper's slot (now a plenary session), since
> there seems to be a fair amount of interest in
> Jesper's proposal from IO and FS people, and my
> topic is more MM specific.

Wow - I'm impressed. I didn't expect such a good slot!
Glad to see the interest!
Thanks!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply

* Re: optimizations to sk_buff handling in rds_tcp_data_ready
From: Sowmini Varadhan @ 2016-04-07 19:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: alexei.starovoitov, tom, netdev
In-Reply-To: <1460038560.6473.397.camel@edumazet-glaptop3.roam.corp.google.com>

On (04/07/16 07:16), Eric Dumazet wrote:
> Use skb split like TCP in output path ?

That almost looks like what I want, but skb_split modifies both
skb and skb1, and I want to leave skb untouched (otherwise 
I will mess up the book-keeping in tcp_read_sock). But skb_split 
is a good template- I think it could even be extended to avoid copying
the frags that we'd later trim off in rds_tcp_data_recv anyway, 
let me see what I can come up with, based on that code.

> Really, pskb_expand_head() is not supposed to copy payload ;)

That would make my world very easy! But it has callers from all
over the kernel, e.g., skb_realloc_headroom, and changing it is
obviously risky

^ permalink raw reply

* Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver
From: Timur Tabi @ 2016-04-07 19:28 UTC (permalink / raw)
  To: Rob Herring
  Cc: Gilad Avidov, netdev, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linux-arm-msm, Sagar Dharia, shankerd,
	Greg Kroah-Hartman, vikrams, Christopher Covington
In-Reply-To: <CAL_JsqLnJt+mdL==-qP-EJhMa2rA6i-AyDESj69wNVt7D5JJUQ@mail.gmail.com>

Rob Herring wrote:

>>>> >>>+- reg : Offset and length of the register regions for the device
>>>> >>>+- reg-names : Register region names referenced in 'reg' above.
>>>> >>>+       Required register resource entries are:
>>>> >>>+       "base"   : EMAC controller base register block.
>>>> >>>+       "csr"    : EMAC wrapper register block.
>>>> >>>+       Optional register resource entries are:
>>>> >>>+       "ptp"    : EMAC PTP (1588) register block.
>>>> >>>+                  Required if 'qcom,emac-tstamp-en' is present.
>>>> >>>+       "sgmii"  : EMAC SGMII PHY register block.
>>>> >>>+- interrupts : Interrupt numbers used by this controller
>>>> >>>+- interrupt-names : Interrupt resource names referenced in 'interrupts'
>>>> >>>above.
>>>> >>>+       Required interrupt resource entries are:
>>>> >>>+       "emac_core0"   : EMAC core0 interrupt.
>>>> >>>+       "sgmii_irq"   : EMAC SGMII interrupt.
>>>> >>>+- qcom,emac-gpio-mdc  : GPIO pin number of the MDC line of MDIO bus.
>>>> >>>+- qcom,emac-gpio-mdio : GPIO pin number of the MDIO line of MDIO bus.
>>> >>
>>> >>
>>> >>Use the standard binding for GPIO controlled MDIO bus.
>> >
>> >
>> >I'm not familiar with that one.  Are you talking about
>> >bindings/net/mdio-gpio.txt?

> Yes.
>
>
>>>> >>>+- phy-addr            : Specifies phy address on MDIO bus.
>>>> >>>+                       Required if the optional property
>>>> >>>"qcom,no-external-phy"
>>>> >>>+                       is not specified.
>>> >>
>>> >>
>>> >>Don't you think you will need to know the specific phy device or other
>>> >>properties of the phy?
>> >
>> >
>> >That, I can't answer.  Aren't all MDIO devices basically the same?  It's
>> >been a while since I've worked on them.

> No. There was some discussion just this week about needing to require
> phy devices to have compatible strings.

I'm back to working on this driver, and I need some more help with how 
to handle the phy.  mdio-gpio.txt doesn't really tell me much. I'm 
actually working on an ACPI system and not DT.  I don't want to strip 
out the DT code, but I can't really test it.  I want to keep changes to 
Gilad's patch to a minimum.

Part of the problem with the Emac (and Gilad tried to explain this) is 
that it has an internal phy.  Technically, you can connect this internal 
phy directly to another internal phy on another SOC, and use this as an 
SOC interconnect.  However, I don't know of anyone actually doing that.

Instead, most systems have the internal phy connect to an external phy. 
  This connection is how the Emac receives packets from the external phy.

So I don't understand how I'm supposed to use the binding in 
mdio-gpio.txt.  For one thing, there is no such binding on ACPI systems. 
  On my ACPI system, firmware has set up the GPIOs.  The driver never 
actually makes any gpio_xxx calls.  At this point, I'm tempted to just 
remove all the GPIO stuff from Gilad's patch, if I can't help enough 
help to figure out how to modify the driver the way you think I should.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

^ permalink raw reply

* [PATCH V2] net: mediatek: update the IRQ part of the binding document
From: John Crispin @ 2016-04-07 19:28 UTC (permalink / raw)
  To: David S. Miller
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Felix Fietkau,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	Sean Wang (王志亘),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Matthias Brugger,
	John Crispin

The current binding document only describes a single interrupt. Update the
document by adding the 2 other interrupts.

The driver currently only uses a single interrupt. The HW is however able
to using IRQ grouping to split TX and RX onto separate GIC irqs.

Signed-off-by: John Crispin <blogic-p3rKhJxN3npAfugRpC6u6w@public.gmane.org>
Cc: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
Changes in V2:
* split this patch out of the series that fixes tx stalls in the driver

 Documentation/devicetree/bindings/net/mediatek-net.txt |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
index 5ca7929..2f142be 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -9,7 +9,7 @@ have dual GMAC each represented by a child node..
 Required properties:
 - compatible: Should be "mediatek,mt7623-eth"
 - reg: Address and length of the register set for the device
-- interrupts: Should contain the frame engines interrupt
+- interrupts: Should contain the three frame engines interrupts
 - clocks: the clock used by the core
 - clock-names: the names of the clock listed in the clocks property. These are
 	"ethif", "esw", "gp2", "gp1"
@@ -42,7 +42,9 @@ eth: ethernet@1b100000 {
 		 <&ethsys CLK_ETHSYS_GP2>,
 		 <&ethsys CLK_ETHSYS_GP1>;
 	clock-names = "ethif", "esw", "gp2", "gp1";
-	interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW>;
+	interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW
+		      GIC_SPI 199 IRQ_TYPE_LEVEL_LOW
+		      GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>;
 	power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
 	resets = <&ethsys MT2701_ETHSYS_ETH_RST>;
 	reset-names = "eth";
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 8/8] net: mediatek: do not set the QID field in the TX DMA descriptors
From: John Crispin @ 2016-04-07 19:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Felix Fietkau, Matthias Brugger,
	Sean Wang (王志亘), netdev, linux-mediatek,
	linux-kernel, John Crispin
In-Reply-To: <1460057210-55786-1-git-send-email-blogic@openwrt.org>

The QID field gets set to the mac id. This made the DMA linked list queue
the traffic of each MAC on a different internal queue. However during long
term testing we found that this will cause traffic stalls as the multi
queue setup requires a more complete initialisation which is not part of
the upstream driver yet.

This patch removes the code setting the QID field, resulting in all
traffic ending up in queue 0 which works without any special setup.

Signed-off-by: John Crispin <blogic@openwrt.org>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index f9f8851..8163047 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -603,8 +603,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 			WRITE_ONCE(txd->txd1, mapped_addr);
 			WRITE_ONCE(txd->txd3, (TX_DMA_SWC |
 					       TX_DMA_PLEN0(frag_map_size) |
-					       last_frag * TX_DMA_LS0) |
-					       mac->id);
+					       last_frag * TX_DMA_LS0));
 			WRITE_ONCE(txd->txd4, 0);
 
 			tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 7/8] net: mediatek: move the pending_work struct to the device generic struct
From: John Crispin @ 2016-04-07 19:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Felix Fietkau, Matthias Brugger,
	Sean Wang (王志亘), netdev, linux-mediatek,
	linux-kernel, John Crispin
In-Reply-To: <1460057210-55786-1-git-send-email-blogic@openwrt.org>

The worker always touches both netdevs. It is ethernet core and not MAC
specific. We only need one worker, which belongs into the ethernets core
struct.

Signed-off-by: John Crispin <blogic@openwrt.org>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |   10 ++++------
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |    4 ++--
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 8434355..f9f8851 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1193,7 +1193,7 @@ static void mtk_tx_timeout(struct net_device *dev)
 	eth->netdev[mac->id]->stats.tx_errors++;
 	netif_err(eth, tx_err, dev,
 		  "transmit timed out\n");
-	schedule_work(&mac->pending_work);
+	schedule_work(&eth->pending_work);
 }
 
 static irqreturn_t mtk_handle_irq(int irq, void *_eth)
@@ -1438,7 +1438,7 @@ static void mtk_pending_work(struct work_struct *work)
 
 	/* stop all devices to make sure that dma is properly shut down */
 	for (i = 0; i < MTK_MAC_COUNT; i++) {
-		if (!netif_oper_up(eth->netdev[i]))
+		if (!eth->netdev[i])
 			continue;
 		mtk_stop(eth->netdev[i]);
 		__set_bit(i, &restart);
@@ -1463,15 +1463,13 @@ static int mtk_cleanup(struct mtk_eth *eth)
 	int i;
 
 	for (i = 0; i < MTK_MAC_COUNT; i++) {
-		struct mtk_mac *mac = netdev_priv(eth->netdev[i]);
-
 		if (!eth->netdev[i])
 			continue;
 
 		unregister_netdev(eth->netdev[i]);
 		free_netdev(eth->netdev[i]);
-		cancel_work_sync(&mac->pending_work);
 	}
+	cancel_work_sync(&eth->pending_work);
 
 	return 0;
 }
@@ -1659,7 +1657,6 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
 	mac->id = id;
 	mac->hw = eth;
 	mac->of_node = np;
-	INIT_WORK(&mac->pending_work, mtk_pending_work);
 
 	mac->hw_stats = devm_kzalloc(eth->dev,
 				     sizeof(*mac->hw_stats),
@@ -1761,6 +1758,7 @@ static int mtk_probe(struct platform_device *pdev)
 
 	eth->dev = &pdev->dev;
 	eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE);
+	INIT_WORK(&eth->pending_work, mtk_pending_work);
 
 	err = mtk_hw_init(eth);
 	if (err)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 48a5292..eed626d 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -363,6 +363,7 @@ struct mtk_rx_ring {
  * @clk_gp1:		The gmac1 clock
  * @clk_gp2:		The gmac2 clock
  * @mii_bus:		If there is a bus we need to create an instance for it
+ * @pending_work:	The workqueue used to reset the dma ring
  */
 
 struct mtk_eth {
@@ -389,6 +390,7 @@ struct mtk_eth {
 	struct clk			*clk_gp1;
 	struct clk			*clk_gp2;
 	struct mii_bus			*mii_bus;
+	struct work_struct		pending_work;
 };
 
 /* struct mtk_mac -	the structure that holds the info about the MACs of the
@@ -398,7 +400,6 @@ struct mtk_eth {
  * @hw:			Backpointer to our main datastruture
  * @hw_stats:		Packet statistics counter
  * @phy_dev:		The attached PHY if available
- * @pending_work:	The workqueue used to reset the dma ring
  */
 struct mtk_mac {
 	int				id;
@@ -406,7 +407,6 @@ struct mtk_mac {
 	struct mtk_eth			*hw;
 	struct mtk_hw_stats		*hw_stats;
 	struct phy_device		*phy_dev;
-	struct work_struct		pending_work;
 };
 
 /* the struct describing the SoC. these are declared in the soc_xyz.c files */
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH V2 6/8] net: mediatek: fix TX locking
From: John Crispin @ 2016-04-07 19:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: Felix Fietkau, netdev-u79uwXL29TY76Z2rM5mHXA,
	Sean Wang (王志亘),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Matthias Brugger,
	John Crispin
In-Reply-To: <1460057210-55786-1-git-send-email-blogic-p3rKhJxN3npAfugRpC6u6w@public.gmane.org>

Inside the TX path there is a lock inside the tx_map function. This is
however too late. The patch moves the lock to the start of the xmit
function right before the free count check of the DMA ring happens.
If we do not do this, the code becomes racy leading to TX stalls and
dropped packets. This happens as there are 2 netdevs running on the
same physical DMA ring.

Signed-off-by: John Crispin <blogic-p3rKhJxN3npAfugRpC6u6w@public.gmane.org>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 60b66ab..8434355 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -536,7 +536,6 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	struct mtk_eth *eth = mac->hw;
 	struct mtk_tx_dma *itxd, *txd;
 	struct mtk_tx_buf *tx_buf;
-	unsigned long flags;
 	dma_addr_t mapped_addr;
 	unsigned int nr_frags;
 	int i, n_desc = 1;
@@ -568,11 +567,6 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(dma_mapping_error(&dev->dev, mapped_addr)))
 		return -ENOMEM;
 
-	/* normally we can rely on the stack not calling this more than once,
-	 * however we have 2 queues running ont he same ring so we need to lock
-	 * the ring access
-	 */
-	spin_lock_irqsave(&eth->page_lock, flags);
 	WRITE_ONCE(itxd->txd1, mapped_addr);
 	tx_buf->flags |= MTK_TX_FLAGS_SINGLE0;
 	dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
@@ -632,8 +626,6 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
 				(!nr_frags * TX_DMA_LS0)));
 
-	spin_unlock_irqrestore(&eth->page_lock, flags);
-
 	netdev_sent_queue(dev, skb->len);
 	skb_tx_timestamp(skb);
 
@@ -661,8 +653,6 @@ err_dma:
 		itxd = mtk_qdma_phys_to_virt(ring, itxd->txd2);
 	} while (itxd != txd);
 
-	spin_unlock_irqrestore(&eth->page_lock, flags);
-
 	return -ENOMEM;
 }
 
@@ -712,14 +702,22 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct mtk_eth *eth = mac->hw;
 	struct mtk_tx_ring *ring = &eth->tx_ring;
 	struct net_device_stats *stats = &dev->stats;
+	unsigned long flags;
 	bool gso = false;
 	int tx_num;
 
+	/* normally we can rely on the stack not calling this more than once,
+	 * however we have 2 queues running ont he same ring so we need to lock
+	 * the ring access
+	 */
+	spin_lock_irqsave(&eth->page_lock, flags);
+
 	tx_num = mtk_cal_txd_req(skb);
 	if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
 		mtk_stop_queue(eth);
 		netif_err(eth, tx_queued, dev,
 			  "Tx Ring full when queue awake!\n");
+		spin_unlock_irqrestore(&eth->page_lock, flags);
 		return NETDEV_TX_BUSY;
 	}
 
@@ -747,10 +745,12 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			     ring->thresh))
 			mtk_wake_queue(eth);
 	}
+	spin_unlock_irqrestore(&eth->page_lock, flags);
 
 	return NETDEV_TX_OK;
 
 drop:
+	spin_unlock_irqrestore(&eth->page_lock, flags);
 	stats->tx_dropped++;
 	dev_kfree_skb(skb);
 	return NETDEV_TX_OK;
-- 
1.7.10.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox