Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] ARM: dts: Add the ethernet and ethernet PHY to the cygnus core DT.
From: Sergei Shtylyov @ 2017-04-25  9:40 UTC (permalink / raw)
  To: Eric Anholt, Florian Fainelli, Vivien Didelot, Andrew Lunn,
	netdev, Rob Herring, Mark Rutland, devicetree
  Cc: linux-arm-kernel, linux-kernel, bcm-kernel-feedback-list, Ray Jui,
	Scott Branden, Jon Mason
In-Reply-To: <20170424215022.30382-3-eric@anholt.net>

Hello.

On 4/25/2017 12:50 AM, Eric Anholt wrote:

> Cygnus has a single amac controller connected to the B53 switch with 2
> PHYs.  On the BCM911360_EP platform, those two PHYs are connected to
> the external ethernet jacks.

    My spell checker trips on "amac" and "ethernet" -- perhaps they need 
capitalization?

> Signed-off-by: Eric Anholt <eric@anholt.net>
> ---
>  arch/arm/boot/dts/bcm-cygnus.dtsi      | 60 ++++++++++++++++++++++++++++++++++
>  arch/arm/boot/dts/bcm911360_entphn.dts |  8 +++++
>  2 files changed, 68 insertions(+)
>
> diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi b/arch/arm/boot/dts/bcm-cygnus.dtsi
> index 009f1346b817..318899df9972 100644
> --- a/arch/arm/boot/dts/bcm-cygnus.dtsi
> +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
> @@ -142,6 +142,56 @@
>  			interrupts = <0>;
>  		};
>
> +		mdio: mdio@18002000 {
> +			compatible = "brcm,iproc-mdio";
> +			reg = <0x18002000 0x8>;
> +			#size-cells = <1>;
> +			#address-cells = <0>;
> +
> +			gphy0: eth-gphy@0 {

    The node anmes must be generic, the DT spec has standardized 
"ethernet-phy" name for this case.

> +				reg = <0>;
> +				max-speed = <1000>;
> +			};
> +
> +			gphy1: eth-gphy@1 {
> +				reg = <1>;
> +				max-speed = <1000>;
> +			};
> +		};
[...]
> @@ -295,6 +345,16 @@
>  			status = "disabled";
>  		};
>
> +		eth0: enet@18042000 {
> +			compatible = "brcm,amac";
> +			reg = <0x18042000 0x1000>,
> +			      <0x18110000 0x1000>;
> +			reg-names = "amac_base", "idm_base";

    I don't think "_base" suffixes are necessary here.

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/2] net: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.
From: Sergei Shtylyov @ 2017-04-25  9:35 UTC (permalink / raw)
  To: Eric Anholt, Florian Fainelli, Vivien Didelot, Andrew Lunn,
	netdev, Rob Herring, Mark Rutland, devicetree
  Cc: Scott Branden, Jon Mason, Ray Jui, linux-kernel,
	bcm-kernel-feedback-list, linux-arm-kernel
In-Reply-To: <20170424215022.30382-2-eric@anholt.net>

Hello!

On 4/25/2017 12:50 AM, Eric Anholt wrote:

> Cygnus is a small family of SoCs, of which we currently have
> devicetree for BCM11360 and BCM58300.  The 11360's B53 is mostly the
> same as 58xx, just requiring a tiny bit of setup that was previously
> missing.
>
> Signed-off-by: Eric Anholt <eric@anholt.net>
> ---
>  Documentation/devicetree/bindings/net/dsa/b53.txt | 3 +++
>  drivers/net/dsa/b53/b53_srab.c                    | 2 ++
>  2 files changed, 5 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt b/Documentation/devicetree/bindings/net/dsa/b53.txt
> index d6c6e41648d4..49c93d3c0839 100644
> --- a/Documentation/devicetree/bindings/net/dsa/b53.txt
> +++ b/Documentation/devicetree/bindings/net/dsa/b53.txt
> @@ -29,6 +29,9 @@ Required properties:
>        "brcm,bcm58625-srab"
>        "brcm,bcm88312-srab" and the mandatory "brcm,nsp-srab string
>
> +  For the BCM11360 SoC, must be:
> +      "brcm,bcm11360-srab" and the mandatory "brcm,cygnus-srab string

     Missing closing quote here and above?

[...]

MBR, Sergei

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: Jesper Dangaard Brouer @ 2017-04-25  9:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Andy Gospodarek, Daniel Borkmann,
	Alexei Starovoitov, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, John Fastabend, brouer
In-Reply-To: <20170420171006.GA97067@ast-mbp.thefacebook.com>

On Thu, 20 Apr 2017 10:10:08 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Apr 20, 2017 at 08:10:51AM +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 19 Apr 2017 19:56:13 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >   
> > > On Thu, Apr 20, 2017 at 12:51:31AM +0200, Daniel Borkmann wrote:  
> > > > 
> > > > Is there a concrete reason that all the proposed future cases like sockets
> > > > have to be handled within the very same XDP_REDIRECT return code? F.e. why
> > > > not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future
> > > > ones would get a different return code f.e. XDP_TX_SK only handling sockets
> > > > when we get there implementation-wise?    
> > > 
> > > yeah. let's keep redirect to sockets, tunnels, crypto and exotic things
> > > out of this discussion.
> > > XDP_REDIRECT should assume L2 raw packet is being redirected to another L2 netdev.
> > > If we make it too generic it will lose performance.
> > > 
> > > For cls_bpf the ifindex concept is symmetric. The program can access it as
> > > skb->ifindex on receive and can redirect to another ifindex via bpf_redirect() helper.
> > > Since netdev is not locked, it's actually big plus, since container management
> > > control plane can simply delete netns+veth and it goes away. The program can
> > > have dangling ifindex (if control plane is buggy and didn't update the bpf side),
> > > but it's harmless. Packets that redirect to non-existing ifindex are dropped.
> > > This approach already understood and works well, so for XDP I suggest to use
> > > the same approach initially before starting to reinvent the wheel.
> > > struct xdp_md needs ifindex field and xdp_redirect() helper that redirects
> > > to L2 netdev only. That's it. Simple and easy.
> > > I think the main use cases in John's and Jesper's minds is something like
> > > xdpswitch where packets are coming from VMs and from physical eths and
> > > being redirected to either physical eth or to VM via upcoming vhost+xdp support.
> > > This xdp_md->ifindex + xdp_redirect(to_ifindex) will solve it just fine.
> > > 
> > > Once we have vhost+xdp and all other bits implemented, we must come back
> > > to this discussion about having port mapping table. As I mentioned
> > > during netconf I think it's very useful, but I don't think we should
> > > gate vhost+xdp and xdp_redirect work on this discussion.
> > > As far as this port mapping table we would need 'port' field in xdp_md as well
> > > and xdp_redirect_port() done via helper or via extra 'out_port' field in xdp_md
> > > plus another XDP_REDIRECT_PORT action code.  
> > 
> > Guess it would be easier to talk about if we name it "ingress_port" and
> > "egress_port".
> >   
> > > The actual port table (array) should be populated by user space with netdevs
> > > and these netdev will have their refcnt incremented. Then we'll have discussion
> > > what to do with netdev_unregister notifiers, whether they should be auto-removed
> > > from port table or bpf should have a chance to be notified and act on it.
> > > Such port mapping will allow us to optimize inevitable call
> > > dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
> > > away, since netdevs will be stored in the port table and direct deref
> > > port_map_array[xdp_md->out_port] will give us target netdev quickly.  
> > 
> > I agree with above paragraph, and is happy that you can see that this
> > will actually be faster than using ifindex'es.
> >   
> > > It's nice optimization and there are other more powerful optimizations we
> > > can do with such port table (since we will know in advance which netdevs
> > > the program will be redirecting too), but I still think we should do
> > > ifindex based xdp_redirect first and only add this port table later.  
> > 
> > No, we cannot first do an ifindex based xdp_redirect. The point of the
> > port table is to sandbox which ports XDP can use.  
> 
> hmm. port table cannot sandbox the ports. The only thing it does
> from 'safety' point of view is moving the checks from run-time into
> static insertion time.
> So the checks that we would do on netdev after looking it up
> based on ifindex are the same checks we will do at insertion time
> into port table. The user space will insert/delete them live
> from that port table, so from program point of view it's exactly
> the same as ifindex. The ports can disappear and can be added
> while the program is running.

I agree, that from the eBPF programs point of view using an ifindex or a
port number is the same.  And I do like this model, that this is just a
number seem from bpf.  It provides a clean separation between the
kernel and ebpf program world.


> Note the very first bpf patchset years ago contained the port table
> abstraction. ovs has concept of vports as well. These two very
> different projects needed port table to provide a layer of
> indirection between ifindex==netdev and virtual port number.
> This is still the case and I'd like to see this port table to be
> implemented for both cls_bpf and xdp. In that sense xdp is not
> special.

Glad to hear you want to see this implemented, I will start coding on
this then.  Good point with cls_bpf, I was planning to make this port
table strongly connected to XDP, guess I should also think of cls_bpf.


> > XDP is different than TC/cls_bpf, as it does "bypass", there is no
> > other layer that can stop or inspect these packets. The TC hooks
> > redirect into the network stack, which have all the usual facilities
> > available for filtering, inspection and debugging what is going on
> > (e.g. tcpdump works for TC redirect).  
> 
> not true. when bpf_redirect() drops the packet due to incorrect ifindex
> that packet disappears without a trace. No tcpdump and no counter.
> And this is fine. We can add tracepoint there for debugging,
> but it wasn't a problem for anyone who's using it today, so it's
> 'nice to have', but certainly not mandatory.
 
I'm not worried about the DROP case, I agree that is fine (as you also
say).  The problem is unintentionally sending a packet to a wrong
ifindex.  This is clearly an eBPF program error, BUT with XDP this
becomes a very hard to debug program error.  With TC-redirect/cls_bpf
we can tcpdump the packets, with XDP there is no visibility into this
happening (the NSA is going to love this "feature").  Maybe we could add
yet-another tracepoint to allow debugging this.  My proposal that we
simply remove the possibility for such program errors, by as you say
move the validation from run-time into static insertion-time, via a
port table.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH v3] brcmfmac: Make skb header writable before use
From: James Hughes @ 2017-04-25  9:15 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: James Hughes

The driver was making changes to the skb_header without
ensuring it was writable (i.e. uncloned).
This patch also removes some boiler plate header size
checking/adjustment code as that is also handled by the
skb_cow_header function used to make header writable.

Please apply to 4.12, important fix.

This patch depends on
brcmfmac: Ensure pointer correctly set if skb data location changes

Signed-off-by: James Hughes <james.hughes-FnsA7b+Nu9XbIbC87yuRow@public.gmane.org>
Acked-by: Arend van Spriel <arend.vanspriel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---

Changes in v2
   Makes the _cow_ call at the entry point of the skb in to the
   stack, means only needs to be done once, and error handling
   is easier.
   Split a separate minor bug fix off to a separate patch (which
   this patch depends on)

Changes in v3
   Minor change to the 'if' logic to reduce patch size as per
   maintainers request.
   Flagged as important fix for 4.12 in commit message

 .../net/wireless/broadcom/brcm80211/brcmfmac/core.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 9b7c19a508ac..433f2c8408e9 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -211,22 +211,13 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 		goto done;
 	}

-	/* Make sure there's enough room for any header */
-	if (skb_headroom(skb) < drvr->hdrlen) {
-		struct sk_buff *skb2;

^ permalink raw reply related

* Re: [RFC 1/4] netlink: make extended ACK setting NULL-friendly
From: Daniel Borkmann @ 2017-04-25  9:12 UTC (permalink / raw)
  To: Jakub Kicinski, netdev
  Cc: davem, johannes, dsa, alexei.starovoitov, bblanco, john.fastabend,
	kubakici, oss-drivers
In-Reply-To: <20170425080644.122536-2-jakub.kicinski@netronome.com>

On 04/25/2017 10:06 AM, Jakub Kicinski wrote:
> As we propagate extended ack reporting throughout various paths in
> the kernel it may happen that the same function is called with the
> extended ack parameter passed as NULL.  Make the NL_SET_ERR_MSG()
> macro simply print the message to the logs if that happens.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>   include/linux/netlink.h | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/netlink.h b/include/linux/netlink.h
> index 8d2a8924705c..b59cfbf2e2c7 100644
> --- a/include/linux/netlink.h
> +++ b/include/linux/netlink.h
> @@ -86,10 +86,14 @@ struct netlink_ext_ack {
>    * Currently string formatting is not supported (due
>    * to the lack of an output buffer.)
>    */
> -#define NL_SET_ERR_MSG(extack, msg) do {	\
> -	static const char _msg[] = (msg);	\
> -						\
> -	(extack)->_msg = _msg;			\
> +#define NL_SET_ERR_MSG(extack, msg) do {		\
> +	struct netlink_ext_ack *_extack = (extack);	\
> +	static const char _msg[] = (msg);		\
> +							\
> +	if (_extack)					\
> +		_extack->_msg = _msg;			\
> +	else						\
> +		pr_info("%s\n", _msg);			\
>   } while (0)
>
>   extern void netlink_kernel_release(struct sock *sk);

Probably makes sense to have a NL_MOD_SET_ERR_MSG(), which
then also prepends a KBUILD_MODNAME ": " string to the
message (similar to pr_*()), so that it's easier to identify
whether the error came from a specific driver or rather
common core code?

^ permalink raw reply

* Re: [PATCH net v3] bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port
From: Pablo Neira Ayuso @ 2017-04-25  9:10 UTC (permalink / raw)
  To: Linus Lüssing
  Cc: netdev, bridge, linux-kernel, coreteam, netfilter-devel,
	Jozsef Kadlecsik, David S . Miller
In-Reply-To: <20170419194733.19006-1-linus.luessing@c0d3.blue>

On Wed, Apr 19, 2017 at 09:47:33PM +0200, Linus Lüssing wrote:
> When trying to redirect bridged frames to the bridge device itself or
> a bridge port (brouting) via the dnat target then this currently fails:
> 
> The ethernet destination of the frame is dnat'ed to the MAC address of
> the bridge device or port just fine. However, the IP code drops it in
> the beginning of ip_input.c/ip_rcv() as the dnat target left
> the skb->pkt_type as PACKET_OTHERHOST.
> 
> Fixing this by resetting skb->pkt_type to an appropriate type after
> dnat'ing.

Applied, thanks.

One comment below.
> @@ -18,11 +19,32 @@ static unsigned int
>  ebt_dnat_tg(struct sk_buff *skb, const struct xt_action_param *par)
>  {
>  	const struct ebt_nat_info *info = par->targinfo;
> +	struct net_device *dev;
>  
>  	if (!skb_make_writable(skb, 0))
>  		return EBT_DROP;
>  
>  	ether_addr_copy(eth_hdr(skb)->h_dest, info->mac);
> +
> +	if (is_multicast_ether_addr(info->mac)) {
> +		if (is_broadcast_ether_addr(info->mac))
> +			skb->pkt_type = PACKET_BROADCAST;
> +		else
> +			skb->pkt_type = PACKET_MULTICAST;
> +	} else {
> +		rcu_read_lock();

I'm going to manually remove this explicit rcu_read_lock() here, no
need to resend. We're guaranteed to run from packet path with read
side lock from netfilter hooks. So we just save some cycles from
running this unnecessary nesting.

Let me know if I'm missing anything. Thanks!

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return uninitialized stats
From: Jeff Kirsher @ 2017-04-25  9:07 UTC (permalink / raw)
  To: Brown, Aaron F, Benjamin Poirier, Neftin, Sasha, David S Miller,
	stephen
  Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	Kirsher@f1.synalogic.ca, Stefan Priebe
In-Reply-To: <309B89C4C689E141A5FF6A0C5FB2118B8C5EF4F7@ORSMSX101.amr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 5660 bytes --]

On Tue, 2017-04-25 at 07:10 +0000, Brown, Aaron F wrote:
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.
> > org] On
> > Behalf Of Benjamin Poirier
> > Sent: Monday, April 24, 2017 12:10 PM
> > To: Neftin, Sasha <sasha.neftin@intel.com>
> > Cc: Kirsher@f1.synalogic.ca; Stefan Priebe <s.priebe@profihost.ag>;
> > netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> > Subject: Re: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return
> > uninitialized
> > stats
> > 
> > Sasha, please use reply-all to keep everyone in cc (including
> > me...).
> > 
> > On 2017/04/24 11:17, Neftin, Sasha wrote:
> > > On 4/23/2017 15:53, Neftin, Sasha wrote:
> > > > -----Original Message-----
> > > > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osu
> > > > osl.org]
> > 
> > On Behalf Of Benjamin Poirier
> > > > Sent: Saturday, April 22, 2017 00:20
> > > > To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> > > > Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org;
> > > > Stefan
> > 
> > Priebe <s.priebe@profihost.ag>
> > > > Subject: [Intel-wired-lan] [PATCH 1/2] e1000e: Don't return
> > > > uninitialized
> > 
> > stats
> > > > 
> > > > Some statistics passed to ethtool are garbage because
> > 
> > e1000e_get_stats64() doesn't write them, for example:
> > tx_heartbeat_errors.
> > This leaks kernel memory to userspace and confuses users.
> > > > 
> > > > Do like ixgbe and use dev_get_stats() which first zeroes out
> > 
> > rtnl_link_stats64.
> > > > 
> > > > Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> > > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > > > ---
> > > >    drivers/net/ethernet/intel/e1000e/ethtool.c | 2 +-
> > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c
> > 
> > b/drivers/net/ethernet/intel/e1000e/ethtool.c
> > > > index 7aff68a4a4df..f117b90cdc2f 100644
> > > > --- a/drivers/net/ethernet/intel/e1000e/ethtool.c
> > > > +++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
> > > > @@ -2063,7 +2063,7 @@ static void
> > > > e1000_get_ethtool_stats(struct
> > 
> > net_device *netdev,
> > > >            pm_runtime_get_sync(netdev->dev.parent);
> > > > - e1000e_get_stats64(netdev, &net_stats);
> > > > + dev_get_stats(netdev, &net_stats);
> > > >            pm_runtime_put_sync(netdev->dev.parent);
> > > > --
> > > > 2.12.2
> > > > 
> > > > _______________________________________________
> > > > Intel-wired-lan mailing list
> > > > Intel-wired-lan@lists.osuosl.org
> > > > http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> > > 
> > > Hello,
> > > 
> > > We would like to not accept this patch. Suggested generic method
> > > '*dev_get_stats' (net/core/dev.c) calls 'ops->ndo_get_stats64'
> > > method
> > 
> > which
> > > eventually calls e1000e_get_stats64 (netdev.c) - so there is same
> > > functionality. Also, see that 'e1000e_get_stats64' method in
> > > netdev.c (line
> > 
> > No, it's not the same functionality because dev_get_stats() does a
> > memset on the rtnl_link_stats64 struct.
> > 
> > > 5928) calls 'memset' with 0's before update statistics.  Local
> > > sanity check
> > 
> > I don't see any memset in e1000e_get_stats64(). What kernel version
> > are
> > you looking at?
> 
> The call to memset was removed from the upstream kernel with:
> -------------------------------------------------------------------
> -----------------
> commit 5944701df90d9577658e2354cc27c4ceaeca30fe
> Author: stephen hemminger <stephen@networkplumber.org>
> Date:   Fri Jan 6 19:12:53 2017 -0800
> 
>     net: remove useless memset's in drivers get_stats64
> 
>     In dev_get_stats() the statistic structure storage has already
> been
>     zeroed. Therefore network drivers do not need to call memset()
> again.
> ...
> < changes to other drivers snipped out >
> ...
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/int
> index 723025b..79651eb 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -5925,7 +5925,6 @@ void e1000e_get_stats64(struct net_device
> *netdev,
>  {
>         struct e1000_adapter *adapter = netdev_priv(netdev);
> 
> -       memset(stats, 0, sizeof(struct rtnl_link_stats64));
>         spin_lock(&adapter->stats64_lock);
>         e1000e_update_stats(adapter);
>         /* Fill out the OS statistics structure */
> -------------------------------------------------------------------
> -----------------
> 
> This also is where the bad counters start to show up for e1000e for
> my test systems.  From this driver on I see (very) large values for
> tx_dropped, rx_over_errors and tx_fifo_errors on driver load (even
> before bringing the interface up.  It seems the memset is not so
> useless for this driver after all.  Would simply reverting the e1000e
> portion of this patch resolve the issue?

Looks like Aaron beat me to the punch on pointing out that we had this
very code in there before.  It appears that Stephen's
assertion/assumption was incorrect about the stats structure being
zero'd out, which is why we are seeing the issue.

I have no issue reverting Stephen's earlier patch, or do we want to
pursue why the stats structure is not zero'd out and resolve that
instead.  Either way, just want to make sure we are all on the same
page as to the right solution so that we do not end up repeating this
in the future.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2] cpsw: ethtool: add support for getting/setting EEE registers
From: Niklas Cassel @ 2017-04-25  9:06 UTC (permalink / raw)
  To: Florian Fainelli, Giuseppe CAVALLARO, Andrew Lunn, Yegor Yefremov
  Cc: netdev, linux-omap@vger.kernel.org, Grygorii Strashko,
	N, Mugunthan V, Rami Rosen, Fabrice GASNIER, rmk+kernel
In-Reply-To: <68e1a232-2fde-da42-d49f-20ea1fec6dbf@gmail.com>



On 04/18/2017 06:40 PM, Florian Fainelli wrote:
> On 04/18/2017 06:23 AM, Niklas Cassel wrote:
>> On 01/04/2017 03:33 PM, Florian Fainelli wrote:
>>> On 12/02/2016 09:48 AM, Florian Fainelli wrote:
>>>>>> Peppe, any thoughts on this?
>>>>>
>>>>> I share what you say.
>>>>>
>>>>> In sum, the EEE management inside the stmmac is:
>>>>>
>>>>> - the driver looks at own HW cap register if EEE is supported
>>>>>
>>>>>     (indeed the user could keep disable EEE if bugged on some HW
>>>>>      + Alex, Fabrice: we had some patches for this to propose where we
>>>>>              called the phy_ethtool_set_eee to disable feature at phy
>>>>>              level
>>>>>
>>>>> - then the stmmac asks PHY layer to understand if transceiver and
>>>>>   partners are EEE capable.
>>>>>
>>>>> - If all matches the EEE is actually initialized.
>>>>>
>>>>> the logic above should be respected when use ethtool, hmm, I will
>>>>> check the stmmac_ethtool_op_set_eee asap.
>>>>>
>>>>> Hoping this is useful
>>>>
>>>> This is definitively useful, the only part that I am struggling to
>>>> understand in phy_init_eee() is this:
>>>>
>>>>                 eee_adv = phy_read_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
>>>>                                                 MDIO_MMD_AN);
>>>>                 if (eee_adv <= 0)
>>>>                         goto eee_exit_err;
>>>>
>>>> if we are not already advertising EEE in the PHY's MMIO_MMD_AN page, by
>>>> the time we call phy_init_eee(), then we cannot complete the EEE
>>>> configuration at the PHY level, and presumably we should abort the EEE
>>>> configuration at the MAC level.
>>>>
>>>> While this condition makes sense if e.g: you are re-negotiating the link
>>>> with your partner for instance and if EEE was already advertised, the
>>>> very first time this function is called, it seems to be like we should
>>>> skip the check, because phy_init_eee() should actually tell us if, as a
>>>> result of a successful check, we should be setting EEE as something we
>>>> advertise?
>>>>
>>>> Do you remember what was the logic behind this check when you added it?
>>>
>>> Peppe, can you remember why phy_init_eee() was written in a way that you
>>> need to have already locally advertised EEE for the function to
>>> successfully return? Thank you!
>>>
>>
>> I'm curious about this as well.
>>
>> I can get EEE to work with stmmac, but to be able to turn EEE on,
>> I need to set eee advertise via ethtool first.
>> (Tested with 2 different PHYs from different vendors, with their
>> PHY specific driver enabled.)
>>
>> Is this the same for all PHYs or are there certain PHYs/PHY drivers
>> that actually advertise eee by default?
> 
> It depends on whether the PHY driver takes care of the EEE advertisement
> part for your or not, most drivers probably don't do that.
> 
>> (From reading this mail thread there seems to be a suggestion that
>> the broadcom PHY driver might advertise eee by default.)
> 
> As written before, some (not all) Broadcom PHY drivers (cygnus, 7xxx) do
> advertise EEE by default in order to validate the first check done in
> phy_init_eee(), but that's the only reason really.
> 
> Since we have not been able to get a straight answer from Peppe about
> why there is this initial check, I think the cleanest path moving
> forward is the following:
> 
> - rename phy_init_eee() into something like: phy_can_do_eee() and remove
> the first check on whether EEE is already advertised because that's
> precisely what we are trying to determine with this function
> 
> - Ethernet MAC drivers keep calling phy_can_do_eee() (formerly
> phy_init_eee()) during their adjust_link callback in order to
> re-negotiate EEE with their link partner, just like they should call
> phy_ethtool_set_eee() to really enable EEE the first time they want to
> enable EEE with the link partner
> 
> - remove the part from phy_init_eee() that tries to stop the PHY TX
> clock and provide a set of helpers: phy_can_stop_tx_clk() and
> phy_set_stop_tx_clk() which will take care of that
> 
> Does that look reasonable?


Sounds very reasonable to me.

However, if I look specifically at the stmmac driver,
stmmac_eee_init() is called from adjust_link callback.

If we replace phy_init_eee() with a phy_can_do_eee()
in stmmac_eee_init(), then the driver will enable
EEE in the IP, and setup timers etc.


If I understand you correctly, the code in the adjust_link
callback should call phy_can_do_eee() so that the PHY
re-negotiate EEE with the link partner.

You will still need to use ethtool to actually enable it in the
PHY (call the new phy_init_eee()).
(Which sounds good, since we probably do not want to suddenly
enable EEE by default in a lot of drivers.)


The issue that I see is that we probably do not want to
setup timers, etc. in the adjust_link callback before
EEE has actually been enabled, so it might not be as
easy as just replacing phy_init_eee() with phy_can_do_eee()
in some drivers.

^ permalink raw reply

* Re: [RFC 0/4] xdp: use netlink extended ACK reporting
From: Daniel Borkmann @ 2017-04-25  9:05 UTC (permalink / raw)
  To: Jakub Kicinski, netdev
  Cc: davem, johannes, dsa, alexei.starovoitov, bblanco, john.fastabend,
	kubakici, oss-drivers
In-Reply-To: <20170425080644.122536-1-jakub.kicinski@netronome.com>

On 04/25/2017 10:06 AM, Jakub Kicinski wrote:
> Hi!
>
> This series is an attempt to make XDP more user friendly by
> enabling exploiting the recently added netlink extended ACK
> reporting to carry messages to user space.
>
> I made iproute2 parse the extended messages and have it showing
> the errors like this:
>
> # ip link set dev p4p1 xdp obj ipip_prepend.o sec ".text"
> RTNETLINK answers: Invalid argument (MTU too large w/ XDP enabled)
>
> Where the message is coming directly from the driver.  There could
> still be a bit of a leap for a complete novice from the message
> above to the right settings.  I wonder if it would be worthwhile

But still 100x better than the current situation. ;) I really
like the series, thanks for working on this!

> adding #defines for the most common configuration conflicts?
> Sharing the messages verbatim between drivers could make them easier
> to google.

Makes sense, once more drivers adapt to this reporting, these
messages could be consolidated.

> Also - is anyone working on adding proper extack support to iproute2?
> The code I have right now is a bit of a hack...
>
> Jakub Kicinski (4):
>    netlink: make extended ACK setting NULL-friendly
>    xdp: propagate extended ack to XDP setup
>    nfp: make use of extended ack message reporting
>    virtio_net: make use of extended ack message reporting
>
>   drivers/net/ethernet/netronome/nfp/nfp_net.h       |  3 ++-
>   .../net/ethernet/netronome/nfp/nfp_net_common.c    | 22 +++++++++++++---------
>   .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  4 ++--
>   drivers/net/virtio_net.c                           | 11 +++++++----
>   include/linux/netdevice.h                          | 10 ++++++++--
>   include/linux/netlink.h                            | 12 ++++++++----
>   net/core/dev.c                                     |  5 ++++-
>   net/core/rtnetlink.c                               | 13 ++++++++-----
>   8 files changed, 52 insertions(+), 28 deletions(-)
>

^ permalink raw reply

* Re: [PATCH] stmmac: Add support for SIMATIC IOT2000 platform
From: Jan Kiszka @ 2017-04-25  9:00 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Giuseppe Cavallaro, Alexandre Torgue, netdev,
	Linux Kernel Mailing List, Sascha Weisenberger
In-Reply-To: <CAHp75VcTzBL4D6nobfDAPQOfEEp-RzBp4BANKYDSaaEtGCyj9A@mail.gmail.com>

On 2017-04-25 09:30, Andy Shevchenko wrote:
> On Tue, Apr 25, 2017 at 8:44 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2017-04-24 23:27, Andy Shevchenko wrote:
>>> On Mon, Apr 24, 2017 at 10:27 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> The IOT2000 is industrial controller platform, derived from the Intel
>>>> Galileo Gen2 board. The variant IOT2020 comes with one LAN port, the
>>>> IOT2040 has two of them. They can be told apart based on the board asset
>>>> tag in the DMI table.
> 
>>>> +       const char *asset_tag;
>>>
>>> I guess this is redundant. See below.
>>>
>>>> +       {
>>>> +               .name = "SIMATIC IOT2000",
>>>> +               .asset_tag = "6ES7647-0AA00-0YA2",
>>>> +               .func = 6,
>>>> +               .phy_addr = 1,
>>>> +       },
>>>
>>> The below has same definition disregard on asset_tag.
>>>
>>
>> There is a small difference in the asset tag, just not at the last digit
>> where one may expect it, look:
>>
>> ...-0YA2 -> IOT2020
>> ...-1YA2 -> IOT2040
> 
> Yes. And how does it change my statement? You may use one record here
> instead of two.

How? Please be more verbose in your comments.

> 
>>
>>>> +       {
>>>> +               .name = "SIMATIC IOT2000",
>>>> +               .asset_tag = "6ES7647-0AA00-1YA2",
>>>> +               .func = 6,
>>>> +               .phy_addr = 1,
>>>> +       },
> 
>>>> +       {
>>>> +               .name = "SIMATIC IOT2000",
>>>> +               .asset_tag = "6ES7647-0AA00-1YA2",
>>>> +               .func = 7,
>>>> +               .phy_addr = 1,
>>>> +       },
>>>
>>> How this supposed to work if phy_addr is the same?
>> That address space is MAC-local, and we have two different MACs here.
> 
> Got it, though asset_tag here is redundant as well.
> 

It's not as it is the only differentiating criteria to tell the
two-ports variant apart from the one-port (and to avoid confusing it
with any potential future variant). We could leave out the name, but I
kept it for documentation purposes.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply

* Re: [PATCH 0/8] Fix and complete CAN namespace support
From: Marc Kleine-Budde @ 2017-04-25  8:49 UTC (permalink / raw)
  To: Oliver Hartkopp, linux-can, davem; +Cc: dev, netdev
In-Reply-To: <d852c2e7-f543-ecea-01f0-9f66d8741476@hartkopp.net>


[-- Attachment #1.1: Type: text/plain, Size: 742 bytes --]

On 04/25/2017 10:48 AM, Oliver Hartkopp wrote:
> On 04/25/2017 10:45 AM, Marc Kleine-Budde wrote:
> 
>> FYI:
>> This series is included in the latest pull request:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git
>> tags/linux-can-next-for-4.12-20170425
>>
> 
> Ok, thanks!
> 
> Sorry for pushing a bit harder this time. But I wanted to make sure that 
> we don't get incomplete stuff into the merge window.

np :D

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] can: ti_hecc: fix return value check in ti_hecc_probe()
From: Marc Kleine-Budde @ 2017-04-25  8:49 UTC (permalink / raw)
  To: Wei Yongjun, Wolfgang Grandegger, Anton Glukhov, Yegor Yefremov
  Cc: Wei Yongjun, linux-can, netdev
In-Reply-To: <20170425064405.3964-1-weiyj.lk@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 734 bytes --]

On 04/25/2017 08:44 AM, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> In case of error, the function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
> 
> Fixes: dabf54dd1c63 ("can: ti_hecc: Convert TI HECC driver to DT only driver")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Added to linux-can-next.

Tnx,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 0/8] Fix and complete CAN namespace support
From: Oliver Hartkopp @ 2017-04-25  8:48 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, davem; +Cc: dev, netdev
In-Reply-To: <9a88aee4-ba8c-00e7-2525-e300d905e593@pengutronix.de>

On 04/25/2017 10:45 AM, Marc Kleine-Budde wrote:

> FYI:
> This series is included in the latest pull request:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git
> tags/linux-can-next-for-4.12-20170425
>

Ok, thanks!

Sorry for pushing a bit harder this time. But I wanted to make sure that 
we don't get incomplete stuff into the merge window.

Regards,
Oliver


^ permalink raw reply

* Re: [PATCH 0/8] Fix and complete CAN namespace support
From: Marc Kleine-Budde @ 2017-04-25  8:45 UTC (permalink / raw)
  To: Oliver Hartkopp, linux-can, davem; +Cc: dev, netdev
In-Reply-To: <8fc17f07-f165-970b-f96a-1d920302fd00@pengutronix.de>


[-- Attachment #1.1: Type: text/plain, Size: 1105 bytes --]

On 04/25/2017 08:48 AM, Marc Kleine-Budde wrote:
> On 04/25/2017 08:19 AM, Oliver Hartkopp wrote:
>> Hello Dave,
>>
>> unfortunately the initial network namespace support by Mario Kicherer
>> (8e8cda6d737d) slipped into net-next without further review and Marc pushed
>> the code without my Acked-by. Due to the fact that this code was in net-next
>> now I spent some nights to fix, clean up, finalize and test the missing pieces
>> for the namespace support for the CAN subsystem in net/can.
>>
>> As Marc is currently *VERY* unresponsive on the mailing list due to his 'real'
> 
> ... and holidays :) But I'm back in the office now.

FYI:
This series is included in the latest pull request:

git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git
tags/linux-can-next-for-4.12-20170425

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* pull-request: can-next 2017-04-25
From: Marc Kleine-Budde @ 2017-04-25  8:44 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, kernel@pengutronix.de, linux-can@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 4994 bytes --]

Hello David,

this is a pull request of 21 patches for net-next/master.

There are 4 patches by Stephane Grosjean for the PEAK PCAN-PCIe FD
CAN-FD boards. The next 7 patches are by Mario Huettel, which add
support for M_CAN IP version >= v3.1.x to the m_can driver. A patch by
Remigiusz Kołłątaj adds support for the Microchip CAN BUS Analyzer. 8
patches by Oliver Hartkopp complete the initial CAN network namespace
support. Wei Yongjun's patch for the ti_hecc driver fixes the return
value check in the probe function.

Marc

---

The following changes since commit 14933dc8d9964e46f1d5bd2a4dfe3d3be8e420e0:

  sparc64: Improve 64-bit constant loading in eBPF JIT. (2017-04-24 20:32:15 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git tags/linux-can-next-for-4.12-20170425

for you to fetch changes up to b655f0e96d4061eac42dea2dccd37a3348d1f3f3:

  can: ti_hecc: fix return value check in ti_hecc_probe() (2017-04-25 10:03:40 +0200)

----------------------------------------------------------------
linux-can-next-for-4.12-20170425

----------------------------------------------------------------
Mario Huettel (7):
      can: m_can: Disabled Interrupt Line 1
      can: m_can: Removed initialization of FIFO water marks
      can: m_can: Removed virtual address from print
      can: m_can: Updated register defines to newest version
      can: m_can: Enable M_CAN version dependent initialization
      can: m_can: Configuration for TX and TX event FIFOs
      can: m_can: Enable TX FIFO Handling for M_CAN IP version >= v3.1.x

Oliver Hartkopp (8):
      can: fix memory leak in initial namespace support
      can: remove obsolete pernet_operations definitions
      can: remove obsolete definitions
      can: complete initial namespace support
      can: network namespace support for CAN_BCM protocol
      can: network namespace support for CAN gateway
      can: add Virtual CAN Tunnel driver (vxcan)
      can: enable module auto loading for virtual CAN interfaces

Remigiusz Kołłątaj (1):
      can: mcba_usb: Add support for Microchip CAN BUS Analyzer

Stephane Grosjean (4):
      can: peak: fix usage of usb specific data type
      can: peak: fix usage of const qualifier in pointers args
      can: peak: move header file to new can common subdir
      can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards

Wei Yongjun (1):
      can: ti_hecc: fix return value check in ti_hecc_probe()

 drivers/net/can/Kconfig                            |  19 +
 drivers/net/can/Makefile                           |   2 +
 drivers/net/can/m_can/m_can.c                      | 752 +++++++++++++----
 drivers/net/can/peak_canfd/Kconfig                 |  13 +
 drivers/net/can/peak_canfd/Makefile                |   5 +
 drivers/net/can/peak_canfd/peak_canfd.c            | 801 ++++++++++++++++++
 drivers/net/can/peak_canfd/peak_canfd_user.h       |  55 ++
 drivers/net/can/peak_canfd/peak_pciefd_main.c      | 842 +++++++++++++++++++
 drivers/net/can/ti_hecc.c                          |  12 +-
 drivers/net/can/usb/Kconfig                        |   6 +
 drivers/net/can/usb/Makefile                       |   1 +
 drivers/net/can/usb/mcba_usb.c                     | 904 +++++++++++++++++++++
 drivers/net/can/usb/peak_usb/pcan_usb_fd.c         |  25 +-
 drivers/net/can/vcan.c                             |   7 +-
 drivers/net/can/vxcan.c                            | 316 +++++++
 include/linux/can/core.h                           |   4 +-
 .../linux/can/dev/peak_canfd.h                     |  86 +-
 include/net/netns/can.h                            |   9 +
 include/uapi/linux/can/vxcan.h                     |  12 +
 net/can/af_can.c                                   |  77 +-
 net/can/af_can.h                                   |   9 -
 net/can/bcm.c                                      |  90 +-
 net/can/gw.c                                       |  72 +-
 net/can/proc.c                                     | 141 ++--
 24 files changed, 3888 insertions(+), 372 deletions(-)
 create mode 100644 drivers/net/can/peak_canfd/Kconfig
 create mode 100644 drivers/net/can/peak_canfd/Makefile
 create mode 100644 drivers/net/can/peak_canfd/peak_canfd.c
 create mode 100644 drivers/net/can/peak_canfd/peak_canfd_user.h
 create mode 100644 drivers/net/can/peak_canfd/peak_pciefd_main.c
 create mode 100644 drivers/net/can/usb/mcba_usb.c
 create mode 100644 drivers/net/can/vxcan.c
 rename drivers/net/can/usb/peak_usb/pcan_ucan.h => include/linux/can/dev/peak_canfd.h (73%)
 create mode 100644 include/uapi/linux/can/vxcan.h

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 0/8] Fix and complete CAN namespace support
From: Oliver Hartkopp @ 2017-04-25  8:43 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, davem; +Cc: dev, netdev
In-Reply-To: <8fc17f07-f165-970b-f96a-1d920302fd00@pengutronix.de>



On 04/25/2017 08:48 AM, Marc Kleine-Budde wrote:
> On 04/25/2017 08:19 AM, Oliver Hartkopp wrote:
>> Hello Dave,
>>
>> unfortunately the initial network namespace support by Mario Kicherer
>> (8e8cda6d737d) slipped into net-next without further review and Marc pushed
>> the code without my Acked-by. Due to the fact that this code was in net-next
>> now I spent some nights to fix, clean up, finalize and test the missing pieces
>> for the namespace support for the CAN subsystem in net/can.
>>
>> As Marc is currently *VERY* unresponsive on the mailing list due to his 'real'
>
> ... and holidays :) But I'm back in the office now.

I created a cleaned patch series to provide more detailed and proper 
patches. And these patches are based on Dave's net-next as you did not 
push your latest changes in mkl/linux-can-next to the public on week ago.

When you manage to get this series integrated today then it would be ok 
for me. Otherwise we should Dave take this series directly.

Thanks,
Oliver

^ permalink raw reply

* Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
From: Jason Wang @ 2017-04-25  8:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Willem de Bruijn
  Cc: Network Development, Willem de Bruijn, David Miller,
	virtualization
In-Reply-To: <20170424194015-mutt-send-email-mst@kernel.org>



On 2017年04月25日 00:40, Michael S. Tsirkin wrote:
> On Fri, Apr 21, 2017 at 10:50:12AM -0400, Willem de Bruijn wrote:
>>>>> Maybe I was wrong, but according to Michael's comment it looks like he
>>>>> want
>>>>> check affinity_hint_set just for speculative tx polling on rx napi
>>>>> instead
>>>>> of disabling it at all.
>>>>>
>>>>> And I'm not convinced this is really needed, driver only provide affinity
>>>>> hint instead of affinity, so it's not guaranteed that tx and rx interrupt
>>>>> are in the same vcpus.
>>>> You're right. I made the restriction broader than the request, to really
>>>> err
>>>> on the side of caution for the initial merge of napi tx. And enabling
>>>> the optimization is always a win over keeping it off, even without irq
>>>> affinity.
>>>>
>>>> The cycle cost is significant without affinity regardless of whether the
>>>> optimization is used.
>>>
>>> Yes, I noticed this in the past too.
>>>
>>>> Though this is not limited to napi-tx, it is more
>>>> pronounced in that mode than without napi.
>>>>
>>>> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:
>>>>
>>>> upstream:
>>>>
>>>> 1,1,1: 28985 Mbps, 278 Gcyc
>>>> 1,0,2: 30067 Mbps, 402 Gcyc
>>>>
>>>> napi tx:
>>>>
>>>> 1,1,1: 34492 Mbps, 269 Gcyc
>>>> 1,0,2: 36527 Mbps, 537 Gcyc (!)
>>>> 1,0,1: 36269 Mbps, 394 Gcyc
>>>> 1,0,0: 34674 Mbps, 402 Gcyc
>>>>
>>>> This is a particularly strong example. It is also representative
>>>> of most RR tests. It is less pronounced in other streaming tests.
>>>> 10x TCP_RR, for instance:
>>>>
>>>> upstream:
>>>>
>>>> 1,1,1: 42267 Mbps, 301 Gcyc
>>>> 1,0,2: 40663 Mbps, 445 Gcyc
>>>>
>>>> napi tx:
>>>>
>>>> 1,1,1: 42420 Mbps, 303 Gcyc
>>>> 1,0,2:  42267 Mbps, 431 Gcyc
>>>>
>>>> These numbers were obtained with the virtqueue_enable_cb_delayed
>>>> optimization after xmit_skb, btw. It turns out that moving that before
>>>> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
>>>> 100x TCP_RR a bit.
>>>
>>> I see, so I think we can leave the affinity hint optimization/check for
>>> future investigation:
>>>
>>> - to avoid endless optimization (e.g we may want to share a single
>>> vector/napi for tx/rx queue pairs in the future) for this series.
>>> - tx napi is disabled by default which means we can do optimization on top.
>> Okay. I'll drop the vi->affinity_hint_set from the patch set for now.
> I kind of like it, let's be conservative. But I'd prefer a comment
> near it explaining why it's there.
>

Another issue for affinity_hint_set is that it could be changed when 
setting channels. I think we've already conservative enough (e.g it was 
disabled by default).

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v3 2/5] virtio-net: transmit napi
From: Jason Wang @ 2017-04-25  8:36 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: mst, virtualization, davem, Willem de Bruijn
In-Reply-To: <20170424174930.82623-3-willemdebruijn.kernel@gmail.com>



On 2017年04月25日 01:49, Willem de Bruijn wrote:
> @@ -1371,8 +1419,10 @@ static int virtnet_close(struct net_device *dev)
>   	/* Make sure refill_work doesn't re-enable napi! */
>   	cancel_delayed_work_sync(&vi->refill);
>   
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		napi_disable(&vi->rq[i].napi);
> +		napi_disable(&vi->sq[i].napi);
> +	}

Looks like this will wait for ever if napi_tx is false because we never 
enable the NAPI so we will wait for NAPI_STATE_SCHED to be cleared.

Thanks

^ permalink raw reply

* Network cooling device and how to control NIC speed on thermal condition
From: Waldemar Rymarkiewicz @ 2017-04-25  8:36 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Hi,

I am not much aware of linux networking architecture so I'd like to
ask first before will start to dig into the code. Appreciate any
feedback.

I am looking on Linux thermal framework and on how to cool down the
system effectively when it hits thermal condition. Already existing
cooling methods cpu_cooling and clock_cooling are good. However, I
wanted to go further and dynamically control also a switch ports'
speed based on thermal condition. Lowering speed means less power,
less power means lower temp.

Is there any in-kernel interface to configure switch port/NIC from other driver?

Is there any mechanism to power save, when port/interface is not
really used (not much or low data traffic), embedded in networking
stack  or is it a task for NIC driver itself ?

I was thinking to create net_cooling device similarly to cpu_cooling
device which cool down the system scaling down cpu freq.  net_cooling
could lower down interface speed (or tune more parameters to achieve
).  Do you thing could this work form networking stack perspective?

Any pointers  to the code or a doc highly appreciated.

Thanks,
/Waldek

^ permalink raw reply

* Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
From: Jesper Dangaard Brouer @ 2017-04-25  8:28 UTC (permalink / raw)
  To: David Miller; +Cc: xdp-newbies, netdev, andy, brouer
In-Reply-To: <20170424.182643.485613135674690555.davem@davemloft.net>

On Mon, 24 Apr 2017 18:26:43 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Mon, 24 Apr 2017 16:24:05 +0200
> 
> > I've done a very detailed evaluation of this patch, and I've created a
> > blogpost like report here:
> > 
> >  https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html  
> 
> Thanks for doing this Jesper.
> 
> > I didn't evaluate the adjust_head part, so I hope Andy is still
> > planning to validate that part?  
> 
> I was hoping he would post some results today as well.
> 
> Andy, how goes it? :)
> 
> Once the basic patch is ready and integrated in we can try to do
> xmit_more in generic XDP and see what that does for XDP_TX
> performance.

I agree, we can do xmit_more for generic-XDP later, and it should not
be that hard... basically replacing netdev_start_xmit() with
dev_hard_start_xmit() in generic_xdp_tx(), and finding some place to
store a XDP-skb-pointer (functioning as the skb-list) that will be
"flushed" like __kfree_skb_flush().

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net v2 1/3] net: hns: support deferred probe when can not obtain irq
From: lipeng (Y) @ 2017-04-25  8:21 UTC (permalink / raw)
  To: Matthias Brugger, Yankejian, davem, salil.mehta, yisen.zhuang,
	huangdaode, zhouhuiru
  Cc: netdev, charles.chenxin, linuxarm
In-Reply-To: <377d3197-9b88-b95a-ae79-4348d1448f92@gmail.com>



On 2017/4/24 22:47, Matthias Brugger wrote:
>
>
> On 24/04/17 13:43, lipeng (Y) wrote:
>>
>>
>> On 2017/4/24 18:28, Matthias Brugger wrote:
>>> On 21/04/17 09:44, Yankejian wrote:
>>>> From: lipeng <lipeng321@huawei.com>
>>>>
>>>> In the hip06 and hip07 SoCs, the interrupt lines from the
>>>> DSAF controllers are connected to mbigen hw module.
>>>> The mbigen module is probed with module_init, and, as such,
>>>> is not guaranteed to probe before the HNS driver. So we need
>>>> to support deferred probe.
>>>>
>>>> We check for probe deferral in the hw layer probe, so we not
>>>> probe into the main layer and memories, etc., to later learn
>>>> that we need to defer the probe.
>>>>
>>>
>>> Why? This looks like a hack.
>>> From what I see, we can handle EPROBE_DEFER easily inside hns_ppe_init
>>> checking the return value of hns_rcb_get_cfg. Like you do in 2/3 of
>>> this series.
>>>
>>> Regards,
>>> Matthias
>> Hi Matthias,
>>
>> mdio && phy is not necessary condition, and port can work well  for port
>> + SFP (without mdio &&phy).
>>
>> BUT irq is the necessary condition,  port can not work well without irq.
>>
>> So, I check IRQ first,and do not probe dsaf if can't obtain irq(1/3 of
>> this series),   and check mdio only when there is phy(2/3 of this 
>> series).
>>
>> And thanks for your review.
>
> I think I didn't explained myself good enough.
> I was suggesting the following (not even compile tested):
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> index eba406bea52f..be38d47bc399 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> @@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
>
>                 hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
>
> -               hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
> +               ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
> +               if (reg < 0)
> +                       goto get_cfg_fail;
>         }
>
>         for (i = 0; i < HNS_PPE_COM_NUM; i++)
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> index c20a0f4f8f02..c7e801d0c3b7 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> @@ -492,7 +492,7 @@ static int hns_rcb_get_base_irq_idx(struct 
> rcb_common_cb *rcb_common)
>   *hns_rcb_get_cfg - get rcb config
>   *@rcb_common: rcb common device
>   */
> -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
> +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
>  {
>         struct ring_pair_cb *ring_pair_cb;
>         u32 i;
> @@ -517,10 +517,18 @@ void hns_rcb_get_cfg(struct rcb_common_cb 
> *rcb_common)
>                 ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
>                 is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 
> + 1) :
>                           platform_get_irq(pdev, base_irq_idx + i * 3);
> +
> +               if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == 
> -EPROBE_DEFER) ||
> +                   (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == 
> -EPROBE_DEFER)) {
> +                       return -EPROBE_DEFER;
> +               }
> +
>                 ring_pair_cb->q.phy_base =
>
> RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
>                 hns_rcb_ring_pair_get_cfg(ring_pair_cb);
>         }
> +
> +       return 0;
>  }
>
>  /**
>
>
> Regards,
> Matthias
Thanks,  I will take your advice and test it.

>
>>
>> lipeng
>>
>>>
>>>> Signed-off-by: lipeng <lipeng321@huawei.com>
>>>> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
>>>> ---
>>>>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 12 ++++++++++++
>>>>  1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>>> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>>> index 403ea9d..2da5b42 100644
>>>> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>>> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>>> @@ -2971,6 +2971,18 @@ static int hns_dsaf_probe(struct
>>>> platform_device *pdev)
>>>>      struct dsaf_device *dsaf_dev;
>>>>      int ret;
>>>>
>>>> +    /*
>>>> +     * Check if we should defer the probe before we probe the
>>>> +     * dsaf, as it's hard to defer later on.
>>>> +     */
>>>> +    ret = platform_get_irq(pdev, 0);
>>>> +    if (ret < 0) {
>>>> +        if (ret != -EPROBE_DEFER)
>>>> +            dev_err(&pdev->dev, "Cannot obtain irq\n");
>>>> +
>>>> +        return ret;
>>>> +    }
>>>> +
>>>>      dsaf_dev = hns_dsaf_alloc_dev(&pdev->dev, sizeof(struct
>>>> dsaf_drv_priv));
>>>>      if (IS_ERR(dsaf_dev)) {
>>>>          ret = PTR_ERR(dsaf_dev);
>>>>
>>>
>>> .
>>>
>>
> .
>

^ permalink raw reply

* Re: [RFC 1/4] netlink: make extended ACK setting NULL-friendly
From: Johannes Berg @ 2017-04-25  8:13 UTC (permalink / raw)
  To: Jakub Kicinski, netdev
  Cc: davem, dsa, daniel, alexei.starovoitov, bblanco, john.fastabend,
	kubakici, oss-drivers
In-Reply-To: <20170425080644.122536-2-jakub.kicinski@netronome.com>

On Tue, 2017-04-25 at 01:06 -0700, Jakub Kicinski wrote:

> +#define NL_SET_ERR_MSG(extack, msg) do {		\
> +	struct netlink_ext_ack *_extack = (extack);	\
> +	static const char _msg[] = (msg);		\
> +							\
> +	if (_extack)					\
> +		_extack->_msg = _msg;			\
> +	else						\
> +		pr_info("%s\n", _msg);			\
>  } while (0)

That's a good point, I used it only for genetlink so far where it was
guaranteed non-NULL.

I'm not really sure about the printing though - I'd rather not people
start relying on that and then we convert to have non-NULL and the
message disappears as a result ...

johannes

^ permalink raw reply

* [RFC 4/4] virtio_net: make use of extended ack message reporting
From: Jakub Kicinski @ 2017-04-25  8:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, johannes, dsa, daniel, alexei.starovoitov, bblanco,
	john.fastabend, kubakici, oss-drivers, Jakub Kicinski
In-Reply-To: <20170425080644.122536-1-jakub.kicinski@netronome.com>

Try to carry error messages to the user via the netlink extended
ack message attribute.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/virtio_net.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 666ada6130ab..96c5bb31f0af 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1788,7 +1788,8 @@ static int virtnet_reset(struct virtnet_info *vi, int curr_qp, int xdp_qp)
 	return ret;
 }
 
-static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			   struct netlink_ext_ack *extack)
 {
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -1800,16 +1801,17 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
 	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
 	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
-		netdev_warn(dev, "can't set XDP while host is implementing LRO, disable LRO first\n");
+		NL_SET_ERR_MSG(extack, "can't set XDP while host is implementing LRO, disable LRO first");
 		return -EOPNOTSUPP;
 	}
 
 	if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
-		netdev_warn(dev, "XDP expects header/data in single page, any_header_sg required\n");
+		NL_SET_ERR_MSG(extack, "XDP expects header/data in single page, any_header_sg required");
 		return -EINVAL;
 	}
 
 	if (dev->mtu > max_sz) {
+		NL_SET_ERR_MSG(extack, "MTU too large to enable XDP");
 		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
 		return -EINVAL;
 	}
@@ -1820,6 +1822,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 
 	/* XDP requires extra queues for XDP_TX */
 	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
+		NL_SET_ERR_MSG(extack, "Too few free TX rings available");
 		netdev_warn(dev, "request %i queues but max is %i\n",
 			    curr_qp + xdp_qp, vi->max_queue_pairs);
 		return -ENOMEM;
@@ -1881,7 +1884,7 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_xdp *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return virtnet_xdp_set(dev, xdp->prog);
+		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
 	case XDP_QUERY_PROG:
 		xdp->prog_attached = virtnet_xdp_query(dev);
 		return 0;
-- 
2.11.0

^ permalink raw reply related

* [RFC 3/4] nfp: make use of extended ack message reporting
From: Jakub Kicinski @ 2017-04-25  8:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, johannes, dsa, daniel, alexei.starovoitov, bblanco,
	john.fastabend, kubakici, oss-drivers, Jakub Kicinski
In-Reply-To: <20170425080644.122536-1-jakub.kicinski@netronome.com>

Try to carry error messages to the user via the netlink extended
ack message attribute.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |  3 ++-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    | 22 +++++++++++++---------
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  4 ++--
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 8f20fdef0754..48b4a2742233 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -816,7 +816,8 @@ nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry *irq_entries,
 		    unsigned int n);
 
 struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn);
-int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new);
+int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *new,
+			  struct netlink_ext_ack *extack);
 
 bool nfp_net_link_changed_read_clear(struct nfp_net *nn);
 int nfp_net_refresh_eth_port(struct nfp_net *nn);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 8a9b74305493..ff66898375cc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2496,24 +2496,27 @@ struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn)
 	return new;
 }
 
-static int nfp_net_check_config(struct nfp_net *nn, struct nfp_net_dp *dp)
+static int
+nfp_net_check_config(struct nfp_net *nn, struct nfp_net_dp *dp,
+		     struct netlink_ext_ack *extack)
 {
 	/* XDP-enabled tests */
 	if (!dp->xdp_prog)
 		return 0;
 	if (dp->fl_bufsz > PAGE_SIZE) {
-		nn_warn(nn, "MTU too large w/ XDP enabled\n");
+		NL_SET_ERR_MSG(extack, "MTU too large w/ XDP enabled");
 		return -EINVAL;
 	}
 	if (dp->num_tx_rings > nn->max_tx_rings) {
-		nn_warn(nn, "Insufficient number of TX rings w/ XDP enabled\n");
+		NL_SET_ERR_MSG(extack, "Insufficient number of TX rings w/ XDP enabled");
 		return -EINVAL;
 	}
 
 	return 0;
 }
 
-int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *dp)
+int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *dp,
+			  struct netlink_ext_ack *extack)
 {
 	int r, err;
 
@@ -2525,7 +2528,7 @@ int nfp_net_ring_reconfig(struct nfp_net *nn, struct nfp_net_dp *dp)
 
 	dp->num_r_vecs = max(dp->num_rx_rings, dp->num_stack_tx_rings);
 
-	err = nfp_net_check_config(nn, dp);
+	err = nfp_net_check_config(nn, dp, extack);
 	if (err)
 		goto exit_free_dp;
 
@@ -2600,7 +2603,7 @@ static int nfp_net_change_mtu(struct net_device *netdev, int new_mtu)
 
 	dp->mtu = new_mtu;
 
-	return nfp_net_ring_reconfig(nn, dp);
+	return nfp_net_ring_reconfig(nn, dp, NULL);
 }
 
 static void nfp_net_stat64(struct net_device *netdev,
@@ -2916,9 +2919,10 @@ static int nfp_net_xdp_offload(struct nfp_net *nn, struct bpf_prog *prog)
 	return ret;
 }
 
-static int nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog)
+static int nfp_net_xdp_setup(struct nfp_net *nn, struct netdev_xdp *xdp)
 {
 	struct bpf_prog *old_prog = nn->dp.xdp_prog;
+	struct bpf_prog *prog = xdp->prog;
 	struct nfp_net_dp *dp;
 	int err;
 
@@ -2945,7 +2949,7 @@ static int nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog)
 		dp->rx_dma_off = 0;
 
 	/* We need RX reconfig to remap the buffers (BIDIR vs FROM_DEV) */
-	err = nfp_net_ring_reconfig(nn, dp);
+	err = nfp_net_ring_reconfig(nn, dp, xdp->extack);
 	if (err)
 		return err;
 
@@ -2963,7 +2967,7 @@ static int nfp_net_xdp(struct net_device *netdev, struct netdev_xdp *xdp)
 
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return nfp_net_xdp_setup(nn, xdp->prog);
+		return nfp_net_xdp_setup(nn, xdp);
 	case XDP_QUERY_PROG:
 		xdp->prog_attached = !!nn->dp.xdp_prog;
 		return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 6e27d1281425..4d41639b9b03 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -309,7 +309,7 @@ static int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt)
 	dp->rxd_cnt = rxd_cnt;
 	dp->txd_cnt = txd_cnt;
 
-	return nfp_net_ring_reconfig(nn, dp);
+	return nfp_net_ring_reconfig(nn, dp, NULL);
 }
 
 static int nfp_net_set_ringparam(struct net_device *netdev,
@@ -880,7 +880,7 @@ static int nfp_net_set_num_rings(struct nfp_net *nn, unsigned int total_rx,
 	if (dp->xdp_prog)
 		dp->num_tx_rings += total_rx;
 
-	return nfp_net_ring_reconfig(nn, dp);
+	return nfp_net_ring_reconfig(nn, dp, NULL);
 }
 
 static int nfp_net_set_channels(struct net_device *netdev,
-- 
2.11.0

^ permalink raw reply related

* [RFC 2/4] xdp: propagate extended ack to XDP setup
From: Jakub Kicinski @ 2017-04-25  8:06 UTC (permalink / raw)
  To: netdev
  Cc: davem, johannes, dsa, daniel, alexei.starovoitov, bblanco,
	john.fastabend, kubakici, oss-drivers, Jakub Kicinski
In-Reply-To: <20170425080644.122536-1-jakub.kicinski@netronome.com>

Drivers usually have a number of restrictions for running XDP
- most common being buffer sizes, LRO and number of rings.
Even though some drivers try to be helpful and print error
messages experience shows that users don't often consult
kernel logs on netlink errors.  Try to use the new extended
ack mechanism to carry the message back to user space.

For now the extack is only set for XDP_SETUP_PROG, adding it
to dump/XDP_QUERY_PROG didn't make much sense.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/netdevice.h | 10 ++++++++--
 net/core/dev.c            |  5 ++++-
 net/core/rtnetlink.c      | 13 ++++++++-----
 3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5d5267febd56..41667f4238d2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -813,11 +813,16 @@ enum xdp_netdev_command {
 	XDP_QUERY_PROG,
 };
 
+struct netlink_ext_ack;
+
 struct netdev_xdp {
 	enum xdp_netdev_command command;
 	union {
 		/* XDP_SETUP_PROG */
-		struct bpf_prog *prog;
+		struct {
+			struct bpf_prog *prog;
+			struct netlink_ext_ack *extack;
+		};
 		/* XDP_QUERY_PROG */
 		bool prog_attached;
 	};
@@ -3283,7 +3288,8 @@ int dev_get_phys_port_id(struct net_device *dev,
 int dev_get_phys_port_name(struct net_device *dev,
 			   char *name, size_t len);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
-int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags);
+int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
+		      int fd, u32 flags);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index db6e31564d06..ca4633af5448 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6716,12 +6716,14 @@ EXPORT_SYMBOL(dev_change_proto_down);
 /**
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
+ *	@extact: netlink extended ack
  *	@fd: new program fd or negative value to clear
  *	@flags: xdp-related flags
  *
  *	Set or clear a bpf program for a device
  */
-int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags)
+int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
+		      int fd, u32 flags)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	struct bpf_prog *prog = NULL;
@@ -6751,6 +6753,7 @@ int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags)
 
 	memset(&xdp, 0, sizeof(xdp));
 	xdp.command = XDP_SETUP_PROG;
+	xdp.extack = extack;
 	xdp.prog = prog;
 
 	err = ops->ndo_xdp(dev, &xdp);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 088f9c8b4196..1723dbb9e3dd 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1909,6 +1909,7 @@ static int do_set_master(struct net_device *dev, int ifindex)
 #define DO_SETLINK_NOTIFY	0x03
 static int do_setlink(const struct sk_buff *skb,
 		      struct net_device *dev, struct ifinfomsg *ifm,
+		      struct netlink_ext_ack *extack,
 		      struct nlattr **tb, char *ifname, int status)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
@@ -2191,7 +2192,7 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 
 		if (xdp[IFLA_XDP_FD]) {
-			err = dev_change_xdp_fd(dev,
+			err = dev_change_xdp_fd(dev, extack,
 						nla_get_s32(xdp[IFLA_XDP_FD]),
 						xdp_flags);
 			if (err)
@@ -2251,7 +2252,7 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err < 0)
 		goto errout;
 
-	err = do_setlink(skb, dev, ifm, tb, ifname, 0);
+	err = do_setlink(skb, dev, ifm, extack, tb, ifname, 0);
 errout:
 	return err;
 }
@@ -2413,6 +2414,7 @@ EXPORT_SYMBOL(rtnl_create_link);
 static int rtnl_group_changelink(const struct sk_buff *skb,
 		struct net *net, int group,
 		struct ifinfomsg *ifm,
+		struct netlink_ext_ack *extack,
 		struct nlattr **tb)
 {
 	struct net_device *dev, *aux;
@@ -2420,7 +2422,7 @@ static int rtnl_group_changelink(const struct sk_buff *skb,
 
 	for_each_netdev_safe(net, dev, aux) {
 		if (dev->group == group) {
-			err = do_setlink(skb, dev, ifm, tb, NULL, 0);
+			err = do_setlink(skb, dev, ifm, extack, tb, NULL, 0);
 			if (err < 0)
 				return err;
 		}
@@ -2566,14 +2568,15 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 				status |= DO_SETLINK_NOTIFY;
 			}
 
-			return do_setlink(skb, dev, ifm, tb, ifname, status);
+			return do_setlink(skb, dev, ifm, extack, tb, ifname,
+					  status);
 		}
 
 		if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {
 			if (ifm->ifi_index == 0 && tb[IFLA_GROUP])
 				return rtnl_group_changelink(skb, net,
 						nla_get_u32(tb[IFLA_GROUP]),
-						ifm, tb);
+						ifm, extack, tb);
 			return -ENODEV;
 		}
 
-- 
2.11.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox