Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 2/2] vxlan: do not output confusing error message
From: Jiri Benc @ 2017-04-27 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Marcelo Ricardo Leitner
In-Reply-To: <cover.1493320999.git.jbenc@redhat.com>

The message "Cannot bind port X, err=Y" creates only confusion. In metadata
based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
no error message should be printed. But when IPv6 tunnel was requested, such
failure is fatal. The vxlan_socket_create does not know when the error is
harmless and when it's not.

Instead of passing such information down to vxlan_socket_create, remove the
message completely. It's not useful. We propagate the error code up to the
user space and the port number comes from the user space. There's nothing in
the message that the process creating vxlan interface does not know.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 118e508f1889..27cba699d83a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2754,8 +2754,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,

 	sock = vxlan_create_sock(net, ipv6, port, flags);
 	if (IS_ERR(sock)) {
-		pr_info("Cannot bind port %d, err=%ld\n", ntohs(port),
-			PTR_ERR(sock));
 		kfree(vs);
 		return ERR_CAST(sock);
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 1/2] vxlan: correctly handle ipv6.disable module parameter
From: Jiri Benc @ 2017-04-27 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Marcelo Ricardo Leitner
In-Reply-To: <cover.1493320999.git.jbenc@redhat.com>

When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.

Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.

Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/vxlan.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bdb6ae16d4a8..118e508f1889 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2818,17 +2818,21 @@ static int __vxlan_sock_add(struct vxlan_dev *vxlan, bool ipv6)
 
 static int vxlan_sock_add(struct vxlan_dev *vxlan)
 {
-	bool ipv6 = vxlan->flags & VXLAN_F_IPV6;
 	bool metadata = vxlan->flags & VXLAN_F_COLLECT_METADATA;
+	bool ipv6 = vxlan->flags & VXLAN_F_IPV6 || metadata;
+	bool ipv4 = !ipv6 || metadata;
 	int ret = 0;
 
 	RCU_INIT_POINTER(vxlan->vn4_sock, NULL);
 #if IS_ENABLED(CONFIG_IPV6)
 	RCU_INIT_POINTER(vxlan->vn6_sock, NULL);
-	if (ipv6 || metadata)
+	if (ipv6) {
 		ret = __vxlan_sock_add(vxlan, true);
+		if (ret < 0 && ret != -EAFNOSUPPORT)
+			ipv4 = false;
+	}
 #endif
-	if (!ret && (!ipv6 || metadata))
+	if (ipv4)
 		ret = __vxlan_sock_add(vxlan, false);
 	if (ret < 0)
 		vxlan_sock_release(vxlan);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 0/2] vxlan: do not error out on disabled IPv6
From: Jiri Benc @ 2017-04-27 19:24 UTC (permalink / raw)
  To: netdev; +Cc: Marcelo Ricardo Leitner

This patchset fixes a bug with metadata based tunnels when booted with
ipv6.disable=1.


Jiri Benc (2):
  vxlan: correctly handle ipv6.disable module parameter
  vxlan: do not output confusing error message

 drivers/net/vxlan.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next 1/2] rtnetlink: Disable notification for NETDEV_NAMECHANGE event
From: Vlad Yasevich @ 2017-04-27 19:26 UTC (permalink / raw)
  To: David Ahern, Vladislav Yasevich, netdev
In-Reply-To: <becfb3d4-5e9d-800b-fc67-1e99f4b14db9@cumulusnetworks.com>

On 04/21/2017 02:08 PM, David Ahern wrote:
> On 4/21/17 11:31 AM, Vladislav Yasevich wrote:
>> The data signaling name change is already provided at
>> the end of do_setlink().  This event handler just generates
>> a duplicate announcement.  Disable it.
>>
>> CC: David Ahern <dsa@cumulusnetworks.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>>  net/core/rtnetlink.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> index 0ee5479..e8e6816 100644
>> --- a/net/core/rtnetlink.c
>> +++ b/net/core/rtnetlink.c
>> @@ -4123,7 +4123,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
>>  
>>  	switch (event) {
>>  	case NETDEV_REBOOT:
>> -	case NETDEV_CHANGENAME:
>>  	case NETDEV_FEAT_CHANGE:
>>  	case NETDEV_BONDING_FAILOVER:
>>  	case NETDEV_NOTIFY_PEERS:
>>
> 
> 
> I only see one using the ip monitor.
> 
> $ ip li set foobar name fubar
> 
> generates these 3 messages:
> 
> [LINK]12: fubar: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN
> group default
>     link/ether 76:cd:72:dd:2a:cb brd ff:ff:ff:ff:ff:ff
> Unknown message: type=0x00000051(81) flags=0x00000000(0)len=0x0000001c(28)
> [NETCONF]ipv4 dev dummy2 forwarding on rp_filter off mc_forwarding off
> proxy_neigh off ignore_routes_with_linkdown off
> Unknown message: type=0x00000051(81) flags=0x00000000(0)len=0x0000001c(28)
> [NETCONF]ipv6 dev dummy2 forwarding on mc_forwarding off proxy_neigh off
> ignore_routes_with_linkdown off
> 
> do_setlink only sets DO_SETLINK_MODIFIED so a name change alone will not
> generate 2 messages.
> 

Actually, it has nothing to do with above flag.  Setting DO_SETLINK_MODIFIED
will still generate notifications, but only if the device is UP.  However,
it looks like link name change can only be done when the link is down.  As
a result, netdev_state_change will not report it, so we only see the 'event'
one.

So this is patch isn't needed, but only as a kind-of side-effect..

-vlad

^ permalink raw reply

* Re: [PATCH net-next 0/9] support unique MAC addresses for slave devices
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-04-27 19:21 UTC (permalink / raw)
  To: Marco Chiappero
  Cc: linux-netdev, David S . Miller, Jeff Kirsher, Alexander Duyck,
	Sainath Grandhi
In-Reply-To: <20170427145142.15830-1-marco.chiappero@intel.com>

On Thu, Apr 27, 2017 at 7:51 AM, Marco Chiappero
<marco.chiappero@intel.com> wrote:
> Currently every slave device gets assigned the same MAC address, by
> having it copied from the master interface. Since some code paths
> depend on this identity, changing the MAC address on slave interfaces
> is not supported. However identical MAC addresses can pose problems to
> management and orchestration software that correctly expect network
> interfaces on the same segment to have unique addresses.
>
Please understand that there are two distinct drivers IPvlan and
MACvlan. They both exist together for good reasons and are trying to
cater for different needs. I would love to combine them together if we
don't mess / miss the goodies each of them have to offer... otherwise
*NO*! Having said that if management / orchestration software has
problems then clearly you should not use IPvlan for that use case.

> Patches 1-8 include style fixes and refactoring (patch 9 depends upon)
> that improve the overal quality and make the intruduction of the
> feature straightforward.
>
Lots of this fall into I-say-potato-you-say-... category. My way of
thinking / organizing code is different than yours and you don't have
to like mine and I don't have to like yours.

> Patch 9 enables slave devices to own unique MAC addresses and change
> such addresses live, fixing lack of support and a related bug, as
> MAC address changes on master were not propagated to slave devices.
> In order to preserve the main peculiarity of this driver, that is
> exposing only a single MAC address for outbound traffic, frames
> egressing from master are now effectively masquerated when working in
> L2 mode.
>
This enhancement is, however, coming via packet-header rewrite for
every Tx/Rx packet which defeats the purpose. The only good thing that
came in light is the mac-addr change propagation from master issue;
but if the fix is coming as a side-effect of header rewrite then it's
not an acceptable fix either. This can be simply fixed by changing a
line in ipvlan_hard_header().

> Marco Chiappero (9):
>   ipvlan: fix coding style for the ipvlan tree
>   ipvlan: refactor ipvlan_process_multicast for readability
>   ipvlan: replace ipvlan_rcv_frame
>   ipvlan: rework the IP lookup function
>   ipvlan: improve and uniform naming
>   ipvlan: reposition three functions
>   ipvlan: relocate ipvlan_skb_crossing_ns calls
>   ipvlan: improve compiler hints
>   ipvlan: introduce individual MAC addresses
>
>  drivers/net/ipvlan/ipvlan.h      |   2 +-
>  drivers/net/ipvlan/ipvlan_core.c | 592 ++++++++++++++++++++-------------------
>  drivers/net/ipvlan/ipvlan_main.c |  49 ++--
>  drivers/net/ipvlan/ipvtap.c      |   1 +
>  4 files changed, 333 insertions(+), 311 deletions(-)
>
> --
> 2.9.3
>
> --------------------------------------------------------------
> Intel Research and Development Ireland Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
>
>
> This e-mail and any attachments may contain confidential material for the sole
> use of the intended recipient(s). Any review or distribution by others is
> strictly prohibited. If you are not the intended recipient, please contact the
> sender and delete all copies.
>

^ permalink raw reply

* Re: [PATCH net-next 09/18] net: dsa: mv88e6xxx: move VTU Data accessors
From: Andrew Lunn @ 2017-04-27 19:19 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-10-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:27AM -0400, Vivien Didelot wrote:
> The code to access the VTU Data registers currently only supports the
> 88E6185 family and alike: 2-bit membership adjacent to 2-bit port state.
> 
> Even though the 88E6352 family introduced an indirect table to program
> the VLAN Spanning Tree states, the usage of the VTU Data registers
> remains the same regardless the VTU or STU operation.
> 
> Now that the mv88e6xxx_vtu_entry structure contains both port membership
> and states data, factorize the code to access them in global1_vtu.c.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH net-next] igb: mark PM functions as __maybe_unused
From: Arnd Bergmann @ 2017-04-27 19:09 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Arnd Bergmann, Alexander Duyck, David S. Miller, Jacob Keller,
	Todd Fujinaka, Yury Kylulin, intel-wired-lan, netdev,
	linux-kernel

The new wake function is only used by the suspend/resume handlers that
are defined in inside of an #ifdef, which can cause this harmless
warning:

drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function]

Removing the #ifdef, instead using a __maybe_unused annotation
simplifies the code and avoids the warning.

Fixes: b90fa8763560 ("igb: Enable reading of wake up packet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 1cf74aa4ebd9..2d5bdb1fd37d 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -191,10 +191,7 @@ static int igb_disable_sriov(struct pci_dev *dev);
 static int igb_pci_disable_sriov(struct pci_dev *dev);
 #endif
 
-#ifdef CONFIG_PM
-#ifdef CONFIG_PM_SLEEP
 static int igb_suspend(struct device *);
-#endif
 static int igb_resume(struct device *);
 static int igb_runtime_suspend(struct device *dev);
 static int igb_runtime_resume(struct device *dev);
@@ -204,7 +201,6 @@ static const struct dev_pm_ops igb_pm_ops = {
 	SET_RUNTIME_PM_OPS(igb_runtime_suspend, igb_runtime_resume,
 			igb_runtime_idle)
 };
-#endif
 static void igb_shutdown(struct pci_dev *);
 static int igb_pci_sriov_configure(struct pci_dev *dev, int num_vfs);
 #ifdef CONFIG_IGB_DCA
@@ -8015,9 +8011,7 @@ static void igb_deliver_wake_packet(struct net_device *netdev)
 	netif_rx(skb);
 }
 
-#ifdef CONFIG_PM
-#ifdef CONFIG_PM_SLEEP
-static int igb_suspend(struct device *dev)
+static int __maybe_unused igb_suspend(struct device *dev)
 {
 	int retval;
 	bool wake;
@@ -8036,9 +8030,8 @@ static int igb_suspend(struct device *dev)
 
 	return 0;
 }
-#endif /* CONFIG_PM_SLEEP */
 
-static int igb_resume(struct device *dev)
+static int __maybe_unused igb_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -8092,7 +8085,7 @@ static int igb_resume(struct device *dev)
 	return err;
 }
 
-static int igb_runtime_idle(struct device *dev)
+static int __maybe_unused igb_runtime_idle(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -8104,7 +8097,7 @@ static int igb_runtime_idle(struct device *dev)
 	return -EBUSY;
 }
 
-static int igb_runtime_suspend(struct device *dev)
+static int __maybe_unused igb_runtime_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	int retval;
@@ -8124,11 +8117,10 @@ static int igb_runtime_suspend(struct device *dev)
 	return 0;
 }
 
-static int igb_runtime_resume(struct device *dev)
+static int __maybe_unused igb_runtime_resume(struct device *dev)
 {
 	return igb_resume(dev);
 }
-#endif /* CONFIG_PM */
 
 static void igb_shutdown(struct pci_dev *pdev)
 {
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Casey Leedom @ 2017-04-27 19:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexander Duyck
  Cc: Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	linux-pci@vger.kernel.org, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org,
	Robin Murphy, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20170427171938.GA10705@bhelgaas-glaptop.roam.corp.google.com>

  Thanks for adding me to the Cc list Bjorn.  Hopefully my message will make
it out to the netdev and linux-pci lists -- I'm not currently subscribed to
them.  I've explicitly marked this message to be sent in plain text but
modern email agents suck with respect to this. (sigh) I have officially
become a curmudgeon. 

  So, officially, Relaxed Ordering should be a Semantic Noop as far as PCIe
transfers are concerned, as long as you don't care what order the PCIe
Transaction Layer Packets are processed in by the target PCIe Fabric End
Point.

  Basically, if you have some number of back-to-back PCIe TLPs between two
Fabric End Points {A} -> {B} which have the Relaxed Ordering Attribute set,
the End Point {B} receiving these RO TLPs may process them in any order it
likes.  When a TLP without Relaxed Ordering is sent {A} -> {B}, all
preceding TLPs with Relaxed Ordering set must be processed by {B} prior to
processing the TLP without Relaxed Ordering set.  In this sense, a TLP
without Relaxed Ordering set is something akin to a "memory barrier".

  All of this is covered in Section 2.4.1 of the PCIe 3.0 Specification (PCI
Express(r) Base Specification Revision 3.0 November 10, 2010).

  The advantage of using Relaxed Ordering (which is heavily used when
sending data to Graphics Cards as I understand it), is that the PCIe
Endpoint can potentially optimize the processing order of RO TLPs with
things like a local multi-channel Memory Controller in order to achieve the
highest transfer bandwidth possible.

  However, we have discovered at least two PCIe 3.0 Root Complex
implementations which have problems with TLPs directed at them with the
Relaxed Ordering Attribute set and I'm in the process of working up a Linux
Kernel PCI "Quirk" to allow those PCIe End Points to be marked as "not being
advisable to send RO TLPs to".  These problems range from "mere" Performance
Problems to outright Data Corruption.  I'm working with the vendors of these
...  "problematic" Root Complex implementations and hope to have this patch
submitted to the linux-pci list by tomorrow.

  By the way, it's important to note that just because, say, a Root Complex
has problems with RO TLPs directed at it, that doesn't mean that you want to
avoid all use of Relaxed Ordering within the PCIe Fabric.  For instance,
with the vendor whose Root Complex has a Performance Problem with RO TLPs
directed at it, it's perfectly reasonable -- and desired -- to use Relaxed
Ordering in Peer-to-Peer traffic.  Say for instance, with an NVMe <->
Ethernet application.

Casey

^ permalink raw reply

* Re: [PATCH net-next 08/18] net: dsa: mv88e6xxx: move generic VTU GetNext
From: Andrew Lunn @ 2017-04-27 18:59 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-9-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:26AM -0400, Vivien Didelot wrote:
> Even though every switch model has a different way to access the VTU
> Data bits, the base implementation of the VTU GetNext operation remains
> the same: wait, write the first VID to iterate from, start the
> operation, and read the next VID.
> 
> Move this generic implementation into global_vtu.c and abstract the

global1_vtu.c

> +int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
> +			     struct mv88e6xxx_vtu_entry *entry)
> +{
> +	int err;
> +
> +	err = mv88e6xxx_g1_vtu_op_wait(chip);
> +	if (err)
> +		return err;
> +
> +	/* Write the VID to iterate from only once */
> +	if (!entry->valid) {
> +		err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
> +		if (err)
> +			return err;
> +	}

Please could you add a bigger comment. It is not clear why you write
it, when it is invalid. That just seems wrong, and needs a good
comment to explain why it is correct, more than what you currently
have as a comment.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next 07/18] net: dsa: mv88e6xxx: move VTU VID accessors
From: Andrew Lunn @ 2017-04-27 18:51 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-8-vivien.didelot@savoirfairelinux.com>

> @@ -1464,13 +1457,16 @@ static int _mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip *chip,
>  				    struct mv88e6xxx_vtu_entry *entry)
>  {
>  	u16 op = GLOBAL_VTU_OP_VTU_LOAD_PURGE;
> -	u16 reg = 0;
>  	int err;
>  
>  	err = mv88e6xxx_g1_vtu_op_wait(chip);
>  	if (err)
>  		return err;
>  
> +	err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
> +	if (err)
> +		return err;
> +
>  	if (!entry->valid)
>  		goto loadpurge;
>  
> @@ -1496,14 +1492,7 @@ static int _mv88e6xxx_vtu_loadpurge(struct mv88e6xxx_chip *chip,
>  		op |= (entry->fid & 0xf0) << 8;
>  		op |= entry->fid & 0xf;
>  	}
> -
> -	reg = GLOBAL_VTU_VID_VALID;
>  loadpurge:
> -	reg |= entry->vid & GLOBAL_VTU_VID_MASK;
> -	err = mv88e6xxx_g1_write(chip, GLOBAL_VTU_VID, reg);
> -	if (err)
> -		return err;
> -
>  	return mv88e6xxx_g1_vtu_op(chip, op);
>  }

This is not obvious, why do the vtu_vid_write() at the beginning,
rather than at the end? Especially before the if (!entry->valid).

However, when you look at the rest of the patch, it is O.K.

It might of been better to do this in two patches, to make it clearer
what is going on.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net] net: adjust skb->truesize in ___pskb_trim()
From: Willem de Bruijn @ 2017-04-27 18:43 UTC (permalink / raw)
  To: Andrey Konovalov; +Cc: Eric Dumazet, David Miller, netdev, Willem de Bruijn
In-Reply-To: <CAAeHK+xTtA-L7VJsP9GXHpVD=o-WOEJ+xD_sJ4b4O-F_aZx_aw@mail.gmail.com>

On Wed, Apr 26, 2017 at 1:08 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
> On Wed, Apr 26, 2017 at 6:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
>> skb_try_coalesce() using syzkaller and a filter attached to a TCP
>> socket.
>>
>> As we did recently in commit 158f323b9868 ("net: adjust skb->truesize in
>> pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
>> via a call to skb_condense().
>>
>> If all frags were freed, then skb->truesize can be recomputed.
>>
>> This call can be done if skb is not yet owned, or destructor is
>> sock_edemux().
>
> Hi Eric,
>
> I still see the warning even with your patch.

Can this happen if sk_trim_filter_cap trims the skb to free some,
but not all, of the frags? If skb->data_len remains larger than
skb->end - skb->tail, skb_condense will not adjust the truesize.

^ permalink raw reply

* Re: [PATCH net-next 06/18] net: dsa: mv88e6xxx: move VTU SID accessors
From: Andrew Lunn @ 2017-04-27 18:37 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-7-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:24AM -0400, Vivien Didelot wrote:
> Add helpers to access the VTU SID register in the global1_vtu.c file.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 05/18] net: dsa: mv88e6xxx: move VTU FID accessors
From: Andrew Lunn @ 2017-04-27 18:35 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-6-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:23AM -0400, Vivien Didelot wrote:
> Add helpers to access the VTU FID register in the global1_vtu.c file.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 04/18] net: dsa: mv88e6xxx: move VTU flush
From: Andrew Lunn @ 2017-04-27 18:34 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-5-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:22AM -0400, Vivien Didelot wrote:
> Move the VTU flush operation to global1_vtu.c and call it from a
> mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 03/18] net: dsa: mv88e6xxx: move VTU Operation accessors
From: Andrew Lunn @ 2017-04-27 18:33 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-4-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:21AM -0400, Vivien Didelot wrote:
> Move the helper functions to access the Global 1 VTU Operation register
> to a new global1_vtu.c file, and get rid of the old underscore prefix
> naming convention. This file will be extended will all VTU/STU related
> code.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 03/18] net: dsa: mv88e6xxx: move VTU Operation accessors
From: Andrew Lunn @ 2017-04-27 18:32 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-4-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:21AM -0400, Vivien Didelot wrote:
> Move the helper functions to access the Global 1 VTU Operation register
> to a new global1_vtu.c file, and get rid of the old underscore prefix
> naming convention. This file will be extended will all VTU/STU related
> code.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>  drivers/net/dsa/mv88e6xxx/Makefile      |  1 +
>  drivers/net/dsa/mv88e6xxx/chip.c        | 39 +++++++++------------------------
>  drivers/net/dsa/mv88e6xxx/global1.h     |  3 +++
>  drivers/net/dsa/mv88e6xxx/global1_vtu.c | 33 ++++++++++++++++++++++++++++
>  4 files changed, 47 insertions(+), 29 deletions(-)
>  create mode 100644 drivers/net/dsa/mv88e6xxx/global1_vtu.c
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/Makefile b/drivers/net/dsa/mv88e6xxx/Makefile
> index 31d37a90cec7..6edd869c8d6f 100644
> --- a/drivers/net/dsa/mv88e6xxx/Makefile
> +++ b/drivers/net/dsa/mv88e6xxx/Makefile
> @@ -2,5 +2,6 @@ obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o
>  mv88e6xxx-objs := chip.o
>  mv88e6xxx-objs += global1.o
>  mv88e6xxx-objs += global1_atu.o
> +mv88e6xxx-objs += global1_vtu.o
>  mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_GLOBAL2) += global2.o
>  mv88e6xxx-objs += port.o
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index f025d3c22dba..bf0350432337 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3,9 +3,6 @@
>   *
>   * Copyright (c) 2008 Marvell Semiconductor
>   *
> - * Copyright (c) 2015 CMC Electronics, Inc.
> - *	Added support for VLAN Table Unit operations
> - *
>   * Copyright (c) 2016 Andrew Lunn <andrew@lunn.ch>
>   *
>   * Copyright (c) 2016-2017 Savoir-faire Linux Inc.
> @@ -1266,31 +1263,15 @@ static void mv88e6xxx_port_fast_age(struct dsa_switch *ds, int port)
>  		netdev_err(ds->ports[port].netdev, "failed to flush ATU\n");
>  }
>  
> -static int _mv88e6xxx_vtu_wait(struct mv88e6xxx_chip *chip)
> -{
> -	return mv88e6xxx_g1_wait(chip, GLOBAL_VTU_OP, GLOBAL_VTU_OP_BUSY);
> -}
> -
> -static int _mv88e6xxx_vtu_cmd(struct mv88e6xxx_chip *chip, u16 op)
> -{
> -	int err;
> -
> -	err = mv88e6xxx_g1_write(chip, GLOBAL_VTU_OP, op);
> -	if (err)
> -		return err;
> -
> -	return _mv88e6xxx_vtu_wait(chip);
> -}
> -
>  static int _mv88e6xxx_vtu_stu_flush(struct mv88e6xxx_chip *chip)
>  {
>  	int ret;
>  
> -	ret = _mv88e6xxx_vtu_wait(chip);
> +	ret = mv88e6xxx_g1_vtu_op_wait(chip);
>  	if (ret < 0)
>  		return ret;
>  
> -	return _mv88e6xxx_vtu_cmd(chip, GLOBAL_VTU_OP_FLUSH_ALL);
> +	return mv88e6xxx_g1_vtu_op(chip, GLOBAL_VTU_OP_FLUSH_ALL);
>  }
>  
>  static int _mv88e6xxx_vtu_stu_data_read(struct mv88e6xxx_chip *chip,
> @@ -1380,11 +1361,11 @@ static int _mv88e6xxx_vtu_getnext(struct mv88e6xxx_chip *chip,
>  	u16 val;
>  	int err;
>  
> -	err = _mv88e6xxx_vtu_wait(chip);
> +	err = mv88e6xxx_g1_vtu_op_wait(chip);
>  	if (err)
>  		return err;
>  
> -	err = _mv88e6xxx_vtu_cmd(chip, GLOBAL_VTU_OP_VTU_GET_NEXT);
> +	err = mv88e6xxx_g1_vtu_op(chip, GLOBAL_VTU_OP_VTU_GET_NEXT);
>  	if (err)
>  		return err;

Hi Vivien

This patch got me for a while. It initially looks like a bad idea. It
is not until you read all the patches, you can see why it is actually
good. It is a simple mechanical translation, which is easy to
review. But it seemed odd to be using these names in chip.c. But later
patches solve that, by moving the code in global1_vtu.c.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next 02/18] net: dsa: mv88e6xxx: split VTU entry data member
From: Andrew Lunn @ 2017-04-27 18:25 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20170426155336.5937-3-vivien.didelot@savoirfairelinux.com>

On Wed, Apr 26, 2017 at 11:53:20AM -0400, Vivien Didelot wrote:
> VLAN aware Marvell chips can program 802.1Q VLAN membership as well as
> 802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers.
> 
> Some chips such as 88E6185 use different Data registers offsets for
> ports state and membership, and program them in a single operation.
> 
> Other chips such as 88E6352 use the same register layout but program
> them in distinct operations (an indirect table is used for 802.1s.)
> 
> Newer chips such as 88E6390 use the same offsets for both state and
> membership in distinct operations, thus require multiple data accesses.
> 
> To correctly abstract this, split the "data" structure member of
> mv88e6xxx_vtu_entry in two "state" and "member" members, before adding
> VTU support for newer chips.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH] igb: make a few local functions static
From: Colin King @ 2017-04-27 17:59 UTC (permalink / raw)
  To: Jeff Kirsher, intel-wired-lan, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

clean up a few sparse warnings, these following
functions can be made static:

drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol
  'igb_add_mac_filter' was not declared. Should it be static?
drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol
  'igb_del_mac_filter' was not declared. Should it be static?
drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol
  'igb_set_vf_mac_filter' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 1cf74aa4ebd9..b0021819b9d0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6457,8 +6457,8 @@ static void igb_set_default_mac_filter(struct igb_adapter *adapter)
 	igb_rar_set_index(adapter, 0);
 }
 
-int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
-		       const u8 queue)
+static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+			      const u8 queue)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	int rar_entries = hw->mac.rar_entry_count -
@@ -6487,8 +6487,8 @@ int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr,
 	return -ENOSPC;
 }
 
-int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
-		       const u8 queue)
+static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr,
+			      const u8 queue)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	int rar_entries = hw->mac.rar_entry_count -
@@ -6540,8 +6540,8 @@ static int igb_uc_unsync(struct net_device *netdev, const unsigned char *addr)
 	return 0;
 }
 
-int igb_set_vf_mac_filter(struct igb_adapter *adapter, const int vf,
-			  const u32 info, const u8 *addr)
+static int igb_set_vf_mac_filter(struct igb_adapter *adapter, const int vf,
+				 const u32 info, const u8 *addr)
 {
 	struct pci_dev *pdev = adapter->pdev;
 	struct vf_data_storage *vf_data = &adapter->vf_data[vf];
-- 
2.11.0


^ permalink raw reply related

* Re: [patch net-next 00/10] net: sched: introduce multichain support for filters
From: Cong Wang @ 2017-04-27 17:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	David Ahern, Eric Dumazet, Stephen Hemminger, Daniel Borkmann,
	Alexander Duyck, mlxsw, Simon Horman
In-Reply-To: <1493291540-2119-1-git-send-email-jiri@resnulli.us>

On Thu, Apr 27, 2017 at 4:12 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Simple example:
> $ tc qdisc add dev eth0 ingress
> $ tc filter add dev eth0 parent ffff: protocol ip pref 33 flower dst_mac 52:54:00:3d:c7:6d action goto chain 11
> $ tc filter add dev eth0 parent ffff: protocol ip pref 22 chain 11 flower dst_ip 192.168.40.1 action drop
> $ tc filter show dev eth0 root

Interesting.

I don't look into the code yet. If I understand the concepts correctly,
so with your patchset we can mark either filter with a chain No. to
choose which chain it belongs to _logically_ even though
_physically_ it is still in the old-fashion chain (prio, proto)?

If so, you have to ensure proto is same since the protocol of
the packet does not change dynamically? And the original
priority becomes pointless with chains since we can just to
any other chain in any order?

By default, without any chain No., you use 0 for all the chains,
so the old-fashion chain still works.

Thanks.

^ permalink raw reply

* Re: [PATCH net v4 3/3] net: hns: fixed bug that skb used after kfree
From: Florian Fainelli @ 2017-04-27 17:38 UTC (permalink / raw)
  To: Yankejian, davem, salil.mehta, yisen.zhuang, matthias.bgg,
	lipeng321, zhouhuiru, huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493261053-68197-4-git-send-email-yankejian@huawei.com>

On 04/26/2017 07:44 PM, Yankejian wrote:
>  	struct hns_nic_priv *priv = netdev_priv(ndev);
>  	struct hnae_ring *ring = ring_data->ring;
> @@ -361,6 +361,10 @@ int hns_nic_net_xmit_hw(struct net_device *ndev,
>  	dev_queue = netdev_get_tx_queue(ndev, skb->queue_mapping);
>  	netdev_tx_sent_queue(dev_queue, skb->len);
>  
> +	netif_trans_update(ndev);
> +	ndev->stats.tx_bytes += skb->len;
> +	ndev->stats.tx_packets++;

This is still wrong though, you should not update your TX statistics
until you get a TX completion interrupt that confirms these packets were
actually transmitted. This has the advantage of not causing use after
free in your ndo_start_xmit() function (current bug), and also allows
feeding information into BQL where it is appropriate, and in a central
location: the TX completion handler.
-- 
Florian

^ permalink raw reply

* [PATCH net] bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
From: Paolo Abeni @ 2017-04-27 17:29 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: Jay Vosburgh, David S. Miller, Honggang LI,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.

Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.

If bond0 is configured on top of ipoib devices, with the
following commands:

ifup bond0
for slave in $BOND_SLAVES_LIST; do
	ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>

we will obtain a skb_under_panic() with a similar call trace:
	skb_push+0x3d/0x40
	push_pseudo_header+0x17/0x30 [ib_ipoib]
	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
	arp_create+0x12f/0x220
	arp_send_dst.part.19+0x28/0x50
	arp_solicit+0x115/0x290
	neigh_probe+0x4d/0x70
	__neigh_event_send+0xa7/0x230
	neigh_resolve_output+0x12e/0x1c0
	ip_finish_output2+0x14b/0x390
	ip_finish_output+0x136/0x1e0
	ip_output+0x76/0xe0
	ip_local_out+0x35/0x40
	ip_send_skb+0x19/0x40
	ip_push_pending_frames+0x33/0x40
	raw_sendmsg+0x7d3/0xb50
	inet_sendmsg+0x31/0xb0
	sock_sendmsg+0x38/0x50
	SYSC_sendto+0x102/0x190
	SyS_sendto+0xe/0x10
	do_syscall_64+0x67/0x180
	entry_SYSCALL64_slow_path+0x25/0x25

This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().

The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
header").

Reported-by: Norbert P <noe-PRwTpj6vllL463JZfw7VRw@public.gmane.org>
Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Signed-off-by: Paolo Abeni <pabeni-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/net/bonding/bond_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8a4ba8b..34481c9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1104,11 +1104,11 @@ static void bond_compute_features(struct bonding *bond)
 		gso_max_size = min(gso_max_size, slave->dev->gso_max_size);
 		gso_max_segs = min(gso_max_segs, slave->dev->gso_max_segs);
 	}
+	bond_dev->hard_header_len = max_hard_header_len;
 
 done:
 	bond_dev->vlan_features = vlan_features;
 	bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
-	bond_dev->hard_header_len = max_hard_header_len;
 	bond_dev->gso_max_segs = gso_max_segs;
 	netif_set_gso_max_size(bond_dev, gso_max_size);
 
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Bjorn Helgaas @ 2017-04-27 17:19 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Ding Tianhong, Mark Rutland, Amir Ancel, Gabriele Paoloni,
	linux-pci@vger.kernel.org, Catalin Marinas, Will Deacon, LinuxArm,
	David Laight, jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org,
	Robin Murphy, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, Casey Leedom
In-Reply-To: <CAKgT0Uc4L=GgYbpO-Fm9OfN+_fLypbDP1c+X4T_ta90ecQiyGQ@mail.gmail.com>

[+cc Casey]

On Wed, Apr 26, 2017 at 09:18:33AM -0700, Alexander Duyck wrote:
> On Wed, Apr 26, 2017 at 2:26 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> > Hi Amir:
> >
> > It is really glad to hear that the mlx5 will support RO mode this year, if so, do you agree that enable it dynamic by ethtool -s xxx,
> > we have try it several month ago but there was only one drivers would use it at that time so the maintainer against it, it mlx5 would support RO,
> > we could try to restart this solution, what do you think about it. :)
> >
> > Thanks
> > Ding
> 
> Hi Ding,
> 
> Enabing relaxed ordering really doesn't have any place in ethtool. It
> is a PCIe attribute that you are essentially wanting to enable.
> 
> It might be worth while to take a look at updating the PCIe code path
> to handle this. Really what we should probably do is guarantee that
> the architectures that need relaxed ordering are setting it in the
> PCIe Device Control register and that the ones that don't are clearing
> the bit. It's possible that this is already occurring, but I don't
> know the state of handling those bits is in the kernel. Once we can
> guarantee that we could use that to have the drivers determine their
> behavior in regards to relaxed ordering. For example in the case of
> igb/ixgbe we could probably change the behavior so that it will bey
> default try to use relaxed ordering but if it is not enabled in PCIe
> Device Control register the hardware should not request to use it. It
> would simplify things in the drivers and allow for each architecture
> to control things as needed in their PCIe code.

I thought Relaxed Ordering was an optimization.  Are there cases where
it is actually required for correct behavior?

The PCI core doesn't currently do anything with Relaxed Ordering.
Some drivers enable/disable it directly.  I think it would probably be
better if the core provided an interface for this.  One reason is
because I think Casey has identified some systems where Relaxed
Ordering doesn't work correctly, and I'd rather deal with them once in
the core than in every driver.

Are you hinting that the PCI core or arch code could actually *enable*
Relaxed Ordering without the driver doing anything?  Is it safe to do
that?  Is there such a thing as a device that is capable of using RO,
but where the driver must be aware of it being enabled, so it programs
the device appropriately?

Bjorn

^ permalink raw reply

* Re: [PATCH 5/7] IB/hfi1: use pcie_flr instead of duplicating it
From: Bjorn Helgaas @ 2017-04-27 16:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Byczkowski, Jakub, Bjorn Helgaas, Cabiddu, Giovanni,
	Benedetto, Salvatore, Marciniszyn, Mike, Dalessandro, Dennis,
	Derek Chickles, Satanand Burla, Felix Manlunas, Raghu Vatsavayi,
	Kirsher, Jeffrey T, linux-pci@vger.kernel.org, qat-linux,
	linux-crypto@vger.kernel.org, linux-rdma@vger.kernel.org,
	"net
In-Reply-To: <20170427064758.GA20614@lst.de>

On Thu, Apr 27, 2017 at 08:47:58AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 25, 2017 at 02:39:55PM -0500, Bjorn Helgaas wrote:
> > This still leaves these:
> > 
> >   [PATCH 4/7] ixgbe: use pcie_flr instead of duplicating it
> >   [PATCH 6/7] crypto: qat: use pcie_flr instead of duplicating it
> >   [PATCH 7/7] liquidio: use pcie_flr instead of duplicating it
> > 
> > I haven't seen any response to 4 and 6.  Felix reported an unused
> > variable in 7.  Let me know if you'd like me to do anything with
> > these.
> 
> Now that Jeff ACKed 4 it might be worth to add it to the pci tree last
> minute.  I'll resend liquidio and qat to the respective maintainers for
> the next merge window.

I applied 4 with Jeff's ack to pci/virtualization for v4.12, thanks!

^ permalink raw reply

* Re: [PATCH net-next] samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel
From: David Miller @ 2017-04-27 16:49 UTC (permalink / raw)
  To: ast; +Cc: dsa, netdev, daniel
In-Reply-To: <085b8b0f-7751-0be4-d3b7-3c06f2cc602b@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 27 Apr 2017 09:38:37 -0700

> On 4/27/17 9:11 AM, David Ahern wrote:
>> Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
>> SKB_MODE:
>>  - update set_link_xdp_fd to take a flags argument that is added to the
>>    RTM_SETLINK message
>>
>>  - Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
>>    XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> 
> awesome. thanks!
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Indeed, very awesome.

Applied!

^ permalink raw reply

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Willem de Bruijn @ 2017-04-27 16:48 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <20170427163911.GC3401@localhost>

On Thu, Apr 27, 2017 at 12:39 PM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Thu, Apr 27, 2017 at 12:21:00PM -0400, Willem de Bruijn wrote:
>> >> > @@ -720,6 +720,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>> >> >                 empty = 0;
>> >> >         if (shhwtstamps &&
>> >> >             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
>> >> > +           (empty || !skb_is_err_queue(skb)) &&
>> >> >             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
>> >>
>> >> I find skb->tstamp == 0 easier to understand than the condition on empty.
>> >>
>> >> Indeed, this is so non-obvious that I would suggest another helper function
>> >> skb_is_hwtx_tstamp with a concise comment about the race condition
>> >> between tx software and hardware timestamps (as in the last sentence of
>> >> the commit message).
>> >
>> > Should it include also the skb_is_err_queue() check? If it returned
>> > true for both TX and RX HW timestamps, maybe it could be called
>> > skb_has_hw_tstamp?
>>
>> For the purpose of documenting why this complex condition exists,
>> I would call the skb_is_err_queue in that helper function and make
>> it tx + hw specific.
>
> Hm, like this?
>
>         if (shhwtstamps &&
>             (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> +           (skb_is_hwtx_tstamp(skb) || !skb_is_err_queue(skb)) &&
>             ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
>
> where skb_is_hwtx_tstamp() has
>         return skb->tstamp == 0 && skb_is_err_queue(skb);
>
> I was just not sure about the unnecessary skb_is_err_queue() call.

Oh, good point. If the condition is

  (skb_is_err_queue(skb) && !skb->tstamp) || !skb_is_err_queue(skb)

then it makes more sense to just use the simpler expression

  (!skb_is_err_queue(skb)) || (!skb->tstamp)

This cannot be called skb_is_hwtx_tstamp, as this does not imply
skb_hwtstamps(skb). Perhaps instead define

  /* On transmit, software and hardware timestamps are returned independently.
   * Do not return hardware timestamps even if skb_hwtstamps(skb) is true if
   * skb->tstamp is set
   */
  static bool skb_is_swtx_tstamp(skb) {
    return skb_is_err_queue(skb) && skb->tstamp;
  }

and use !skb_is_swtx_tstamp(skb) in this condition. The comment is
just a quick first try, can probably be improved.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox