Netdev List
 help / color / mirror / Atom feed
* Re: Deleting child qdisc doesn't reset parent to default qdisc?
From: Eric Dumazet @ 2016-04-14 16:40 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Jiri Kosina, Jamal Hadi Salim, netdev, linux-kernel
In-Reply-To: <20160414162229.GF3715@orbyte.nwl.cc>

On Thu, 2016-04-14 at 18:22 +0200, Phil Sutter wrote:

> And those being invisible can be overridden using 'tc qd add', right?
> AFAIR they're not listed because they don't properly register, so the
> system doesn't care to override them. In this case we could change all
> classful qdiscs to restore the default qdisc if a leaf qdisc is being
> deleted instead of noop (which is probably not what the user wants
> anyway).

Even if they properly register, they are not visible.

Take a look at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95dc19299f741c986227ec33e23cbf9b3321f812

for some context.

When a default pfifo is created on say a HTB class, you do not see it by
default in a dump.

If you have 100 HTB classes, HTB created 100 pfifo just fine, and it
works, unless an admin tries to delete them maybe ;)

^ permalink raw reply

* Re: [PATCH 1/2] [v4] net: emac: emac gigabit ethernet controller driver
From: Timur Tabi @ 2016-04-14 16:47 UTC (permalink / raw)
  To: Rob Herring
  Cc: netdev, linux-kernel, devicetree, linux-arm-msm, sdharia,
	Shanker Donthineni, Greg Kroah-Hartman, vikrams, cov, gavidov,
	andrew, bjorn.andersson, Mark Langsdorf, Jon Masters, Andy Gross,
	David S. Miller
In-Reply-To: <20160414163240.GB15303@rob-hp-laptop>

Rob Herring wrote:

>> @@ -0,0 +1,65 @@
>> +Qualcomm EMAC Gigabit Ethernet Controller
>> +
>> +Required properties:
>> +- compatible : Should be "qcom,emac".
>
> Come on... Can you guess what I'm going to say here.

Ooops, I missed that one.

>
>> +- reg : Offset and length of the register regions for the device
>> +- reg-names : Register region names referenced in 'reg' above.
>> +	Required register resource entries are:
>> +	"base"   : EMAC controller base register block.
>> +	"csr"    : EMAC wrapper register block.
>> +	Optional register resource entries are:
>> +	"ptp"    : EMAC PTP (1588) register block.
>> +		   Required if 'qcom,emac-tstamp-en' is present.
>> +	"sgmii"  : EMAC SGMII PHY register block.
>> +- interrupts : Interrupt numbers used by this controller
>> +- interrupt-names : Interrupt resource names referenced in 'interrupts' above.
>> +	Required interrupt resource entries are:
>> +	"emac_core0"   : EMAC core0 interrupt.
>> +	"sgmii_irq"   : EMAC SGMII interrupt.
>> +- phy-addr            : Specifies phy address on MDIO bus.
>> +			Required if the optional property "qcom,no-external-phy"
>> +			is not specified.
>
> As I mentioned in the last version, you should still describe this with
> a standard MDIO bus binding even if you can't use the generic code.

You mean like this?

	phy0: ethernet-phy@0 {
		compatible = "qcom,fsm9900-emac-phy";
		reg = <4>;
	};

>> +Optional properties:
>> +- qcom,emac-tstamp-en       : Enables the PTP (1588) timestamping feature.
>> +			      Include this only if PTP (1588) timestamping
>> +			      feature is needed. If included, "ptp" register
>> +			      base should be specified.
>> +- mac-address               : The 6-byte MAC address. If present, it is the
>> +			      default MAC address.
>> +- qcom,no-external-phy      : Indicates there is no external PHY connected to
>> +			      EMAC. Include this only if the EMAC is directly
>> +			      connected to the peer end without EPHY.
>> +Example:
>> +	emac0: qcom,emac@feb20000 {
>
> ethernet@
>
>> +		compatible = "qcom,fsm9900-emac";
>
> Ah, I see you fixed it here...

and in the code, I just missed it in the top of the file.  I'll fix it 
everywhere in v5.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

^ permalink raw reply

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
From: Marcelo Ricardo Leitner @ 2016-04-14 17:00 UTC (permalink / raw)
  To: Neil Horman, David Miller
  Cc: netdev, vyasevich, linux-sctp, David.Laight, jkbs
In-Reply-To: <20160414130324.GA6806@hmsreliant.think-freely.org>

Em 14-04-2016 10:03, Neil Horman escreveu:
> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Date: Fri,  8 Apr 2016 16:41:26 -0300
>>
>>> 1st patch is a preparation for the 2nd. The idea is to not call
>>> ->sk_data_ready() for every data chunk processed while processing
>>> packets but only once before releasing the socket.
>>>
>>> v2: patchset re-checked, small changelog fixes
>>> v3: on patch 2, make use of local vars to make it more readable
>>
>> Applied to net-next, but isn't this reduced overhead coming at the
>> expense of latency?  What if that lower latency is important to the
>> application and/or consumer?
> Thats a fair point, but I'd make the counter argument that, as it currently
> stands, any latency introduced (or removed), is an artifact of our
> implementation rather than a designed feature of it.  That is to say, we make no
> guarantees at the application level regarding how long it takes to signal data
> readines from the time we get data off the wire, so I would rather see our
> throughput raised if we can, as thats been sctp's more pressing achilles heel.
>
>
> Thats not to say I'd like to enable lower latency, but I'd rather have this now,
> and start pondering how to design that in.  Perhaps we can convert the pending
> flag to a counter to count the number of events we enqueue, and call
> sk_data_ready every  time we reach a sysctl defined threshold.

That and also that there is no chance of the application reading the 
first chunks before all current ToDo's are performed by either the bh or 
backlog handlers for that packet. Socket lock won't be cycled in between 
chunks so the application is going to wait all the processing one way or 
another.

Thanks,
Marcelo

^ permalink raw reply

* Re: [PATCH 1/2] [v4] net: emac: emac gigabit ethernet controller driver
From: Rob Herring @ 2016-04-14 17:18 UTC (permalink / raw)
  To: Timur Tabi
  Cc: netdev, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	linux-arm-msm, Sagar Dharia, Shanker Donthineni,
	Greg Kroah-Hartman, vikrams, Christopher Covington, Gilad Avidov,
	Andrew Lunn, Bjorn Andersson, Mark Langsdorf, Jon Masters,
	Andy Gross, David S. Miller
In-Reply-To: <570FC987.80304@codeaurora.org>

On Thu, Apr 14, 2016 at 11:47 AM, Timur Tabi <timur@codeaurora.org> wrote:
> Rob Herring wrote:
>
>>> @@ -0,0 +1,65 @@
>>> +Qualcomm EMAC Gigabit Ethernet Controller
>>> +
>>> +Required properties:
>>> +- compatible : Should be "qcom,emac".
>>
>>
>> Come on... Can you guess what I'm going to say here.
>
>
> Ooops, I missed that one.
>
>>
>>> +- reg : Offset and length of the register regions for the device
>>> +- reg-names : Register region names referenced in 'reg' above.
>>> +       Required register resource entries are:
>>> +       "base"   : EMAC controller base register block.
>>> +       "csr"    : EMAC wrapper register block.
>>> +       Optional register resource entries are:
>>> +       "ptp"    : EMAC PTP (1588) register block.
>>> +                  Required if 'qcom,emac-tstamp-en' is present.
>>> +       "sgmii"  : EMAC SGMII PHY register block.
>>> +- interrupts : Interrupt numbers used by this controller
>>> +- interrupt-names : Interrupt resource names referenced in 'interrupts'
>>> above.
>>> +       Required interrupt resource entries are:
>>> +       "emac_core0"   : EMAC core0 interrupt.
>>> +       "sgmii_irq"   : EMAC SGMII interrupt.
>>> +- phy-addr            : Specifies phy address on MDIO bus.
>>> +                       Required if the optional property
>>> "qcom,no-external-phy"
>>> +                       is not specified.
>>
>>
>> As I mentioned in the last version, you should still describe this with
>> a standard MDIO bus binding even if you can't use the generic code.
>
>
> You mean like this?
>
>         phy0: ethernet-phy@0 {
>                 compatible = "qcom,fsm9900-emac-phy";
>                 reg = <4>;

Yes, but you mean 0 here or 4 for unit address.

^ permalink raw reply

* [net][PATCH v2 1/2] RDS: fix endianness for dp_ack_seq
From: Santosh Shilimkar @ 2016-04-14 17:43 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar
In-Reply-To: <1460655807-26236-1-git-send-email-santosh.shilimkar@oracle.com>

From: Qing Huang <qing.huang@oracle.com>

dp->dp_ack_seq is used in big endian format. We need to do the
big endianness conversion when we assign a value in host format
to it.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib_cm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 8764970..310cabc 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -194,7 +194,7 @@ static void rds_ib_cm_fill_conn_param(struct rds_connection *conn,
 		dp->dp_protocol_major = RDS_PROTOCOL_MAJOR(protocol_version);
 		dp->dp_protocol_minor = RDS_PROTOCOL_MINOR(protocol_version);
 		dp->dp_protocol_minor_mask = cpu_to_be16(RDS_IB_SUPPORTED_PROTOCOLS);
-		dp->dp_ack_seq = rds_ib_piggyb_ack(ic);
+		dp->dp_ack_seq = cpu_to_be64(rds_ib_piggyb_ack(ic));
 
 		/* Advertise flow control */
 		if (ic->i_flowctl) {
-- 
1.9.1

^ permalink raw reply related

* [net][PATCH v2 2/2] RDS: Fix the atomicity for congestion map update
From: Santosh Shilimkar @ 2016-04-14 17:43 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel, Santosh Shilimkar
In-Reply-To: <1460655807-26236-1-git-send-email-santosh.shilimkar@oracle.com>

Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.

Full credit to Wengang <wen.gang.wang@oracle.com> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.

Reviewed-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/cong.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/cong.c b/net/rds/cong.c
index e6144b8..6641bcf 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -299,7 +299,7 @@ void rds_cong_set_bit(struct rds_cong_map *map, __be16 port)
 	i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS;
 	off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS;
 
-	__set_bit_le(off, (void *)map->m_page_addrs[i]);
+	set_bit_le(off, (void *)map->m_page_addrs[i]);
 }
 
 void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port)
@@ -313,7 +313,7 @@ void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port)
 	i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS;
 	off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS;
 
-	__clear_bit_le(off, (void *)map->m_page_addrs[i]);
+	clear_bit_le(off, (void *)map->m_page_addrs[i]);
 }
 
 static int rds_cong_test_bit(struct rds_cong_map *map, __be16 port)
-- 
1.9.1

^ permalink raw reply related

* [net][PATCH v2 0/2] RDS: couple of fixes for 4.6
From: Santosh Shilimkar @ 2016-04-14 17:43 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel

v2:
Rebased fixes against 'net' instead of 'net-next' Patches are also
available at below git tree.

The following changes since commit e013b7780c41b471c4269ac9ccafb65ba7c9ec86:

  Merge branch 'dsa-voidify-ops' (2016-04-08 16:51:15 -0400)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net/rds-fixes

for you to fetch changes up to e9155afb1902380938ca83ba8504aaa2d7ee5210:

  RDS: Fix the atomicity for congestion map update (2016-04-08 15:08:13 -0700)

----------------------------------------------------------------
Qing Huang (1):
      RDS: fix endianness for dp_ack_seq

Santosh Shilimkar (1):
      RDS: Fix the atomicity for congestion map update

 net/rds/cong.c  | 4 ++--
 net/rds/ib_cm.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

Regards,
Santosh

^ permalink raw reply

* Re: [net-next][PATCH 0/2] RDS: couple of fixes for 4.6
From: santosh.shilimkar @ 2016-04-14 17:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20160413.233637.2047659923049596108.davem@davemloft.net>

On 4/13/16 8:36 PM, David Miller wrote:
> From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> Date: Fri,  8 Apr 2016 15:26:38 -0700
>
>> Patches are also available at below git tree.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net-next/rds-fixes
>
> "Bug fixes for 4.6" do not get targetted at the net-next tree, that's
> for 4.7 development.
>
Sorry. Should have based it against 'net'. Just posted re-based version.
Thanks !!

Regards,
Santosh

^ permalink raw reply

* Re: Deleting child qdisc doesn't reset parent to default qdisc?
From: Eric Dumazet @ 2016-04-14 17:49 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Phil Sutter, Jamal Hadi Salim, netdev, linux-kernel
In-Reply-To: <alpine.LNX.2.00.1604141807350.27368@cbobk.fhfr.pm>

On Thu, 2016-04-14 at 18:08 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
> 
> > > > I've came across the behavior where adding a child qdisc and then deleting 
> > > > it again makes the networking dysfunctional (I guess that's because all of 
> > > > a sudden there is absolutely no working qdisc on the device, although 
> > > > there originally was a default one in the parent).
> > > > 
> > > > In a nutshell, is this expected behavior or bug?
> > > 
> > > This is the expected behavior.
> > 
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
> 
> Would a patch that'd unify this in a sense that all qdiscs would assign 
> the default one upon deletion acceptable?
> 

And what would be the chosen behavior ?

Relying on TBF installing a bfifo for you at delete would be hazardous.

For example CBQ got it differently than HFSC

If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
falls back to noop_qdisc, without warning the user :(

At least always using noop_qdisc is consistent. No magic there.

Doing 'unification' right now would break existing scripts.

This is too late, I am afraid.

^ permalink raw reply

* [PATCH pci] pci: Add helper function to set VPD size
From: Hariprasad Shenai @ 2016-04-14 18:12 UTC (permalink / raw)
  To: bhelgaas, davem
  Cc: linux-pci, netdev, leedom, swise, nirranjan, santosh,
	Hariprasad Shenai

commit 104daa71b396 ("PCI: Determine actual VPD size on first access")
introduced a regression in cxgb4 driver and used to fail in pci probe.

The problem is stemming from the fact that the Chelsio adapters actually
have two VPD structures stored in the VPD. An abbreviated on at Offset 0x0
and the complete VPD at Offset 0x400. The abbreviated one only contains
the PN, SN and EC Keywords, while the complete VPD contains those plus
various adapter constants contained in V0, V1, etc. And it also contains
the Base Ethernet MAC Address in the "NA" Keyword which the cxgb4 driver
needs when it can't contact the adapter firmware. (We don't have the "NA"
Keywork in the VPD Structure at Offset 0x0 because that's not an allowed
VPD Keyword in the PCI-E 3.0 specification.)

With the new code, the computed size of the VPD is 0x200 and so our efforts
to read the VPD at Offset 0x400 silently fails. We check the result of the
read looking for a signature 0x82 byte but we're checking against random
stack garbage.

The fix is to add a PCI helper function to set the VPD size, so the
driver can expicitly set the exact size of the VPD.

Fixes commit 104daa71b396 ("PCI: Determine actual VPD size on first access")

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 10 +++++++
 drivers/pci/access.c                       | 42 ++++++++++++++++++++++++++++++
 drivers/pci/pci.h                          |  1 +
 include/linux/pci.h                        |  1 +
 4 files changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index cc1736bece0f..2033159e26a5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -2557,6 +2557,7 @@ void t4_get_regs(struct adapter *adap, void *buf, size_t buf_size)
 }
 
 #define EEPROM_STAT_ADDR   0x7bfc
+#define VPD_SIZE           0x800
 #define VPD_BASE           0x400
 #define VPD_BASE_OLD       0
 #define VPD_LEN            1024
@@ -2594,6 +2595,15 @@ int t4_get_raw_vpd_params(struct adapter *adapter, struct vpd_params *p)
 	if (!vpd)
 		return -ENOMEM;
 
+	/* We have two VPD data structures stored in the adapter VPD area.
+	 * By default, Linux calculates the size of the VPD area by traversing
+	 * the first VPD area at offset 0x0, so we need to tell the OS what
+	 * our real VPD size is.
+	 */
+	ret = pci_set_size_vpd(adapter->pdev, VPD_SIZE);
+	if (ret < 0)
+		goto out;
+
 	/* Card information normally starts at VPD_BASE but early cards had
 	 * it at 0.
 	 */
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 01b9d0a00abc..e69b3877bd37 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -275,6 +275,19 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void
 }
 EXPORT_SYMBOL(pci_write_vpd);
 
+/**
+ * pci_set_size_vpd - Set size of Vital Product Data space
+ * @dev:	pci device struct
+ * @len:	size of vpd space
+ */
+ssize_t pci_set_size_vpd(struct pci_dev *dev, size_t len)
+{
+	if (!dev->vpd || !dev->vpd->ops)
+		return -ENODEV;
+	return dev->vpd->ops->set_size(dev, len);
+}
+EXPORT_SYMBOL(pci_set_size_vpd);
+
 #define PCI_VPD_MAX_SIZE (PCI_VPD_ADDR_MASK + 1)
 
 /**
@@ -498,9 +511,23 @@ out:
 	return ret ? ret : count;
 }
 
+static ssize_t pci_vpd_set_size(struct pci_dev *dev, size_t len)
+{
+	struct pci_vpd *vpd = dev->vpd;
+
+	if (len == 0 || len > PCI_VPD_MAX_SIZE)
+		return -EIO;
+
+	vpd->valid = 1;
+	vpd->len = len;
+
+	return 0;
+}
+
 static const struct pci_vpd_ops pci_vpd_ops = {
 	.read = pci_vpd_read,
 	.write = pci_vpd_write,
+	.set_size = pci_vpd_set_size,
 };
 
 static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
@@ -533,9 +560,24 @@ static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
 	return ret;
 }
 
+static ssize_t pci_vpd_f0_set_size(struct pci_dev *dev, size_t len)
+{
+	struct pci_dev *tdev = pci_get_slot(dev->bus,
+					    PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+	ssize_t ret;
+
+	if (!tdev)
+		return -ENODEV;
+
+	ret = pci_set_size_vpd(tdev, len);
+	pci_dev_put(tdev);
+	return ret;
+}
+
 static const struct pci_vpd_ops pci_vpd_f0_ops = {
 	.read = pci_vpd_f0_read,
 	.write = pci_vpd_f0_write,
+	.set_size = pci_vpd_f0_set_size,
 };
 
 int pci_vpd_init(struct pci_dev *dev)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d0fb93481573..8239d186f1ed 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -97,6 +97,7 @@ static inline bool pci_has_subordinate(struct pci_dev *pci_dev)
 struct pci_vpd_ops {
 	ssize_t (*read)(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
 	ssize_t (*write)(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
+	ssize_t (*set_size)(struct pci_dev *dev, size_t len);
 };
 
 struct pci_vpd {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 004b8133417d..1ab1b7458a8b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1111,6 +1111,7 @@ void pci_unlock_rescan_remove(void);
 /* Vital product data routines */
 ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);
 ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf);
+ssize_t pci_set_size_vpd(struct pci_dev *dev, size_t len);
 
 /* Helper functions for low-level code (drivers/pci/setup-[bus,res].c) */
 resource_size_t pcibios_retrieve_fw_addr(struct pci_dev *dev, int idx);
-- 
2.3.4

^ permalink raw reply related

* [PATCH iproute2] ip: neigh: Fix leftover attributes message during flush
From: Jeff Harris @ 2016-04-14 18:15 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Harris

Use the same rtnl_dump_request_n call as the show.  The rtnl_wilddump_request
assumes the type uses an ifinfomsg which is not the case for the neighbor
table.

Signed-off-by: Jeff Harris <jefftharris@gmail.com>
---
 ip/ipneigh.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index c49fb4e..4ddb747 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -430,6 +430,8 @@ static int do_show_or_flush(int argc, char **argv, int flush)
 		addattr32(&req.n, sizeof(req), NDA_IFINDEX, filter.index);
 	}
 
+	req.ndm.ndm_family = filter.family;
+
 	if (flush) {
 		int round = 0;
 		char flushb[4096-512];
@@ -440,7 +442,7 @@ static int do_show_or_flush(int argc, char **argv, int flush)
 		filter.state &= ~NUD_FAILED;
 
 		while (round < MAX_ROUNDS) {
-			if (rtnl_wilddump_request(&rth, filter.family, RTM_GETNEIGH) < 0) {
+			if (rtnl_dump_request_n(&rth, &req.n) < 0) {
 				perror("Cannot send dump request");
 				exit(1);
 			}
@@ -472,8 +474,6 @@ static int do_show_or_flush(int argc, char **argv, int flush)
 		return 1;
 	}
 
-	req.ndm.ndm_family = filter.family;
-
 	if (rtnl_dump_request_n(&rth, &req.n) < 0) {
 		perror("Cannot send dump request");
 		exit(1);
-- 
1.7.9.5

^ permalink raw reply related

* RE: [PATCH pci] pci: Add helper function to set VPD size
From: Steve Wise @ 2016-04-14 18:35 UTC (permalink / raw)
  To: 'Hariprasad Shenai', bhelgaas, davem
  Cc: linux-pci, netdev, leedom, nirranjan, santosh
In-Reply-To: <1460657525-17551-1-git-send-email-hariprasad@chelsio.com>

> The fix is to add a PCI helper function to set the VPD size, so the
> driver can expicitly set the exact size of the VPD.
> 
> Fixes commit 104daa71b396 ("PCI: Determine actual VPD size on first access")
> 
> Signed-off-by: Casey Leedom <leedom@chelsio.com>
> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>

Looks good!
  
Tested-by: Steve Wise <swise@opengridcomputing.com>

^ permalink raw reply

* [PATCH net 0/3] net: dsa: mv88e6xxx: fix hardware cross-chip bridging
From: Vivien Didelot @ 2016-04-14 18:42 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot

In order to accelerate cross-chip switching of frames with the hardware,
the DSA Tag ports, used to interconnect switch devices, must learn SA
and DA addresses, and share the same FDB with the user ports.

The two first patches restore address learning on DSA links. This fixes
hardware cross-chip bridging in a VLAN filtering enabled system, which
implements a bridge group as a 802.1Q VLAN and thus share an isolated
address database between DSA and user ports.

The third patch changes the distinct default databases used for each
port, to the same address database. This fixes the hardware cross-chip
bridging in a VLAN filtering disabled system, where a bridge group gets
implemented only as a port-based VLAN.

Vivien Didelot (3):
  net: dsa: mv88e6xxx: unlock DSA and CPU ports
  net: dsa: mv88e6xxx: enable SA learning on DSA ports
  net: dsa: mv88e6xxx: share a default FDB

 drivers/net/dsa/mv88e6xxx.c | 34 +++++-----------------------------
 1 file changed, 5 insertions(+), 29 deletions(-)

-- 
2.8.0

^ permalink raw reply

* [PATCH net 1/3] net: dsa: mv88e6xxx: unlock DSA and CPU ports
From: Vivien Didelot @ 2016-04-14 18:42 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <1460659329-11473-1-git-send-email-vivien.didelot@savoirfairelinux.com>

Locking a port generates an hardware interrupt when a new SA address is
received. This enables CPU directed learning, which is needed for 802.1X
MAC authentication.

To disable automatic learning on a port, the only configuration needed
is to set its Port Association Vector to all zero.

Clear PAV when SA learning should be disabled instead of locking a port.

Fixes: 4c7ea3c0791e ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 9985a0c..7725e29 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2544,7 +2544,7 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 	reg = 1 << port;
 	/* Disable learning for DSA and CPU ports */
 	if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
-		reg = PORT_ASSOC_VECTOR_LOCKED_PORT;
+		reg = 0;
 
 	ret = _mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_ASSOC_VECTOR, reg);
 	if (ret)
-- 
2.8.0

^ permalink raw reply related

* [PATCH net 2/3] net: dsa: mv88e6xxx: enable SA learning on DSA ports
From: Vivien Didelot @ 2016-04-14 18:42 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <1460659329-11473-1-git-send-email-vivien.didelot@savoirfairelinux.com>

In multi-chip systems, DSA Tag ports must learn SA addresses in order to
correctly switch frames between interconnected chips.

This fixes cross-chip hardware bridging in a VLAN filtering aware
system, because a bridge group gets implemented as an hardware 802.1Q
VLAN and thus DSA and user ports share the same FDB.

Fixes: 4c7ea3c0791e ("net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 7725e29..46c9b4c 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2542,8 +2542,8 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 	 * the other bits clear.
 	 */
 	reg = 1 << port;
-	/* Disable learning for DSA and CPU ports */
-	if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
+	/* Disable learning for CPU port */
+	if (dsa_is_cpu_port(ds, port))
 		reg = 0;
 
 	ret = _mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_ASSOC_VECTOR, reg);
-- 
2.8.0

^ permalink raw reply related

* [PATCH net 3/3] net: dsa: mv88e6xxx: share the same default FDB
From: Vivien Didelot @ 2016-04-14 18:42 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <1460659329-11473-1-git-send-email-vivien.didelot@savoirfairelinux.com>

For hardware cross-chip bridging to work, user ports *and* DSA ports
need to share a common address database, in order to switch a frame to
the correct interconnected device.

This is currently working for VLAN filtering aware systems, since Linux
will implement a bridge group as a 802.1Q VLAN, which has its own FDB,
including DSA and CPU links as members.

However when the system doesn't support VLAN filtering, Linux only
relies on the port-based VLAN to implement a bridge group.

To fix hardware cross-chip bridging for such systems, set the same
default address database 0 for user and DSA ports, instead of giving
them all a different default database.

Note that the bridging code prevents frames to egress between unbridged
ports, and flushes FDB entries of a port when changing its STP state.

Also note that the FID 0 is special and means "all" for ATU operations,
but it's OK since it is used as a default forwarding address database.

Fixes: 2db9ce1fd9a3 ("net: dsa: mv88e6xxx: assign default FDB to ports")
Fixes: 466dfa077022 ("net: dsa: mv88e6xxx: assign dynamic FDB to bridges")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx.c | 28 ++--------------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 46c9b4c..b76f870 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2246,27 +2246,10 @@ int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port,
 			       struct net_device *bridge)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-	u16 fid;
 	int i, err;
 
 	mutex_lock(&ps->smi_mutex);
 
-	/* Get or create the bridge FID and assign it to the port */
-	for (i = 0; i < ps->num_ports; ++i)
-		if (ps->ports[i].bridge_dev == bridge)
-			break;
-
-	if (i < ps->num_ports)
-		err = _mv88e6xxx_port_fid_get(ds, i, &fid);
-	else
-		err = _mv88e6xxx_fid_new(ds, &fid);
-	if (err)
-		goto unlock;
-
-	err = _mv88e6xxx_port_fid_set(ds, port, fid);
-	if (err)
-		goto unlock;
-
 	/* Assign the bridge and remap each port's VLANTable */
 	ps->ports[port].bridge_dev = bridge;
 
@@ -2278,7 +2261,6 @@ int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port,
 		}
 	}
 
-unlock:
 	mutex_unlock(&ps->smi_mutex);
 
 	return err;
@@ -2288,16 +2270,10 @@ void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 	struct net_device *bridge = ps->ports[port].bridge_dev;
-	u16 fid;
 	int i;
 
 	mutex_lock(&ps->smi_mutex);
 
-	/* Give the port a fresh Filtering Information Database */
-	if (_mv88e6xxx_fid_new(ds, &fid) ||
-	    _mv88e6xxx_port_fid_set(ds, port, fid))
-		netdev_warn(ds->ports[port], "failed to assign a new FID\n");
-
 	/* Unassign the bridge and remap each port's VLANTable */
 	ps->ports[port].bridge_dev = NULL;
 
@@ -2624,11 +2600,11 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
 	if (ret)
 		goto abort;
 
-	/* Port based VLAN map: give each port its own address
+	/* Port based VLAN map: give each port the same default address
 	 * database, and allow bidirectional communication between the
 	 * CPU and DSA port(s), and the other ports.
 	 */
-	ret = _mv88e6xxx_port_fid_set(ds, port, port + 1);
+	ret = _mv88e6xxx_port_fid_set(ds, port, 0);
 	if (ret)
 		goto abort;
 
-- 
2.8.0

^ permalink raw reply related

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
From: David Miller @ 2016-04-14 18:59 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: nhorman, netdev, vyasevich, linux-sctp, David.Laight, jkbs
In-Reply-To: <570FCCC1.6090504@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Thu, 14 Apr 2016 14:00:49 -0300

> Em 14-04-2016 10:03, Neil Horman escreveu:
>> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
>>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Date: Fri,  8 Apr 2016 16:41:26 -0300
>>>
>>>> 1st patch is a preparation for the 2nd. The idea is to not call
>>>> ->sk_data_ready() for every data chunk processed while processing
>>>> packets but only once before releasing the socket.
>>>>
>>>> v2: patchset re-checked, small changelog fixes
>>>> v3: on patch 2, make use of local vars to make it more readable
>>>
>>> Applied to net-next, but isn't this reduced overhead coming at the
>>> expense of latency?  What if that lower latency is important to the
>>> application and/or consumer?
>> Thats a fair point, but I'd make the counter argument that, as it
>> currently
>> stands, any latency introduced (or removed), is an artifact of our
>> implementation rather than a designed feature of it.  That is to say,
>> we make no
>> guarantees at the application level regarding how long it takes to
>> signal data
>> readines from the time we get data off the wire, so I would rather see
>> our
>> throughput raised if we can, as thats been sctp's more pressing
>> achilles heel.
>>
>>
>> Thats not to say I'd like to enable lower latency, but I'd rather have
>> this now,
>> and start pondering how to design that in.  Perhaps we can convert the
>> pending
>> flag to a counter to count the number of events we enqueue, and call
>> sk_data_ready every  time we reach a sysctl defined threshold.
> 
> That and also that there is no chance of the application reading the
> first chunks before all current ToDo's are performed by either the bh
> or backlog handlers for that packet. Socket lock won't be cycled in
> between chunks so the application is going to wait all the processing
> one way or another.

But it takes time to signal the wakeup to the remote cpu the process
was running on, schedule out the current process on that cpu (if it
has in fact lost it's timeslice), and then finally look at the socket
queue.

Of course this is all assuming the process was sleeping in the first
place, either in recv or more likely poll.

I really think signalling early helps performance.

^ permalink raw reply

* RE: [PATCH 1/1] hv_netvsc: Implement support for VF drivers on Hyper-V
From: KY Srinivasan @ 2016-04-14 19:08 UTC (permalink / raw)
  To: KY Srinivasan, davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com
In-Reply-To: <1460591887-24671-1-git-send-email-kys@microsoft.com>



> -----Original Message-----
> From: K. Y. Srinivasan [mailto:kys@microsoft.com]
> Sent: Wednesday, April 13, 2016 4:58 PM
> To: davem@davemloft.net; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; devel@linuxdriverproject.org; olaf@aepfle.de;
> apw@canonical.com; jasowang@redhat.com
> Cc: KY Srinivasan <kys@microsoft.com>
> Subject: [PATCH 1/1] hv_netvsc: Implement support for VF drivers on Hyper-
> V
> 
> Support VF drivers on Hyper-V. On Hyper-V, each VF instance presented to
> the guest has an associated synthetic interface that shares the MAC address
> with the VF instance. Typically these are bonded together to support
> live migration. By default, the host delivers all the incoming packets
> on the synthetic interface. Once the VF is up, we need to explicitly switch
> the data path on the host to divert traffic onto the VF interface. Even after
> switching the data path, broadcast and multicast packets are always
> delivered
> on the synthetic interface and these will have to be injected back onto the
> VF interface (if VF is up).
> This patch implements the necessary support in netvsc to support Linux
> VF drivers.

David,

Please drop this patch. I just discovered a merge issue and I am going to resubmit this
patch.

Regards,

K. Y

^ permalink raw reply

* Re: [patch net-next 00/18] devlink + mlxsw: add support for config and control of shared buffers
From: David Miller @ 2016-04-14 19:21 UTC (permalink / raw)
  To: jiri
  Cc: netdev, idosch, eladr, yotamg, ogerlitz, roopa, nikolay, jhs,
	john.fastabend, rami.rosen, gospo, stephen, sfeldma
In-Reply-To: <1460650770-19382-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Thu, 14 Apr 2016 18:19:12 +0200

> ASICs implement shared buffer for packet forwarding purposes and enable
> flexible partitioning of the shared buffer for different flows and ports,
> enabling non-blocking progress of different flows as well as separation
> of lossy traffic from loss-less traffic when using Per-Priority Flow
> Control (PFC). The shared buffer optimizes the buffer utilization for better
> absorption of packet bursts.
> 
> This patchset implements API which is based on the model SAI uses. That is
> aligned with multiple ASIC vendors so this API should be vendor neutral.
> 
> Userspace counterpart patchset for devlink iproute2 tool can be found here:
> https://github.com/jpirko/iproute2_mlxsw/tree/devlink_sb
> 
> Couple of examples of usage:
 ...

This looks really nice to me, series applied, thanks Jiri.

^ permalink raw reply

* Re: [PATCH v2 net-next 2/5] qed/qede: Add VXLAN tunnel slowpath configuration support
From: David Miller @ 2016-04-14 19:26 UTC (permalink / raw)
  To: manish.chopra; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460612313-20323-3-git-send-email-manish.chopra@qlogic.com>

From: Manish Chopra <manish.chopra@qlogic.com>
Date: Thu, 14 Apr 2016 01:38:30 -0400

> This patch enables VXLAN tunnel on the adapter and
> add support for driver hooks to configure UDP ports
> for VXLAN tunnel offload to be performed by the adapter.
> 
> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>

Why do you call all of these offloads the "slowpath"?

^ permalink raw reply

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
From: marcelo.leitner @ 2016-04-14 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, netdev, vyasevich, linux-sctp, David.Laight, jkbs
In-Reply-To: <20160414.145916.2286519059284215039.davem@davemloft.net>

On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Thu, 14 Apr 2016 14:00:49 -0300
> 
> > Em 14-04-2016 10:03, Neil Horman escreveu:
> >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> >>>
> >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> >>>> ->sk_data_ready() for every data chunk processed while processing
> >>>> packets but only once before releasing the socket.
> >>>>
> >>>> v2: patchset re-checked, small changelog fixes
> >>>> v3: on patch 2, make use of local vars to make it more readable
> >>>
> >>> Applied to net-next, but isn't this reduced overhead coming at the
> >>> expense of latency?  What if that lower latency is important to the
> >>> application and/or consumer?
> >> Thats a fair point, but I'd make the counter argument that, as it
> >> currently
> >> stands, any latency introduced (or removed), is an artifact of our
> >> implementation rather than a designed feature of it.  That is to say,
> >> we make no
> >> guarantees at the application level regarding how long it takes to
> >> signal data
> >> readines from the time we get data off the wire, so I would rather see
> >> our
> >> throughput raised if we can, as thats been sctp's more pressing
> >> achilles heel.
> >>
> >>
> >> Thats not to say I'd like to enable lower latency, but I'd rather have
> >> this now,
> >> and start pondering how to design that in.  Perhaps we can convert the
> >> pending
> >> flag to a counter to count the number of events we enqueue, and call
> >> sk_data_ready every  time we reach a sysctl defined threshold.
> > 
> > That and also that there is no chance of the application reading the
> > first chunks before all current ToDo's are performed by either the bh
> > or backlog handlers for that packet. Socket lock won't be cycled in
> > between chunks so the application is going to wait all the processing
> > one way or another.
> 
> But it takes time to signal the wakeup to the remote cpu the process
> was running on, schedule out the current process on that cpu (if it
> has in fact lost it's timeslice), and then finally look at the socket
> queue.
> 
> Of course this is all assuming the process was sleeping in the first
> place, either in recv or more likely poll.
> 
> I really think signalling early helps performance.

I see. Okay, I'll revisit this, thanks.

  Marcelo

^ permalink raw reply

* [net-next PATCH 0/5] Add support for offloads with IPv6 GRE tunnels
From: Alexander Duyck @ 2016-04-14 19:33 UTC (permalink / raw)
  To: jesse, netdev, davem, alexander.duyck, tom

This patch series enables the use of segmentation and checksum offloads
with IPv6 based GRE tunnels.

In order to enable this series I had to make a change to
iptunnel_handle_offloads so that it would no longer free the skb.  This was
necessary as there were multiple paths in the IPv6 GRE code that required
the skb to still be present so it could be freed.  As it turned out I
believe this actually fixes a bug that was present in FOU/GUE based tunnels
anyway.

Below is a quick breakdown of the performance gains seen with a simple
netperf test passing traffic through a ip6gretap tunnel and then an i40e
interface:

Throughput Throughput  Local Local   Result 
           Units       CPU   Service Tag    
                       Util  Demand         
                       %  
3544.93    10^6bits/s  6.30  4.656   "before"
13081.75   10^6bits/s  3.75  0.752   "after"

---

Alexander Duyck (5):
      ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb
      ip6gretap: Fix MTU to allow for Ethernet header
      ip6gre: Add support for basic offloads offloads excluding GSO
      GRE: Add support for GRO/GSO of IPv6 GRE traffic
      ip6gre: Add support for GSO


 drivers/net/geneve.c            |   32 ++++++---------
 drivers/net/vxlan.c             |    6 +--
 include/net/ip_tunnels.h        |    2 -
 include/net/udp_tunnel.h        |    3 -
 net/ipv4/fou.c                  |   16 ++++----
 net/ipv4/gre_offload.c          |   14 ++++++-
 net/ipv4/ip_gre.c               |   20 +++-------
 net/ipv4/ip_tunnel_core.c       |   13 ++----
 net/ipv4/ipip.c                 |    7 +--
 net/ipv6/ip6_gre.c              |   81 +++++++++++++++++++++++++++------------
 net/ipv6/sit.c                  |   14 +++----
 net/netfilter/ipvs/ip_vs_xmit.c |    6 +--
 12 files changed, 116 insertions(+), 98 deletions(-)

^ permalink raw reply

* [net-next PATCH 1/5] ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb
From: Alexander Duyck @ 2016-04-14 19:33 UTC (permalink / raw)
  To: jesse, netdev, davem, alexander.duyck, tom
In-Reply-To: <20160414192709.12934.82858.stgit@ahduyck-xeon-server>

This patch updates the IP tunnel core function iptunnel_handle_offloads so
that we return an int and do not free the skb inside the function.  This
actually allows us to clean up several paths in several tunnels so that we
can free the skb at one point in the path without having to have a
secondary path if we are supporting tunnel offloads.

In addition it should resolve some double-free issues I have found in the
tunnels paths as I believe it is possible for us to end up triggering such
an event in the case of fou or gue.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 drivers/net/geneve.c            |   32 ++++++++++++--------------------
 drivers/net/vxlan.c             |    6 +++---
 include/net/ip_tunnels.h        |    2 +-
 include/net/udp_tunnel.h        |    3 +--
 net/ipv4/fou.c                  |   16 ++++++++--------
 net/ipv4/ip_gre.c               |   20 ++++++--------------
 net/ipv4/ip_tunnel_core.c       |   13 +++++--------
 net/ipv4/ipip.c                 |    7 +++----
 net/ipv6/sit.c                  |   14 ++++++--------
 net/netfilter/ipvs/ip_vs_xmit.c |    6 ++----
 10 files changed, 47 insertions(+), 72 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a9fbf17eb256..efbc7ceedc3a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -696,16 +696,12 @@ static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr);
 	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	if (unlikely(err))
 		goto free_rt;
-	}
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb)) {
-		err = PTR_ERR(skb);
+	err = udp_tunnel_handle_offloads(skb, udp_sum);
+	if (err)
 		goto free_rt;
-	}
 
 	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
 	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
@@ -733,16 +729,12 @@ static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 	min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
 			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct ipv6hdr);
 	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	if (unlikely(err))
 		goto free_dst;
-	}
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb)) {
-		err = PTR_ERR(skb);
+	err = udp_tunnel_handle_offloads(skb, udp_sum);
+	if (IS_ERR(skb))
 		goto free_dst;
-	}
 
 	gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
 	geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
@@ -937,7 +929,7 @@ static netdev_tx_t geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		err = geneve_build_skb(rt, skb, key->tun_flags, vni,
 				       info->options_len, opts, flags, xnet);
 		if (unlikely(err))
-			goto err;
+			goto tx_error;
 
 		tos = ip_tunnel_ecn_encap(key->tos, iip, skb);
 		ttl = key->ttl;
@@ -946,7 +938,7 @@ static netdev_tx_t geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		err = geneve_build_skb(rt, skb, 0, geneve->vni,
 				       0, NULL, flags, xnet);
 		if (unlikely(err))
-			goto err;
+			goto tx_error;
 
 		tos = ip_tunnel_ecn_encap(fl4.flowi4_tos, iip, skb);
 		ttl = geneve->ttl;
@@ -964,7 +956,7 @@ static netdev_tx_t geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 
 tx_error:
 	dev_kfree_skb(skb);
-err:
+
 	if (err == -ELOOP)
 		dev->stats.collisions++;
 	else if (err == -ENETUNREACH)
@@ -1026,7 +1018,7 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 					info->options_len, opts,
 					flags, xnet);
 		if (unlikely(err))
-			goto err;
+			goto tx_error;
 
 		prio = ip_tunnel_ecn_encap(key->tos, iip, skb);
 		ttl = key->ttl;
@@ -1035,7 +1027,7 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 		err = geneve6_build_skb(dst, skb, 0, geneve->vni,
 					0, NULL, flags, xnet);
 		if (unlikely(err))
-			goto err;
+			goto tx_error;
 
 		prio = ip_tunnel_ecn_encap(ip6_tclass(fl6.flowlabel),
 					   iip, skb);
@@ -1054,7 +1046,7 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
 
 tx_error:
 	dev_kfree_skb(skb);
-err:
+
 	if (err == -ELOOP)
 		dev->stats.collisions++;
 	else if (err == -ENETUNREACH)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7f697a3f00a4..a3bd67dce0ce 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1784,9 +1784,9 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
 	if (WARN_ON(!skb))
 		return -ENOMEM;
 
-	skb = iptunnel_handle_offloads(skb, type);
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
+	err = iptunnel_handle_offloads(skb, type);
+	if (err)
+		goto out_free;
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
 	vxh->vx_flags = VXLAN_HF_VNI;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 9ae9fbbccd67..6d790910ebdf 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -309,7 +309,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 					     gfp_t flags);
 
-struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, int gso_type_mask);
+int iptunnel_handle_offloads(struct sk_buff *skb, int gso_type_mask);
 
 static inline int iptunnel_pull_offloads(struct sk_buff *skb)
 {
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 2dcf1de948ac..4f543262dd81 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -105,8 +105,7 @@ struct metadata_dst *udp_tun_rx_dst(struct sk_buff *skb, unsigned short family,
 				    __be16 flags, __be64 tunnel_id,
 				    int md_size);
 
-static inline struct sk_buff *udp_tunnel_handle_offloads(struct sk_buff *skb,
-							 bool udp_csum)
+static inline int udp_tunnel_handle_offloads(struct sk_buff *skb, bool udp_csum)
 {
 	int type = udp_csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index d039f8fff57f..7ac5ec87b004 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -802,11 +802,11 @@ int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM :
 						       SKB_GSO_UDP_TUNNEL;
 	__be16 sport;
+	int err;
 
-	skb = iptunnel_handle_offloads(skb, type);
-
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
+	err = iptunnel_handle_offloads(skb, type);
+	if (err)
+		return err;
 
 	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
 					       skb, 0, 0, false);
@@ -826,6 +826,7 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	__be16 sport;
 	void *data;
 	bool need_priv = false;
+	int err;
 
 	if ((e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) &&
 	    skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -836,10 +837,9 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 
 	optlen += need_priv ? GUE_LEN_PRIV : 0;
 
-	skb = iptunnel_handle_offloads(skb, type);
-
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
+	err = iptunnel_handle_offloads(skb, type);
+	if (err)
+		return err;
 
 	/* Get source port (based on flow hash) before skb_push */
 	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index af5d1f38217f..eedd829a2f87 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -500,8 +500,7 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
 	ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
 }
 
-static struct sk_buff *gre_handle_offloads(struct sk_buff *skb,
-					   bool csum)
+static int gre_handle_offloads(struct sk_buff *skb, bool csum)
 {
 	return iptunnel_handle_offloads(skb, csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
 }
@@ -568,11 +567,8 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* Push Tunnel header. */
-	skb = gre_handle_offloads(skb, !!(tun_info->key.tun_flags & TUNNEL_CSUM));
-	if (IS_ERR(skb)) {
-		skb = NULL;
+	if (gre_handle_offloads(skb, !!(tun_info->key.tun_flags & TUNNEL_CSUM)))
 		goto err_free_rt;
-	}
 
 	flags = tun_info->key.tun_flags & (TUNNEL_CSUM | TUNNEL_KEY);
 	build_header(skb, tunnel_hlen, flags, htons(ETH_P_TEB),
@@ -640,16 +636,14 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
 		tnl_params = &tunnel->parms.iph;
 	}
 
-	skb = gre_handle_offloads(skb, !!(tunnel->parms.o_flags&TUNNEL_CSUM));
-	if (IS_ERR(skb))
-		goto out;
+	if (gre_handle_offloads(skb, !!(tunnel->parms.o_flags & TUNNEL_CSUM)))
+		goto free_skb;
 
 	__gre_xmit(skb, dev, tnl_params, skb->protocol);
 	return NETDEV_TX_OK;
 
 free_skb:
 	kfree_skb(skb);
-out:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 }
@@ -664,9 +658,8 @@ static netdev_tx_t gre_tap_xmit(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 	}
 
-	skb = gre_handle_offloads(skb, !!(tunnel->parms.o_flags&TUNNEL_CSUM));
-	if (IS_ERR(skb))
-		goto out;
+	if (gre_handle_offloads(skb, !!(tunnel->parms.o_flags & TUNNEL_CSUM)))
+		goto free_skb;
 
 	if (skb_cow_head(skb, dev->needed_headroom))
 		goto free_skb;
@@ -676,7 +669,6 @@ static netdev_tx_t gre_tap_xmit(struct sk_buff *skb,
 
 free_skb:
 	kfree_skb(skb);
-out:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 }
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 43445df61efd..f46c5c873831 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -146,8 +146,8 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 }
 EXPORT_SYMBOL_GPL(iptunnel_metadata_reply);
 
-struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb,
-					 int gso_type_mask)
+int iptunnel_handle_offloads(struct sk_buff *skb,
+			     int gso_type_mask)
 {
 	int err;
 
@@ -159,9 +159,9 @@ struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb,
 	if (skb_is_gso(skb)) {
 		err = skb_unclone(skb, GFP_ATOMIC);
 		if (unlikely(err))
-			goto error;
+			return err;
 		skb_shinfo(skb)->gso_type |= gso_type_mask;
-		return skb;
+		return 0;
 	}
 
 	if (skb->ip_summed != CHECKSUM_PARTIAL) {
@@ -174,10 +174,7 @@ struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb,
 		skb->encapsulation = 0;
 	}
 
-	return skb;
-error:
-	kfree_skb(skb);
-	return ERR_PTR(err);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(iptunnel_handle_offloads);
 
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index ec51d02166de..92827483ee3d 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -219,9 +219,8 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (unlikely(skb->protocol != htons(ETH_P_IP)))
 		goto tx_error;
 
-	skb = iptunnel_handle_offloads(skb, SKB_GSO_IPIP);
-	if (IS_ERR(skb))
-		goto out;
+	if (iptunnel_handle_offloads(skb, SKB_GSO_IPIP))
+		goto tx_error;
 
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
@@ -230,7 +229,7 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 
 tx_error:
 	kfree_skb(skb);
-out:
+
 	dev->stats.tx_errors++;
 	return NETDEV_TX_OK;
 }
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 83384308d032..a13d8c114ccb 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -913,10 +913,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		goto tx_error;
 	}
 
-	skb = iptunnel_handle_offloads(skb, SKB_GSO_SIT);
-	if (IS_ERR(skb)) {
+	if (iptunnel_handle_offloads(skb, SKB_GSO_SIT)) {
 		ip_rt_put(rt);
-		goto out;
+		goto tx_error;
 	}
 
 	if (df) {
@@ -992,7 +991,6 @@ tx_error_icmp:
 	dst_link_failure(skb);
 tx_error:
 	kfree_skb(skb);
-out:
 	dev->stats.tx_errors++;
 	return NETDEV_TX_OK;
 }
@@ -1002,15 +1000,15 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	const struct iphdr  *tiph = &tunnel->parms.iph;
 
-	skb = iptunnel_handle_offloads(skb, SKB_GSO_IPIP);
-	if (IS_ERR(skb))
-		goto out;
+	if (iptunnel_handle_offloads(skb, SKB_GSO_IPIP))
+		goto tx_error;
 
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
 	ip_tunnel_xmit(skb, dev, tiph, IPPROTO_IPIP);
 	return NETDEV_TX_OK;
-out:
+tx_error:
+	kfree_skb(skb);
 	dev->stats.tx_errors++;
 	return NETDEV_TX_OK;
 }
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index dc196a0f501d..6d19d2eeaa60 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -1013,8 +1013,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (IS_ERR(skb))
 		goto tx_error;
 
-	skb = iptunnel_handle_offloads(skb, __tun_gso_type_mask(AF_INET, cp->af));
-	if (IS_ERR(skb))
+	if (iptunnel_handle_offloads(skb, __tun_gso_type_mask(AF_INET, cp->af)))
 		goto tx_error;
 
 	skb->transport_header = skb->network_header;
@@ -1105,8 +1104,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	if (IS_ERR(skb))
 		goto tx_error;
 
-	skb = iptunnel_handle_offloads(skb, __tun_gso_type_mask(AF_INET6, cp->af));
-	if (IS_ERR(skb))
+	if (iptunnel_handle_offloads(skb, __tun_gso_type_mask(AF_INET6, cp->af)))
 		goto tx_error;
 
 	skb->transport_header = skb->network_header;

^ permalink raw reply related

* [net-next PATCH 2/5] ip6gretap: Fix MTU to allow for Ethernet header
From: Alexander Duyck @ 2016-04-14 19:33 UTC (permalink / raw)
  To: jesse, netdev, davem, alexander.duyck, tom
In-Reply-To: <20160414192709.12934.82858.stgit@ahduyck-xeon-server>

When we were creating an ip6gretap interface the MTU was about 6 bytes
short of what was needed.  It turns out we were not taking the Ethernet
header into account and as a result we were eating into the 8 bytes
reserved for the encap limit.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 net/ipv6/ip6_gre.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 4e636e60a360..2be66e7b4a78 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -987,6 +987,8 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu)
 				dev->mtu = rt->dst.dev->mtu - addend;
 				if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 					dev->mtu -= 8;
+				if (dev->type == ARPHRD_ETHER)
+					dev->mtu -= ETH_HLEN;
 
 				if (dev->mtu < IPV6_MIN_MTU)
 					dev->mtu = IPV6_MIN_MTU;

^ permalink raw reply related

* [net-next PATCH 3/5] ip6gre: Add support for basic offloads offloads excluding GSO
From: Alexander Duyck @ 2016-04-14 19:33 UTC (permalink / raw)
  To: jesse, netdev, davem, alexander.duyck, tom
In-Reply-To: <20160414192709.12934.82858.stgit@ahduyck-xeon-server>

This patch adds support for the basic offloads we support on most devices.
Specifically with this patch set we can support checksum offload, basic
scatter-gather, and highdma.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 net/ipv6/ip6_gre.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 2be66e7b4a78..1a5ad143be40 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -598,6 +598,18 @@ static void init_tel_txopt(struct ipv6_tel_txoption *opt, __u8 encap_limit)
 	opt->ops.opt_nflen = 8;
 }
 
+static __sum16 gre6_checksum(struct sk_buff *skb)
+{
+	__wsum csum;
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL)
+		csum = lco_csum(skb);
+	else
+		csum = skb_checksum(skb, sizeof(struct ipv6hdr),
+				    skb->len - sizeof(struct ipv6hdr), 0);
+	return csum_fold(csum);
+}
+
 static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
 			 struct net_device *dev,
 			 __u8 dsfield,
@@ -750,8 +762,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
 		}
 		if (tunnel->parms.o_flags&GRE_CSUM) {
 			*ptr = 0;
-			*(__sum16 *)ptr = ip_compute_csum((void *)(ipv6h+1),
-				skb->len - sizeof(struct ipv6hdr));
+			*(__sum16 *)ptr = gre6_checksum(skb);
 		}
 	}
 
@@ -1507,6 +1518,11 @@ static const struct net_device_ops ip6gre_tap_netdev_ops = {
 	.ndo_get_iflink = ip6_tnl_get_iflink,
 };
 
+#define GRE6_FEATURES (NETIF_F_SG |		\
+		       NETIF_F_FRAGLIST |	\
+		       NETIF_F_HIGHDMA |		\
+		       NETIF_F_HW_CSUM)
+
 static void ip6gre_tap_setup(struct net_device *dev)
 {
 
@@ -1540,6 +1556,9 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
 	nt->net = dev_net(dev);
 	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
 
+	dev->features		|= GRE6_FEATURES;
+	dev->hw_features	|= GRE6_FEATURES;
+
 	/* Can use a lockless transmit, unless we generate output sequences */
 	if (!(nt->parms.o_flags & GRE_SEQ))
 		dev->features |= NETIF_F_LLTX;

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox