Netdev List
 help / color / mirror / Atom feed
* M_CAN message RAM initialization AppNote  - was: Re: [PATCH V3 3/3] can: m_can: workaround for transmit data less than 4 bytes
From: Oliver Hartkopp @ 2014-11-05 18:15 UTC (permalink / raw)
  To: Marc Kleine-Budde, Dong Aisheng, linux-can
  Cc: wg, varkabhadram, netdev, linux-arm-kernel
In-Reply-To: <545A3451.2090302@pengutronix.de>

Hi all,

just to close this application note relevant point ...

I got an answer from Florian Hartwich (Mr. CAN) from Bosch regarding the bit 
error detection found by Dong Aisheng.

The relevant interrupts IR.BEU or IR.BEC monitor the message RAM:

Bit 21 BEU: Bit Error Uncorrected
Message RAM bit error detected, uncorrected. Controlled by input signal 
m_can_aeim_berr[1] generated by an optional external parity / ECC logic 
attached to the Message RAM. An uncorrected Message RAM bit error sets 
CCCR.INIT to ‘1’. This is done to avoid transmission of corrupted data.

0= No bit error detected when reading from Message RAM
1= Bit error detected, uncorrected (e.g. parity logic)

Bit 20 BEC: Bit Error Corrected
Message RAM bit error detected and corrected. Controlled by input signal 
m_can_aeim_berr[0] generated by an optional external parity / ECC logic 
attached to the Message RAM.

0= No bit error detected when reading from Message RAM
1= Bit error detected and corrected (e.g. ECC)

---

The Message RAM is usually equipped with a parity or ECC functionality.
But RAM cells suffer a hardware reset and can therefore hold arbitrary content 
at startup - including parity and/or ECC bits.

So when you write only the CAN ID and the first four bytes the last four bytes 
remain untouched. Then the M_CAN starts to read in 32bit words from the start 
of the Tx Message element. So it is very likely to trigger the message RAM 
error when reading the uninitialized 32bit word from the last four bytes.

Finally it turns out that an initial writing (with any kind of data) to the 
entire message RAM is mandatory to create valid parity/ECC checksums.

That's it.

Regards,
Oliver


^ permalink raw reply

* Re: Can ndo_select_queue save data in skb->cb?
From: Eric Dumazet @ 2014-11-05 18:16 UTC (permalink / raw)
  To: James Yonan; +Cc: netdev
In-Reply-To: <545A64EC.309@openvpn.net>

On Wed, 2014-11-05 at 10:57 -0700, James Yonan wrote:
> Is it permissible for a net driver's ndo_select_queue method to save 
> data in skb->cb for later use in ndo_start_xmit?
> 
> Also, is it necessary for users of skb->cb to zero out their private 
> data after use to prevent it from being misinterpreted by other layers? 
>   I noticed some commits in the log (such as 462fb2) are zeroing out the 
> skb->cb area for this reason.

Its ok to use skb->cb[] from ndo_select_queue()

Look at bond_select_queue() for such a case.

You do not need to cleanup skb->cb[] to zero before giving skb to the
driver.

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: reclaim TX skbs only when released by the HW
From: Karl Beldan @ 2014-11-05 18:31 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: David Miller, Ian Campbell, Karl Beldan, netdev, Eric Dumazet,
	Sebastian Hesselbarth
In-Reply-To: <20141105150521.GA25354@magnum.frso.rivierawaves.com>

On Wed, Nov 05, 2014 at 04:05:21PM +0100, Karl Beldan wrote:
> On Wed, Nov 05, 2014 at 11:46:16AM -0300, Ezequiel Garcia wrote:
> > Hi Karl,
> > 
> > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com>
> > > 
> > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released
> > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each
> > > hw desc embedding the last part of a segment has TX_LAST_DESC set,
> > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance,
> > > which causes data corruption.
> > > 
> > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and
> > > warn when trying to dequeue from an empty txq (which can be symptomatic
> > > of releasing skbs prematurely).
> > > 
> > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO')
> > 
> > Although your change makes sense, this isn't fixing the issue for me,
> > neither did the previous one.
> > 
> This change fixes a serious issue.
> On my side I can now trigger misc NFS and md5sums errors very easily,
> which I haven't detected so far with it applied.
> Are you running little endian ? Do you have the tso alignment fix
> a63ba13e (I don't expect it to be required but I don't know what SoC you
> are using) ? I suppose you are running with all 3 fixes applied.
>  
Also, I haven't checked SMP issues and I only have one core, if you are
using SMP it might be worth looking into that, maybe try running on one
core only (I only have an MV78200).
 
Karl

^ permalink raw reply

* Re: Can ndo_select_queue save data in skb->cb?
From: Cong Wang @ 2014-11-05 18:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: James Yonan, netdev
In-Reply-To: <1415211391.13896.10.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, Nov 5, 2014 at 10:16 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-11-05 at 10:57 -0700, James Yonan wrote:
>> Is it permissible for a net driver's ndo_select_queue method to save
>> data in skb->cb for later use in ndo_start_xmit?
>>
>> Also, is it necessary for users of skb->cb to zero out their private
>> data after use to prevent it from being misinterpreted by other layers?
>>   I noticed some commits in the log (such as 462fb2) are zeroing out the
>> skb->cb area for this reason.
>
> Its ok to use skb->cb[] from ndo_select_queue()
>
> Look at bond_select_queue() for such a case.
>
> You do not need to cleanup skb->cb[] to zero before giving skb to the
> driver.
>

That is only because qdisc layer saves data for bond,
which means if you have more data to save, you have to put
more into qdisc CB, it is just ugly.

^ permalink raw reply

* guten Tag,
From: Lubben Hendrik @ 2014-11-05 17:39 UTC (permalink / raw)
  To: netdev

guten Tag,

Mein Name ist Lubben Hendrik und ich arbeite mit der Finanz- Haus hier in der Niederlande. 
Ich fand Ihre Adresse durch meinen Ländern internationale Web Verzeichnis. Bei unserem letzten Treffen und Prüfung der Bankkonten
hier in den Niederlanden gefunden meine Abteilung ein ruhendes Konto mit einem enorme Summe von US $6.500.000,00
( sechs Millionen fünfhunderttausend USDollar ), die von verstorbenen Herrn Williams aus England abgelagert wurde .

Vor seinem Tod übernahm er die Summe von US $6.500.000,00 ( sechs Millionen fünfhunderttausend USDollar)an eine Bank hier in den Niederlanden. von unserem
Untersuchung hatte er keinen Empfänger oder nächsten Angehörigen , diese Mittel zu erreichen.
Aufgrund unserer finanziellen Hausordnung nur ein Ausländer kann als Stand-nächsten Verwandten oder nächsten Angehörigen . Der Antrag eines
Ausländers als nächsten Angehörigen ist die Basis auf der Tatsache, dass der Einleger ein Ausländer und jemand
in der war Niederlande kann nicht als die nächsten Angehörigen zu stehen.


Ich brauche Ihre Erlaubnis als nächster Verwandter oder nächsten Angehörigen unserer Verstorbenen
Kunden , so dass die Mittel freigegeben werden kann und Transfer zu Ihrem Konto , am Ende der Transaktion 40% wird für Sie sein und 60 % werden
für mich und meine Kollegen.

Wir brauchen ein ausländisches Konto . Ich immer noch auf die finanzielle Haus zu arbeiten , und das ist
der eigentliche Grund , dass ich eine zweite Partei oder Person brauchen ,zu stehen und zu arbeiten
mit mir und gelten für die Bank hier in den Niederlanden , wie die nächsten Angehörigen.
Ich habe in meinem Besitz alle notwendigen Unterlagen zu haben, diese Transaktion erfolgreich durchgeführt .


Weitere Informationen werden nach dem Eingang Ihrer Aufforderung zur Verfügung gestellt werden
Reaktion und ich möchte , dass Sie wissen , dass es kein Risiko. ich werde müssen uns zusammen zu arbeiten , 
wenn Sie interessiert sind , und ich versichere Ihnen, dass ich werden alle nützlichen Informationen und Dokumentation wie diese
Unternehmen zu schaffen braucht dringend Aufmerksamkeit,da es keine viel Zeit zu verlieren.
Bitte schreiben Sie mir direkt mit Ihrem Namen , Adresse , Telefon-und Faxnummer
auf diese E-Mail(lubbenhendrik1@aol.com)so kann ich erklären die Verfahren .

Grüße

Lubben Hendrik

^ permalink raw reply

* [PATCH] net: dsa: slave: Fix autoneg for phys on switch MDIO bus
From: Andrew Lunn @ 2014-11-05 18:47 UTC (permalink / raw)
  To: davem, f.fainelli; +Cc: netdev, Andrew Lunn

When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---

Hi Florian

Is this the right fix?

What i found is that ports on mv88E6171 we coming up as 10/half.  If i
forced a renegotiation with ethtool -r lan0, the phy would then goto
1000/full.

The code for phys on internal switch MDIO buses never connect the phy
to the device, only phys listed in DT get connected via
of_phy_connect(). By connecting the phy, the state machine is active
and does an autoneg when the slave interface is opened.

net/dsa/slave.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6d1817449c36..ab03e00ffe8f 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -489,11 +489,14 @@ static void dsa_slave_phy_setup(struct dsa_slave_priv *p,
 	/* We could not connect to a designated PHY, so use the switch internal
 	 * MDIO bus instead
 	 */
-	if (!p->phy)
+	if (!p->phy) {
 		p->phy = ds->slave_mii_bus->phy_map[p->port];
-	else
+		phy_connect_direct(slave_dev, p->phy, dsa_slave_adjust_link,
+				   p->phy_interface);
+	} else {
 		pr_info("attached PHY at address %d [%s]\n",
 			p->phy->addr, p->phy->drv->name);
+	}
 }
 
 int dsa_slave_suspend(struct net_device *slave_dev)
-- 
2.1.3

^ permalink raw reply related

* Re: getting rid of ->splice_write?
From: Al Viro @ 2014-11-05 18:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Miklos Szeredi, linux-fsdevel, netdev
In-Reply-To: <20140922173053.GA13109@infradead.org>

On Mon, Sep 22, 2014 at 10:30:53AM -0700, Christoph Hellwig wrote:
> Currently only /dev/null, fusedev and the socket code have a
> splice_write implementation that isn't iter_file_splice_write, and
> it seems like these three could easily be switched over if they
> implemented a ->write_iter.

Not really.  A minor nitpick is that you've missed port_fops_splice_write(),
but the real bitch isn't that and not even the sockets (see the fun with
iov_iter sendmsg/recvmsg work getting resurrected).  It's that NULL
->splice_write() means default_file_splice_write.  IOW, you'd need either
->write_iter() for _everything_ (with support of bvec-backed ones included,
since that's what iter_file_splice_write() will feed to ->write_iter()),
or you need to have do_splice_from() check if ->write_iter is NULL and
go for default_file_splice_write() instead of iter_file_splice_write().

The latter might be doable, but the former is really over the top - for
that we'd need to touch every driver instance of ->write() out there.
You want to do that, it's your funeral...

> Similarly it seems to be like we could kill ->splice_read by
> implementing an equivalent iteration over ->read_iter.

Hard to do.  I agree that we want to, but it'll take quite a bit of work
on iov_iter primitives, I'm afraid.  At the very least, we want a variant
of iov_iter that could steal pages.  Don't forget that a large part of
the rationale behind splice_read was the ability to play zero-copy games.

I'm not sure if it will happen this cycle; there's more than enough fun
on the net/* side.  _If_ that ends up done faster than I expect it to be,
->splice_read() is the obvious next target.

^ permalink raw reply

* Re: Can ndo_select_queue save data in skb->cb?
From: Eric Dumazet @ 2014-11-05 18:53 UTC (permalink / raw)
  To: Cong Wang; +Cc: James Yonan, netdev
In-Reply-To: <CAHA+R7M2k8_9C8=kdAJT1nSFDhKfV2HwvTo0nX-tVDJQvXjy+w@mail.gmail.com>

On Wed, 2014-11-05 at 10:38 -0800, Cong Wang wrote:
> That is only because qdisc layer saves data for bond,
> which means if you have more data to save, you have to put
> more into qdisc CB, it is just ugly.

Right, using skb->cb[] is ugly, dangerous, and all this.

If you can, avoiding it is safer.

Note that I pointed to the bonding case for a legal case,
I had no time to do a full explanation, feel free to do so ;)

^ permalink raw reply

* Payment
From: Finance Department @ 2014-11-05 18:49 UTC (permalink / raw)


Dear Recipient,

You have been awarded the sum of  8,000,000.00 (Eight Million Pounds sterling) with reference number 77100146 by office of the ministry of finance UK.Send us your personal details to deliver your funds.

Gloria Peter

^ permalink raw reply

* [PATCH 0/3] Add support for switch on Armanda 370 RD
From: Andrew Lunn @ 2014-11-05 19:01 UTC (permalink / raw)
  To: davem, Jason Cooper
  Cc: netdev, linux ARM, Thomas Petazzoni, tawfik, maxime.ripard,
	Andrew Lunn

This patchset adds support for the switch chip found on the Marvell
370 RD board. The first patch extends the current mv88e6171 driver to
also support the mv88e6172. The second patch described the switch in
the device tree of the board, and the last patch enables require build
options in mvebu_v7_defconfig.

Davem should take patch #1
Jason Cooper should take patch #2 and #3.

Andrew Lunn (3):
  dsa: mv88e6171: Add support for mv88e6172
  mvebu: 370 RD: Add support for the switch
  mvebu: defconfig: Enable the mv88E6171 switch driver

 arch/arm/boot/dts/armada-370-rd.dts  | 49 ++++++++++++++++++++++++++++++++----
 arch/arm/boot/dts/armada-370-xp.dtsi |  2 +-
 arch/arm/configs/mvebu_v7_defconfig  |  2 ++
 drivers/net/dsa/Kconfig              |  6 ++---
 drivers/net/dsa/mv88e6171.c          |  5 +++-
 5 files changed, 54 insertions(+), 10 deletions(-)

-- 
2.1.3

^ permalink raw reply

* [PATCH 2/3] mvebu: 370 RD: Add support for the switch
From: Andrew Lunn @ 2014-11-05 19:02 UTC (permalink / raw)
  To: davem, Jason Cooper
  Cc: netdev, linux ARM, Thomas Petazzoni, tawfik, maxime.ripard,
	Andrew Lunn
In-Reply-To: <1415214121-29286-1-git-send-email-andrew@lunn.ch>

The 370 rd has a 7 port, mv88E6182 switch, connected to eth1.  Add a
fixed-link subnode to the ethernet device tree node, to force
gigabit/full duplex.  Add a dsa node, with describing the four used
ports. This requires adding an alias to the mdio node, so it can be
referenced as a phandle.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 arch/arm/boot/dts/armada-370-rd.dts  | 49 ++++++++++++++++++++++++++++++++----
 arch/arm/boot/dts/armada-370-xp.dtsi |  2 +-
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/arch/arm/boot/dts/armada-370-rd.dts b/arch/arm/boot/dts/armada-370-rd.dts
index f57a8f841498..2bb671a778db 100644
--- a/arch/arm/boot/dts/armada-370-rd.dts
+++ b/arch/arm/boot/dts/armada-370-rd.dts
@@ -85,10 +85,6 @@
 				phy0: ethernet-phy@0 {
 					reg = <0>;
 				};
-
-				phy1: ethernet-phy@1 {
-					reg = <1>;
-				};
 			};
 
 			ethernet@70000 {
@@ -100,8 +96,11 @@
 				pinctrl-0 = <&ge1_rgmii_pins>;
 				pinctrl-names = "default";
 				status = "okay";
-				phy = <&phy1>;
 				phy-mode = "rgmii-id";
+				fixed-link {
+					   speed = <1000>;
+					   full-duplex;
+				};
 			};
 
 			mvsdio@d4000 {
@@ -173,4 +172,44 @@
 			};
 		};
 	};
+
+	dsa@0 {
+		compatible = "marvell,dsa";
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		dsa,ethernet = <&eth1>;
+		dsa,mii-bus = <&mdio>;
+
+		switch@0 {
+			#address-cells = <1>;
+			#size-cells = <0>;
+			reg = <0x10 0>;	/* MDIO address 16, switch 0 in tree */
+
+			port@0 {
+				reg = <0>;
+				label = "lan0";
+			};
+
+			port@1 {
+			       reg = <1>;
+			       label = "lan1";
+			};
+
+			port@2 {
+			       reg = <2>;
+			       label = "lan2";
+			};
+
+			port@3 {
+			       reg = <3>;
+			       label = "lan3";
+			};
+
+			port@5 {
+			      reg = <5>;
+			      label = "cpu";
+			};
+		};
+	 };
  };
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 83286ec9702c..4d84ca981fe0 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -232,7 +232,7 @@
 				status = "disabled";
 			};
 
-			mdio {
+			mdio: mdio {
 				#address-cells = <1>;
 				#size-cells = <0>;
 				compatible = "marvell,orion-mdio";
-- 
2.1.3

^ permalink raw reply related

* [PATCH 1/3] dsa: mv88e6171: Add support for mv88e6172
From: Andrew Lunn @ 2014-11-05 19:01 UTC (permalink / raw)
  To: davem, Jason Cooper
  Cc: netdev, linux ARM, Thomas Petazzoni, tawfik, maxime.ripard,
	Andrew Lunn
In-Reply-To: <1415214121-29286-1-git-send-email-andrew@lunn.ch>

The mv88e6172 is very similar to the mv88e6171.  So extend the
mv88e6171 driver to support it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/Kconfig     | 6 +++---
 drivers/net/dsa/mv88e6171.c | 5 ++++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 9234d808cbb3..69bed5b2f078 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -37,13 +37,13 @@ config NET_DSA_MV88E6123_61_65
 	  ethernet switch chips.
 
 config NET_DSA_MV88E6171
-	tristate "Marvell 88E6171 ethernet switch chip support"
+	tristate "Marvell 88E6171/6172 ethernet switch chip support"
 	select NET_DSA
 	select NET_DSA_MV88E6XXX
 	select NET_DSA_TAG_EDSA
 	---help---
-	  This enables support for the Marvell 88E6171 ethernet switch
-	  chip.
+	  This enables support for the Marvell 88E6171/6172 ethernet switch
+	  chips.
 
 config NET_DSA_BCM_SF2
 	tristate "Broadcom Starfighter 2 Ethernet switch support"
diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 1020a7af67cf..d9919ce2b005 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -1,4 +1,4 @@
-/* net/dsa/mv88e6171.c - Marvell 88e6171 switch chip support
+/* net/dsa/mv88e6171.c - Marvell 88e6171/8826172 switch chip support
  * Copyright (c) 2008-2009 Marvell Semiconductor
  * Copyright (c) 2014 Claudio Leite <leitec@staticky.com>
  *
@@ -29,6 +29,8 @@ static char *mv88e6171_probe(struct device *host_dev, int sw_addr)
 	if (ret >= 0) {
 		if ((ret & 0xfff0) == 0x1710)
 			return "Marvell 88E6171";
+		if ((ret & 0xfff0) == 0x1720)
+			return "Marvell 88E6172";
 	}
 
 	return NULL;
@@ -409,3 +411,4 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
 };
 
 MODULE_ALIAS("platform:mv88e6171");
+MODULE_ALIAS("platform:mv88e6172");
-- 
2.1.3

^ permalink raw reply related

* [PATCH 3/3] mvebu: defconfig: Enable the mv88E6171 switch driver
From: Andrew Lunn @ 2014-11-05 19:02 UTC (permalink / raw)
  To: davem, Jason Cooper
  Cc: netdev, linux ARM, Thomas Petazzoni, tawfik, maxime.ripard,
	Andrew Lunn
In-Reply-To: <1415214121-29286-1-git-send-email-andrew@lunn.ch>

This switch is used by the 370-rd. Enable it and support for
fixed-link phy configuration.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 arch/arm/configs/mvebu_v7_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/mvebu_v7_defconfig b/arch/arm/configs/mvebu_v7_defconfig
index ed0a0d1be0f3..8548eee5cbeb 100644
--- a/arch/arm/configs/mvebu_v7_defconfig
+++ b/arch/arm/configs/mvebu_v7_defconfig
@@ -59,10 +59,12 @@ CONFIG_ATA=y
 CONFIG_AHCI_MVEBU=y
 CONFIG_SATA_MV=y
 CONFIG_NETDEVICES=y
+CONFIG_NET_DSA_MV88E6171=y
 CONFIG_MV643XX_ETH=y
 CONFIG_MVNETA=y
 CONFIG_MVPP2=y
 CONFIG_MARVELL_PHY=y
+CONFIG_FIXED_PHY=y
 CONFIG_MWIFIEX=y
 CONFIG_MWIFIEX_SDIO=y
 CONFIG_INPUT_EVDEV=y
-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH] net: mv643xx_eth: reclaim TX skbs only when released by the HW
From: Ian Campbell @ 2014-11-05 19:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ezequiel Garcia, Karl Beldan, David Miller, Karl Beldan, netdev,
	Sebastian Hesselbarth
In-Reply-To: <1415202075.12206.0.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, 2014-11-05 at 07:41 -0800, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 11:46 -0300, Ezequiel Garcia wrote:
> > Hi Karl,
> > 
> > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com>
> > > 
> > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released
> > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each
> > > hw desc embedding the last part of a segment has TX_LAST_DESC set,
> > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance,
> > > which causes data corruption.
> > > 
> > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and
> > > warn when trying to dequeue from an empty txq (which can be symptomatic
> > > of releasing skbs prematurely).
> > > 
> > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO')
> > 
> > Although your change makes sense, this isn't fixing the issue for me,
> > neither did the previous one.
> > 
> > Ian: Can you double check that you have corruption *without* the patch,
> > and that the patch fixes the issue?

Yes, doing md5sum on an NFS mount with 18 files in it I see 8-9
corrupted ones without any patch applied and none with Karl's previous
one from <20141104142020.GA6728@magnum.frso.rivierawaves.com> in place.

This was consistent over repeated invocations of md5sum (mounting and
unmounting around each one).

I've just confirmed this again to be sure.

The system is a QNAP TS-41x (armel, Feroceon 88FR131)

> Have you also applied my patch ?

I've only applied that one patch from Karl onto the 3.16.7 kernel which
is currently in Debian. Debian hasn't got any other mv643xx patches
applied.

I'm building now with Karl's latest patch from
<1415197979-1702-1-git-send-email-karl.beldan@gmail.com> instead.

Ian.

^ permalink raw reply

* RE: Kernel Oops in __inet_twsk_kill()
From: Charley (Hao Chuan) Chu @ 2014-11-05 19:16 UTC (permalink / raw)
  To: Cong Wang, Daniel Borkmann; +Cc: netdev
In-Reply-To: <CAHA+R7PAdVS=TBd8ZDAW3wdadjNEVjiyemrs54wNp8fdtCMvZQ@mail.gmail.com>

Thanks Daniel and Cong,

The problem has been fixed. It is introduced by a third party patch, which decreases the refcnt of timewait socket. 

Charley

-----Original Message-----
From: Cong Wang [mailto:cwang@twopensource.com] 
Sent: Wednesday, November 05, 2014 10:00 AM
To: Daniel Borkmann
Cc: Charley (Hao Chuan) Chu; netdev
Subject: Re: Kernel Oops in __inet_twsk_kill()

On Wed, Nov 5, 2014 at 8:00 AM, Daniel Borkmann <borkmann@iogearbox.net> wrote:
> [ moving to netdev ]
>
> -------- Original Message --------
> Subject: Kernel Oops in __inet_twsk_kill()
> Date: Tue, 4 Nov 2014 23:47:18 +0000
> From: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
> To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>
> We have situation on our system. It brings the network interface up and down
> every
> a few seconds. Eventually, it brings down the system - the kernel crashed
> due to BUG
> on in __inet_twsk_kill(). The debug message show following call flow.
>
> 1) time-wait socket is created by tcp_time_wait() when the socket gets into
> "TIME_WAIT" state.
>     inet_twsk_alloc()               - refcnt= 0
>     inet_twsk_hashdance()  - refcnt = 3
>     inet_twsk_schedule()      - refcnt = 4
>     inet_twsk_put()                 - refcnt = 3
> 2) tcp_v4_timewait_ack() is called when sync is received
>     inet_twsk_put()                  - refcnt= 2      <== where we thing the
> problem is
>     occasionally, second sync is received, so the inet_twsk_put is called
> twice - refcnt = 1
> 3) twdr_do_twkill_work() is called when timed out
>     call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
>     call inet_twsk_put()
>
> In a normal case, the callflow only has step 1 and step 3.  Our
> understanding is
> the time-wait socket has three references - ehash, bhash and timer death
> row. In
> step 2, none of them are touched. Can anyone here explain to us why the
> inet_twsk_put()
> is called in tcp_v4_timewait_ack()?
>

It has been there for a rather long time, but this doesn't mean it is
correct. Its caller calls inet_twsk_put() on error path, so smells wrong
to call it on non-error path. But I don't look into this.

^ permalink raw reply

* Re: Can ndo_select_queue save data in skb->cb?
From: John Fastabend @ 2014-11-05 19:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, James Yonan, netdev
In-Reply-To: <1415213589.13896.14.camel@edumazet-glaptop2.roam.corp.google.com>

On 11/05/2014 10:53 AM, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 10:38 -0800, Cong Wang wrote:
>> That is only because qdisc layer saves data for bond,
>> which means if you have more data to save, you have to put
>> more into qdisc CB, it is just ugly.
>
> Right, using skb->cb[] is ugly, dangerous, and all this.
>
> If you can, avoiding it is safer.
>
> Note that I pointed to the bonding case for a legal case,
> I had no time to do a full explanation, feel free to do so ;)
>

I would prefer to get rid of select_queue() versus expand
it. It mostly seems to be used for hacks around the stack
for FCoE/DCB/whatever. The wireless drivers and bonding/team
drivers are two examples where its not clear how to remove
it. See xen-netfront.c for an example where I'm not sure
why its needed at all.

James, why is your driver feeding itself data via cb? Maybe
we could solve the problem some other way.

Thanks,
John

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* [PATCH net v4] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Daniel Borkmann @ 2014-11-05 19:27 UTC (permalink / raw)
  To: davem
  Cc: lw1a2.jing, netdev, Eric Dumazet, Hannes Frederic Sowa,
	David L Stevens

It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():

skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
 <IRQ>
 [<ffffffff80578226>] ? skb_put+0x3a/0x3b
 [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
 [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
 [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
 [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70

mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.

However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.

The IGMP case doesn't have this issue as commit 57e1ab6eaddc
("igmp: refine skb allocations") stores the allocation size in
the cb[].

Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().

Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
---
 v3->v4:
  - Reduced noise from previous one as discussed
 v2->v3:
  - Still had a discussion w/ Hannes and improved the code a bit as
    suggested to make it more clear to read
 v1->v2:
  - Don't introduce skb_nofrag_tailroom(), but reuse skb_availroom()
    as suggested by Eric

 net/ipv4/igmp.c  | 11 +++++------
 net/ipv6/mcast.c |  9 +++++----
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index fb70e3e..bb15d0e 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -318,9 +318,7 @@ igmp_scount(struct ip_mc_list *pmc, int type, int gdeleted, int sdeleted)
 	return scount;
 }
 
-#define igmp_skb_size(skb) (*(unsigned int *)((skb)->cb))
-
-static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
+static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu)
 {
 	struct sk_buff *skb;
 	struct rtable *rt;
@@ -330,6 +328,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 	struct flowi4 fl4;
 	int hlen = LL_RESERVED_SPACE(dev);
 	int tlen = dev->needed_tailroom;
+	unsigned int size = mtu;
 
 	while (1) {
 		skb = alloc_skb(size + hlen + tlen,
@@ -341,7 +340,6 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 			return NULL;
 	}
 	skb->priority = TC_PRIO_CONTROL;
-	igmp_skb_size(skb) = size;
 
 	rt = ip_route_output_ports(net, &fl4, NULL, IGMPV3_ALL_MCR, 0,
 				   0, 0,
@@ -354,6 +352,8 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 	skb_dst_set(skb, &rt->dst);
 	skb->dev = dev;
 
+	skb->reserved_tailroom = skb_end_offset(skb) -
+				 min(mtu, skb_end_offset(skb));
 	skb_reserve(skb, hlen);
 
 	skb_reset_network_header(skb);
@@ -423,8 +423,7 @@ static struct sk_buff *add_grhead(struct sk_buff *skb, struct ip_mc_list *pmc,
 	return skb;
 }
 
-#define AVAILABLE(skb) ((skb) ? ((skb)->dev ? igmp_skb_size(skb) - (skb)->len : \
-	skb_tailroom(skb)) : 0)
+#define AVAILABLE(skb)	((skb) ? skb_availroom(skb) : 0)
 
 static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc,
 	int type, int gdeleted, int sdeleted)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 9648de2..ed2c4e4 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1550,7 +1550,7 @@ static void ip6_mc_hdr(struct sock *sk, struct sk_buff *skb,
 	hdr->daddr = *daddr;
 }
 
-static struct sk_buff *mld_newpack(struct inet6_dev *idev, int size)
+static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
 {
 	struct net_device *dev = idev->dev;
 	struct net *net = dev_net(dev);
@@ -1561,13 +1561,13 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, int size)
 	const struct in6_addr *saddr;
 	int hlen = LL_RESERVED_SPACE(dev);
 	int tlen = dev->needed_tailroom;
+	unsigned int size = mtu + hlen + tlen;
 	int err;
 	u8 ra[8] = { IPPROTO_ICMPV6, 0,
 		     IPV6_TLV_ROUTERALERT, 2, 0, 0,
 		     IPV6_TLV_PADN, 0 };
 
 	/* we assume size > sizeof(ra) here */
-	size += hlen + tlen;
 	/* limit our allocations to order-0 page */
 	size = min_t(int, size, SKB_MAX_ORDER(0, 0));
 	skb = sock_alloc_send_skb(sk, size, 1, &err);
@@ -1576,6 +1576,8 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, int size)
 		return NULL;
 
 	skb->priority = TC_PRIO_CONTROL;
+	skb->reserved_tailroom = skb_end_offset(skb) -
+				 min(mtu, skb_end_offset(skb));
 	skb_reserve(skb, hlen);
 
 	if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) {
@@ -1690,8 +1692,7 @@ static struct sk_buff *add_grhead(struct sk_buff *skb, struct ifmcaddr6 *pmc,
 	return skb;
 }
 
-#define AVAILABLE(skb) ((skb) ? ((skb)->dev ? (skb)->dev->mtu - (skb)->len : \
-	skb_tailroom(skb)) : 0)
+#define AVAILABLE(skb)	((skb) ? skb_availroom(skb) : 0)
 
 static struct sk_buff *add_grec(struct sk_buff *skb, struct ifmcaddr6 *pmc,
 	int type, int gdeleted, int sdeleted, int crsend)
-- 
1.7.11.7

^ permalink raw reply related

* Re: [linux-nics] [PATCH net 3/5] fm10k: Implement ndo_gso_check()
From: Jeff Kirsher @ 2014-11-05 19:36 UTC (permalink / raw)
  To: Joe Stringer, Matthew Vick
  Cc: linux.nics, shahed.shaikh, sathya.perla, Linux Netdev List, Vadai,
	Linux Kernel, Tom Herbert, dept-gelinuxnicdev, Or Gerlitz, Amir
In-Reply-To: <CANr6G5xBehKGVozOAM=m8CXY7Q8_kPu-s5zgwho4=npv=bQsrg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

On Wed, 2014-11-05 at 10:26 -0800, Joe Stringer wrote:
> On 5 November 2014 04:47, Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
> >
> > On Wed, 2014-11-05 at 14:44 +0200, Or Gerlitz wrote:
> > > On Wed, Nov 5, 2014 at 2:34 PM, Jeff Kirsher
> > > <jeffrey.t.kirsher@intel.com> wrote:
> > > > On Tue, 2014-11-04 at 13:56 -0800, Joe Stringer wrote:
> > > >> ndo_gso_check() was recently introduced to allow NICs to report the
> > > >> offloading support that they have on a per-skb basis. Add an
> > > >> implementation for this driver which checks for something that looks
> > > >> like VXLAN.
> > > >>
> > > >> Implementation shamelessly stolen from Tom Herbert:
> > > >> http://thread.gmane.org/gmane.linux.network/332428/focus=333111
> > > >>
> > > >> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> > > >> ---
> > > >> Should this driver report support for GSO on packets with tunnel
> > > >> headers
> > > >> up to 64B like the i40e driver does?
> > > >> ---
> > > >>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |   12 ++++++++++++
> > > >>  1 file changed, 12 insertions(+)
> > > >
> > > > Thanks Joe, I will add your patch to my queue.
> > >
> > > Hi Jeff, please see my comment on patch 0/5, we're essentially
> > > replicating the same helper four different times (fm10k, mlx4, benet,
> > > qlgc) - I don't see the point in doing so. I asked Joe to come up with
> > > one generic helper and then to pick it up by the four drivers, makes
> > > sense?
> >
> > Yeah, I just saw your reply Or.  Ok, I will await an update to Joe's
> > series, thanks!
> 
> Thanks Or/Jeff.
> 
> There is also the question in the commit message above, perhaps fm10k
> support is a bit different - wasn't sure who to ask regarding that.

Matthew Vick is the fm10k maintainer now and can answer any fm10k
questions you may have.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [stable request <= 3.11] net/mlx4_en: Fix BlueFlame race
From: Vinson Lee @ 2014-11-05 19:38 UTC (permalink / raw)
  To: stable
  Cc: cwang, ben, Amir Vadai, Or Gerlitz, Jack Morgenstein,
	Eugenia Emantayev, Matan Barak, netdev, David S. Miller
In-Reply-To: <20141103.150917.1015521239616746570.davem@davemloft.net>

On Mon, Nov 3, 2014 at 12:09 PM, David Miller <davem@davemloft.net> wrote:
> From: Cong Wang <cwang@twopensource.com>
> Date: Mon, 3 Nov 2014 09:22:18 -0800
>
>> On Sat, Nov 1, 2014 at 10:41 AM, David Miller <davem@davemloft.net> wrote:
>>>
>>> There is no documented way nor do I wish to state anything so strictly.
>>> I want maximum flexibility for such a time consuming task.
>>>
>>> I tend to go back 3 or 4 releases at most, and it really depends upon
>>> the difficulty of the backports and my own time constraints.
>>
>> You should really offload to developers, otherwise too much work for you. :)
>
> That's exactly what I am doing by having the -stable maintainers for older
> releases deal with the backports and other pains.

+ stable

Please backport upstream commit
2d4b646613d6b12175b017aca18113945af1faf3 "net/mlx4_en: Fix BlueFlame
race" to stable kernels. David is no longer supporting 3.10 and older.

Please refer to http://marc.info/?l=linux-netdev&m=141366689811844 for
previous comments in thread.

Cheers,
Vinson

^ permalink raw reply

* Re: Kernel Oops in __inet_twsk_kill()
From: David Miller @ 2014-11-05 19:39 UTC (permalink / raw)
  To: charley.chu; +Cc: cwang, borkmann, netdev
In-Reply-To: <FB8A4655DFD2B34DB16AE06DDDD6C0E231A71CE3@SJEXCHMB12.corp.ad.broadcom.com>

From: "Charley (Hao Chuan) Chu" <charley.chu@broadcom.com>
Date: Wed, 5 Nov 2014 19:16:09 +0000

> The problem has been fixed. It is introduced by a third party patch,
> which decreases the refcnt of timewait socket.

This is something that really drives me nuts, seriously?

Never report problems to us when you have such patches applied.

It is an even larger crime to not mention this at all from the
beginning, because we might start to investigate and waste our
precious time doing so.

^ permalink raw reply

* [patch net-next] sched: fix act file names in header comment
From: Jiri Pirko @ 2014-11-05 19:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, jhs

Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/sched/act_gact.c   | 2 +-
 net/sched/act_ipt.c    | 2 +-
 net/sched/act_mirred.c | 2 +-
 net/sched/act_pedit.c  | 2 +-
 net/sched/act_police.c | 2 +-
 net/sched/act_simple.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index d6bcbd9..7fffc22 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -1,5 +1,5 @@
 /*
- * net/sched/gact.c	Generic actions
+ * net/sched/act_gact.c		Generic actions
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 8a64a07..cbc8dd7 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -1,5 +1,5 @@
 /*
- * net/sched/ipt.c     iptables target interface
+ * net/sched/act_ipt.c		iptables target interface
  *
  *TODO: Add other tables. For now we only support the ipv4 table targets
  *
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index eb48306..5953517 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -1,5 +1,5 @@
 /*
- * net/sched/mirred.c	packet mirroring and redirect actions
+ * net/sched/act_mirred.c	packet mirroring and redirect actions
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 5f9bcb2..59649d5 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -1,5 +1,5 @@
 /*
- * net/sched/pedit.c	Generic packet editor
+ * net/sched/act_pedit.c	Generic packet editor
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 69791ca..9a1c42a 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -1,5 +1,5 @@
 /*
- * net/sched/police.c	Input police filter.
+ * net/sched/act_police.c	Input police filter
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 992c231..6a8d948 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -1,5 +1,5 @@
 /*
- * net/sched/simp.c	Simple example of an action
+ * net/sched/act_simple.c	Simple example of an action
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCHv2 net-next 0/3] RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros
From: David Miller @ 2014-11-05 19:54 UTC (permalink / raw)
  To: hariprasad-ut6Up61K2wZBDgjK7y7TUQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, roland-BHEL68pLQRGGvPXPguhicg,
	JBottomley-bzQdu9zFT3WakBO8gow8eQ, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, praveenm-ut6Up61K2wZBDgjK7y7TUQ,
	anish-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	kumaras-ut6Up61K2wZBDgjK7y7TUQ
In-Reply-To: <1415069457-22277-1-git-send-email-hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

From: Hariprasad Shenai <hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Date: Tue,  4 Nov 2014 08:20:54 +0530

> It's not really the "hardware" which generates these hardware constant symbolic
> macros/register defines of course, it's scripts developed by the hardware team.
> Various patches have ended up changing the style of the symbolic macros/register
> defines and some of them used the macros/register defines that matches the
> output of the script from the hardware team.

We've told you that we don't care what format your internal whatever uses
for these macros.

We have standards, tastes, and desires and reasons for naming macros
in a certain way in upstream kernel code.

I consider it flat out unacceptable to use macros with one letter
prefixes like "S_".  You simply should not do this.

Why?

Because you are polluting the global name space, that's why.

You should only define macros with very-likely-to-be-unique prefixes
otherwise you and some arch/* header file are going to define the same
thing.

This is not just theory, it has happened in the past.

I'm not applying this series, fixup your macros properly.  And to
repeat, complaints about how your internal tools generate things
will fall on deaf ears.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [GIT net-next] Open vSwitch
From: Pravin B Shelar @ 2014-11-04  6:00 UTC (permalink / raw)
  To: davem; +Cc: netdev

First two patches are related to OVS MPLS support. Rest of patches
are various refactoring and minor improvements to openvswitch.

----------------------------------------------------------------

The following changes since commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7:

  net: phy: spi_ks8995: remove sysfs bin file by registered attribute (2014-11-04 17:18:45 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch.git net_next_ovs

for you to fetch changes up to fb90c8d8d5169d4dfbe5896721367e1904638a91:

  openvswitch: Avoid NULL mask check while building mask (2014-11-04 22:20:33 -0800)

----------------------------------------------------------------
Andy Zhou (2):
      openvswitch: refactor do_output() to move NULL check out of fast path
      openvswitch: Refactor get_dp() function into multiple access APIs.

Chunhe Li (1):
      openvswitch: Drop packets when interdev is not up

Jarno Rajahalme (1):
      openvswitch: Fix the type of struct ovs_key_nd nd_target field.

Jesse Gross (1):
      openvswitch: Additional logging for -EINVAL on flow setups.

Joe Stringer (3):
      openvswitch: Remove redundant tcp_flags code.
      openvswitch: Refactor ovs_flow_cmd_fill_info().
      openvswitch: Move key_attr_size() to flow_netlink.h.

Lorand Jakab (1):
      openvswitch: Remove flow member from struct ovs_skb_cb

Pravin B Shelar (4):
      net: Remove MPLS GSO feature.
      openvswitch: Move table destroy to dp-rcu callback.
      openvswitch: Refactor action alloc and copy api.
      openvswitch: Avoid NULL mask check while building mask

Simon Horman (1):
      openvswitch: Add basic MPLS support to kernel

 include/linux/netdev_features.h      |   7 +-
 include/linux/netdevice.h            |   1 -
 include/linux/skbuff.h               |   3 -
 include/net/mpls.h                   |  39 +++++
 include/uapi/linux/openvswitch.h     |  38 ++++-
 net/core/dev.c                       |   3 +-
 net/core/ethtool.c                   |   1 -
 net/ipv4/af_inet.c                   |   1 -
 net/ipv4/tcp_offload.c               |   1 -
 net/ipv4/udp_offload.c               |   3 +-
 net/ipv6/ip6_offload.c               |   1 -
 net/ipv6/udp_offload.c               |   3 +-
 net/mpls/mpls_gso.c                  |   3 +-
 net/openvswitch/Kconfig              |   1 +
 net/openvswitch/actions.c            | 136 ++++++++++++---
 net/openvswitch/datapath.c           | 215 ++++++++++++-----------
 net/openvswitch/datapath.h           |   4 +-
 net/openvswitch/flow.c               |  30 ++++
 net/openvswitch/flow.h               |  17 +-
 net/openvswitch/flow_netlink.c       | 322 +++++++++++++++++++++++++----------
 net/openvswitch/flow_netlink.h       |   5 +-
 net/openvswitch/flow_table.c         |  11 +-
 net/openvswitch/flow_table.h         |   2 +-
 net/openvswitch/vport-internal_dev.c |   5 +
 24 files changed, 606 insertions(+), 246 deletions(-)
 create mode 100644 include/net/mpls.h

^ permalink raw reply

* [PATCH net-next 01/14] net: Remove MPLS GSO feature.
From: Pravin B Shelar @ 2014-11-04  6:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove this redundant flag.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/linux/netdev_features.h | 7 ++-----
 include/linux/netdevice.h       | 1 -
 include/linux/skbuff.h          | 3 ---
 net/core/ethtool.c              | 1 -
 net/ipv4/af_inet.c              | 1 -
 net/ipv4/tcp_offload.c          | 1 -
 net/ipv4/udp_offload.c          | 3 +--
 net/ipv6/ip6_offload.c          | 1 -
 net/ipv6/udp_offload.c          | 3 +--
 net/mpls/mpls_gso.c             | 3 +--
 10 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index dcfdecb..b7e52ba 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -47,9 +47,8 @@ enum {
 	NETIF_F_GSO_SIT_BIT,		/* ... SIT tunnel with TSO */
 	NETIF_F_GSO_UDP_TUNNEL_BIT,	/* ... UDP TUNNEL with TSO */
 	NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
-	NETIF_F_GSO_MPLS_BIT,		/* ... MPLS segmentation */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_MPLS_BIT,
+		NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CSUM_BIT,		/* SCTP checksum offload */
@@ -118,7 +117,6 @@ enum {
 #define NETIF_F_GSO_SIT		__NETIF_F(GSO_SIT)
 #define NETIF_F_GSO_UDP_TUNNEL	__NETIF_F(GSO_UDP_TUNNEL)
 #define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
-#define NETIF_F_GSO_MPLS	__NETIF_F(GSO_MPLS)
 #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
 #define NETIF_F_HW_VLAN_STAG_RX	__NETIF_F(HW_VLAN_STAG_RX)
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
@@ -181,7 +179,6 @@ enum {
 				 NETIF_F_GSO_IPIP |			\
 				 NETIF_F_GSO_SIT |			\
 				 NETIF_F_GSO_UDP_TUNNEL |		\
-				 NETIF_F_GSO_UDP_TUNNEL_CSUM |		\
-				 NETIF_F_GSO_MPLS)
+				 NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
 #endif	/* _LINUX_NETDEV_FEATURES_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5ed05bd..a64c023 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3583,7 +3583,6 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_SIT     != (NETIF_F_GSO_SIT >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
-	BUILD_BUG_ON(SKB_GSO_MPLS    != (NETIF_F_GSO_MPLS >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5ad9675..03edc07 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -370,9 +370,6 @@ enum {
 	SKB_GSO_UDP_TUNNEL = 1 << 10,
 
 	SKB_GSO_UDP_TUNNEL_CSUM = 1 << 11,
-
-	SKB_GSO_MPLS = 1 << 12,
-
 };
 
 #if BITS_PER_LONG > 32
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 06dfb29..b0f84f5 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -84,7 +84,6 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_GSO_IPIP_BIT] =	 "tx-ipip-segmentation",
 	[NETIF_F_GSO_SIT_BIT] =		 "tx-sit-segmentation",
 	[NETIF_F_GSO_UDP_TUNNEL_BIT] =	 "tx-udp_tnl-segmentation",
-	[NETIF_F_GSO_MPLS_BIT] =	 "tx-mpls-segmentation",
 
 	[NETIF_F_FCOE_CRC_BIT] =         "tx-checksum-fcoe-crc",
 	[NETIF_F_SCTP_CSUM_BIT] =        "tx-checksum-sctp",
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8b7fe5b..b7340a3 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1222,7 +1222,6 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 		       SKB_GSO_TCPV6 |
 		       SKB_GSO_UDP_TUNNEL |
 		       SKB_GSO_UDP_TUNNEL_CSUM |
-		       SKB_GSO_MPLS |
 		       0)))
 		goto out;
 
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f..7902bbe 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -94,7 +94,6 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 			       SKB_GSO_GRE_CSUM |
 			       SKB_GSO_IPIP |
 			       SKB_GSO_SIT |
-			       SKB_GSO_MPLS |
 			       SKB_GSO_UDP_TUNNEL |
 			       SKB_GSO_UDP_TUNNEL_CSUM |
 			       0) ||
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6480cea..7da2a70 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -173,8 +173,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 				      SKB_GSO_UDP_TUNNEL |
 				      SKB_GSO_UDP_TUNNEL_CSUM |
 				      SKB_GSO_IPIP |
-				      SKB_GSO_GRE | SKB_GSO_GRE_CSUM |
-				      SKB_GSO_MPLS) ||
+				      SKB_GSO_GRE | SKB_GSO_GRE_CSUM) ||
 			     !(type & (SKB_GSO_UDP))))
 			goto out;
 
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a071563..27f3ea4 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -78,7 +78,6 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 		       SKB_GSO_SIT |
 		       SKB_GSO_UDP_TUNNEL |
 		       SKB_GSO_UDP_TUNNEL_CSUM |
-		       SKB_GSO_MPLS |
 		       SKB_GSO_TCPV6 |
 		       0)))
 		goto out;
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 6b8f543..f1c4ab4 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -45,8 +45,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 				      SKB_GSO_GRE |
 				      SKB_GSO_GRE_CSUM |
 				      SKB_GSO_IPIP |
-				      SKB_GSO_SIT |
-				      SKB_GSO_MPLS) ||
+				      SKB_GSO_SIT) ||
 			     !(type & (SKB_GSO_UDP))))
 			goto out;
 
diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index e3545f2..ca27837 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -34,8 +34,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
 				  SKB_GSO_TCP_ECN |
 				  SKB_GSO_GRE |
 				  SKB_GSO_GRE_CSUM |
-				  SKB_GSO_IPIP |
-				  SKB_GSO_MPLS)))
+				  SKB_GSO_IPIP)))
 		goto out;
 
 	/* Setup inner SKB. */
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 02/14] openvswitch: Add basic MPLS support to kernel
From: Pravin B Shelar @ 2014-11-04  6:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, Simon Horman, Ravi K, Leo Alterman, Isaku Yamahata,
	Joe Stringer, Jesse Gross, Pravin B Shelar

From: Simon Horman <horms@verge.net.au>

Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/net/mpls.h               |  39 +++++++++++
 include/uapi/linux/openvswitch.h |  32 +++++++++
 net/core/dev.c                   |   3 +-
 net/openvswitch/Kconfig          |   1 +
 net/openvswitch/actions.c        | 106 ++++++++++++++++++++++++++++-
 net/openvswitch/datapath.c       |   6 +-
 net/openvswitch/flow.c           |  30 +++++++++
 net/openvswitch/flow.h           |  17 +++--
 net/openvswitch/flow_netlink.c   | 139 ++++++++++++++++++++++++++++++++++-----
 net/openvswitch/flow_netlink.h   |   2 +-
 10 files changed, 345 insertions(+), 30 deletions(-)
 create mode 100644 include/net/mpls.h

diff --git a/include/net/mpls.h b/include/net/mpls.h
new file mode 100644
index 0000000..5b3b5ad
--- /dev/null
+++ b/include/net/mpls.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (c) 2014 Nicira, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef _NET_MPLS_H
+#define _NET_MPLS_H 1
+
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+
+#define MPLS_HLEN 4
+
+static inline bool eth_p_mpls(__be16 eth_type)
+{
+	return eth_type == htons(ETH_P_MPLS_UC) ||
+		eth_type == htons(ETH_P_MPLS_MC);
+}
+
+/*
+ * For non-MPLS skbs this will correspond to the network header.
+ * For MPLS skbs it will be before the network_header as the MPLS
+ * label stack lies between the end of the mac header and the network
+ * header. That is, for MPLS skbs the end of the mac header
+ * is the top of the MPLS label stack.
+ */
+static inline unsigned char *skb_mpls_header(struct sk_buff *skb)
+{
+	return skb_mac_header(skb) + skb->mac_len;
+}
+#endif
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 435eabc..631056b 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -293,6 +293,9 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_DP_HASH,      /* u32 hash value. Value 0 indicates the hash
 				   is not computed by the datapath. */
 	OVS_KEY_ATTR_RECIRC_ID, /* u32 recirc id */
+	OVS_KEY_ATTR_MPLS,      /* array of struct ovs_key_mpls.
+				 * The implementation may restrict
+				 * the accepted length of the array. */
 
 #ifdef __KERNEL__
 	OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ovs_tunnel_info */
@@ -340,6 +343,10 @@ struct ovs_key_ethernet {
 	__u8	 eth_dst[ETH_ALEN];
 };
 
+struct ovs_key_mpls {
+	__be32 mpls_lse;
+};
+
 struct ovs_key_ipv4 {
 	__be32 ipv4_src;
 	__be32 ipv4_dst;
@@ -484,6 +491,19 @@ enum ovs_userspace_attr {
 #define OVS_USERSPACE_ATTR_MAX (__OVS_USERSPACE_ATTR_MAX - 1)
 
 /**
+ * struct ovs_action_push_mpls - %OVS_ACTION_ATTR_PUSH_MPLS action argument.
+ * @mpls_lse: MPLS label stack entry to push.
+ * @mpls_ethertype: Ethertype to set in the encapsulating ethernet frame.
+ *
+ * The only values @mpls_ethertype should ever be given are %ETH_P_MPLS_UC and
+ * %ETH_P_MPLS_MC, indicating MPLS unicast or multicast. Other are rejected.
+ */
+struct ovs_action_push_mpls {
+	__be32 mpls_lse;
+	__be16 mpls_ethertype; /* Either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC */
+};
+
+/**
  * struct ovs_action_push_vlan - %OVS_ACTION_ATTR_PUSH_VLAN action argument.
  * @vlan_tpid: Tag protocol identifier (TPID) to push.
  * @vlan_tci: Tag control identifier (TCI) to push.  The CFI bit must be set
@@ -534,6 +554,15 @@ struct ovs_action_hash {
  * @OVS_ACTION_ATTR_POP_VLAN: Pop the outermost 802.1Q header off the packet.
  * @OVS_ACTION_ATTR_SAMPLE: Probabilitically executes actions, as specified in
  * the nested %OVS_SAMPLE_ATTR_* attributes.
+ * @OVS_ACTION_ATTR_PUSH_MPLS: Push a new MPLS label stack entry onto the
+ * top of the packets MPLS label stack.  Set the ethertype of the
+ * encapsulating frame to either %ETH_P_MPLS_UC or %ETH_P_MPLS_MC to
+ * indicate the new packet contents.
+ * @OVS_ACTION_ATTR_POP_MPLS: Pop an MPLS label stack entry off of the
+ * packet's MPLS label stack.  Set the encapsulating frame's ethertype to
+ * indicate the new packet contents. This could potentially still be
+ * %ETH_P_MPLS if the resulting MPLS label stack is not empty.  If there
+ * is no MPLS label stack, as determined by ethertype, no action is taken.
  *
  * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
  * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
@@ -550,6 +579,9 @@ enum ovs_action_attr {
 	OVS_ACTION_ATTR_SAMPLE,       /* Nested OVS_SAMPLE_ATTR_*. */
 	OVS_ACTION_ATTR_RECIRC,       /* u32 recirc_id. */
 	OVS_ACTION_ATTR_HASH,	      /* struct ovs_action_hash. */
+	OVS_ACTION_ATTR_PUSH_MPLS,    /* struct ovs_action_push_mpls. */
+	OVS_ACTION_ATTR_POP_MPLS,     /* __be16 ethertype. */
+
 	__OVS_ACTION_ATTR_MAX
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 40be481..70bb609 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -118,6 +118,7 @@
 #include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <net/ip.h>
+#include <net/mpls.h>
 #include <linux/ipv6.h>
 #include <linux/in.h>
 #include <linux/jhash.h>
@@ -2530,7 +2531,7 @@ static netdev_features_t net_mpls_features(struct sk_buff *skb,
 					   netdev_features_t features,
 					   __be16 type)
 {
-	if (type == htons(ETH_P_MPLS_UC) || type == htons(ETH_P_MPLS_MC))
+	if (eth_p_mpls(type))
 		features &= skb->dev->mpls_features;
 
 	return features;
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 2a9673e..454ce12 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -30,6 +30,7 @@ config OPENVSWITCH
 
 config OPENVSWITCH_GRE
 	tristate "Open vSwitch GRE tunneling support"
+	select NET_MPLS_GSO
 	depends on INET
 	depends on OPENVSWITCH
 	depends on NET_IPGRE_DEMUX
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 922c133..930b1b6 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -28,10 +28,12 @@
 #include <linux/in6.h>
 #include <linux/if_arp.h>
 #include <linux/if_vlan.h>
+
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/checksum.h>
 #include <net/dsfield.h>
+#include <net/mpls.h>
 #include <net/sctp/checksum.h>
 
 #include "datapath.h"
@@ -118,6 +120,92 @@ static int make_writable(struct sk_buff *skb, int write_len)
 	return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
 }
 
+static int push_mpls(struct sk_buff *skb,
+		     const struct ovs_action_push_mpls *mpls)
+{
+	__be32 *new_mpls_lse;
+	struct ethhdr *hdr;
+
+	/* Networking stack do not allow simultaneous Tunnel and MPLS GSO. */
+	if (skb->encapsulation)
+		return -ENOTSUPP;
+
+	if (skb_cow_head(skb, MPLS_HLEN) < 0)
+		return -ENOMEM;
+
+	skb_push(skb, MPLS_HLEN);
+	memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+	skb_reset_mac_header(skb);
+
+	new_mpls_lse = (__be32 *)skb_mpls_header(skb);
+	*new_mpls_lse = mpls->mpls_lse;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
+							     MPLS_HLEN, 0));
+
+	hdr = eth_hdr(skb);
+	hdr->h_proto = mpls->mpls_ethertype;
+
+	skb_set_inner_protocol(skb, skb->protocol);
+	skb->protocol = mpls->mpls_ethertype;
+
+	return 0;
+}
+
+static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+{
+	struct ethhdr *hdr;
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_sub(skb->csum,
+				     csum_partial(skb_mpls_header(skb),
+						  MPLS_HLEN, 0));
+
+	memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
+		skb->mac_len);
+
+	__skb_pull(skb, MPLS_HLEN);
+	skb_reset_mac_header(skb);
+
+	/* skb_mpls_header() is used to locate the ethertype
+	 * field correctly in the presence of VLAN tags.
+	 */
+	hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
+	hdr->h_proto = ethertype;
+	if (eth_p_mpls(skb->protocol))
+		skb->protocol = ethertype;
+	return 0;
+}
+
+static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+{
+	__be32 *stack;
+	int err;
+
+	err = make_writable(skb, skb->mac_len + MPLS_HLEN);
+	if (unlikely(err))
+		return err;
+
+	stack = (__be32 *)skb_mpls_header(skb);
+	if (skb->ip_summed == CHECKSUM_COMPLETE) {
+		__be32 diff[] = { ~(*stack), *mpls_lse };
+
+		skb->csum = ~csum_partial((char *)diff, sizeof(diff),
+					  ~skb->csum);
+	}
+
+	*stack = *mpls_lse;
+
+	return 0;
+}
+
 /* remove VLAN header from packet and update csum accordingly. */
 static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 {
@@ -140,10 +228,12 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 
 	vlan_set_encap_proto(skb, vhdr);
 	skb->mac_header += VLAN_HLEN;
+
 	if (skb_network_offset(skb) < ETH_HLEN)
 		skb_set_network_header(skb, ETH_HLEN);
-	skb_reset_mac_len(skb);
 
+	/* Update mac_len for subsequent MPLS actions */
+	skb_reset_mac_len(skb);
 	return 0;
 }
 
@@ -186,6 +276,8 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
 
 		if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
 			return -ENOMEM;
+		/* Update mac_len for subsequent MPLS actions */
+		skb->mac_len += VLAN_HLEN;
 
 		if (skb->ip_summed == CHECKSUM_COMPLETE)
 			skb->csum = csum_add(skb->csum, csum_partial(skb->data
@@ -612,6 +704,10 @@ static int execute_set_action(struct sk_buff *skb,
 	case OVS_KEY_ATTR_SCTP:
 		err = set_sctp(skb, nla_data(nested_attr));
 		break;
+
+	case OVS_KEY_ATTR_MPLS:
+		err = set_mpls(skb, nla_data(nested_attr));
+		break;
 	}
 
 	return err;
@@ -690,6 +786,14 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			execute_hash(skb, key, a);
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS:
+			err = push_mpls(skb, nla_data(a));
+			break;
+
+		case OVS_ACTION_ATTR_POP_MPLS:
+			err = pop_mpls(skb, nla_get_be16(a));
+			break;
+
 		case OVS_ACTION_ATTR_PUSH_VLAN:
 			err = push_vlan(skb, nla_data(a));
 			if (unlikely(err)) /* skb already freed. */
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index f18302f..688cb9b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -560,7 +560,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_flow_free;
 
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
-				   &flow->key, 0, &acts);
+				   &flow->key, &acts);
 	if (err)
 		goto err_flow_free;
 
@@ -846,7 +846,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_kfree_flow;
 
 	error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key,
-				     0, &acts);
+				     &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 		goto err_kfree_acts;
@@ -953,7 +953,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
 		return acts;
 
 	ovs_flow_mask_key(&masked_key, key, mask);
-	error = ovs_nla_copy_actions(a, &masked_key, 0, &acts);
+	error = ovs_nla_copy_actions(a, &masked_key, &acts);
 	if (error) {
 		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
 		kfree(acts);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 2b78789..90a2101 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -32,6 +32,7 @@
 #include <linux/if_arp.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
+#include <linux/mpls.h>
 #include <linux/sctp.h>
 #include <linux/smp.h>
 #include <linux/tcp.h>
@@ -42,6 +43,7 @@
 #include <net/ip.h>
 #include <net/ip_tunnels.h>
 #include <net/ipv6.h>
+#include <net/mpls.h>
 #include <net/ndisc.h>
 
 #include "datapath.h"
@@ -480,6 +482,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 		return -ENOMEM;
 
 	skb_reset_network_header(skb);
+	skb_reset_mac_len(skb);
 	__skb_push(skb, skb->data - skb_mac_header(skb));
 
 	/* Network layer. */
@@ -584,6 +587,33 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 			memset(&key->ip, 0, sizeof(key->ip));
 			memset(&key->ipv4, 0, sizeof(key->ipv4));
 		}
+	} else if (eth_p_mpls(key->eth.type)) {
+		size_t stack_len = MPLS_HLEN;
+
+		/* In the presence of an MPLS label stack the end of the L2
+		 * header and the beginning of the L3 header differ.
+		 *
+		 * Advance network_header to the beginning of the L3
+		 * header. mac_len corresponds to the end of the L2 header.
+		 */
+		while (1) {
+			__be32 lse;
+
+			error = check_header(skb, skb->mac_len + stack_len);
+			if (unlikely(error))
+				return 0;
+
+			memcpy(&lse, skb_network_header(skb), MPLS_HLEN);
+
+			if (stack_len == MPLS_HLEN)
+				memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN);
+
+			skb_set_network_header(skb, skb->mac_len + stack_len);
+			if (lse & htonl(MPLS_LS_S_MASK))
+				break;
+
+			stack_len += MPLS_HLEN;
+		}
 	} else if (key->eth.type == htons(ETH_P_IPV6)) {
 		int nh_len;             /* IPv6 Header + Extensions */
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 7181331..4962bee 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -102,12 +102,17 @@ struct sw_flow_key {
 		__be16 tci;		/* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
 		__be16 type;		/* Ethernet frame type. */
 	} eth;
-	struct {
-		u8     proto;		/* IP protocol or lower 8 bits of ARP opcode. */
-		u8     tos;		/* IP ToS. */
-		u8     ttl;		/* IP TTL/hop limit. */
-		u8     frag;		/* One of OVS_FRAG_TYPE_*. */
-	} ip;
+	union {
+		struct {
+			__be32 top_lse;	/* top label stack entry */
+		} mpls;
+		struct {
+			u8     proto;	/* IP protocol or lower 8 bits of ARP opcode. */
+			u8     tos;	    /* IP ToS. */
+			u8     ttl;	    /* IP TTL/hop limit. */
+			u8     frag;	/* One of OVS_FRAG_TYPE_*. */
+		} ip;
+	};
 	struct {
 		__be16 src;		/* TCP/UDP/SCTP source port. */
 		__be16 dst;		/* TCP/UDP/SCTP destination port. */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 939bcb3..569309c 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -46,6 +46,7 @@
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/ndisc.h>
+#include <net/mpls.h>
 
 #include "flow_netlink.h"
 
@@ -134,7 +135,8 @@ static bool match_validate(const struct sw_flow_match *match,
 			| (1 << OVS_KEY_ATTR_ICMP)
 			| (1 << OVS_KEY_ATTR_ICMPV6)
 			| (1 << OVS_KEY_ATTR_ARP)
-			| (1 << OVS_KEY_ATTR_ND));
+			| (1 << OVS_KEY_ATTR_ND)
+			| (1 << OVS_KEY_ATTR_MPLS));
 
 	/* Always allowed mask fields. */
 	mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL)
@@ -149,6 +151,12 @@ static bool match_validate(const struct sw_flow_match *match,
 			mask_allowed |= 1 << OVS_KEY_ATTR_ARP;
 	}
 
+	if (eth_p_mpls(match->key->eth.type)) {
+		key_expected |= 1 << OVS_KEY_ATTR_MPLS;
+		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
+			mask_allowed |= 1 << OVS_KEY_ATTR_MPLS;
+	}
+
 	if (match->key->eth.type == htons(ETH_P_IP)) {
 		key_expected |= 1 << OVS_KEY_ATTR_IPV4;
 		if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
@@ -266,6 +274,7 @@ static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_RECIRC_ID] = sizeof(u32),
 	[OVS_KEY_ATTR_DP_HASH] = sizeof(u32),
 	[OVS_KEY_ATTR_TUNNEL] = -1,
+	[OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),
 };
 
 static bool is_all_zero(const u8 *fp, size_t size)
@@ -735,6 +744,16 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 		attrs &= ~(1 << OVS_KEY_ATTR_ARP);
 	}
 
+	if (attrs & (1 << OVS_KEY_ATTR_MPLS)) {
+		const struct ovs_key_mpls *mpls_key;
+
+		mpls_key = nla_data(a[OVS_KEY_ATTR_MPLS]);
+		SW_FLOW_KEY_PUT(match, mpls.top_lse,
+				mpls_key->mpls_lse, is_mask);
+
+		attrs &= ~(1 << OVS_KEY_ATTR_MPLS);
+	 }
+
 	if (attrs & (1 << OVS_KEY_ATTR_TCP)) {
 		const struct ovs_key_tcp *tcp_key;
 
@@ -1140,6 +1159,14 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
 		arp_key->arp_op = htons(output->ip.proto);
 		ether_addr_copy(arp_key->arp_sha, output->ipv4.arp.sha);
 		ether_addr_copy(arp_key->arp_tha, output->ipv4.arp.tha);
+	} else if (eth_p_mpls(swkey->eth.type)) {
+		struct ovs_key_mpls *mpls_key;
+
+		nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
+		if (!nla)
+			goto nla_put_failure;
+		mpls_key = nla_data(nla);
+		mpls_key->mpls_lse = output->mpls.top_lse;
 	}
 
 	if ((swkey->eth.type == htons(ETH_P_IP) ||
@@ -1336,9 +1363,15 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 	a->nla_len = sfa->actions_len - st_offset;
 }
 
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth, struct sw_flow_actions **sfa,
+				  __be16 eth_type, __be16 vlan_tci);
+
 static int validate_and_copy_sample(const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
-				    struct sw_flow_actions **sfa)
+				    struct sw_flow_actions **sfa,
+				    __be16 eth_type, __be16 vlan_tci)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
@@ -1375,7 +1408,8 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	if (st_acts < 0)
 		return st_acts;
 
-	err = ovs_nla_copy_actions(actions, key, depth + 1, sfa);
+	err = ovs_nla_copy_actions__(actions, key, depth + 1, sfa,
+				     eth_type, vlan_tci);
 	if (err)
 		return err;
 
@@ -1385,10 +1419,10 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 	return 0;
 }
 
-static int validate_tp_port(const struct sw_flow_key *flow_key)
+static int validate_tp_port(const struct sw_flow_key *flow_key,
+			    __be16 eth_type)
 {
-	if ((flow_key->eth.type == htons(ETH_P_IP) ||
-	     flow_key->eth.type == htons(ETH_P_IPV6)) &&
+	if ((eth_type == htons(ETH_P_IP) || eth_type == htons(ETH_P_IPV6)) &&
 	    (flow_key->tp.src || flow_key->tp.dst))
 		return 0;
 
@@ -1483,7 +1517,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
 			struct sw_flow_actions **sfa,
-			bool *set_tun)
+			bool *set_tun, __be16 eth_type)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -1508,6 +1542,9 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_TUNNEL:
+		if (eth_p_mpls(eth_type))
+			return -EINVAL;
+
 		*set_tun = true;
 		err = validate_and_copy_set_tun(a, sfa);
 		if (err)
@@ -1515,7 +1552,7 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_IPV4:
-		if (flow_key->eth.type != htons(ETH_P_IP))
+		if (eth_type != htons(ETH_P_IP))
 			return -EINVAL;
 
 		if (!flow_key->ip.proto)
@@ -1531,7 +1568,7 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_IPV6:
-		if (flow_key->eth.type != htons(ETH_P_IPV6))
+		if (eth_type != htons(ETH_P_IPV6))
 			return -EINVAL;
 
 		if (!flow_key->ip.proto)
@@ -1553,19 +1590,24 @@ static int validate_set(const struct nlattr *a,
 		if (flow_key->ip.proto != IPPROTO_TCP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_type);
 
 	case OVS_KEY_ATTR_UDP:
 		if (flow_key->ip.proto != IPPROTO_UDP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_type);
+
+	case OVS_KEY_ATTR_MPLS:
+		if (!eth_p_mpls(eth_type))
+			return -EINVAL;
+		break;
 
 	case OVS_KEY_ATTR_SCTP:
 		if (flow_key->ip.proto != IPPROTO_SCTP)
 			return -EINVAL;
 
-		return validate_tp_port(flow_key);
+		return validate_tp_port(flow_key, eth_type);
 
 	default:
 		return -EINVAL;
@@ -1609,12 +1651,13 @@ static int copy_action(const struct nlattr *from,
 	return 0;
 }
 
-int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key,
-			 int depth,
-			 struct sw_flow_actions **sfa)
+static int ovs_nla_copy_actions__(const struct nlattr *attr,
+				  const struct sw_flow_key *key,
+				  int depth, struct sw_flow_actions **sfa,
+				  __be16 eth_type, __be16 vlan_tci)
 {
 	const struct nlattr *a;
+	bool out_tnl_port = false;
 	int rem, err;
 
 	if (depth >= SAMPLE_ACTION_DEPTH)
@@ -1626,6 +1669,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 			[OVS_ACTION_ATTR_OUTPUT] = sizeof(u32),
 			[OVS_ACTION_ATTR_RECIRC] = sizeof(u32),
 			[OVS_ACTION_ATTR_USERSPACE] = (u32)-1,
+			[OVS_ACTION_ATTR_PUSH_MPLS] = sizeof(struct ovs_action_push_mpls),
+			[OVS_ACTION_ATTR_POP_MPLS] = sizeof(__be16),
 			[OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan),
 			[OVS_ACTION_ATTR_POP_VLAN] = 0,
 			[OVS_ACTION_ATTR_SET] = (u32)-1,
@@ -1655,6 +1700,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 		case OVS_ACTION_ATTR_OUTPUT:
 			if (nla_get_u32(a) >= DP_MAX_PORTS)
 				return -EINVAL;
+			out_tnl_port = false;
+
 			break;
 
 		case OVS_ACTION_ATTR_HASH: {
@@ -1671,6 +1718,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_VLAN:
+			vlan_tci = htons(0);
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
@@ -1679,19 +1727,66 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 				return -EINVAL;
 			if (!(vlan->vlan_tci & htons(VLAN_TAG_PRESENT)))
 				return -EINVAL;
+			vlan_tci = vlan->vlan_tci;
 			break;
 
 		case OVS_ACTION_ATTR_RECIRC:
 			break;
 
+		case OVS_ACTION_ATTR_PUSH_MPLS: {
+			const struct ovs_action_push_mpls *mpls = nla_data(a);
+
+			/* Networking stack do not allow simultaneous Tunnel
+			 * and MPLS GSO.
+			 */
+			if (out_tnl_port)
+				return -EINVAL;
+
+			if (!eth_p_mpls(mpls->mpls_ethertype))
+				return -EINVAL;
+			/* Prohibit push MPLS other than to a white list
+			 * for packets that have a known tag order.
+			 */
+			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			    (eth_type != htons(ETH_P_IP) &&
+			     eth_type != htons(ETH_P_IPV6) &&
+			     eth_type != htons(ETH_P_ARP) &&
+			     eth_type != htons(ETH_P_RARP) &&
+			     !eth_p_mpls(eth_type)))
+				return -EINVAL;
+			eth_type = mpls->mpls_ethertype;
+			break;
+		}
+
+		case OVS_ACTION_ATTR_POP_MPLS:
+			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			    !eth_p_mpls(eth_type))
+				return -EINVAL;
+
+			/* Disallow subsequent L2.5+ set and mpls_pop actions
+			 * as there is no check here to ensure that the new
+			 * eth_type is valid and thus set actions could
+			 * write off the end of the packet or otherwise
+			 * corrupt it.
+			 *
+			 * Support for these actions is planned using packet
+			 * recirculation.
+			 */
+			eth_type = htons(0);
+			break;
+
 		case OVS_ACTION_ATTR_SET:
-			err = validate_set(a, key, sfa, &skip_copy);
+			err = validate_set(a, key, sfa,
+					   &out_tnl_port, eth_type);
 			if (err)
 				return err;
+
+			skip_copy = out_tnl_port;
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-			err = validate_and_copy_sample(a, key, depth, sfa);
+			err = validate_and_copy_sample(a, key, depth, sfa,
+						       eth_type, vlan_tci);
 			if (err)
 				return err;
 			skip_copy = true;
@@ -1713,6 +1808,14 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
 	return 0;
 }
 
+int ovs_nla_copy_actions(const struct nlattr *attr,
+			 const struct sw_flow_key *key,
+			 struct sw_flow_actions **sfa)
+{
+	return ovs_nla_copy_actions__(attr, key, 0, sfa, key->eth.type,
+				      key->eth.tci);
+}
+
 static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
 {
 	const struct nlattr *a;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 206e45a..6355b1d 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -49,7 +49,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 		      const struct nlattr *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
-			 const struct sw_flow_key *key, int depth,
+			 const struct sw_flow_key *key,
 			 struct sw_flow_actions **sfa);
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
-- 
1.9.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox