Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: M_CAN message RAM initialization AppNote  - was: Re: [PATCH V3 3/3] can: m_can: workaround for transmit data less than 4 bytes
From: Marc Kleine-Budde @ 2014-11-06 12:47 UTC (permalink / raw)
  To: Oliver Hartkopp, Dong Aisheng
  Cc: linux-can, wg, varkabhadram, netdev, linux-arm-kernel
In-Reply-To: <545B6AB4.70003@hartkopp.net>

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On 11/06/2014 01:33 PM, Oliver Hartkopp wrote:
>> So i'm not sure memset() the entire TX FIFO element is neccesary...
> 
> It's no big deal - so we should be defensive here.
> And memset() is not working as Marc pointed out in another mail.
> 
> So we would need to loop with
> 
>     m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i), 0x0);
> 
>>
>> Do you think we could keep the current solution firstly and updated later
>> if needed?
> 
> No :-)
> 
> I would like to have all data bytes to be written at startup.

Me, too. As this happens only once during ifconfig up it should not hurt
performance, either send an incremental or new patch. I'll sort it out.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: "asix: Don't reset PHY on if_up for ASIX 88772" breaks net on arndale platform
From: Charles Keepax @ 2014-11-06 12:46 UTC (permalink / raw)
  To: Stam, Michel [FINT]
  Cc: Riku Voipio, freddy, davem, linux-usb, netdev, linux-kernel,
	linux-samsung-soc
In-Reply-To: <C89EFD3CD56F64468D3D206D683A8D22039FFE16@ldam-msx2.fugro-nl.local>

On Thu, Nov 06, 2014 at 01:39:07PM +0100, Stam, Michel [FINT] wrote:
> Hello Riku and Charles,
> 
> I tried this with my original patch and the suggested patch applied,
> this seems to work for me too.
> 
> One thing that bothers me, is the suspend / resume situation; usbnet.c
> seems to call the bind( ) on probe( ). Suspend / resume do not seem to
> call bind( ) directly.
> 
> As Riku pointed out, the original patch I reverted was because of
> suspend/resume issues. I wonder if this will still work? 

I seem to remember I had a few issues with Arndale suspend and
resume last time I tried it anyways, so not sure I will be able
to test for that. But I will have a go. But in either case I
guess a fix for that is probably orthogonal to my suggested fix
here, as it looks pretty clear we intended to fully reset the
device in bind and were appartently not doing that. So this
patch is probably a needed fix anyway. Even if there are
seperate issues relating to suspend and resume.

Thanks,
Charles

^ permalink raw reply

* RE: "asix: Don't reset PHY on if_up for ASIX 88772" breaks net on arndale platform
From: Stam, Michel [FINT] @ 2014-11-06 12:39 UTC (permalink / raw)
  To: Riku Voipio, Charles Keepax
  Cc: freddy, davem, linux-usb, netdev, linux-kernel, linux-samsung-soc
In-Reply-To: <20141106120416.GA20162@afflict.kos.to>

Hello Riku and Charles,

I tried this with my original patch and the suggested patch applied,
this seems to work for me too.

One thing that bothers me, is the suspend / resume situation; usbnet.c
seems to call the bind( ) on probe( ). Suspend / resume do not seem to
call bind( ) directly.

As Riku pointed out, the original patch I reverted was because of
suspend/resume issues. I wonder if this will still work? 

Kind regards,

Michel Stam

-----Original Message-----
From: Riku Voipio [mailto:riku.voipio@iki.fi] 
Sent: Thursday, November 06, 2014 13:04 PM
To: Charles Keepax
Cc: Riku Voipio; Stam, Michel [FINT]; freddy@asix.com.tw;
davem@davemloft.net; linux-usb@vger.kernel.org; netdev@vger.kernel.org;
linux-kernel@vger.kernel.org; linux-samsung-soc@vger.kernel.org
Subject: Re: "asix: Don't reset PHY on if_up for ASIX 88772" breaks net
on arndale platform

On Thu, Nov 06, 2014 at 10:01:04AM +0000, Charles Keepax wrote:
> On Thu, Nov 06, 2014 at 11:06:51AM +0200, Riku Voipio wrote:
> > The asix on arndale comes semi-configured from u-boot, which I guess

> > is not the state kernel expects it to come in. At least in my case 
> > where I use tftp from u-boot to load my kernel.
> > 
> > So probably the full reset is needed here to make the asix chip come

> > to a truly pristine state.
> > 
> > The commit that Michel partially reverted (by returning to use 
> > ax88772_link_reset instead of ax88772_reset), indicates that a 
> > strong reset is needed for suspend/resume as well:
 
> Ok I think I have cracked this one. I am pretty sure you are right 
> that the USB comes to us in a strange state and needs a full reset, 
> but that only needs to happen once when the driver is bound in. So 
> there is some code in ax88772_bind that appears to try to reset the 
> device but does a lot less than ax88772_reset and I think that must be

> the problem. Applying the following on top of the patch we have been 
> debating I think will make everything work for all of us:

The patch below on top of 3.18-rc3 fixes arndale network for me.

Tested-by: Riku Voipio <riku.voipio@linaro.org>

> --- a/drivers/net/usb/asix_devices.c
> +++ b/drivers/net/usb/asix_devices.c
> @@ -465,19 +465,7 @@ static int ax88772_bind(struct usbnet *dev, 
> struct usb_interface *in
>                 return ret;
>         }
> 
> -       ret = asix_sw_reset(dev, AX_SWRESET_IPPD | AX_SWRESET_PRL);
> -       if (ret < 0)
> -               return ret;
> -
> -       msleep(150);
> -
> -       ret = asix_sw_reset(dev, AX_SWRESET_CLEAR);
> -       if (ret < 0)
> -               return ret;
> -
> -       msleep(150);
> -
> -       ret = asix_sw_reset(dev, embd_phy ? AX_SWRESET_IPRL :
AX_SWRESET_PRTE);
> +       ax88772_reset(dev);
> 
> If you guys could test that and let me know how you get on I will send

> in a proper patch if it looks good.
> 
> Thanks,
> Charles

^ permalink raw reply

* Re: M_CAN message RAM initialization AppNote  - was: Re: [PATCH V3 3/3] can: m_can: workaround for transmit data less than 4 bytes
From: Oliver Hartkopp @ 2014-11-06 12:33 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: Marc Kleine-Budde, linux-can, wg, varkabhadram, netdev,
	linux-arm-kernel
In-Reply-To: <20141106080940.GA22964@shlinux1.ap.freescale.net>

On 06.11.2014 09:09, Dong Aisheng wrote:
> On Thu, Nov 06, 2014 at 08:04:17AM +0100, Oliver Hartkopp wrote:


>> To prevent the M_CAN controller detecting checksum errors when
>> reading potentially uninitialized TX message RAM content to transmit
>> CAN frames the TX message RAM has to be written with (any kind of)
>> initial data.
>>
>
> The key point of the issue is that why M_CAN will read potentially uninitialized
> TX message RAM content which should not happen?
> e.g. for our case of the issue, if we sending a no data frame or a less
> than 4 bytes frame, why m_can will read extra 4 bytes uninitialized/unset
> data which seems not reasonable?
>
>  From IP code logic, it will read full 8 bytes of data no matter how many data
> actually to be transfered which is strange.

Yes.

>
> For sending data over 4 bytes, since the Message RAM content will be filled
> with the real data to be transfered so there's no such issue.

E.g. think about the transfer of a CAN FD frame with 32 byte.
When you only fill up content until 28 byte the last four bytes still remain 
uninitialized.

Did you try this 28 byte use-case with an uninitialized TX message RAM ?

cansend can0 123##1001122334566778899AABBCCDDEEFF001122334566778899AABB

To me it looks too risky when we only initialize the first 8 byte.

>
>> ---
>>
>> Then the code should memset() the entire TX FIFO element - and not
>> only the 8 data bytes we are addressing now.
>>
>
> Our IC guy told me the issue only happened on transferring a data size
> of less than 4 bytes and my test also proved that.

'less than'?

So you might try to use 26 bytes too:

cansend can0 123##1001122334566778899AABBCCDDEEFF001122334566778899


> So i'm not sure memset() the entire TX FIFO element is neccesary...

It's no big deal - so we should be defensive here.
And memset() is not working as Marc pointed out in another mail.

So we would need to loop with

	m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i), 0x0);

>
> Do you think we could keep the current solution firstly and updated later
> if needed?

No :-)

I would like to have all data bytes to be written at startup.

Regards,
Oliver


^ permalink raw reply

* [PATCH RFC net-next 5/5] bonding: Introduce 56Gbps AD link speed
From: Xie Jianhua @ 2014-11-06 11:52 UTC (permalink / raw)
  To: netdev
  Cc: Jianhua Xie, Jianhua Xie, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, David S. Miller
In-Reply-To: <1415274772-8029-1-git-send-email-Jianhua.Xie@freescale.com>

From: Jianhua Xie <Jianhua.Xie@freescale.com>

This patch inserts 56Gbps bitmask, fixes aggregated bandwidth
calculation based on 56Gbps slave links.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/bonding/bond_3ad.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 065c4cc..32513dc 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -91,6 +91,7 @@
 #define     AD_LINK_SPEED_BITMASK_10000MBPS   0x20
 #define     AD_LINK_SPEED_BITMASK_20000MBPS   0x40
 #define     AD_LINK_SPEED_BITMASK_40000MBPS   0x80
+#define     AD_LINK_SPEED_BITMASK_56000MBPS   0x100
 
 /* compare MAC addresses */
 #define MAC_ADDRESS_EQUAL(A, B)	\
@@ -252,6 +253,7 @@ static inline int __check_agg_selection_timer(struct port *port)
  *     %AD_LINK_SPEED_BITMASK_10000MBPS
  *     %AD_LINK_SPEED_BITMASK_20000MBPS
  *     %AD_LINK_SPEED_BITMASK_40000MBPS
+ *     %AD_LINK_SPEED_BITMASK_56000MBPS
  */
 static u16 __get_link_speed(struct port *port)
 {
@@ -295,6 +297,10 @@ static u16 __get_link_speed(struct port *port)
 			speed = AD_LINK_SPEED_BITMASK_40000MBPS;
 			break;
 
+		case SPEED_56000:
+			speed = AD_LINK_SPEED_BITMASK_56000MBPS;
+			break;
+
 		default:
 			/* unknown speed value from ethtool. shouldn't happen */
 			speed = 0;
@@ -667,6 +673,9 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
 		case AD_LINK_SPEED_BITMASK_40000MBPS:
 			bandwidth = aggregator->num_of_ports * 40000;
 			break;
+		case AD_LINK_SPEED_BITMASK_56000MBPS:
+			bandwidth = aggregator->num_of_ports * 56000;
+			break;
 		default:
 			bandwidth = 0; /* to silence the compiler */
 		}
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* [PATCH RFC net-next 4/5] bonding: Introduce 40Gbps AD link speed
From: Xie Jianhua @ 2014-11-06 11:52 UTC (permalink / raw)
  To: netdev
  Cc: Jianhua Xie, Jianhua Xie, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, David S. Miller
In-Reply-To: <1415274772-8029-1-git-send-email-Jianhua.Xie@freescale.com>

From: Jianhua Xie <Jianhua.Xie@freescale.com>

This patch inserts 40Gbps bitmask, fixes aggregated bandwidth
calculation based on 40Gbps slave links.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/bonding/bond_3ad.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index a1acc0e..065c4cc 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -90,6 +90,7 @@
 #define     AD_LINK_SPEED_BITMASK_2500MBPS    0x10
 #define     AD_LINK_SPEED_BITMASK_10000MBPS   0x20
 #define     AD_LINK_SPEED_BITMASK_20000MBPS   0x40
+#define     AD_LINK_SPEED_BITMASK_40000MBPS   0x80
 
 /* compare MAC addresses */
 #define MAC_ADDRESS_EQUAL(A, B)	\
@@ -250,6 +251,7 @@ static inline int __check_agg_selection_timer(struct port *port)
  *     %AD_LINK_SPEED_BITMASK_2500MBPS,
  *     %AD_LINK_SPEED_BITMASK_10000MBPS
  *     %AD_LINK_SPEED_BITMASK_20000MBPS
+ *     %AD_LINK_SPEED_BITMASK_40000MBPS
  */
 static u16 __get_link_speed(struct port *port)
 {
@@ -289,6 +291,10 @@ static u16 __get_link_speed(struct port *port)
 			speed = AD_LINK_SPEED_BITMASK_20000MBPS;
 			break;
 
+		case SPEED_40000:
+			speed = AD_LINK_SPEED_BITMASK_40000MBPS;
+			break;
+
 		default:
 			/* unknown speed value from ethtool. shouldn't happen */
 			speed = 0;
@@ -658,6 +664,9 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
 		case AD_LINK_SPEED_BITMASK_20000MBPS:
 			bandwidth = aggregator->num_of_ports * 20000;
 			break;
+		case AD_LINK_SPEED_BITMASK_40000MBPS:
+			bandwidth = aggregator->num_of_ports * 40000;
+			break;
 		default:
 			bandwidth = 0; /* to silence the compiler */
 		}
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* [PATCH RFC net-next 1/5] bonding: Expand speed type for AD Port Key
From: Xie Jianhua @ 2014-11-06 11:52 UTC (permalink / raw)
  To: netdev
  Cc: Jianhua Xie, Jianhua Xie, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, David S. Miller
In-Reply-To: <1415274772-8029-1-git-send-email-Jianhua.Xie@freescale.com>

From: Jianhua Xie <Jianhua.Xie@freescale.com>

Port Key was determined as 16 bits according to the link speed,
duplex and user key (which is yet not supported), in which key
speed was 5 bits for 1Mbps/10Mbps/100Mbps/1Gbps/10Gbps as below:
--------------------------------------------------------------
Port key :|	User key	| Speed		|	Duplex|
--------------------------------------------------------------
16			6		1		0
This patch is expanding speed type from 5 bits to 9 bits for other
speed 2.5Gbps/20Gbps/40Gbps/56Gbps and shrinking user key from 10
bits to 6 bits. New Port Key looks like below:
--------------------------------------------------------------
Port key :|	User key	| Speed		|	Duplex|
--------------------------------------------------------------
16			10		1		0

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/bonding/bond_3ad.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 2110215f..15821f3 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -77,11 +77,11 @@
  * --------------------------------------------------------------
  * Port key :	| User key	| Speed		| Duplex	|
  * --------------------------------------------------------------
- * 16		  6		  1		  0
+ * 16		  	10		  1		  0
  */
 #define  AD_DUPLEX_KEY_BITS    0x1
-#define  AD_SPEED_KEY_BITS     0x3E
-#define  AD_USER_KEY_BITS      0xFFC0
+#define  AD_SPEED_KEY_BITS     0x3FE
+#define  AD_USER_KEY_BITS      0xFC00
 
 #define     AD_LINK_SPEED_BITMASK_1MBPS       0x1
 #define     AD_LINK_SPEED_BITMASK_10MBPS      0x2
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* [PATCH RFC net-next 2/5] bonding: Introduce 2.5Gbps AD link speed
From: Xie Jianhua @ 2014-11-06 11:52 UTC (permalink / raw)
  To: netdev
  Cc: Jianhua Xie, Jianhua Xie, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, David S. Miller
In-Reply-To: <1415274772-8029-1-git-send-email-Jianhua.Xie@freescale.com>

From: Jianhua Xie <Jianhua.Xie@freescale.com>

This patch inserts 2500Mbps bitmask, fixes aggregated bandwidth
calculation based on 2.5Gbps slave links.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/bonding/bond_3ad.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 15821f3..250232e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -87,7 +87,8 @@
 #define     AD_LINK_SPEED_BITMASK_10MBPS      0x2
 #define     AD_LINK_SPEED_BITMASK_100MBPS     0x4
 #define     AD_LINK_SPEED_BITMASK_1000MBPS    0x8
-#define     AD_LINK_SPEED_BITMASK_10000MBPS   0x10
+#define     AD_LINK_SPEED_BITMASK_2500MBPS    0x10
+#define     AD_LINK_SPEED_BITMASK_10000MBPS   0x20
 
 /* compare MAC addresses */
 #define MAC_ADDRESS_EQUAL(A, B)	\
@@ -245,6 +246,7 @@ static inline int __check_agg_selection_timer(struct port *port)
  *     %AD_LINK_SPEED_BITMASK_10MBPS,
  *     %AD_LINK_SPEED_BITMASK_100MBPS,
  *     %AD_LINK_SPEED_BITMASK_1000MBPS,
+ *     %AD_LINK_SPEED_BITMASK_2500MBPS,
  *     %AD_LINK_SPEED_BITMASK_10000MBPS
  */
 static u16 __get_link_speed(struct port *port)
@@ -273,6 +275,10 @@ static u16 __get_link_speed(struct port *port)
 			speed = AD_LINK_SPEED_BITMASK_1000MBPS;
 			break;
 
+		case SPEED_2500:
+			speed = AD_LINK_SPEED_BITMASK_2500MBPS;
+			break;
+
 		case SPEED_10000:
 			speed = AD_LINK_SPEED_BITMASK_10000MBPS;
 			break;
@@ -637,6 +643,9 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
 		case AD_LINK_SPEED_BITMASK_1000MBPS:
 			bandwidth = aggregator->num_of_ports * 1000;
 			break;
+		case AD_LINK_SPEED_BITMASK_2500MBPS:
+			bandwidth = aggregator->num_of_ports * 2500;
+			break;
 		case AD_LINK_SPEED_BITMASK_10000MBPS:
 			bandwidth = aggregator->num_of_ports * 10000;
 			break;
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* Re: "asix: Don't reset PHY on if_up for ASIX 88772" breaks net on arndale platform
From: Riku Voipio @ 2014-11-06 12:04 UTC (permalink / raw)
  To: Charles Keepax
  Cc: Riku Voipio, Stam, Michel [FINT], freddy, davem, linux-usb,
	netdev, linux-kernel, linux-samsung-soc
In-Reply-To: <20141106100104.GS23178@opensource.wolfsonmicro.com>

On Thu, Nov 06, 2014 at 10:01:04AM +0000, Charles Keepax wrote:
> On Thu, Nov 06, 2014 at 11:06:51AM +0200, Riku Voipio wrote:
> > The asix on arndale comes semi-configured from u-boot, which I guess is
> > not the state kernel expects it to come in. At least in my case where
> > I use tftp from u-boot to load my kernel.
> > 
> > So probably the full reset is needed here to make the asix chip come
> > to a truly pristine state.
> > 
> > The commit that Michel partially reverted (by returning to use
> > ax88772_link_reset instead of ax88772_reset), indicates that a strong reset
> > is needed for suspend/resume as well:
 
> Ok I think I have cracked this one. I am pretty sure you are
> right that the USB comes to us in a strange state and needs
> a full reset, but that only needs to happen once when the driver
> is bound in. So there is some code in ax88772_bind that appears
> to try to reset the device but does a lot less than ax88772_reset
> and I think that must be the problem. Applying the following on
> top of the patch we have been debating I think will make
> everything work for all of us:

The patch below on top of 3.18-rc3 fixes arndale network for me.

Tested-by: Riku Voipio <riku.voipio@linaro.org>

> --- a/drivers/net/usb/asix_devices.c
> +++ b/drivers/net/usb/asix_devices.c
> @@ -465,19 +465,7 @@ static int ax88772_bind(struct usbnet *dev,
> struct usb_interface *in
>                 return ret;
>         }
> 
> -       ret = asix_sw_reset(dev, AX_SWRESET_IPPD | AX_SWRESET_PRL);
> -       if (ret < 0)
> -               return ret;
> -
> -       msleep(150);
> -
> -       ret = asix_sw_reset(dev, AX_SWRESET_CLEAR);
> -       if (ret < 0)
> -               return ret;
> -
> -       msleep(150);
> -
> -       ret = asix_sw_reset(dev, embd_phy ? AX_SWRESET_IPRL : AX_SWRESET_PRTE);
> +       ax88772_reset(dev);
> 
> If you guys could test that and let me know how you get on I will
> send in a proper patch if it looks good.
> 
> Thanks,
> Charles

^ permalink raw reply

* [PATCH RFC net-next 3/5] bonding: Introduce 20Gbps AD link speed
From: Xie Jianhua @ 2014-11-06 11:52 UTC (permalink / raw)
  To: netdev
  Cc: Jianhua Xie, Jianhua Xie, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek, David S. Miller
In-Reply-To: <1415274772-8029-1-git-send-email-Jianhua.Xie@freescale.com>

From: Jianhua Xie <Jianhua.Xie@freescale.com>

This patch inserts 20Gbps bitmask, fixes aggregated bandwidth
calculation based on 20Gbps slave links.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
---
 drivers/net/bonding/bond_3ad.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 250232e..a1acc0e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -89,6 +89,7 @@
 #define     AD_LINK_SPEED_BITMASK_1000MBPS    0x8
 #define     AD_LINK_SPEED_BITMASK_2500MBPS    0x10
 #define     AD_LINK_SPEED_BITMASK_10000MBPS   0x20
+#define     AD_LINK_SPEED_BITMASK_20000MBPS   0x40
 
 /* compare MAC addresses */
 #define MAC_ADDRESS_EQUAL(A, B)	\
@@ -248,6 +249,7 @@ static inline int __check_agg_selection_timer(struct port *port)
  *     %AD_LINK_SPEED_BITMASK_1000MBPS,
  *     %AD_LINK_SPEED_BITMASK_2500MBPS,
  *     %AD_LINK_SPEED_BITMASK_10000MBPS
+ *     %AD_LINK_SPEED_BITMASK_20000MBPS
  */
 static u16 __get_link_speed(struct port *port)
 {
@@ -283,6 +285,10 @@ static u16 __get_link_speed(struct port *port)
 			speed = AD_LINK_SPEED_BITMASK_10000MBPS;
 			break;
 
+		case SPEED_20000:
+			speed = AD_LINK_SPEED_BITMASK_20000MBPS;
+			break;
+
 		default:
 			/* unknown speed value from ethtool. shouldn't happen */
 			speed = 0;
@@ -649,6 +655,9 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
 		case AD_LINK_SPEED_BITMASK_10000MBPS:
 			bandwidth = aggregator->num_of_ports * 10000;
 			break;
+		case AD_LINK_SPEED_BITMASK_20000MBPS:
+			bandwidth = aggregator->num_of_ports * 20000;
+			break;
 		default:
 			bandwidth = 0; /* to silence the compiler */
 		}
-- 
2.1.0.27.g96db324

^ permalink raw reply related

* [PATCH RFC net-next 0/5] *** Introduce 4 AD link speed ***
From: Xie Jianhua @ 2014-11-06 11:52 UTC (permalink / raw)
  To: netdev; +Cc: Jianhua Xie, Jianhua Xie

From: Jianhua Xie <Jianhua.Xie@freescale.com>

User Key bits in Port key of AD mode are yet not used.  This series
expands speed type bits and shrinks unused user Key bits in AD mode
Port Key, introduces 4 AD link speed: 2.5G/20G/40G/56G, and fixes
aggregated link bandwidth calculation based on new link speed.

Jianhua Xie (5):
  bonding: Expand speed type for AD Port Key
  bonding: Introduce 2.5Gbps AD link speed
  bonding: Introduce 20Gbps AD link speed
  bonding: Introduce 40Gbps AD link speed
  bonding: Introduce 56Gbps AD link speed

 drivers/net/bonding/bond_3ad.c | 44 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

-- 
2.1.0.27.g96db324

^ permalink raw reply

* RE: [PATCH] bridge: missing null bridge device check causing null pointer dereference (bugfix)
From: 박수현 @ 2014-11-06 11:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Toshiaki Makita, Stephen Hemminger, David S. Miller,
	bridge@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1415273711.13896.67.camel@edumazet-glaptop2.roam.corp.google.com>

My appologies,

I was working on kernel 3.2.30 when I hit the crash. I only looked at the up-to-date kernel for br_handle_frame function where I still found "p->state" reference.

Please disregard my patch.

Thanks,
Su-Hyun Park

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Thursday, November 06, 2014 8:35 PM
To: 박수현
Cc: Toshiaki Makita; Stephen Hemminger; David S. Miller; bridge@lists.linux-foundation.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] bridge: missing null bridge device check causing null pointer dereference (bugfix)

On Thu, 2014-11-06 at 07:58 +0000, 박수현 wrote:
> >-----Original Message-----
> >From: Toshiaki Makita [mailto:makita.toshiaki@lab.ntt.co.jp]
> >Sent: Thursday, November 06, 2014 4:07 PM
> >To: 박수현; Stephen Hemminger; David S. Miller
> >Cc: bridge@lists.linux-foundation.org; netdev@vger.kernel.org; linux- 
> >kernel@vger.kernel.org
> >Subject: Re: [PATCH] bridge: missing null bridge device check causing 
> >null pointer dereference (bugfix)
> >
> >On 2014/11/06 15:26, Su-Hyun Park wrote:
> >> the bridge device can be null if the bridge is being deleted while 
> >> processing the packet, which causes the null pointer dereference in
> >switch statement.
> >
> >How can this happen??
> >It is guarded by rcu.
> >netdev_rx_handler_unregister() ensures rx_handler_data is non NULL.
> >
> 
> The RCU protect rx_handler_data, not the bridge member port. It can be NULL according to below code.
> 

Where do you find this 'below code' ?

Are you sending a patch for an old linux kernel ?

> static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) {
> 	struct net_bridge_port *port = rcu_dereference(dev->rx_handler_data);
> 	return br_port_exists(dev) ? port : NULL; }

Actual code is :

static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) {
	return rcu_dereference(dev->rx_handler_data);
}


> 
> The crash happens at the below switch statement in br_handle_frame, where p is NULL.
> 
> 	switch (p->state)

Is your tree really including the fix we already did to fix this issue ?

(commit 716ec052d2280d511e10e90ad54a86f5b5d4dcc2 )
bridge: fix NULL pointer deref of br_port_get_rcu





^ permalink raw reply

* Re: How to make stack send broadcast ARP request when entry is STALE?
From: Sowmini Varadhan @ 2014-11-06 11:48 UTC (permalink / raw)
  To: Ulf samuelsson; +Cc: Netdev
In-Reply-To: <C4AF677B-55AC-4CE0-9BFB-EBB2E0C3BF74@emagii.com>

On Wed, Nov 5, 2014 at 1:48 AM, Ulf samuelsson <netdev@emagii.com> wrote:
> Have a problem with an HP router at a certain location, which
> is configured to only answer to broadcast ARP requests.
> That cannot be changed.
>
> The first ARP request the kernel sends out, is a broadcast request,
> which is fine, but after the reply, the kernel sends unicast requests,
> which will not get any replies.

I'm afraid that's a problem with your HP router. There is nothing
in RFC 826 that says that the ARP request MUST be broadcast,
and many OS'es  will follow the IPv6 ND model of sending
unicast ARP requests to re-verify cached arp information.
Looks like the "kernel" you refer to above is following this
state machine, which is a good and healthy thing- you dont
want to be needlessly broadcasting arp reverification requests.

thus even if you have access to the kernel source code (as with linux)
it's not recommended to broadcast the ARP reverification.

> The ARP entry will after some time enter STALE state,
> and if nothing is done it will time out, and be removed.
 :

> I think the recommended behaviour in IPv6 is to send out 3 unicasts
> and if all fails, to send out broadcasts.

Be careful about the distinction between IPv4 ARP and IPv6 ND.
The base state-machine for IPv6 is explicitly specified in RFC 4861 (with
ongoing work to optimize some of this in various ietf working groups)

The territory is less clearly defined for ARP. The base protocol comes from
RFC 726, which does not address DAD  at all- that only comes from rfc 5227.
Some OS-es (solaris, microsoft?) implelement the same state-machine
for both IPv4 and IPv6, while also support 5227.

>
> Anyone know any good literature on how the ARP + neigh state machine works
> in the kernel.
>
> I read in Herberts book about the Linux TCP/IP stack and it only discuss how to reply to
> ARP requests and not anything on how to generate ARP requests.

Hope that helps.

--Sowmini

^ permalink raw reply

* Re: [PATCH] rtlwifi: Add more checks for get_btc_status callback
From: Murilo Opsfelder Araujo @ 2014-11-06 11:40 UTC (permalink / raw)
  To: Larry Finger, Mike Galbraith
  Cc: linux-kernel, linux-wireless, netdev, Chaoming Li,
	John W. Linville, Thadeu Cascardo, troy_tan
In-Reply-To: <545A6894.7040506@lwfinger.net>

On 11/05/2014 04:12 PM, Larry Finger wrote:
> On 11/05/2014 03:16 AM, Mike Galbraith wrote:
>> On Wed, 2014-10-29 at 23:30 -0500, Larry Finger wrote:
>>> On 10/29/2014 06:28 PM, Murilo Opsfelder Araujo wrote:
>>>> This is a complement of commit 08054200117a95afc14c3d2ed3a38bf4e345bf78
>>>> "rtlwifi: Add check for get_btc_status callback".
>>>>
>>>> With this patch, next-20141029 at least does not panic with rtl8192se
>>>> device.
>>>>
>>>
>>> This patch is OK, but as noted it is not complete.
>>>
>>> I have patches to fix all the kernel panics for rtl8192se AND
>>> rtl8192ce. There
>>> are missing parts, but I would prefer submitting mine, which would
>>> conflict with
>>> this one. For that reason, NACK for this one, and please apply the
>>> set I am
>>> submitting now.
>>
>> It's all in there now, but my RTL8191SEvB is still dead.  Squabbling
>> with it isn't going all that well either.
>>
>> As soon as 38506ece rtlwifi: rtl_pci: Start modification for new drivers
>> is applied, explosions appear.  Subsequently applying...
>>
>> 08054200 rtlwifi: Add check for get_btc_status callback
>> c0386f15 rtlwifi: rtl8192ce: rtl8192de: rtl8192se: Fix handling for
>> missing get_btc_status
>> 50147969 rtlwifi: rtl8192se: Fix duplicate calls to
>> ieee80211_register_hw()
>> 30c5ccc6 rtlwifi: rtl8192se: Add missing section to read descriptor
>> setting
>> 75a916e1 rtlwifi: rtl8192se: Fix firmware loading
>>
>> ...fixes that mess up, but leaves the interface dead in the same manner
>> as if nothing has been reverted.  So it _seems_ the bustage lurks in
>> 38506ece somewhere.  Too bad it's non-dinky, and written in wifi-ese :)
> 
> Yes, I am aware that rtl8192se is failing, and now that I am back from
> vacation, I am working on the problem. If you want to use the driver
> with kernel 3.18, clone the repo at
> http://github.com/lwfinger/rtlwifi_new.git and build and install either
> the master or kernel_version branches. Both work.
> 
> I am in the process of trying to find what the crucial difference is
> between that repo and the kernel version.
> 
> Larry
> 
> 
I'm sending to everyone so others can jump in as well.

Here are the steps I've followed.

Installed and booted my kernel:

$ sudo dpkg -i linux-image-3.18.0-rc3-next-20141105-panda_3.18.0-rc3-next-20141105-panda-1_amd64.deb linux-headers-3.18.0-rc3-next-20141105-panda_3.18.0-rc3-next-20141105-panda-1_amd64.deb

Built modules from Larry's github repository.

$ cd rtlwifi_new
$ make
$ sudo make install

$ sudo modprobe -rv rtl8192se
rmmod rtl8192se
rmmod rtl_pci
rmmod rtlwifi
rmmod mac80211
rmmod cfg80211

The module does not load:

$ sudo modprobe -v rtl8192se
insmod /lib/modules/3.18.0-rc3-next-20141105-panda/kernel/net/wireless/cfg80211.ko
insmod /lib/modules/3.18.0-rc3-next-20141105-panda/kernel/net/mac80211/mac80211.ko
insmod /lib/modules/3.18.0-rc3-next-20141105-panda/kernel/drivers/net/wireless/rtlwifi/rtlwifi.ko
insmod /lib/modules/3.18.0-rc3-next-20141105-panda/kernel/drivers/net/wireless/rtlwifi/rtl_pci.ko
insmod /lib/modules/3.18.0-rc3-next-20141105-panda/extra/rtl8192se.ko
ERROR: could not insert 'rtl8192se': Invalid argument

And /var/log/messages showed:

Nov  5 22:28:01 laptop kernel: [  301.276806] rtl8192se: disagrees about version of symbol rtl_process_phyinfo
Nov  5 22:28:01 laptop kernel: [  301.276812] rtl8192se: Unknown symbol rtl_process_phyinfo (err -22)
Nov  5 22:28:01 laptop kernel: [  301.276864] rtl8192se: disagrees about version of symbol rtl_get_tcb_desc
Nov  5 22:28:01 laptop kernel: [  301.276866] rtl8192se: Unknown symbol rtl_get_tcb_desc (err -22)

-- 
Murilo

^ permalink raw reply

* Re: [PATCH] bridge: missing null bridge device check causing null pointer dereference (bugfix)
From: Eric Dumazet @ 2014-11-06 11:35 UTC (permalink / raw)
  To: 박수현
  Cc: netdev@vger.kernel.org, bridge@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Stephen Hemminger, David S. Miller
In-Reply-To: <8D1F1238A24CE743B8F3CED0F137C69E408AA087@EXMB02.ahnbang.ahnlab.com>

On Thu, 2014-11-06 at 07:58 +0000, 박수현 wrote:
> >-----Original Message-----
> >From: Toshiaki Makita [mailto:makita.toshiaki@lab.ntt.co.jp]
> >Sent: Thursday, November 06, 2014 4:07 PM
> >To: 박수현; Stephen Hemminger; David S. Miller
> >Cc: bridge@lists.linux-foundation.org; netdev@vger.kernel.org; linux-
> >kernel@vger.kernel.org
> >Subject: Re: [PATCH] bridge: missing null bridge device check causing null
> >pointer dereference (bugfix)
> >
> >On 2014/11/06 15:26, Su-Hyun Park wrote:
> >> the bridge device can be null if the bridge is being deleted while
> >> processing the packet, which causes the null pointer dereference in
> >switch statement.
> >
> >How can this happen??
> >It is guarded by rcu.
> >netdev_rx_handler_unregister() ensures rx_handler_data is non NULL.
> >
> 
> The RCU protect rx_handler_data, not the bridge member port. It can be NULL according to below code.
> 

Where do you find this 'below code' ?

Are you sending a patch for an old linux kernel ?

> static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev) {
> 	struct net_bridge_port *port = rcu_dereference(dev->rx_handler_data);
> 	return br_port_exists(dev) ? port : NULL; 
> }

Actual code is :

static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
{
	return rcu_dereference(dev->rx_handler_data);
}


> 
> The crash happens at the below switch statement in br_handle_frame, where p is NULL.
> 
> 	switch (p->state)

Is your tree really including the fix we already did to fix this issue ?

(commit 716ec052d2280d511e10e90ad54a86f5b5d4dcc2 )
bridge: fix NULL pointer deref of br_port_get_rcu

^ permalink raw reply

* [PATCH net 2/2] net/mlx5_core: Fix race on driver load
From: Eli Cohen @ 2014-11-06 10:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-rdma, ogerlitz, Eli Cohen
In-Reply-To: <1415271082-7644-1-git-send-email-eli@mellanox.com>

When events arrive at driver load, the event handler gets called even before
the spinlock and list are initialized. Fix this by moving the initialization
before EQs creation.

Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 88b2ffa0edfb..ecc6341e728a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -855,14 +855,14 @@ static int init_one(struct pci_dev *pdev,
 	dev->profile = &profile[prof_sel];
 	dev->event = mlx5_core_event;
 
+	INIT_LIST_HEAD(&priv->ctx_list);
+	spin_lock_init(&priv->ctx_lock);
 	err = mlx5_dev_init(dev, pdev);
 	if (err) {
 		dev_err(&pdev->dev, "mlx5_dev_init failed %d\n", err);
 		goto out;
 	}
 
-	INIT_LIST_HEAD(&priv->ctx_list);
-	spin_lock_init(&priv->ctx_lock);
 	err = mlx5_register_device(dev);
 	if (err) {
 		dev_err(&pdev->dev, "mlx5_register_device failed %d\n", err);
-- 
2.1.2

^ permalink raw reply related

* [PATCH net 1/2] net/mlx5_core: Fix race in create EQ
From: Eli Cohen @ 2014-11-06 10:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-rdma, ogerlitz, Eli Cohen
In-Reply-To: <1415271082-7644-1-git-send-email-eli@mellanox.com>

After the EQ is created, it can possibly generate interrupts and the interrupt
handler is referencing eq->dev. It is therefore required to set eq->dev before
calling request_irq() so if an event is generated before request_irq() returns,
we will have a valid eq->dev field.

Signed-off-by: Eli Cohen <eli@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index a278238a2db6..ad2c96a02a53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -374,15 +374,14 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 vecidx,
 	snprintf(eq->name, MLX5_MAX_EQ_NAME, "%s@pci:%s",
 		 name, pci_name(dev->pdev));
 	eq->eqn = out.eq_number;
+	eq->irqn = vecidx;
+	eq->dev = dev;
+	eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
 	err = request_irq(table->msix_arr[vecidx].vector, mlx5_msix_handler, 0,
 			  eq->name, eq);
 	if (err)
 		goto err_eq;
 
-	eq->irqn = vecidx;
-	eq->dev = dev;
-	eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
-
 	err = mlx5_debug_eq_add(dev, eq);
 	if (err)
 		goto err_irq;
-- 
2.1.2

^ permalink raw reply related

* [PATCH net 0/2] mlx5_core fixes for 3.18
From: Eli Cohen @ 2014-11-06 10:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-rdma, ogerlitz, Eli Cohen

Hi Dave,
the following two patches fix races to could lead to kernel panic in some cases.

Thanks,
Eli

Eli Cohen (2):
  net/mlx5_core: Fix race in create EQ
  net/mlx5_core: Fix race on driver load

 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 7 +++----
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++--
 2 files changed, 5 insertions(+), 6 deletions(-)

-- 
2.1.2

^ permalink raw reply

* Re: "asix: Don't reset PHY on if_up for ASIX 88772" breaks net on arndale platform
From: Charles Keepax @ 2014-11-06 10:01 UTC (permalink / raw)
  To: Riku Voipio
  Cc: Stam, Michel [FINT], freddy, davem, linux-usb, netdev,
	linux-kernel, linux-samsung-soc
In-Reply-To: <20141106090651.GA19109@afflict.kos.to>

On Thu, Nov 06, 2014 at 11:06:51AM +0200, Riku Voipio wrote:
> On Wed, Nov 05, 2014 at 03:02:58PM +0000, Charles Keepax wrote:
> > On Wed, Nov 05, 2014 at 01:04:37PM +0100, Stam, Michel [FINT] wrote:
> > > Hello Charles,
> > > 
> > > After looking around I found the reset value for the 8772 chip, which
> > > seems to be 0x1E1 (ANAR register).
> > > 
> > > This equates to (according to include/uapi/linux/mii.h)
> > > ADVERTISE_ALL | ADVERTISE_CSMA.
> > > 
> > > The register only seems to become 0 if the software reset fails.
>  
> > Odd it definitely reads back as zero on Arndale. I am guessing
> > that the root of the problem here is that for some reason Arndale
> > POR of the ethernet is pants and it needs a full software reset
> > before it will work and the patch removes the full reset
> > callback.
> 
> The asix on arndale comes semi-configured from u-boot, which I guess is
> not the state kernel expects it to come in. At least in my case where
> I use tftp from u-boot to load my kernel.
> 
> So probably the full reset is needed here to make the asix chip come
> to a truly pristine state.
> 
> The commit that Michel partially reverted (by returning to use
> ax88772_link_reset instead of ax88772_reset), indicates that a strong reset
> is needed for suspend/resume as well:

Ok I think I have cracked this one. I am pretty sure you are
right that the USB comes to us in a strange state and needs
a full reset, but that only needs to happen once when the driver
is bound in. So there is some code in ax88772_bind that appears
to try to reset the device but does a lot less than ax88772_reset
and I think that must be the problem. Applying the following on
top of the patch we have been debating I think will make
everything work for all of us:

--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -465,19 +465,7 @@ static int ax88772_bind(struct usbnet *dev,
struct usb_interface *in
                return ret;
        }

-       ret = asix_sw_reset(dev, AX_SWRESET_IPPD | AX_SWRESET_PRL);
-       if (ret < 0)
-               return ret;
-
-       msleep(150);
-
-       ret = asix_sw_reset(dev, AX_SWRESET_CLEAR);
-       if (ret < 0)
-               return ret;
-
-       msleep(150);
-
-       ret = asix_sw_reset(dev, embd_phy ? AX_SWRESET_IPRL : AX_SWRESET_PRTE);
+       ax88772_reset(dev);

If you guys could test that and let me know how you get on I will
send in a proper patch if it looks good.

Thanks,
Charles

^ permalink raw reply

* RE: ipv4: Use standard iovec primitive in raw_probe_proto_opt
From: Jon Maloy @ 2014-11-06  9:55 UTC (permalink / raw)
  To: Al Viro, Herbert Xu
  Cc: David Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bcrl@kvack.org, Masahide Nakamura,
	Hideaki YOSHIFUJI
In-Reply-To: <20141106071109.GX7996@ZenIV.linux.org.uk>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Al Viro
> Sent: November-06-14 8:11 AM
> To: Herbert Xu
> Cc: David Miller; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> bcrl@kvack.org; Masahide Nakamura; Hideaki YOSHIFUJI
> Subject: Re: ipv4: Use standard iovec primitive in raw_probe_proto_opt
> 
> On Thu, Nov 06, 2014 at 02:46:29PM +0800, Herbert Xu wrote:
> > On Thu, Nov 06, 2014 at 06:43:18AM +0000, Al Viro wrote:
> > > On Thu, Nov 06, 2014 at 01:50:23PM +0800, Herbert Xu wrote:
> > > > +	/* We only need the first two bytes. */
> > > > +	err = memcpy_fromiovecend((void *)&icmph, msg->msg_iov, 0, 2);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	fl4->fl4_icmp_type = icmph.type;
> > > > +	fl4->fl4_icmp_code = icmph.code;
> > >
> > > That's more readable, but that exposes another problem in there - we
> > > read the same piece of userland data twice, with no promise
> > > whatsoever that we'll get the same value both times...
> >
> > Sure, but you have to be root anyway to write to raw sockets.
> 
> Point, but that might very well be a pattern to watch for - there's at least one
> more instance in TIPC (also not exploitable, according to TIPC folks) and such

I don't recall this, and I can't see where it would be either. Can you please
point to where it is?

///jon

> bugs are easily repeated...
> 
> BTW, I've picked the tun and macvtap related bits from another part of old
> queue; see vfs.git#untested-macvtap - it's on top of #iov_iter-net and it's
> really completely untested.  Back then I was mostly interested in killing as
> many ->aio_write() instances as I could, so it's only the "send" side of things.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net 2/2] enic: update desc properly in rx_copybreak
From: Govindarajulu Varadarajan @ 2014-11-06  9:51 UTC (permalink / raw)
  To: ssujith, netdev, davem; +Cc: Govindarajulu Varadarajan
In-Reply-To: <1415267499-25955-1-git-send-email-_govind@gmx.com>

When we reuse the rx buffer, we need to update the desc. If not hardware sees
stale value.

In the following crash, when mtu is changed, hardware sees old rx buffer value
and crashes on skb_put.

Fix this by using enic_queue_rq_desc helper function which updates the necessary
desc.

[   64.657376] skbuff: skb_over_panic: text:ffffffffa041f55d len:9010 put:9010 head:ffff8800d3ca9fc0 data:ffff8800d3caa000 tail:0x2372 end:0x640 dev:enp0s3
[   64.659965] ------------[ cut here ]------------
[   64.661322] kernel BUG at net/core/skbuff.c:100!
[   64.662644] invalid opcode: 0000 [#1] PREEMPT SMP
[   64.664001] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 cirrus ttm drm_kms_helper drm enic psmouse microcode evdev serio_raw syscopyarea sysfillrect sysimgblt i2c_piix4 i2c_core pcspkr nfs lockd grace sunrpc fscache ext4 crc16 mbcache jbd2 sd_mod ata_generic virtio_balloon ata_piix libata uhci_hcd virtio_pci virtio_ring usbcore usb_common virtio scsi_mod
[   64.664834] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W      3.17.0-netnext-10335-g942396b-dirty #273
[   64.664834] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   64.664834] task: ffffffff81a1d580 ti: ffffffff81a00000 task.ti: ffffffff81a00000
[   64.664834] RIP: 0010:[<ffffffff81392cf1>]  [<ffffffff81392cf1>] skb_panic+0x61/0x70
[   64.664834] RSP: 0018:ffff880210603d48  EFLAGS: 00010292
[   64.664834] RAX: 000000000000008c RBX: ffff88020b0f6930 RCX: 0000000000000000
[   64.664834] RDX: 000000000000008c RSI: ffffffff8178b288 RDI: 00000000ffffffff
[   64.664834] RBP: ffff880210603d68 R08: 0000000000000001 R09: 0000000000000001
[   64.664834] R10: 00000000000005ce R11: 0000000000000001 R12: ffff88020b1f0b40
[   64.664834] R13: 000000000000a332 R14: ffff880209a1a000 R15: 0000000000000001
[   64.664834] FS:  0000000000000000(0000) GS:ffff880210600000(0000) knlGS:0000000000000000
[   64.664834] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   64.664834] CR2: 00007f6752935e48 CR3: 0000000035743000 CR4: 00000000000006f0
[   64.664834] Stack:
[   64.664834]  ffff8800d3caa000 0000000000002372 0000000000000640 ffff88020b1f0000
[   64.664834]  ffff880210603d78 ffffffff81392d54 ffff880210603e08 ffffffffa041f55d
[   64.664834]  0000000000000296 ffffffff00000000 00008e7e00008e7e ffff880200002332
[   64.664834] Call Trace:
[   64.664834]  <IRQ>
[   64.664834]
[   64.664834]  [<ffffffff81392d54>] skb_put+0x54/0x60
[   64.664834]  [<ffffffffa041f55d>] enic_rq_service.constprop.47+0x3ad/0x730 [enic]
[   64.664834]  [<ffffffffa041fa79>] enic_poll_msix_rq+0x199/0x370 [enic]
[   64.664834]  [<ffffffff813a5499>] net_rx_action+0x139/0x210
[   64.664834]  [<ffffffff81290db3>] ? __this_cpu_preempt_check+0x13/0x20
[   64.664834]  [<ffffffff8106110e>] __do_softirq+0x14e/0x280
[   64.664834]  [<ffffffff8106152e>] irq_exit+0x8e/0xb0
[   64.664834]  [<ffffffff8100fd21>] do_IRQ+0x61/0x100
[   64.664834]  [<ffffffff814a2bf2>] common_interrupt+0x72/0x72

fixes: a03bb56e67c357980dae886683733dab5583dc14 ("enic: implement rx_copybreak")
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index cd254d1..73cf165 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -940,18 +940,8 @@ static int enic_rq_alloc_buf(struct vnic_rq *rq)
 	struct vnic_rq_buf *buf = rq->to_use;
 
 	if (buf->os_buf) {
-		buf = buf->next;
-		rq->to_use = buf;
-		rq->ring.desc_avail--;
-		if ((buf->index & VNIC_RQ_RETURN_RATE) == 0) {
-			/* Adding write memory barrier prevents compiler and/or
-			 * CPU reordering, thus avoiding descriptor posting
-			 * before descriptor is initialized. Otherwise, hardware
-			 * can read stale descriptor fields.
-			 */
-			wmb();
-			iowrite32(buf->index, &rq->ctrl->posted_index);
-		}
+		enic_queue_rq_desc(rq, buf->os_buf, os_buf_index, buf->dma_addr,
+				   buf->len);
 
 		return 0;
 	}
-- 
2.1.0

^ permalink raw reply related

* [PATCH net 1/2] enic: handle error condition properly in enic_rq_indicate_buf
From: Govindarajulu Varadarajan @ 2014-11-06  9:51 UTC (permalink / raw)
  To: ssujith, netdev, davem; +Cc: Govindarajulu Varadarajan

In case of error in rx path, we free the buf->os_buf but we do not make it NULL.
In next iteration we use the skb which is already freed. This causes the
following crash.

[  886.154772] general protection fault: 0000 [#1] PREEMPT SMP
[  886.154851] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 microcode evdev cirrus ttm drm_kms_helper drm enic syscopyarea sysfillrect sysimgblt psmouse i2c_piix4 serio_raw pcspkr i2c_core nfs lockd grace sunrpc fscache ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common ata_generic ata_piix virtio_balloon libata scsi_mod uhci_hcd usbcore virtio_pci virtio_ring virtio usb_common
[  886.155199] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W      3.17.0-netnext-05668-g876bc7f #272
[  886.155263] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  886.155304] task: ffffffff81a1d580 ti: ffffffff81a00000 task.ti: ffffffff81a00000
[  886.155356] RIP: 0010:[<ffffffff81384030>]  [<ffffffff81384030>] kfree_skb_list+0x10/0x30
[  886.155418] RSP: 0018:ffff880210603d48  EFLAGS: 00010206
[  886.155456] RAX: 0000000000000020 RBX: 0000000000000000 RCX: 0000000000000000
[  886.155504] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 004500084e000017
[  886.155553] RBP: ffff880210603d50 R08: 00000000fe13d1b6 R09: 0000000000000001
[  886.155601] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880209ff2f00
[  886.155650] R13: ffff88020ac0fe40 R14: ffff880209ff2f00 R15: ffff8800da8e3a80
[  886.155699] FS:  0000000000000000(0000) GS:ffff880210600000(0000) knlGS:0000000000000000
[  886.155774] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  886.155814] CR2: 00007f0e0c925000 CR3: 0000000035e8b000 CR4: 00000000000006f0
[  886.155865] Stack:
[  886.155882]  0000000000000000 ffff880210603d78 ffffffff81383f79 ffff880209ff2f00
[  886.155942]  ffff88020b0c0b40 000000000000c000 ffff880210603d90 ffffffff81383faf
[  886.156001]  ffff880209ff2f00 ffff880210603da8 ffffffff8138406d ffff88020b1b08c0
[  886.156061] Call Trace:
[  886.156080]  <IRQ>
[  886.156095]
[  886.156112]  [<ffffffff81383f79>] skb_release_data+0xa9/0xc0
[  886.157656]  [<ffffffff81383faf>] skb_release_all+0x1f/0x30
[  886.159195]  [<ffffffff8138406d>] consume_skb+0x1d/0x40
[  886.160719]  [<ffffffff813942e5>] __dev_kfree_skb_any+0x35/0x40
[  886.162224]  [<ffffffffa02dc1d5>] enic_rq_service.constprop.47+0xe5/0x5a0 [enic]
[  886.163756]  [<ffffffffa02dc829>] enic_poll_msix_rq+0x199/0x370 [enic]
[  886.164730]  [<ffffffff81397e29>] net_rx_action+0x139/0x210
[  886.164730]  [<ffffffff8105fb2e>] __do_softirq+0x14e/0x280
[  886.164730]  [<ffffffff8105ff2e>] irq_exit+0x8e/0xb0
[  886.164730]  [<ffffffff8100fc1d>] do_IRQ+0x5d/0x100
[  886.164730]  [<ffffffff81496832>] common_interrupt+0x72/0x72

fixes: a03bb56e67c357980dae886683733dab5583dc14 ("enic: implement rx_copybreak")
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 180e53f..cd254d1 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1037,7 +1037,10 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
 				enic->rq_truncated_pkts++;
 		}
 
+		pci_unmap_single(enic->pdev, buf->dma_addr, buf->len,
+				 PCI_DMA_FROMDEVICE);
 		dev_kfree_skb_any(skb);
+		buf->os_buf = NULL;
 
 		return;
 	}
@@ -1088,7 +1091,10 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
 		/* Buffer overflow
 		 */
 
+		pci_unmap_single(enic->pdev, buf->dma_addr, buf->len,
+				 PCI_DMA_FROMDEVICE);
 		dev_kfree_skb_any(skb);
+		buf->os_buf = NULL;
 	}
 }
 
-- 
2.1.0

^ permalink raw reply related

* RE: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: Jon Maloy @ 2014-11-06  9:50 UTC (permalink / raw)
  To: Al Viro, David Miller
  Cc: herbert@gondor.apana.org.au, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bcrl@kvack.org
In-Reply-To: <20141106032533.GU7996@ZenIV.linux.org.uk>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Al Viro
> Sent: November-06-14 4:26 AM
> To: David Miller
> Cc: herbert@gondor.apana.org.au; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; bcrl@kvack.org
> Subject: Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
> 
> On Wed, Nov 05, 2014 at 04:57:19PM -0500, David Miller wrote:
> > From: Al Viro <viro@ZenIV.linux.org.uk>
> > Date: Wed, 5 Nov 2014 21:07:45 +0000
> >
> > > Ping me when you put it there, OK?  I'll rebase the rest of old
> > > stuff on top of it (similar helpers, mostly).
> >
> > I just pushed it into net-next, thanks Al.
> 
> OK, I've taken the beginning of the old queue on top of net-next; it's in
> git://git.kernel.org//pub/scm/linux/kernel/git/viro/vfs.git iov_iter-net.
> 
> From the quick look at the remaining ->msg_iov users:
> 
> 	* I'll need to add several iov_iter primitives - counterparts of
> checksum.h stuff (copy_and_csum_{from,to}_iter(), maybe some more).
> Not a big deal, I'll do that tomorrow.  That will give us a clean iov_iter-based
> counterpart of skb_copy_and_csum_datagram_iovec().
> 
> 	* a new helper: zerocopy_sg_from_iter().  I have it, actually, but I'd
> rather not step on Herbert's toes - it's too close to the areas his series will
> touch, so that's probably for when his series goes in.
> It will be needed for complete macvtap conversion...
> 
> 	* why doesn't verify_iovec() use rw_copy_check_uvector()?  The
> only real differences I see is that (a) you do allocation in callers (same as
> rw_copy_check_uvector() would've done), (b) you return EMSGSIZE in case
> of too long vector, while rw_copy_check_uvector() returns EINVAL in that
> case and (c) you don't do access_ok().  The last one is described as
> optimization, but for iov_iter primitives it's a serious PITA - for iovec-backed
> instances they are using __copy_from_user()/__copy_to_user(), etc.
> 	It certainly would be nice to have the same code doing all copying of
> iovecs from userland - readv/writev/aio/sendmsg/recvmsg/etc.  Am I
> missing something subtle semantical difference in there?  EMSGSIZE vs
> EINVAL is trivial (we can lift that check into the callers, if nothing else), but I
> could miss something more interesting...
> 
> 	* various getfrag will need to grow iov_iter-based counterparts, but
> ip_append_output() needs no changes, AFAICS.
> 
> 	* crypto stuff will be easy to convert - iov_iter_get_pages() would
> suffice for a primitive
> 
> 	* there's some really weird stuff in there.  Just what is this static int
> raw_probe_proto_opt(struct flowi4 *fl4, struct msghdr *msg) {
>         struct iovec *iov;
>         u8 __user *type = NULL;
>         u8 __user *code = NULL;
>         int probed = 0;
>         unsigned int i;
> 
>         if (!msg->msg_iov)
>                 return 0;
> 
>         for (i = 0; i < msg->msg_iovlen; i++) {
>                 iov = &msg->msg_iov[i];
>                 if (!iov)
>                         continue;
> trying to do?  "If non-NULL pointer + i somehow happened to be NULL, skip it
> and try to use the same pointer + i + 1"?  Huh?  Had been that way since the
> function first went in back in 2004 ("[IPV4] XFRM: probe icmp type/code
> when sending packets via raw socket.", according to historical tree)...
> 
> 	* rds, bluetooth and vsock are doing something odd; need to RTFS
> some more.
> 
> 	* not sure I understand what TIPC is doing - does it prohibit too short
> first segment of ->msg_iov?  

Yes, that is the purpose. It was needed in early versions of TIPC, which was
using TIPC itself, instead of netlink, as carrier of configuration commands.
This option is long gone, and we can safely remove those checks.  I'll
post a patch.

///jon

net/tipc/socket.c:dest_name_check() looks
> odd _and_ potentially racy - we read the same data twice and hope our
> checks still apply.  I asked TIPC folks about that race back in April, but it looks
> like that fell through the cracks...
> 
> Overall, so far it looks more or less feasible - other than the missing csum
> primitives, current mm/iov_iter.c should suffice.  I have _not_ seriously
> looked into sendpage yet; that might very well require some more.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 1/2] ethtool: Support for configurable RSS hash function
From: Eyal perry @ 2014-11-06  9:41 UTC (permalink / raw)
  To: Ben Hutchings, Amir Vadai
  Cc: David S. Miller, netdev, Or Gerlitz, Eyal Perry, Yevgeny Petrilin
In-Reply-To: <1415224298.3398.27.camel@decadent.org.uk>

On 11/5/2014 11:51 PM, Ben Hutchings wrote:
> On Wed, 2014-11-05 at 13:59 +0200, Amir Vadai wrote:
>> From: Eyal Perry <eyalpe@mellanox.com>
>>
>> This patch adds an RSS hash functions string-set, and two
>> ethtool-options for set/get current RSS hash function. User-kernel API is done
>> through the new hfunc mask field in the ethtool_rxfh struct. A bit set
>> in the hfunc is corresponding to an index in the string-set.
>>
>> Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>> ---
>>  include/linux/ethtool.h      | 28 ++++++++++++++++++++++++
>>  include/uapi/linux/ethtool.h |  6 ++++-
>>  net/core/ethtool.c           | 52 ++++++++++++++++++++++++++++++--------------
>>  3 files changed, 69 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
>> index c1a2d60..61003b1 100644
>> --- a/include/linux/ethtool.h
>> +++ b/include/linux/ethtool.h
>> @@ -59,6 +59,29 @@ enum ethtool_phys_id_state {
>>  	ETHTOOL_ID_OFF
>>  };
>>  
>> +enum {
>> +	RSS_HASH_TOP_BIT, /* Configurable RSS hash function - Toeplitz */
>> +	RSS_HASH_XOR_BIT, /* Configurable RSS hash function - Xor */
>> +
>> +	/*
>> +	 * Add your fresh new hash function bits above and remember to update
>> +	 * rss_hash_func_strings[] below
>> +	 */
>> +	RSS_HASH_FUNCS_COUNT
>> +};
>> +
>> +#define __RSS_HASH_BIT(bit)	((u32)1 << (bit))
>> +#define __RSS_HASH(name)	 __RSS_HASH_BIT(RSS_HASH_##name##_BIT)
>> +
>> +#define RSS_HASH_TOP		__RSS_HASH(TOP)
>> +#define RSS_HASH_XOR		__RSS_HASH(XOR)
> 
> I think #define RSS_HASH_UNKNOWN 0 might also be useful.
> 
> And I think all of these names should get an ETH_ prefix.
> 
I'll add it in V1.
>> +static const char
>> +rss_hash_func_strings[RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
>> +	[RSS_HASH_TOP_BIT] =     "toeplitz",
>> +	[RSS_HASH_XOR_BIT] =     "xor",
>> +};
> 
> This belongs in net/core/ethtool.c.
> 
I agree, will be fixed in V1.
>>  struct net_device;
>>  
>>  /* Some generic methods drivers may use in their ethtool_ops */
>> @@ -158,6 +181,9 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 n_rx_rings)
>>   *	Returns zero if not supported for this specific device.
>>   * @get_rxfh_indir_size: Get the size of the RX flow hash indirection table.
>>   *	Returns zero if not supported for this specific device.
>> + * @get_rxfh_func: Get the hardware RX flow hash function.
>> + * @set_rxfh_func: Set the hardware RX flow hash function. Returns a negative
>> + *	error code or zero.
>>   * @get_rxfh: Get the contents of the RX flow hash indirection table and hash
>>   *	key.
>>   *	Will only be called if one or both of @get_rxfh_indir_size and
>> @@ -241,6 +267,8 @@ struct ethtool_ops {
>>  	int	(*reset)(struct net_device *, u32 *);
>>  	u32	(*get_rxfh_key_size)(struct net_device *);
>>  	u32	(*get_rxfh_indir_size)(struct net_device *);
>> +	u32	(*get_rxfh_func)(struct net_device *);
>> +	int	(*set_rxfh_func)(struct net_device *, u32);
> 
> Why not another parameter to get_rxfh/set_rxfh?  I know it's a pain to
> update all the implementations, but changing algorithm potentially
> changes the supported indirection table and key lengths.  They have to
> be validated together.
> 
OK, I'll do it for V1.
>>  	int	(*get_rxfh)(struct net_device *, u32 *indir, u8 *key);
>>  	int	(*set_rxfh)(struct net_device *, const u32 *indir,
>>  			    const u8 *key);
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index eb2095b..eb91da4 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -534,6 +534,7 @@ struct ethtool_pauseparam {
>>   * @ETH_SS_NTUPLE_FILTERS: Previously used with %ETHTOOL_GRXNTUPLE;
>>   *	now deprecated
>>   * @ETH_SS_FEATURES: Device feature names
>> + * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
>>   */
>>  enum ethtool_stringset {
>>  	ETH_SS_TEST		= 0,
>> @@ -541,6 +542,7 @@ enum ethtool_stringset {
>>  	ETH_SS_PRIV_FLAGS,
>>  	ETH_SS_NTUPLE_FILTERS,
>>  	ETH_SS_FEATURES,
>> +	ETH_SS_RSS_HASH_FUNCS,
>>  };
>>  
>>  /**
>> @@ -900,7 +902,9 @@ struct ethtool_rxfh {
>>  	__u32	rss_context;
>>  	__u32   indir_size;
>>  	__u32   key_size;
>> -	__u32	rsvd[2];
>> +	__u8	hfunc;
> 
> Missing kernel-doc.  This needs to be very clear about what the valid
> values are.
> 
I'll add it in V1.
>> +	__u8	rsvd8[3];
>> +	__u32	rsvd32;
>>  	__u32   rss_config[0];
>>  };
>>  #define ETH_RXFH_INDIR_NO_CHANGE	0xffffffff
>> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
>> index 06dfb29..4791c17 100644
>> --- a/net/core/ethtool.c
>> +++ b/net/core/ethtool.c
>> @@ -185,6 +185,9 @@ static int __ethtool_get_sset_count(struct net_device *dev, int sset)
>>  	if (sset == ETH_SS_FEATURES)
>>  		return ARRAY_SIZE(netdev_features_strings);
>>  
>> +	if (sset == ETH_SS_RSS_HASH_FUNCS)
>> +		return ARRAY_SIZE(rss_hash_func_strings);
>> +
>>  	if (ops->get_sset_count && ops->get_strings)
>>  		return ops->get_sset_count(dev, sset);
>>  	else
> [...]
>> @@ -760,32 +769,43 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
>>  	const struct ethtool_ops *ops = dev->ethtool_ops;
>>  	struct ethtool_rxnfc rx_rings;
>>  	struct ethtool_rxfh rxfh;
>> -	u32 dev_indir_size = 0, dev_key_size = 0, i;
>> +	u32 dev_indir_size = 0, dev_key_size = 0, dev_hfunc = 0, i;
>>  	u32 *indir = NULL, indir_bytes = 0;
>>  	u8 *hkey = NULL;
>>  	u8 *rss_config;
>>  	u32 rss_cfg_offset = offsetof(struct ethtool_rxfh, rss_config[0]);
>>  
>> -	if (!(ops->get_rxfh_indir_size || ops->get_rxfh_key_size) ||
>> -	    !ops->get_rxnfc || !ops->set_rxfh)
>> +	if (!(ops->get_rxfh_indir_size || ops->get_rxfh_key_size ||
>> +	      ops->get_rxfh_func) || !ops->get_rxnfc || !ops->set_rxfh)
>>  		return -EOPNOTSUPP;
>>  
>> +	if (ops->get_rxfh_func)
>> +		dev_hfunc = ops->get_rxfh_func(dev);
>>  	if (ops->get_rxfh_indir_size)
>>  		dev_indir_size = ops->get_rxfh_indir_size(dev);
>>  	if (ops->get_rxfh_key_size)
>>  		dev_key_size = dev->ethtool_ops->get_rxfh_key_size(dev);
>> -	if ((dev_key_size + dev_indir_size) == 0)
>> +	if ((dev_key_size + dev_indir_size + dev_hfunc) == 0)
>>  		return -EOPNOTSUPP;
>>  
>>  	if (copy_from_user(&rxfh, useraddr, sizeof(rxfh)))
>>  		return -EFAULT;
>>  
>>  	/* Check that reserved fields are 0 for now */
>> -	if (rxfh.rss_context || rxfh.rsvd[0] || rxfh.rsvd[1])
>> +	if (rxfh.rss_context || rxfh.rsvd8[0] || rxfh.rsvd8[1] ||
>> +	    rxfh.rsvd8[2] || rxfh.rsvd32)
>>  		return -EINVAL;
>>  
>> +	if (rxfh.hfunc != dev_hfunc) {
>> +		if (!ops->set_rxfh_func)
>> +			return -EOPNOTSUPP;
>> +		ret = ops->set_rxfh_func(dev, rxfh.hfunc);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>>  	/* If either indir or hash key is valid, proceed further.
>> -	 * It is not valid to request that both be unchanged.
>> +	 * Must request at least one change: indir size, hash key or function.
>>  	 */
>>  	if ((rxfh.indir_size &&
>>  	     rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE &&
>> @@ -793,7 +813,7 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
>>  	    (rxfh.key_size && (rxfh.key_size != dev_key_size)) ||
>>  	    (rxfh.indir_size == ETH_RXFH_INDIR_NO_CHANGE &&
>>  	     rxfh.key_size == 0))
>> -		return -EINVAL;
>> +		return rxfh.hfunc ? 0 : -EINVAL;
> 
> Shouldn't the condition be rxfh.hfunc != dev_hfunc ?
> 
> Ben.
>
On current implementation it's not necessary since we have already
checked that above. However, according to your suggestion, after I'll
remove the set/get_rxfh_func() callbacks I wouldn't be able to check
that condition at this point of the code.

Best Regards,
Eyal.

^ permalink raw reply

* [patch net-next 02/10] net: introduce generic switch devices support
From: Jiri Pirko @ 2014-11-06  9:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl
In-Reply-To: <1415265610-9338-1-git-send-email-jiri@resnulli.us>

The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
 MAINTAINERS                            |  7 ++++
 include/linux/netdevice.h              | 10 ++++++
 include/net/switchdev.h                | 30 +++++++++++++++++
 net/Kconfig                            |  1 +
 net/Makefile                           |  3 ++
 net/switchdev/Kconfig                  | 13 ++++++++
 net/switchdev/Makefile                 |  5 +++
 net/switchdev/switchdev.c              | 33 +++++++++++++++++++
 9 files changed, 161 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
new file mode 100644
index 0000000..98be76c
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,59 @@
+Switch (and switch-ish) device drivers HOWTO
+===========================
+
+Please note that the word "switch" is here used in very generic meaning.
+This include devices supporting L2/L3 but also various flow offloading chips,
+including switches embedded into SR-IOV NICs.
+
+Lets describe a topology a bit. Imagine the following example:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  NIC0 NIC1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  eth0 eth1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
+			       of the same physical switch chip. This is
+			       mandatory to be implemented by all switch drivers
+			       and serves the caller for recognition of a port
+			       netdevice.
+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
+			  chip itself (it can be though of as a "parent" of the
+			  port, therefore the name). They are not port-specific.
+			  Caller might use arbitrary port netdevice of the same
+			  switch and it will make no difference.
+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
diff --git a/MAINTAINERS b/MAINTAINERS
index 3a41fb0..776e078 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9003,6 +9003,13 @@ F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCHDEV
+M:	Jiri Pirko <jiri@resnulli.us>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/switchdev/
+F:	include/net/switchdev.h
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bbc730f..54b08a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1017,6 +1017,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	performing GSO on a packet. The device returns true if it is
  *	able to GSO the packet, false otherwise. If the return value is
  *	false the stack will do software GSO.
+ *
+ * int (*ndo_sw_parent_id_get)(struct net_device *dev,
+ *			       struct netdev_phys_item_id *psid);
+ *	Called to get an ID of the switch chip this port is part of.
+ *	If driver implements this, it indicates that it represents a port
+ *	of a switch chip.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1168,6 +1174,10 @@ struct net_device_ops {
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
 	bool			(*ndo_gso_check) (struct sk_buff *skb,
 						  struct net_device *dev);
+#ifdef CONFIG_NET_SWITCHDEV
+	int			(*ndo_sw_parent_id_get)(struct net_device *dev,
+							struct netdev_phys_item_id *psid);
+#endif
 };
 
 /**
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
new file mode 100644
index 0000000..79bf9bd
--- /dev/null
+++ b/include/net/switchdev.h
@@ -0,0 +1,30 @@
+/*
+ * include/net/switchdev.h - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _LINUX_SWITCHDEV_H_
+#define _LINUX_SWITCHDEV_H_
+
+#include <linux/netdevice.h>
+
+#ifdef CONFIG_NET_SWITCHDEV
+
+int netdev_sw_parent_id_get(struct net_device *dev,
+			    struct netdev_phys_item_id *psid);
+
+#else
+
+static inline int netdev_sw_parent_id_get(struct net_device *dev,
+					  struct netdev_phys_item_id *psid)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif
+
+#endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 99815b5..ff9ffc1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
+source "net/switchdev/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 7ed1970..95fc694 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+ifneq ($(CONFIG_NET_SWITCHDEV),)
+obj-y				+= switchdev/
+endif
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
new file mode 100644
index 0000000..1557545
--- /dev/null
+++ b/net/switchdev/Kconfig
@@ -0,0 +1,13 @@
+#
+# Configuration for Switch device support
+#
+
+config NET_SWITCHDEV
+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
+	depends on INET
+	---help---
+	  This module provides glue between core networking code and device
+	  drivers in order to support hardware switch chips in very generic
+	  meaning of the word "switch". This include devices supporting L2/L3 but
+	  also various flow offloading chips, including switches embedded into
+	  SR-IOV NICs.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
new file mode 100644
index 0000000..5ed63ed
--- /dev/null
+++ b/net/switchdev/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Switch device API
+#
+
+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
new file mode 100644
index 0000000..5010f646
--- /dev/null
+++ b/net/switchdev/switchdev.c
@@ -0,0 +1,33 @@
+/*
+ * net/switchdev/switchdev.c - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <net/switchdev.h>
+
+/**
+ *	netdev_sw_parent_id_get - Get ID of a switch
+ *	@dev: port device
+ *	@psid: switch ID
+ *
+ *	Get ID of a switch this port is part of.
+ */
+int netdev_sw_parent_id_get(struct net_device *dev,
+			    struct netdev_phys_item_id *psid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_sw_parent_id_get)
+		return -EOPNOTSUPP;
+	return ops->ndo_sw_parent_id_get(dev, psid);
+}
+EXPORT_SYMBOL(netdev_sw_parent_id_get);
-- 
1.9.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox