Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: BCM4313 & brcmsmac & 3.12: only semi-working?
From: Maximilian Engelhardt @ 2014-10-08 22:19 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Michael Tokarev, Seth Forshee, brcm80211-dev-list, linux-wireless,
	netdev
In-Reply-To: <542A8092.9090602@broadcom.com>

[-- Attachment #1: Type: text/plain, Size: 6808 bytes --]

On Tuesday 30 September 2014 12:06:10 Arend van Spriel wrote:
> On 09/29/14 21:40, Maximilian Engelhardt wrote:
> > On Monday 29 September 2014 15:44:03 Arend van Spriel wrote:
> >> On 09/26/14 17:20, Michael Tokarev wrote:
> >>> I can send it your way, -- guess it will be quite a bit costly,
> >>> but I don't have any use for it anyway (short of throwing it
> >>> away), and since I already spent significantly more money due
> >>> to all this (whole thinkpad plus ssds and several wifi adaptors),
> >>> this additional cost is just a small noize.  But since that's
> >>> 2nd card in a row, maybe there's something else in there, the
> >>> prob is not in the card?
> >> 
> >> Could be. Maybe some BIOS issue. Can you make some hi-res pictures of
> >> the card and email them to me? If it is identical to what I already have
> >> over here there is not much sense in sending it.
> >> 
> >> Regards,
> >> Arend
> > 
> > Hi Arend,
> > 
> > I just saw this thread on linux-wireless and wanted to answer as it might
> > be of interest to you.
> > 
> > I also own a BCM4313 wireless network card. About a year ago I reported
> > some problems with reception strength which were then fixed after some
> > time, debugging and testing passed. At that time I also did some
> > throughput testing, but only had a 802.11g access-point to test. The
> > results were not ideal, but also not too bad. So at that time I thought
> > the issues were all more or less fixed and mostly fine. I also don't use
> > wireless very much, so as long as things do work somehow acceptable I
> > probably don't notice any problems immediately. So it comes that only
> > about some month ago I noticed that the throughput I measured with my 11g
> > access-point (about half the rate than with an atheros card on the same
> > ap) is about the same on a 11n access-point where it should be much
> > faster. I didn't experience any stalls, but that may also be that I
> > didn't use the card enough to really notice them.
> > 
> > I always wanted to report a bug because of the low throughout, but never
> > got to it because of lack of time. I didn't want to provide a report
> > saying just it doesn't work or it's slow without any data about it and a
> > description how to reproduce it, as I think without this information a
> > bug report is mostly useless.
> > 
> > I also had a look at the kernel changelog of the brcmsm driver and notices
> > there was little to no activity lately. Because of this I also wasn't sure
> > if there is still someone interesting in fixing bugs for this device.
> > 
> > As I was annoyed by the bad support for this card I decided it would be
> > more easy and much less time consuming to simply buy another card than
> > trying to get this fixed. So I bought a BCM43228 card, because it also
> > supports 5 GHz. Only after it arrived I noticed that it was only
> > supported by the 3.17+ kernel (not so much of a problem) and that it only
> > seems to work in 802.11g mode (only slow speeds and no 5 GHz). At least I
> > could use it in full 11g speeds, so it was a improvement.
> > 
> > So I still don't have a card that does simply work. As I hope the missing
> > support for my BCM43228 will hopefully be added some time in the future it
> > probably would still be worth fixing the BCM4313 card as other users will
> > also benefit from it.
> > 
> > A friend of mine also has the same laptop than me and the same (or at
> > least
> > very similar) wireless card. He has told me he has also problem with
> > stalls
> > like Michael reported (if I remember the history of the thread correctly).
> > 
> > So I'm not really sure where I should go from here. I can try to provide
> > some debugging information as time permits, but I don't know how much
> > time I will have for this in the future. Of course ideally I want to use
> > the BCM43228 card with full support, as it can work on 5 GHz.
> > 
> > Currently the BCM43228 card is plugged into my laptop, but I want to avoid
> > swapping the cards more that very few times because the antenna connectors
> > are only designed for very few (un)plug cycles.
> > 
> > If it's of any information, my card is labeled BCM-BCM94313HMGB on the
> > sticker, the laptop where it was originally is a ThinkPad Edge E135.
> 
> Thanks for taking time to chime in. This chipset is a pain in the....
> The label info does help. You have a 4313 with internal PA for which
> support was added later. The card that Michael has seems to have an
> external PA. The initial iPA support patch broke things for everyone
> with external PA so it was reverted. In the second round it was better,
> but it seems Michael still had issues. As he mentioned BT issues and his
> card shares the external PA between WLAN and BT I believe that there is
> a BT-coex issue.
> 
> What is causing your 4313 to seemingly do 11g rates is hard to tell
> without any debug info. I have that card over here, but in my cabled
> setup it is doing 72Mbps, ie. 11n rate. I can run a rate-vs-range test
> to see if there is an issue.
> 
> Thanks again and
> Regards,
> Arend

Hi Arend,

Today I changed back to the BC4313 card to verify the speed is still slow. At 
first it was slow as I remembered it (down about 3 Mbits/s, up about 1 Mbit/s, 
measured with iperf).
I then booted an Ubuntu Live image to try the wl driver, just to verify that 
it performs better and the hardware is still fine. First on the Ubuntu Live 
image the speed with the brcmsmac driver were the same as in my test above. I 
the installed the wl driver, unloaded the brcmsmac module and loaded the wl 
module. That however did not work as something in the kernel crashed and 
wireless didn't work at all. I rebootet the system to test again in case this 
was a random failure. What was interesting after this reboot was that the 
wireless connection was fast with the brcmsmac driver. I got about 35 Mbit/s 
up and down speed. I then tried the wl driver again but it still crashed.

After this I rebooted into my normal Debian system and the brcmsmac driver was 
still fast. So what was different now than before. I realized that I always did 
a reboot, so the Laptop was never really off between the boots. I then shut 
down the laptop and waited a few seconds before turning it on again. And after 
that the driver was back to the slow speed.

I did not verify this a second time, but for now it very much looks like the 
wl driver is settings something in the card that is necessary for getting 
faster speeds. This seems to be preserved on reboot but not on poweroff.

I hope this my help you finding the problem in the brcmsmac driver. Feel free 
to ask if you need any addition debug information from me.

And for the record, my atheros usb card is still much faster, it reaches 
130 Mbit/s.

Greetings,
Maxi

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] igb: Indicate failure on vf reset for empty mac address
From: Jeff Kirsher @ 2014-10-08 22:29 UTC (permalink / raw)
  To: Alexander Graf
  Cc: David S. Miller, netdev, Mitch Williams, Andy Gospodarek,
	Stefan Assmann, Aaron Brown, Greg Rose, John Ronciak
In-Reply-To: <1412803407-10621-1-git-send-email-agraf@suse.de>

[-- Attachment #1: Type: text/plain, Size: 2513 bytes --]

On Wed, 2014-10-08 at 23:23 +0200, Alexander Graf wrote:
> Commit 5ac6f91d changed the igb driver to expose a zero (empty) mac
> address to the VF on reset rather than a random one.
> 
> However, that behavioral change also requires igbvf driver changes
> which can be hard especially when we want to talk to proprietary
> guest OSs.
> 
> Looking at the code previous to the commit in Linux that made igbvf
> work with empty mac addresses (8d56b6d), we can see that on reset
> failure the driver will try to generate a new mac address with both
> the old and the new code.
> 
> Furthermore, ixgbe does send reset failure when it detects an empty
> mac address (35055928c).
> 
> So I think it's safe to make igb behave the same. With this patch I
> can successfully run a Windows 8.1 guest with an empty mac address
> and an assigned igbvf device that has no mac address set by the host.
> 
> If anyone is aware of a guest driver that chokes on NACK returns of
> VF RESET commands, please speak up.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Actually after further review of this patch and the reported bug in
SuSE's bugzilla, we are NACK'ing this patch.

If the reset has not failed, why are we indicating that it has?  We
originally supplied the VF with a NULL MAC, so we should supply it
again.  That way, the VF can choose to either regenerate a new random
MAC or keep using the one that it had.

The current method was a fix that was requested by the community in the
first place, also we cannot take into account "proprietary guest OS's".

> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index cb14bbd..e8c53b6 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -6024,8 +6024,12 @@ static void igb_vf_reset_msg(struct igb_adapter *adapter, u32 vf)
>  	adapter->vf_data[vf].flags |= IGB_VF_FLAG_CTS;
>  
>  	/* reply to reset with ack and vf mac address */
> -	msgbuf[0] = E1000_VF_RESET | E1000_VT_MSGTYPE_ACK;
> -	memcpy(addr, vf_mac, ETH_ALEN);
> +	if (!is_zero_ether_addr(vf_mac)) {
> +		msgbuf[0] = E1000_VF_RESET | E1000_VT_MSGTYPE_ACK;
> +		memcpy(addr, vf_mac, ETH_ALEN);
> +	} else {
> +		msgbuf[0] = E1000_VF_RESET | E1000_VT_MSGTYPE_NACK;
> +	}
>  	igb_write_mbx(hw, msgbuf, 3, vf);
>  }
>  



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] igb: Indicate failure on vf reset for empty mac address
From: Alexander Graf @ 2014-10-08 22:38 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: David S. Miller, netdev@vger.kernel.org, Mitch Williams,
	Andy Gospodarek, Stefan Assmann, Aaron Brown, Greg Rose,
	John Ronciak
In-Reply-To: <1412807355.2260.17.camel@jtkirshe-mobl.jf.intel.com>




> Am 09.10.2014 um 00:29 schrieb Jeff Kirsher <jeffrey.t.kirsher@intel.com>:
> 
>> On Wed, 2014-10-08 at 23:23 +0200, Alexander Graf wrote:
>> Commit 5ac6f91d changed the igb driver to expose a zero (empty) mac
>> address to the VF on reset rather than a random one.
>> 
>> However, that behavioral change also requires igbvf driver changes
>> which can be hard especially when we want to talk to proprietary
>> guest OSs.
>> 
>> Looking at the code previous to the commit in Linux that made igbvf
>> work with empty mac addresses (8d56b6d), we can see that on reset
>> failure the driver will try to generate a new mac address with both
>> the old and the new code.
>> 
>> Furthermore, ixgbe does send reset failure when it detects an empty
>> mac address (35055928c).
>> 
>> So I think it's safe to make igb behave the same. With this patch I
>> can successfully run a Windows 8.1 guest with an empty mac address
>> and an assigned igbvf device that has no mac address set by the host.
>> 
>> If anyone is aware of a guest driver that chokes on NACK returns of
>> VF RESET commands, please speak up.
>> 
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> drivers/net/ethernet/intel/igb/igb_main.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
> 
> Actually after further review of this patch and the reported bug in
> SuSE's bugzilla, we are NACK'ing this patch.
> 
> If the reset has not failed, why are we indicating that it has?

Very good question. It's what the ixgbe driver does and older Linux, newer Linux as well as Windows seem to interpret it as "I need to generate a mac address".

I have no idea whether that's the intended meaning of a reset nack, but if it isn't, ixgbe probably shouldn't nack it either then.

Is the mbox message protocol documented anywhere? I could only find references to register layouts so far, but nothing about the comminucation on top of them.

> We
> originally supplied the VF with a NULL MAC, so we should supply it
> again.  That way, the VF can choose to either regenerate a new random
> MAC or keep using the one that it had.

Where is this documented? And why doesn't ixgbe adhere to this?

> The current method was a fix that was requested by the community in the
> first place, also we cannot take into account "proprietary guest OS's".

I don't think I understand this comment. "The community" is a very broad term in Linux ;). Also, this patch merely moves igb to behave identically to ixgbe in how it tells an igbvf driver that the mac address is empty. Nothing changes wrt the original change to not generate random mac addresses. That stays identical.


Alex

^ permalink raw reply

* Re: r8168 is needed to enter P-state: Package State 6 (pc6)onHaswell hardware: does the patch below against current kernel make a difference?
From: Ceriel Jacobs @ 2014-10-08 22:46 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Hayes Wang, nic_swsd, netdev@vger.kernel.org
In-Reply-To: <20141008221726.GA25515@electric-eye.fr.zoreil.com>

The 3.13 source was installed using "apt-get source linux-source-3.13.0"

The patch hunk#2 failing issue is something in the "heredoc" notation 
(me trying to apply the patch without storing the patch lines as 
temporary input file). I am now using the temp file without any issue.

The next newly blocking issue is this:

# make -C /lib/modules/$(uname -r)/build M=$(pwd) modules
make: Entering directory `/usr/src/linux-headers-3.17.0-031700rc7-generic'
   CC [M]  /root/linux-3.13.0/drivers/net/ethernet/realtek/8139cp.o
   CC [M]  /root/linux-3.13.0/drivers/net/ethernet/realtek/8139too.o
   CC [M]  /root/linux-3.13.0/drivers/net/ethernet/realtek/atp.o
   CC [M]  /root/linux-3.13.0/drivers/net/ethernet/realtek/r8169.o
/root/linux-3.13.0/drivers/net/ethernet/realtek/r8169.c: In function 
'rtl_init_one':
/root/linux-3.13.0/drivers/net/ethernet/realtek/r8169.c:7127:2: error: 
implicit declaration of function 'SET_ETHTOOL_OPS' 
[-Werror=implicit-function-declaration]
   SET_ETHTOOL_OPS(dev, &rtl8169_ethtool_ops);
   ^
cc1: some warnings being treated as errors
make[1]: *** [/root/linux-3.13.0/drivers/net/ethernet/realtek/r8169.o] 
Error 1
make: *** [_module_/root/linux-3.13.0/drivers/net/ethernet/realtek] Error 2
make: Leaving directory `/usr/src/linux-headers-3.17.0-031700rc7-generic'

Any new suggestion how to honour your patch/test request?

Francois Romieu schreef op 09-10-14 om 00:17:
> Ceriel Jacobs <linux-ide@crashplan.pro> :
> [...]
>> Francois, could you help me apply your patch?
>
> $ git describe
> v3.13
> $ patch -p1 --dry-run < /tmp/plop
> checking file drivers/net/ethernet/realtek/r8169.c
> Hunk #1 succeeded at 467 (offset -1 lines).
> Hunk #2 succeeded at 5275 (offset -5 lines).
>
> $ git rev-list v3.16 -- drivers/net/ethernet/realtek/r8169.c | while read x; do echo $x:$(git cat-file -p $x:drivers/net/ethernet/realtek/r8169.c | md5sum); done | grep 95056b56932b375f8b65a6379009f704
>
> -> oualou
>
> Where does your 3.13 source tree come from ?
>

^ permalink raw reply

* Re: r8168 is needed to enter P-state: Package State 6 (pc6)onHaswell hardware: does the patch below against current kernel make a difference?
From: Francois Romieu @ 2014-10-08 23:26 UTC (permalink / raw)
  To: Ceriel Jacobs; +Cc: Hayes Wang, nic_swsd, netdev@vger.kernel.org
In-Reply-To: <5435BEC3.6080200@crashplan.pro>

Ceriel Jacobs <linux-ide@crashplan.pro> :
> The 3.13 source was installed using "apt-get source linux-source-3.13.0"
> 
> The patch hunk#2 failing issue is something in the "heredoc" notation (me
> trying to apply the patch without storing the patch lines as temporary input
> file). I am now using the temp file without any issue.
> 
> The next newly blocking issue is this:
> 
> # make -C /lib/modules/$(uname -r)/build M=$(pwd) modules
> make: Entering directory `/usr/src/linux-headers-3.17.0-031700rc7-generic'

You are mixing 3.13 and 3.17. Try 'uname -r' alone. Got it ?

Please get some pristine 3.17 sources - they're available on www.kernel.org -
and try again.

-- 
Ueimor

^ permalink raw reply

* RE: [PATCH] igb: Indicate failure on vf reset for empty mac address
From: Williams, Mitch A @ 2014-10-08 23:33 UTC (permalink / raw)
  To: 'Alexander Graf', Kirsher, Jeffrey T
  Cc: David S. Miller, netdev@vger.kernel.org, Andy Gospodarek,
	Stefan Assmann, Brown, Aaron F, Rose, Gregory V, Ronciak, John
In-Reply-To: <4F8303E0-EA1B-4D93-B898-10ACD072F404@suse.de>

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Wednesday, October 08, 2014 3:39 PM
> To: Kirsher, Jeffrey T
> Cc: David S. Miller; netdev@vger.kernel.org; Williams, Mitch A; Andy
> Gospodarek; Stefan Assmann; Brown, Aaron F; Rose, Gregory V; Ronciak, John
> Subject: Re: [PATCH] igb: Indicate failure on vf reset for empty mac address
> 
> 
> 
> 
> > Am 09.10.2014 um 00:29 schrieb Jeff Kirsher <jeffrey.t.kirsher@intel.com>:
> >
> >> On Wed, 2014-10-08 at 23:23 +0200, Alexander Graf wrote:
> >> Commit 5ac6f91d changed the igb driver to expose a zero (empty) mac
> >> address to the VF on reset rather than a random one.
> >>
> >> However, that behavioral change also requires igbvf driver changes
> >> which can be hard especially when we want to talk to proprietary
> >> guest OSs.
> >>
> >> Looking at the code previous to the commit in Linux that made igbvf
> >> work with empty mac addresses (8d56b6d), we can see that on reset
> >> failure the driver will try to generate a new mac address with both
> >> the old and the new code.
> >>
> >> Furthermore, ixgbe does send reset failure when it detects an empty
> >> mac address (35055928c).
> >>
> >> So I think it's safe to make igb behave the same. With this patch I
> >> can successfully run a Windows 8.1 guest with an empty mac address
> >> and an assigned igbvf device that has no mac address set by the host.
> >>
> >> If anyone is aware of a guest driver that chokes on NACK returns of
> >> VF RESET commands, please speak up.
> >>
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> ---
> >> drivers/net/ethernet/intel/igb/igb_main.c | 8 ++++++--
> >> 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > Actually after further review of this patch and the reported bug in
> > SuSE's bugzilla, we are NACK'ing this patch.
> >
> > If the reset has not failed, why are we indicating that it has?
> 
> Very good question. It's what the ixgbe driver does and older Linux, newer
> Linux as well as Windows seem to interpret it as "I need to generate a mac
> address".
> 
> I have no idea whether that's the intended meaning of a reset nack, but if
> it isn't, ixgbe probably shouldn't nack it either then.
> 
> Is the mbox message protocol documented anywhere? I could only find
> references to register layouts so far, but nothing about the comminucation
> on top of them.
> 
> > We
> > originally supplied the VF with a NULL MAC, so we should supply it
> > again.  That way, the VF can choose to either regenerate a new random
> > MAC or keep using the one that it had.
> 
> Where is this documented? And why doesn't ixgbe adhere to this?
> 
> > The current method was a fix that was requested by the community in the
> > first place, also we cannot take into account "proprietary guest OS's".
> 
> I don't think I understand this comment. "The community" is a very broad
> term in Linux ;). Also, this patch merely moves igb to behave identically to
> ixgbe in how it tells an igbvf driver that the mac address is empty. Nothing
> changes wrt the original change to not generate random mac addresses. That
> stays identical.
> 
> 
> Alex

Alex, I can't speak for ixgbe - I've never worked on that driver. However, for igb and igbvf, I was the one who changed the MAC address generation code, at the request of the Linux networking community. In general, the PF driver provides a valid MAC to the VF driver if and only if the user has assigned a fixed MAC address for that VF on the command line. Otherwise, the VF driver gets all zeros, and it knows to generate its own random MAC address. This was done so that the VF would know when it was using a random MAC address and be able to inform the stack (and, eventually, udev). This kept udev from creating a new interface name each time the VF was booted.

On a reset, the PF should not generate a new MAC address for the VF driver. It shouldn't do anything with the VF MAC, except report what it knows to the VF driver - either all zeros, or the address assigned by the user. It's up to the VF driver to know what to do.

And the PF should not lie to the VF driver and tell it that reset failed. That's just incorrect.

Yes, there are some older Linux and Windows VF drivers that don't know what to do in this situation. But we made this change almost two years ago, and all of our current drivers (both Windows and Linux) do the right thing.

If you have a bug to report against a current Windows or Linux VF driver, please let us know. We will be happy to help. 

If you have another proprietary OS that doesn't work, then you need to fix that VF driver, not the PF driver. The Linux kernel community will not accept a change that is only made to support a non-Linux guest.

In any case, this patch is not correct, so we have responded with a NAK.

-Mitch Williams

^ permalink raw reply

* Re: [PATCH] igb: Indicate failure on vf reset for empty mac address
From: Alexander Graf @ 2014-10-08 23:45 UTC (permalink / raw)
  To: Williams, Mitch A, Kirsher, Jeffrey T
  Cc: David S. Miller, netdev@vger.kernel.org, Andy Gospodarek,
	Stefan Assmann, Brown, Aaron F, Rose, Gregory V, Ronciak, John
In-Reply-To: <AAEA33E297BCAC4B9BB20A7C2DF0AB8D65765AC1@FMSMSX113.amr.corp.intel.com>



On 09.10.14 01:33, Williams, Mitch A wrote:
> 
> 
>> -----Original Message-----
>> From: Alexander Graf [mailto:agraf@suse.de]
>> Sent: Wednesday, October 08, 2014 3:39 PM
>> To: Kirsher, Jeffrey T
>> Cc: David S. Miller; netdev@vger.kernel.org; Williams, Mitch A; Andy
>> Gospodarek; Stefan Assmann; Brown, Aaron F; Rose, Gregory V; Ronciak, John
>> Subject: Re: [PATCH] igb: Indicate failure on vf reset for empty mac address
>>
>>
>>
>>
>>> Am 09.10.2014 um 00:29 schrieb Jeff Kirsher <jeffrey.t.kirsher@intel.com>:
>>>
>>>> On Wed, 2014-10-08 at 23:23 +0200, Alexander Graf wrote:
>>>> Commit 5ac6f91d changed the igb driver to expose a zero (empty) mac
>>>> address to the VF on reset rather than a random one.
>>>>
>>>> However, that behavioral change also requires igbvf driver changes
>>>> which can be hard especially when we want to talk to proprietary
>>>> guest OSs.
>>>>
>>>> Looking at the code previous to the commit in Linux that made igbvf
>>>> work with empty mac addresses (8d56b6d), we can see that on reset
>>>> failure the driver will try to generate a new mac address with both
>>>> the old and the new code.
>>>>
>>>> Furthermore, ixgbe does send reset failure when it detects an empty
>>>> mac address (35055928c).
>>>>
>>>> So I think it's safe to make igb behave the same. With this patch I
>>>> can successfully run a Windows 8.1 guest with an empty mac address
>>>> and an assigned igbvf device that has no mac address set by the host.
>>>>
>>>> If anyone is aware of a guest driver that chokes on NACK returns of
>>>> VF RESET commands, please speak up.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>> drivers/net/ethernet/intel/igb/igb_main.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> Actually after further review of this patch and the reported bug in
>>> SuSE's bugzilla, we are NACK'ing this patch.
>>>
>>> If the reset has not failed, why are we indicating that it has?
>>
>> Very good question. It's what the ixgbe driver does and older Linux, newer
>> Linux as well as Windows seem to interpret it as "I need to generate a mac
>> address".
>>
>> I have no idea whether that's the intended meaning of a reset nack, but if
>> it isn't, ixgbe probably shouldn't nack it either then.
>>
>> Is the mbox message protocol documented anywhere? I could only find
>> references to register layouts so far, but nothing about the comminucation
>> on top of them.
>>
>>> We
>>> originally supplied the VF with a NULL MAC, so we should supply it
>>> again.  That way, the VF can choose to either regenerate a new random
>>> MAC or keep using the one that it had.
>>
>> Where is this documented? And why doesn't ixgbe adhere to this?
>>
>>> The current method was a fix that was requested by the community in the
>>> first place, also we cannot take into account "proprietary guest OS's".
>>
>> I don't think I understand this comment. "The community" is a very broad
>> term in Linux ;). Also, this patch merely moves igb to behave identically to
>> ixgbe in how it tells an igbvf driver that the mac address is empty. Nothing
>> changes wrt the original change to not generate random mac addresses. That
>> stays identical.
>>
>>
>> Alex
> 
> Alex, I can't speak for ixgbe - I've never worked on that driver. However, for igb and igbvf, I was the one who changed the MAC address generation code, at the request of the Linux networking community. In general, the PF driver provides a valid MAC to the VF driver if and only if the user has assigned a fixed MAC address for that VF on the command line. Otherwise, the VF driver gets all zeros, and it knows to generate its own random MAC address. This was done so that the VF would know when it was using a random MAC address and be able to inform the stack (and, eventually, udev). This kept udev from creating a new interface name each time the VF was booted.

Yup, makes a lot of sense.

> 
> On a reset, the PF should not generate a new MAC address for the VF driver. It shouldn't do anything with the VF MAC, except report what it knows to the VF driver - either all zeros, or the address assigned by the user. It's up to the VF driver to know what to do.

Well, semantically you want to tell the guest "I don't have a mac
address" or "this is your mac address". Whether the former is encoded in
an empty mac address is really an implementation detail I suppose.

> 
> And the PF should not lie to the VF driver and tell it that reset failed. That's just incorrect.
> 
> Yes, there are some older Linux and Windows VF drivers that don't know what to do in this situation. But we made this change almost two years ago, and all of our current drivers (both Windows and Linux) do the right thing.
> 
> If you have a bug to report against a current Windows or Linux VF driver, please let us know. We will be happy to help. 

Sure! So what I did was that I picked a recent Windows 8.1 installation
and ran it in a VM. Then I assigned the VF to it and it just borked at
me ;).

Apparently 2 years weren't enough for the driver to trickle into the
latest released version of Windows.

> 
> If you have another proprietary OS that doesn't work, then you need to fix that VF driver, not the PF driver. The Linux kernel community will not accept a change that is only made to support a non-Linux guest.

The interface in question is a guest interface. Breaking it in the first
place was probably not a very good decision. Instead, it would've been
better to enhance the protocol so that instead of sending a "1" command,
you would send another command that indicates to the PF driver that the
guest knows how to react to an empty mac address. For the old "1"
command you could've returned randomly generated mac addresses still, or
failed the request altogether (which coincidentally would've made it
work too). But it's probably too late for that :).

> In any case, this patch is not correct, so we have responded with a NAK.

So we're stuck in a nasty situation here. There are 2 problems that I
can see:

  1) User experience is horrible. There is no way mortals will find out
why their VFs don't work in VMs with Windows. It's good for our business
model of paid support, but probably not in Intel's best interest ;)

  2) Igb and ixgbe implement an incredibly similar protocol between VF
and PF, yet are inconsistent in how they react to the reset function. I
dislike inconsistencies :)

I agree that failing the reset sounds odd, but why does the ixgbe driver
do it? And why do the Linux and Windows drivers recover properly from
that failure? Someone must've thought of that case, no?


Alex

^ permalink raw reply

* Re: [PATCH/RFC] datapath: offload hooks
From: Simon Horman @ 2014-10-08 23:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, netdev, Pravin Shelar, Jesse Gross, Jiri Pirko, Thomas Graf,
	John Fastabend, Scott Feldman, Roopa Prabhu, Alexi Starovoitov,
	Florian Fainelli, Jamal Hadi Salim, Bert van Leeuwen
In-Reply-To: <20141007215036.65f1d439@urahara>

On Tue, Oct 07, 2014 at 09:50:36PM -0700, Stephen Hemminger wrote:
> On Wed,  8 Oct 2014 09:40:51 +0900
> Simon Horman <simon.horman@netronome.com> wrote:
> 
> > +struct ovs_offload_ops {
> > +	/* Flow offload functions  */
> > +	/* Called when a flow entry is added to the flow table */
> > +	void (*flow_new)(struct sw_flow *);
> > +	/* Called when a flow entry is modified */
> > +	void (*flow_set)(struct sw_flow *);
> > +	/* Called when a flow entry is removed from the flow table */
> > +	void (*flow_del)(struct sw_flow *);
> > +	/* Called when flow stats are queried */
> > +	void (*flow_stats_get)(const struct sw_flow *, struct ovs_flow_stats *,
> > +			      unsigned long *used, __be16 *tcp_flags);
> > +	/* Called when flow stats are removed */
> > +	void (*flow_stats_clear)(struct sw_flow *);
> > +
> > +	/* Port offload functions  */
> > +	/* Called when a vport is added to the datapath */
> > +	void (*vport_new)(struct sk_buff *, struct vport *,
> > +			  struct vport_parms *);
> > +	/* Called when a vport is modified */
> > +	void (*vport_set)(struct sk_buff *, struct vport *);
> > +	/* Called when a vport is removed from the datapath */
> > +	void (*vport_del)(struct sk_buff *, struct vport *);
> > +	/* Called when vport stats are queried */
> > +	void (*vport_stats_get)(struct vport *, struct ovs_vport_stats *);
> > +	/* Called when vport stats are set */
> > +	void (*vport_stats_set)(struct vport *, struct ovs_vport_stats *);
> > +
> > +	/* Datapath offload functions  */
> > +	/* Called when the datapath is created */
> > +	void (*dp_new)(struct datapath *);
> > +	/* Called when the datapath is modified */
> > +	void (*dp_set)(struct datapath *);
> > +	/* Called when the datapath is removed */
> > +	void (*dp_del)(struct datapath *);
> > +	/* Called when the datapath stats are queried */
> > +	void (*dp_stats_get)(struct datapath *, struct ovs_dp_stats *);
> > +}
> 
> For security, you should mark any ops type table const, so an
> attacker can't find a home to poke their favorite routine into.

Thanks, I'll get that fixed.

^ permalink raw reply

* Re: sunvnet and ->xmit_more
From: Sowmini Varadhan @ 2014-10-08 23:51 UTC (permalink / raw)
  To: davem, Raghuram.Kothakota; +Cc: netdev
In-Reply-To: <CACP96tSqmPzwYQKbAfVFc2YmyEOdyRtxeoCSuczx1EZTrF0JMA@mail.gmail.com>

One thought occurred to me about a way to use this..

It's true that in the case of sunvnet, we send the tx-trigger
at the *start* of a burst, thus we would do the same amount of work
whether we had 1 or N (>1) packets for a burst, until a STOPPED was
received, 

and if we tried playing games with this, for every sequence where
the game-playing around "when should I send START" helps, there would
always be a counter-example where it would hurt..

but there *is* one way in which it could be used- if the producer
could somehow signal the "xmit_more" info to the consumer, then
the consumer could factor that information in, when trying to
decide whether or not to declare STOPPED (recall the discussion 
around the fudge-factor that we had earlier in
http://www.spinics.net/lists/netdev/msg294549.html -  the skb_more
provides a better hint than merely guessing a udelay)

Raghuram just pointed out to me

" But this can only be used as a hint, as we cannot rely on the peer to
  send the next pack in specific time frame, the peer may be stuck in something
  else or switched out due to scheduling or some bug etc.

  To pass this hint, ..  we could use a bit in a descriptor in TxDring too."

That might be worth investigating (but I might not get to it till
I get the NAPI on its way).

--Sowmini

^ permalink raw reply

* Re: [PATCH/RFC] datapath: offload hooks
From: Simon Horman @ 2014-10-09  0:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Florian Fainelli, Jiri Pirko,
	netdev-u79uwXL29TY76Z2rM5mHXA, Roopa Prabhu, Jamal Hadi Salim,
	John Fastabend, Bert van Leeuwen, Scott Feldman
In-Reply-To: <20141007215521.493233d2@urahara>

On Tue, Oct 07, 2014 at 09:55:21PM -0700, Stephen Hemminger wrote:
> On Wed,  8 Oct 2014 09:40:51 +0900
> Simon Horman <simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:
> 
> > +
> > +struct ovs_offload_ops {
> > +	/* Flow offload functions  */
> > +	/* Called when a flow entry is added to the flow table */
> > +	void (*flow_new)(struct sw_flow *);
> > +	/* Called when a flow entry is modified */
> > +	void (*flow_set)(struct sw_flow *);
> > +	/* Called when a flow entry is removed from the flow table */
> > +	void (*flow_del)(struct sw_flow *);
> > +	/* Called when flow stats are queried */
> > +	void (*flow_stats_get)(const struct sw_flow *, struct ovs_flow_stats *,
> > +			      unsigned long *used, __be16 *tcp_flags);
> > +	/* Called when flow stats are removed */
> > +	void (*flow_stats_clear)(struct sw_flow *);
> > +
> > +	/* Port offload functions  */
> > +	/* Called when a vport is added to the datapath */
> > +	void (*vport_new)(struct sk_buff *, struct vport *,
> > +			  struct vport_parms *);
> > +	/* Called when a vport is modified */
> > +	void (*vport_set)(struct sk_buff *, struct vport *);
> > +	/* Called when a vport is removed from the datapath */
> > +	void (*vport_del)(struct sk_buff *, struct vport *);
> > +	/* Called when vport stats are queried */
> > +	void (*vport_stats_get)(struct vport *, struct ovs_vport_stats *);
> > +	/* Called when vport stats are set */
> > +	void (*vport_stats_set)(struct vport *, struct ovs_vport_stats *);
> > +
> > +	/* Datapath offload functions  */
> > +	/* Called when the datapath is created */
> > +	void (*dp_new)(struct datapath *);
> > +	/* Called when the datapath is modified */
> > +	void (*dp_set)(struct datapath *);
> > +	/* Called when the datapath is removed */
> > +	void (*dp_del)(struct datapath *);
> > +	/* Called when the datapath stats are queried */
> > +	void (*dp_stats_get)(struct datapath *, struct ovs_dp_stats *);
> > +};
> 
> What about using netlink and netlink notifiers for event type stuff?
> Much easier to extend than all the _ops stuff and you can provide
> hook for people that want to do it in user space.

Hi Stephen,

thanks, that is not an avenue that I had previously considered,
though on that point I'm not speaking for others at Netronome.

I wonder if you could expand a little on which of the above you
consider event type stuff. From my point of view the hooks seem
to fall into two categories:

1. Hooks that are called to obtain information about the datapath.
   That is the *stats* hooks.
2. Hooks that are called when reconfiguration occurs.
   That is the non *stats* hooks.

It seems to me that both categories are typically driven by requests from
user-space.

^ permalink raw reply

* [PATCH net-next 0/3] cxgb4/cxgb4vf: Misc fixes and 40G support for cxgb4vf
From: Hariprasad Shenai @ 2014-10-09  0:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, kumaras, nirranjan, santosh, Hariprasad Shenai

Hi,

This patch series adds 40G support for cxgb4vf driver. Update the LSO length for
cxgb4vf, fix macro. Wait for device to get ready before reading PL_WHOAMI
register.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

Thanks

Hariprasad Shenai (3):
  cxgb4/cxgb4vf: Updated the LSO transfer length in CPL_TX_PKT_LSO for
    T5
  cxgb4vf: Add 40G support for cxgb4vf driver
  cxgb4: Wait for device to get ready before reading any register

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |    2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |    6 +++++-
 drivers/net/ethernet/chelsio/cxgb4/sge.c           |    5 ++++-
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c         |   17 ++++++++++-------
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h        |    1 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h       |    5 ++---
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    |   12 ++++++++----
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c         |    5 ++++-
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |    6 ++++++
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c     |   10 +++++++---
 10 files changed, 48 insertions(+), 21 deletions(-)

^ permalink raw reply

* [PATCH net-next 1/3] cxgb4/cxgb4vf: Updated the LSO transfer length in CPL_TX_PKT_LSO for T5
From: Hariprasad Shenai @ 2014-10-09  0:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, kumaras, nirranjan, santosh, Hariprasad Shenai
In-Reply-To: <1412813927-24951-1-git-send-email-hariprasad@chelsio.com>

Update the lso length for T5 adapter and fix PIDX_T5 macro

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c     |    5 ++++-
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h  |    1 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h |    5 ++---
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c   |    5 ++++-
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index fab4c84..5e1b314 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -1123,7 +1123,10 @@ out_free:	dev_kfree_skb_any(skb);
 		lso->c.ipid_ofst = htons(0);
 		lso->c.mss = htons(ssi->gso_size);
 		lso->c.seqno_offset = htonl(0);
-		lso->c.len = htonl(skb->len);
+		if (is_t4(adap->params.chip))
+			lso->c.len = htonl(skb->len);
+		else
+			lso->c.len = htonl(LSO_T5_XFER_SIZE(skb->len));
 		cpl = (void *)(lso + 1);
 		cntrl = TXPKT_CSUM_TYPE(v6 ? TX_CSUM_TCPIP6 : TX_CSUM_TCPIP) |
 			TXPKT_IPHDR_LEN(l3hdr_len) |
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index 52e0810..5f4db23 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -527,6 +527,7 @@ struct cpl_tx_pkt_lso_core {
 #define LSO_LAST_SLICE    (1 << 22)
 #define LSO_FIRST_SLICE   (1 << 23)
 #define LSO_OPCODE(x)     ((x) << 24)
+#define LSO_T5_XFER_SIZE(x) ((x) << 0)
 	__be16 ipid_ofst;
 	__be16 mss;
 	__be32 seqno_offset;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index eee2728..a1024db 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -72,9 +72,8 @@
 #define  PIDX_MASK   0x00003fffU
 #define  PIDX_SHIFT  0
 #define  PIDX(x)     ((x) << PIDX_SHIFT)
-#define  S_PIDX_T5   0
-#define  M_PIDX_T5   0x1fffU
-#define  PIDX_T5(x)  (((x) >> S_PIDX_T5) & M_PIDX_T5)
+#define  PIDX_SHIFT_T5   0
+#define  PIDX_T5(x)  ((x) << PIDX_SHIFT_T5)
 
 
 #define SGE_TIMERREGS	6
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index a5fb949..85036e6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -1208,7 +1208,10 @@ int t4vf_eth_xmit(struct sk_buff *skb, struct net_device *dev)
 		lso->ipid_ofst = cpu_to_be16(0);
 		lso->mss = cpu_to_be16(ssi->gso_size);
 		lso->seqno_offset = cpu_to_be32(0);
-		lso->len = cpu_to_be32(skb->len);
+		if (is_t4(adapter->params.chip))
+			lso->len = cpu_to_be32(skb->len);
+		else
+			lso->len = cpu_to_be32(LSO_T5_XFER_SIZE(skb->len));
 
 		/*
 		 * Set up TX Packet CPL pointer, control word and perform
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/3] cxgb4vf: Add 40G support for cxgb4vf driver
From: Hariprasad Shenai @ 2014-10-09  0:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, kumaras, nirranjan, santosh, Hariprasad Shenai
In-Reply-To: <1412813927-24951-1-git-send-email-hariprasad@chelsio.com>

Add 40G support for cxgb4vf driver. ethtool speed values are just numbers of
megabits and there is no SPEED_40000 in ethtool speed values. To be consistent,
use integer constants directly for all speeds.

Use is_x_10g_port()("is 10Gb/s or higher") in cfg_queues() instead of
is_10g_port() ("is exactly 10Gb/s"). Else we will end up using a single
"Queue Set" on 40Gb/s adapters.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    |   12 ++++++++----
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |    6 ++++++
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c     |   10 +++++++---
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 8498a64..bfa398d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -163,15 +163,19 @@ void t4vf_os_link_changed(struct adapter *adapter, int pidx, int link_ok)
 		netif_carrier_on(dev);
 
 		switch (pi->link_cfg.speed) {
-		case SPEED_10000:
+		case 40000:
+			s = "40Gbps";
+			break;
+
+		case 10000:
 			s = "10Gbps";
 			break;
 
-		case SPEED_1000:
+		case 1000:
 			s = "1000Mbps";
 			break;
 
-		case SPEED_100:
+		case 100:
 			s = "100Mbps";
 			break;
 
@@ -2351,7 +2355,7 @@ static void cfg_queues(struct adapter *adapter)
 		struct port_info *pi = adap2pinfo(adapter, pidx);
 
 		pi->first_qset = qidx;
-		pi->nqsets = is_10g_port(&pi->link_cfg) ? q10g : 1;
+		pi->nqsets = is_x_10g_port(&pi->link_cfg) ? q10g : 1;
 		qidx += pi->nqsets;
 	}
 	s->ethqsets = qidx;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
index f412d0f..95df61d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h
@@ -228,6 +228,12 @@ static inline bool is_10g_port(const struct link_config *lc)
 	return (lc->supported & SUPPORTED_10000baseT_Full) != 0;
 }
 
+static inline bool is_x_10g_port(const struct link_config *lc)
+{
+	return (lc->supported & FW_PORT_CAP_SPEED_10G) != 0 ||
+		(lc->supported & FW_PORT_CAP_SPEED_40G) != 0;
+}
+
 static inline unsigned int core_ticks_per_usec(const struct adapter *adapter)
 {
 	return adapter->params.vpd.cclk / 1000;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
index 25dfeb8..e984fdc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c
@@ -327,6 +327,8 @@ int t4vf_port_init(struct adapter *adapter, int pidx)
 		v |= SUPPORTED_1000baseT_Full;
 	if (word & FW_PORT_CAP_SPEED_10G)
 		v |= SUPPORTED_10000baseT_Full;
+	if (word & FW_PORT_CAP_SPEED_40G)
+		v |= SUPPORTED_40000baseSR4_Full;
 	if (word & FW_PORT_CAP_ANEG)
 		v |= SUPPORTED_Autoneg;
 	init_link_config(&pi->link_cfg, v);
@@ -1352,11 +1354,13 @@ int t4vf_handle_fw_rpl(struct adapter *adapter, const __be64 *rpl)
 		if (word & FW_PORT_CMD_TXPAUSE)
 			fc |= PAUSE_TX;
 		if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_100M))
-			speed = SPEED_100;
+			speed = 100;
 		else if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_1G))
-			speed = SPEED_1000;
+			speed = 1000;
 		else if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_10G))
-			speed = SPEED_10000;
+			speed = 10000;
+		else if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_40G))
+			speed = 40000;
 
 		/*
 		 * Scan all of our "ports" (Virtual Interfaces) looking for
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 3/3] cxgb4: Wait for device to get ready before reading any register
From: Hariprasad Shenai @ 2014-10-09  0:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, kumaras, nirranjan, santosh, Hariprasad Shenai
In-Reply-To: <1412813927-24951-1-git-send-email-hariprasad@chelsio.com>

Call t4_wait_dev_ready() before attempting to read the PL_WHOAMI register
(to determine which function we have been attached to). This prevents us from
failing on that read if it comes right after a RESET.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |    2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |    6 +++++-
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      |   17 ++++++++++-------
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 9b2c669..410ed58 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -968,7 +968,7 @@ void t4_intr_enable(struct adapter *adapter);
 void t4_intr_disable(struct adapter *adapter);
 int t4_slow_intr_handler(struct adapter *adapter);
 
-int t4_wait_dev_ready(struct adapter *adap);
+int t4_wait_dev_ready(void __iomem *regs);
 int t4_link_start(struct adapter *adap, unsigned int mbox, unsigned int port,
 		  struct link_config *lc);
 int t4_restart_aneg(struct adapter *adap, unsigned int mbox, unsigned int port);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 321f3d9..5b38e95 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -6137,7 +6137,7 @@ static pci_ers_result_t eeh_slot_reset(struct pci_dev *pdev)
 	pci_save_state(pdev);
 	pci_cleanup_aer_uncorrect_error_status(pdev);
 
-	if (t4_wait_dev_ready(adap) < 0)
+	if (t4_wait_dev_ready(adap->regs) < 0)
 		return PCI_ERS_RESULT_DISCONNECT;
 	if (t4_fw_hello(adap, adap->fn, adap->fn, MASTER_MUST, NULL) < 0)
 		return PCI_ERS_RESULT_DISCONNECT;
@@ -6530,6 +6530,10 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto out_disable_device;
 	}
 
+	err = t4_wait_dev_ready(regs);
+	if (err < 0)
+		goto out_unmap_bar0;
+
 	/* We control everything through one PF */
 	func = SOURCEPF_GET(readl(regs + PL_WHOAMI));
 	if (func != ent->driver_data) {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 22d7581..1fff149 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3845,12 +3845,19 @@ static void init_link_config(struct link_config *lc, unsigned int caps)
 	}
 }
 
-int t4_wait_dev_ready(struct adapter *adap)
+#define CIM_PF_NOACCESS 0xeeeeeeee
+
+int t4_wait_dev_ready(void __iomem *regs)
 {
-	if (t4_read_reg(adap, PL_WHOAMI) != 0xffffffff)
+	u32 whoami;
+
+	whoami = readl(regs + PL_WHOAMI);
+	if (whoami != 0xffffffff && whoami != CIM_PF_NOACCESS)
 		return 0;
+
 	msleep(500);
-	return t4_read_reg(adap, PL_WHOAMI) != 0xffffffff ? 0 : -EIO;
+	whoami = readl(regs + PL_WHOAMI);
+	return (whoami != 0xffffffff && whoami != CIM_PF_NOACCESS ? 0 : -EIO);
 }
 
 struct flash_desc {
@@ -3919,10 +3926,6 @@ int t4_prep_adapter(struct adapter *adapter)
 	uint16_t device_id;
 	u32 pl_rev;
 
-	ret = t4_wait_dev_ready(adapter);
-	if (ret < 0)
-		return ret;
-
 	get_pci_mode(adapter, &adapter->params.pci);
 	pl_rev = G_REV(t4_read_reg(adapter, PL_REV));
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Wheeler @ 2014-10-09  0:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shahed Shaikh, Stephen Hemminger, netdev, Rasesh Mody
In-Reply-To: <1412795092.11091.197.camel@edumazet-glaptop2.roam.corp.google.com>

>>> Further inspection of the driver told me that unmap->vector.len should
>>> be 16384 already. (same than PAGE_SIZE << 2)
>>> (set at line 304, drivers/net/ethernet/brocade/bna/bnad.c)
>>>
>>> So you might hit memory fragmentation issues.
>>>
>>> Do you have CONFIG_COMPACTION=y in your .config ?
>>
>> We're still having the backtrace.
>
> What is the output of
>
> free
> cat /proc/sys/vm/min_free_kbytes
> cat /proc/buddyinfo


[root@hv2 ~]# cat /proc/sys/vm/min_free_kbytes
262144

[root@hv2 ~]# cat /proc/buddyinfo
Node 0, zone      DMA      1      2      1      1      1      1      1      0      1      1      3
Node 0, zone    DMA32   2221  28306    771   2964    148      9      4      1      0      0      1
Node 0, zone   Normal 956207  99272      0      0      0      0      0      0      0      0      1

[root@hv2 ~]# free
              total       used       free     shared    buffers     cached
Mem:      33051552   28058192    4993360          0    3058040     158224
-/+ buffers/cache:   24841928    8209624
Swap:      8388604      21540    8367064

[root@hv2 ~]# free -m
              total       used       free     shared    buffers     cached
Mem:         32276      27401       4875          0       2986        154
-/+ buffers/cache:      24260       8016
Swap:         8191         21       8170

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: Jesse Gross @ 2014-10-09  0:30 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Alexander Duyck, John Fastabend, Jeff Kirsher,
	David Miller, Linux Netdev List, Thomas Graf, Pravin Shelar,
	Andy Zhou
In-Reply-To: <CA+mtBx811eEUub3juPpXpS1qAEfPMAtrp7CWjipMZwFC81GcYQ@mail.gmail.com>

On Mon, Oct 6, 2014 at 5:17 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Oct 6, 2014 at 3:33 PM, Jesse Gross <jesse@nicira.com> wrote:
>> On Mon, Oct 6, 2014 at 10:59 AM, Tom Herbert <therbert@google.com> wrote:
>>> On Sun, Oct 5, 2014 at 12:13 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Sun, Oct 5, 2014 at 9:49 PM, Tom Herbert <therbert@google.com> wrote:
>>>>> On Sun, Oct 5, 2014 at 7:04 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>> On Thu, Oct 2, 2014 at 2:06 AM, Tom Herbert <therbert@google.com> wrote:
>>>>>>> On Wed, Oct 1, 2014 at 1:58 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>>>> On Tue, Sep 30, 2014 at 6:34 PM, Tom Herbert <therbert@google.com> wrote:
>>>>>>>>> On Tue, Sep 30, 2014 at 7:30 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>> [...]
>>>>>>> Solution #4: apply this patch and implement the check functions as
>>>>>>> needed in those 4 or 5 drivers. If a device can only do VXLAN/NVGRE
>>>>>>> then I believe the check function is something like:
>>>>>>>
>>>>>>> bool mydev_gso_check(struct sk_buff *skb, struct net_device *dev)
>>>>>>> {
>>>>>>>         if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
>>>>>>>             ((skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
>>>>>>>               skb->protocol != htons(ETH_P_TEB) ||
>>>>>>>               skb_inner_mac_header(skb) - skb_transport_header(skb) != 12)
>>>>>>>                 return false;
>>>>>>>
>>>>>>>         return true;
>>>>>>> }
>>>>>>
>>>>>> Yep, such helper can can be basically made to work and let the 4-5
>>>>>> drivers that can
>>>>>> do GSO offloading for vxlan but not for any FOU/GUE packets signal
>>>>>> that to the stack.
>>>>>>
>>>>>> Re the 12 constant, you were referring to the udp+vxlan headers? it's 8+8
>>>>>>
>>>>>> Also, we need a way for drivers that can support VXLAN or NVGRE but
>>>>>> not concurrently
>>>>>> on the same port @ the same time to only let vxlan packet to pass
>>>>>> successfully through the helper.
>>>>
>>>>> Or, there should be no difference in GSO processing between VXLAN and
>>>>> NVGRE. Can you explain why you feel you need to differentiate them for GSO?
>>>>
>>>>
>>>> RX wise, Linux tells the driver that UDP port X would be used for
>>>> VXLAN, right? and indeed, it's possible for some HW implementations
>>>> not to support RX offloading (checksum) for both VXLAN and NVGRE @ the
>>>> same time over the same port. But TX/GRO wise, you're probably
>>>> correct. The thing is that from the user POV they need solution that
>>>> works for both RX and TX offloading.
>>>
>>> I think from a user POV we want a solution that supports RX and TX
>>> offloading across the widest range of protocols. This is accomplished
>>> by implementing protocol agnostic mechanisms like CHECKSUM_COMPLETE
>>> and protocol agnostic UDP tunnel TSO like we've described. IMO, the
>>> fact that we have devices that implement protocol specific mechanisms
>>> for NVGRE and VXLAN should be considered legacy support in the stack,
>>> for new UDP encapsulation protocols we should not expose specifics in
>>> the stack in either by adding a GSO type for each protocol, nor
>>> ndo_add_foo_port for each protocol-- these things will not scale and
>>> unnecessarily complicate the core stack.
>>
>> It's not clear to me that allowing devices to know what protocols are
>> running on what ports actually complicates the stack. The part that is
>> complicated is usually the types of operations that are being
>> offloaded (checksum, TSO, etc.). In all of these tunnel cases, the
>> operations are same and if you have a clean registration mechanism
>> then nothing in the core has to see this - only the protocol doing the
>> registering and the driver that is supporting it.
>>
>
> We already have an ntuple filtering interface that allows configuring
> a device for special processing of RX packets. I don't see why that
> shouldn't apply to the use case protocol processing for specific ports
> in the encapsulation use case.

You mentioned this before but I guess I don't really understand it. I
suppose it is possible to express the port number and encapsulation as
a filter but it doesn't really seem all that natural and at the end of
the day it won't be mapped to a filter in the NIC. Can you explain it
some more?

>> I have no disagreement with trying to be generic across protocols. I'm
>> just not convinced that it is a realistic plan. It's obvious that it
>> is not doable today nor will be it be in the next generation of NICs
>> (which are guaranteed to add support for new protocols). Furthermore,
>> there will be more advanced stuff coming in the future that I think
>> will be difficult or impossible to make protocol agnostic. Rather than
>> pretending that this doesn't exist or will never happen, it's better
>> focus on how to integrating it cleanly.
>
> Sorry, but I don't understand how supporting a new protocols in a
> device for the purposes of returning CHECKSUM_UNNECESSARY is better or
> easier to implement than just returning CHECKSUM_COMPLETE. Same thing
> for trying to use NETIF_F_IP_CSUM with encapsulation rather than
> NETIF_F_HW_CSUM. I'm not a hardware guy, so it's possible I'm missing
> something obvious...
>
> Can you be more specific about this "advanced stuff"?

I think checksums are really the exception, not the rule. It's great
that they have this nice property of being additive and we should use
that where we can but that doesn't apply to other types of operation
(or even other types of checksums). Encryption or CRC32 carried inside
the tunnel header can't be accelerated without some additional
knowledge of the protocol. I think there were also a few other things
that came up along these lines when we talked about this in an earlier
thread - that's what I mean by "advanced stuff".

For the basic one's complement checksums, I have no objection to
CHECKSUM_COMPLETE. However, the reality is that this is not generally
implemented today and that is unlikely to change for a few years even
in the best case.

^ permalink raw reply

* RE: [PATCH net] bna: page allocation during interrupts to use a mempool.
From: Eric Wheeler @ 2014-10-09  0:50 UTC (permalink / raw)
  To: Rasesh Mody; +Cc: Eric Dumazet, Shahed Shaikh, Stephen Hemminger, netdev
In-Reply-To: <2552F74A0BCCBE4DBE2AD218C81B2811C6399B@avmb3.qlogic.org>

On Wed, 8 Oct 2014, Rasesh Mody wrote:

> Hi Eric Wheeler, Eric Dumazet,
>
> We'll try to reproduce the issue in-house to find more about the root 
> cause of the failure and work on possible solution.

Thanks Rasesh,

FYI, we're using the Ethernet part this card (no FCoE use here); I believe 
it is the 1020 version:

02:00.0 Fibre Channel: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
02:00.1 Fibre Channel: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
02:00.2 Ethernet controller: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
02:00.3 Ethernet controller: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)


02:00.2 Ethernet controller: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA (rev 01)
         Subsystem: Brocade Communications Systems, Inc. 1010/1020/1007/1741 10Gbps CNA - LL
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin A routed to IRQ 19
         Region 0: Memory at df540000 (64-bit, non-prefetchable) [size=256K]
         Region 2: Memory at df604000 (64-bit, non-prefetchable) [size=16K]
         Expansion ROM at df200000 [disabled] [size=1M]
         Capabilities: [40] Power Management version 3
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
         Capabilities: [50] MSI-X: Enable+ Count=256 Masked-
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00002000
         Capabilities: [60] Express (v2) Endpoint, MSI 00
                 DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <256ns, L1 <64us
                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                         MaxPayload 128 bytes, MaxReadReq 512 bytes
                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                 LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <256ns, L1 <64us
                         ClockPM- Surprise- LLActRep- BwNot-
                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                 LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
         Capabilities: [a0] Vital Product Data
                 No end tag found
         Capabilities: [100 v1] Advanced Error Reporting
                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                 CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
         Capabilities: [180 v1] Power Budgeting <?>
         Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
                 ARICap: MFVC- ACS-, Next Function: 3
                 ARICtl: MFVC- ACS-, Function Group: 0
         Kernel driver in use: bna
         Kernel modules: bna


--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298


>
> Thanks,
> -Rasesh

^ permalink raw reply

* Re: [ovs-dev] [PATCH/RFC repost 7/8] ofproto: translate datapath select group action
From: Simon Horman @ 2014-10-09  1:14 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: dev, netdev
In-Reply-To: <20140926235725.GB20493@nicira.com>

On Fri, Sep 26, 2014 at 04:57:25PM -0700, Ben Pfaff wrote:
> On Thu, Sep 18, 2014 at 10:55:10AM +0900, Simon Horman wrote:
> > This patch is a prototype and has several limitations:
> > 
> > * It assumes that no actions follow a select group action
> >   because the resulting packet after a select group action may
> >   differ depending on the bucket used. It may be possible
> >   to address this problem using recirculation. Or to not use
> >   the datapath select group in such situations. In any case
> >   this patch does not solve this problem or even prevent it
> >   from occurring.
> 
> It seems like this limitation in particular is a pretty big one.  Do
> you have a good plan in mind for how to resolve it?

Hi Ben,

it seems to me that this would be somewhat difficult to resolve in the
datapath so I propose not doing so. And I have two ideas on how to
resolve this problem outside of the datapath.

1. Recirculation

   It seems to me that it ought to be possible to handle this by
   recirculating if actions occur after an ODP select group action.

   This could be made slightly more selective by only recirculating
   if the execution different buckets may result in different packet
   contents and the actions after the ODP select group action rely on
   the packet contents (e.g. set actions do but output actions do not).

   My feeling is that this could be implemented by adding a small amount
   of extra state to action translation in ovs-vswitchd.

2. Fall back to selecting buckets in ovs-vswtichd

   The idea here is to detect cases where there would be a problem
   executing actions after an ODP select group action and in that
   case to select buckets in ovs-vswtichd: that is use the existing bucket
   translation code in ovs-vswtichd.

   Though this seems conceptually simpler than recirculation it
   seems to me that it would be somewhat more difficult to implement
   as it implies a two stage translation process: e.g. one stage to
   determine if an ODP select group may be used; and one to perform
   the translation.

   I seem to recall trying various two stage translation processes
   as part some earlier unrelated work. And my recollection is that
   the result of my previous efforts were not pretty.

Both of the above more or less negate any benefits of ODP select group
action. In particular lowering flow setup cost and potentially allowing
complete offload of select groups from the datapath to hardware. However I
think that this case is not a common one as it requires both of the
following. And I think they are both not usual use cases.

* Different buckets modifying packets in different ways
  - My expectation is that it is common for buckets to be homogeneous in
    regards to packet modifications. But perhaps this is naïve in the
    context of VLANs, MPLS, and similar tags that can be pushed and popped.
* Actions that rely on packet contents after
  - My expectation is that it is common to use a select group to output
    packets and that is the final action performed.

^ permalink raw reply

* Re: [ovs-dev] [PATCH/RFC repost 2/8] netlink: Allow suppression of warnings for duplicate attributes
From: Simon Horman @ 2014-10-09  1:18 UTC (permalink / raw)
  To: Ben Pfaff; +Cc: dev, netdev
In-Reply-To: <20140926235542.GA20493@nicira.com>

On Fri, Sep 26, 2014 at 04:55:42PM -0700, Ben Pfaff wrote:
> On Thu, Sep 18, 2014 at 10:55:05AM +0900, Simon Horman wrote:
> > Add a multiple field to struct nl_policy which if set suppresses
> > warning of duplicate attributes in nl_parse_nested().
> > 
> > As is the case without this patch only the last occurrence of an
> > attribute is stored in attrs by nl_parse_nested(). As such
> > if the multiple field of struct nl_policy is set then it
> > is up to the caller to parse the message to extract all the attributes.
> > 
> > This is in preparation for allowing multiple OVS_SELECT_GROUP_ATTR_BUCKET
> > attributes in a nested OVS_ACTION_ATTR_SELECT_GROUP attribute.
> > 
> > Signed-off-by: Simon Horman <simon.horman@netronome.com>
> 
> In the other case where we have duplicate attributes, it doesn't make
> sense to process them with the policy functions, because we want to
> see all of the instances of the duplicate attributes and policy
> doesn't allow us to do that.  I'm a little surprised that the new
> attributes work differently.  What's the idea?

My idea was to use the policy to obtain the attributes that
may not be duplicated. And then custom code to pick up all the
instances of attributes that may be duplicated.

I'm don't feel strongly about that approach and I'd be just has
happy to drop this patch and rework things a little so that
all the attributes are picked out by custom code. It sounds
like that would match the approach taken elsewhere. Sorry for
not noticing that earlier.

^ permalink raw reply

* Re: [PATCH] net: Add ndo_gso_check
From: Tom Herbert @ 2014-10-09  1:46 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Or Gerlitz, Alexander Duyck, John Fastabend, Jeff Kirsher,
	David Miller, Linux Netdev List, Thomas Graf, Pravin Shelar,
	Andy Zhou
In-Reply-To: <CAEP_g=_sPBCeSO3nQfy-HJ68HqCgRU+CcA_09bKDL9fgehtveA@mail.gmail.com>

On Wed, Oct 8, 2014 at 5:30 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Oct 6, 2014 at 5:17 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Oct 6, 2014 at 3:33 PM, Jesse Gross <jesse@nicira.com> wrote:
>>> On Mon, Oct 6, 2014 at 10:59 AM, Tom Herbert <therbert@google.com> wrote:
>>>> On Sun, Oct 5, 2014 at 12:13 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Sun, Oct 5, 2014 at 9:49 PM, Tom Herbert <therbert@google.com> wrote:
>>>>>> On Sun, Oct 5, 2014 at 7:04 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>>> On Thu, Oct 2, 2014 at 2:06 AM, Tom Herbert <therbert@google.com> wrote:
>>>>>>>> On Wed, Oct 1, 2014 at 1:58 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>>>>> On Tue, Sep 30, 2014 at 6:34 PM, Tom Herbert <therbert@google.com> wrote:
>>>>>>>>>> On Tue, Sep 30, 2014 at 7:30 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>>> [...]
>>>>>>>> Solution #4: apply this patch and implement the check functions as
>>>>>>>> needed in those 4 or 5 drivers. If a device can only do VXLAN/NVGRE
>>>>>>>> then I believe the check function is something like:
>>>>>>>>
>>>>>>>> bool mydev_gso_check(struct sk_buff *skb, struct net_device *dev)
>>>>>>>> {
>>>>>>>>         if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
>>>>>>>>             ((skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
>>>>>>>>               skb->protocol != htons(ETH_P_TEB) ||
>>>>>>>>               skb_inner_mac_header(skb) - skb_transport_header(skb) != 12)
>>>>>>>>                 return false;
>>>>>>>>
>>>>>>>>         return true;
>>>>>>>> }
>>>>>>>
>>>>>>> Yep, such helper can can be basically made to work and let the 4-5
>>>>>>> drivers that can
>>>>>>> do GSO offloading for vxlan but not for any FOU/GUE packets signal
>>>>>>> that to the stack.
>>>>>>>
>>>>>>> Re the 12 constant, you were referring to the udp+vxlan headers? it's 8+8
>>>>>>>
>>>>>>> Also, we need a way for drivers that can support VXLAN or NVGRE but
>>>>>>> not concurrently
>>>>>>> on the same port @ the same time to only let vxlan packet to pass
>>>>>>> successfully through the helper.
>>>>>
>>>>>> Or, there should be no difference in GSO processing between VXLAN and
>>>>>> NVGRE. Can you explain why you feel you need to differentiate them for GSO?
>>>>>
>>>>>
>>>>> RX wise, Linux tells the driver that UDP port X would be used for
>>>>> VXLAN, right? and indeed, it's possible for some HW implementations
>>>>> not to support RX offloading (checksum) for both VXLAN and NVGRE @ the
>>>>> same time over the same port. But TX/GRO wise, you're probably
>>>>> correct. The thing is that from the user POV they need solution that
>>>>> works for both RX and TX offloading.
>>>>
>>>> I think from a user POV we want a solution that supports RX and TX
>>>> offloading across the widest range of protocols. This is accomplished
>>>> by implementing protocol agnostic mechanisms like CHECKSUM_COMPLETE
>>>> and protocol agnostic UDP tunnel TSO like we've described. IMO, the
>>>> fact that we have devices that implement protocol specific mechanisms
>>>> for NVGRE and VXLAN should be considered legacy support in the stack,
>>>> for new UDP encapsulation protocols we should not expose specifics in
>>>> the stack in either by adding a GSO type for each protocol, nor
>>>> ndo_add_foo_port for each protocol-- these things will not scale and
>>>> unnecessarily complicate the core stack.
>>>
>>> It's not clear to me that allowing devices to know what protocols are
>>> running on what ports actually complicates the stack. The part that is
>>> complicated is usually the types of operations that are being
>>> offloaded (checksum, TSO, etc.). In all of these tunnel cases, the
>>> operations are same and if you have a clean registration mechanism
>>> then nothing in the core has to see this - only the protocol doing the
>>> registering and the driver that is supporting it.
>>>
>>
>> We already have an ntuple filtering interface that allows configuring
>> a device for special processing of RX packets. I don't see why that
>> shouldn't apply to the use case protocol processing for specific ports
>> in the encapsulation use case.
>
> You mentioned this before but I guess I don't really understand it. I
> suppose it is possible to express the port number and encapsulation as
> a filter but it doesn't really seem all that natural and at the end of
> the day it won't be mapped to a filter in the NIC. Can you explain it
> some more?
>
With n-tuple filters you should be able to configure a rule to match
packets (say by port) and assign an "action" which is understood by
the driver (say assume packets are VXLAN and return
CHECKSUM_UNNECESSARY). The interface is to the driver, so how this is
actually instantiated on the device is a private matter.

This is far more flexible and extensible model than trying to have the
stack do port->protocol registration. We can filter on much than just
destination port (this is actually a problem in UDP offloads which
only works with binding socket to INADDR_ANY). Also, this model can be
applied to many different scenarios not just encapsulation or those
protocols that are implemented by the kernel. I imagine someone will
want to do QUIC acceleration/steering in a device at some point, this
work even if the kernel doesn't implement any part of the protocol (a
design point of QUIC ;-) ). It would be interesting to see how the
super charged programmable protocol parsers Alexei described might be
integrated with RX filtering.

>>> I have no disagreement with trying to be generic across protocols. I'm
>>> just not convinced that it is a realistic plan. It's obvious that it
>>> is not doable today nor will be it be in the next generation of NICs
>>> (which are guaranteed to add support for new protocols). Furthermore,
>>> there will be more advanced stuff coming in the future that I think
>>> will be difficult or impossible to make protocol agnostic. Rather than
>>> pretending that this doesn't exist or will never happen, it's better
>>> focus on how to integrating it cleanly.
>>
>> Sorry, but I don't understand how supporting a new protocols in a
>> device for the purposes of returning CHECKSUM_UNNECESSARY is better or
>> easier to implement than just returning CHECKSUM_COMPLETE. Same thing
>> for trying to use NETIF_F_IP_CSUM with encapsulation rather than
>> NETIF_F_HW_CSUM. I'm not a hardware guy, so it's possible I'm missing
>> something obvious...
>>
>> Can you be more specific about this "advanced stuff"?
>
> I think checksums are really the exception, not the rule. It's great
> that they have this nice property of being additive and we should use
> that where we can but that doesn't apply to other types of operation
> (or even other types of checksums). Encryption or CRC32 carried inside
> the tunnel header can't be accelerated without some additional
> knowledge of the protocol. I think there were also a few other things
> that came up along these lines when we talked about this in an earlier
> thread - that's what I mean by "advanced stuff".
>
> For the basic one's complement checksums, I have no objection to
> CHECKSUM_COMPLETE. However, the reality is that this is not generally
> implemented today and that is unlikely to change for a few years even
> in the best case.

^ permalink raw reply

* RE: [PATCH v1 4/4] ARM: Documentation: Update fec dts binding doc
From: luwei.zhou @ 2014-10-09  2:19 UTC (permalink / raw)
  To: Richard Cochran
  Cc: davem@davemloft.net, netdev@vger.kernel.org, shawn.guo@linaro.org,
	bhutchings@solarflare.com, Fabio.Estevam@freescale.com,
	fugang.duan@freescale.com, Frank.Li@freescale.com,
	stephen@networkplumber.org
In-Reply-To: <20141008101309.GA15020@netboy>

On Wed, Oct 8, 2014 at 06:13:00PM, Richard Cochran wrote:
> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Wednesday, October 08, 2014 6:13 PM
> To: Zhou Luwei-B45643
> Cc: davem@davemloft.net; netdev@vger.kernel.org; shawn.guo@linaro.org;
> bhutchings@solarflare.com; Estevam Fabio-R49496; Duan Fugang-B38611; Li
> Frank-B20596; stephen@networkplumber.org
> Subject: Re: [PATCH v1 4/4] ARM: Documentation: Update fec dts binding
> doc
> 
> On Wed, Oct 08, 2014 at 08:36:11AM +0000, luwei.zhou@freescale.com wrote:
> >
> > I am not expert in DT.
> 
> (Me neither ;)
> 
> > I didn't get your point of using timer resource in the DT.  Yes, the
> > timer resource is Part of FEC IP hardware. But if you want to choose
> > one channel for using PPS, you have to add another property to specify.
> Do you mean the current code already have the DT property to specify the
> channel?
> 
> I see some timer bindings in Documentation/devicetree/bindings/timer.
> So, you could and should add your SoC timer there and in the DTS files.
> 
> Then, you have something like
> 
> 	@fec {
> 		pps_timer = &timer;
> 	}
> 
> in your SoC's DTS file.
> 
> Thanks,
> Richard

I know what you meant now. If using similar way, it seems that we still need the new property in the timer node below.

@timer
{
	pps_channel = 0;
}
I didn't find out the exiting channel related property in the Documentation.

Since there are 4 channels in the PTP timer IP and none of 4 channels
Is now used by now. We can use hardcode to define the default PPS output channel not DT such as 
"#define PPS_COUTPUT_CHANNEL FEC_TIMER_CHANNEL0". 

Thanks
Luwei

^ permalink raw reply

* [PATCH v3 0/6] Add 10GbE support to APM X-Gene SoC ethernet driver
From: Iyappan Subramanian @ 2014-10-09  3:54 UTC (permalink / raw)
  To: davem, netdev, devicetree
  Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian

Adding 10GbE support to APM X-Gene SoC ethernet driver.

v3: Address comments from v2
* dtb: changed to use all-zeros for the mac address

v2: Address comments from v1
* created preparatory patch to review before adding new functionality
* dtb: updated to use tabs consistently

v1:
* Initial version
---

Iyappan Subramanian (6):
  MAINTAINERS: Update APM X-Gene section
  Documentation: dts: Update section header for APM X-Gene
  dtb: Add 10GbE node to APM X-Gene SoC device tree
  drivers: net: xgene: Preparing for adding 10GbE support
  drivers: net: xgene: Add 10GbE support
  drivers: net: xgene: Add 10GbE ethtool support

 .../devicetree/bindings/net/apm-xgene-enet.txt     |   4 +-
 MAINTAINERS                                        |   1 -
 arch/arm64/boot/dts/apm-mustang.dts                |   4 +
 arch/arm64/boot/dts/apm-storm.dtsi                 |  28 +-
 drivers/net/ethernet/apm/xgene/Makefile            |   3 +-
 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c    |  28 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c     |  44 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h     |  30 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |  86 ++++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   |  24 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c  | 331 +++++++++++++++++++++
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h  |  57 ++++
 12 files changed, 566 insertions(+), 74 deletions(-)
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h

-- 
1.9.1

^ permalink raw reply

* [PATCH v3 2/6] Documentation: dts: Update section header for APM X-Gene
From: Iyappan Subramanian @ 2014-10-09  3:54 UTC (permalink / raw)
  To: davem, netdev, devicetree
  Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian
In-Reply-To: <1412826861-32208-1-git-send-email-isubramanian@apm.com>

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
---
 Documentation/devicetree/bindings/net/apm-xgene-enet.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
index ebcad25..cfcc527 100644
--- a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
+++ b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
@@ -3,7 +3,7 @@ APM X-Gene SoC Ethernet nodes
 Ethernet nodes are defined to describe on-chip ethernet interfaces in
 APM X-Gene SoC.
 
-Required properties:
+Required properties for all the ethernet interfaces:
 - compatible: Should be "apm,xgene-enet"
 - reg: Address and length of the register set for the device. It contains the
   information of registers in the same order as described by reg-names
@@ -15,6 +15,8 @@ Required properties:
 - clocks: Reference to the clock entry.
 - local-mac-address: MAC address assigned to this device
 - phy-connection-type: Interface type between ethernet device and PHY device
+
+Required properties for ethernet interfaces that have external PHY:
 - phy-handle: Reference to a PHY node connected to this device
 
 - mdio: Device tree subnode with the following required properties:
-- 
1.9.1

^ permalink raw reply related

* [PATCH v3 3/6] dtb: Add 10GbE node to APM X-Gene SoC device tree
From: Iyappan Subramanian @ 2014-10-09  3:54 UTC (permalink / raw)
  To: davem, netdev, devicetree
  Cc: linux-arm-kernel, patches, kchudgar, Iyappan Subramanian
In-Reply-To: <1412826861-32208-1-git-send-email-isubramanian@apm.com>

Added 10GbE interface and clock nodes.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Keyur Chudgar <kchudgar@apm.com>
---
 arch/arm64/boot/dts/apm-mustang.dts |  4 ++++
 arch/arm64/boot/dts/apm-storm.dtsi  | 28 +++++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/apm-mustang.dts b/arch/arm64/boot/dts/apm-mustang.dts
index b2f5622..2ae782b 100644
--- a/arch/arm64/boot/dts/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm-mustang.dts
@@ -32,3 +32,7 @@
 &menet {
 	status = "ok";
 };
+
+&xgenet {
+	status = "ok";
+};
diff --git a/arch/arm64/boot/dts/apm-storm.dtsi b/arch/arm64/boot/dts/apm-storm.dtsi
index c0aceef..0a4dbf8 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -176,6 +176,16 @@
 				clock-output-names = "menetclk";
 			};
 
+			xge0clk: xge0clk@1f61c000 {
+				compatible = "apm,xgene-device-clock";
+				#clock-cells = <1>;
+				clocks = <&socplldiv2 0>;
+				reg = <0x0 0x1f61c000 0x0 0x1000>;
+				reg-names = "csr-reg";
+				csr-mask = <0x3>;
+				clock-output-names = "xge0clk";
+			};
+
 			sataphy1clk: sataphy1clk@1f21c000 {
 				compatible = "apm,xgene-device-clock";
 				#clock-cells = <1>;
@@ -407,7 +417,8 @@
 			interrupts = <0x0 0x3c 0x4>;
 			dma-coherent;
 			clocks = <&menetclk 0>;
-			local-mac-address = [00 01 73 00 00 01];
+			/* mac address will be overwritten by the bootloader */
+			local-mac-address = [00 00 00 00 00 00];
 			phy-connection-type = "rgmii";
 			phy-handle = <&menetphy>;
 			mdio {
@@ -421,5 +432,20 @@
 
 			};
 		};
+
+		xgenet: ethernet@1f610000 {
+			compatible = "apm,xgene-enet";
+			status = "disabled";
+			reg = <0x0 0x1f610000 0x0 0xd100>,
+			      <0x0 0x1f600000 0x0 0X400>,
+			      <0x0 0x18000000 0x0 0X200>;
+			reg-names = "enet_csr", "ring_csr", "ring_cmd";
+			interrupts = <0x0 0x60 0x4>;
+			dma-coherent;
+			clocks = <&xge0clk 0>;
+			/* mac address will be overwritten by the bootloader */
+			local-mac-address = [00 00 00 00 00 00];
+			phy-connection-type = "xgmii";
+		};
 	};
 };
-- 
1.9.1

^ permalink raw reply related

* [PATCH v3 1/6] MAINTAINERS: Update APM X-Gene section
From: Iyappan Subramanian @ 2014-10-09  3:54 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	patches-qTEPVZfXA3Y, kchudgar-qTEPVZfXA3Y, Iyappan Subramanian
In-Reply-To: <1412826861-32208-1-git-send-email-isubramanian-qTEPVZfXA3Y@public.gmane.org>

Updated APM X-Gene ethernet driver maintainers list.

Signed-off-by: Iyappan Subramanian <isubramanian-qTEPVZfXA3Y@public.gmane.org>
Signed-off-by: Keyur Chudgar <kchudgar-qTEPVZfXA3Y@public.gmane.org>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 907de3d..5da45de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -723,7 +723,6 @@ F:	net/appletalk/
 APPLIED MICRO (APM) X-GENE SOC ETHERNET DRIVER
 M:	Iyappan Subramanian <isubramanian-qTEPVZfXA3Y@public.gmane.org>
 M:	Keyur Chudgar <kchudgar-qTEPVZfXA3Y@public.gmane.org>
-M:	Ravi Patel <rapatel-qTEPVZfXA3Y@public.gmane.org>
 S:	Supported
 F:	drivers/net/ethernet/apm/xgene/
 F:	Documentation/devicetree/bindings/net/apm-xgene-enet.txt
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox