Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Vivien Didelot @ 2019-09-11 15:31 UTC (permalink / raw)
  To: Robert Beckett
  Cc: netdev, Andrew Lunn, Florian Fainelli, David S. Miller,
	bob.beckett
In-Reply-To: <3f265c5afcb2eea48410ec607d65e8f4e6a20373.camel@collabora.com>

Hi Robert,

On Wed, 11 Sep 2019 10:46:05 +0100, Robert Beckett <bob.beckett@collabora.com> wrote:
> > Feature series targeting netdev must be prefixed "PATCH net-next". As
> 
> Thanks for the info. Out of curiosity, where should I have gleaned this
> info from? This is my first contribution to netdev, so I wasnt familiar
> with the etiquette.
> 
> > this approach was a PoC, sending it as "RFC net-next" would be even
> > more
> > appropriate.

Netdev being a huge subsystem has specific rules for subject prefix or merge
window, which are described in Documentation/networking/netdev-FAQ.rst


Thank you,

	Vivien

^ permalink raw reply

* RE: [PATCH net-next 1/5] enetc: Fix if_mode extraction
From: Claudiu Manoil @ 2019-09-11 16:01 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: David S . Miller, Alexandru Marginean, netdev@vger.kernel.org
In-Reply-To: <20190910074412.GA31298@lunn.ch>

>-----Original Message-----
>From: Andrew Lunn <andrew@lunn.ch>
>Sent: Tuesday, September 10, 2019 10:44 AM
>To: Claudiu Manoil <claudiu.manoil@nxp.com>
>Cc: David S . Miller <davem@davemloft.net>; Alexandru Marginean
><alexandru.marginean@nxp.com>; netdev@vger.kernel.org
>Subject: Re: [PATCH net-next 1/5] enetc: Fix if_mode extraction
>
>On Mon, Sep 09, 2019 at 04:24:01PM +0000, Claudiu Manoil wrote:
[...]
>>
>> Hi Andrew,
>>
>> The MAC2MAC connections are defined as fixed-link too, but without
>> phy-mode/phy-connection-type properties.  We don't want to de-register
>> these links.  Initial code was bogus in this regard.
>
>Hi Claudiu
>
>This is what is not clear in the change log. That this code is removed
>because it is wrong. Please could you expand the explanation to make
>this clearer.
>

I agree, but I also need to modify the patch to handle both the error case
of invalid phy-mode for mdio and normal fixed link phy connections, and
the mac2mac connection case.  The mac2mac connection case can be also
deferred to a later patch, when the switch driver - Felix - will be available
(there's no use for it in the current enetc upstream driver).

>> Current proposal is:
>> 			ethernet@0,2 { /* SoC internal, connected to switch port 4 */
>> 				compatible = "fsl,enetc";
>> 				reg = <0x000200 0 0 0 0>;
>> 				fixed-link {
>> 					speed = <1000>;
>> 					full-duplex;
>> 				};
>> 			};
>> 			switch@0,5 {
>> 				compatible = "mscc,felix-switch";
>> 				[...]
>> 				ports {
>> 					#address-cells = <1>;
>> 					#size-cells = <0>;
>>
>> 					/* external ports */
>> 					[...]
>> 					/* internal SoC ports */
>> 					port@4 { /* connected to ENETC port2 */
>> 						reg = <4>;
>> 						fixed-link {
>> 							speed = <1000>;
>> 							full-duplex;
>> 						};
>> 					};
>
>So this connection between the SoC and the switch does not use tags?
>Can it use tags? Does the hardware allow you to have two CPU ports,
>and load balance over them?
>

Unfortunately the switch can handle only one port with tags.  There's only
one CPU port, switch port 4 is just like another front panel port.  On top of
that, the CPU port is not capable of flow control (pause frames don't work with
tagged traffic on the switch side).  So we may be forced to use port 4 to mitigate
this.  Note that the switch is inside the SoC.

>This second half is just standard DSA. This looks good.
>

Thanks for the confirmation and the rest of the review, all valid findings.

Regards,
Claudiu

^ permalink raw reply

* [PATCH] sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
From: Christophe JAILLET @ 2019-09-11 16:02 UTC (permalink / raw)
  To: davem, vyasevich, nhorman, marcelo.leitner
  Cc: linux-sctp, netdev, linux-kernel, kernel-janitors,
	Christophe JAILLET

The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.

Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 net/sctp/protocol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 2d47adcb4cbe..53746ffeeca3 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1336,7 +1336,7 @@ static int __net_init sctp_ctrlsock_init(struct net *net)
 	return status;
 }
 
-static void __net_init sctp_ctrlsock_exit(struct net *net)
+static void __net_exit sctp_ctrlsock_exit(struct net *net)
 {
 	/* Free the control endpoint.  */
 	inet_ctl_sock_destroy(net->sctp.ctl_sock);
-- 
2.20.1


^ permalink raw reply related

* Re: VRF Issue Since kernel 5
From: David Ahern @ 2019-09-11 16:09 UTC (permalink / raw)
  To: Gowen, Alexis Bauvin, mmanning@vyatta.att-mail.com; +Cc: netdev@vger.kernel.org
In-Reply-To: <CWLP265MB155424EF95E39E98C4502F86FDB10@CWLP265MB1554.GBRP265.PROD.OUTLOOK.COM>

On 9/11/19 3:01 PM, Gowen wrote:
> Hi all,
> 
> It looks like ip vrf exec checks /etc/resolv.conf (found with strace -e
> trace=file sudo ip vrf exec mgmt-vrf host www.google.co.uk &>
> ~/straceFileOfVrfHost.txt) , but as I'm on an Azure machine using
> netplan, this file isn't updated with DNS servers. I have added my DNS
> server to resolv.conf and now can update the cache with "sudo ip vrf
> exec sudo apt update", if I am correct (which I'm not sure about as not
> really my area) then this might be affecting more than just me.
> 
> Also still not able to fix the updating cache from global VRF - which
> would cause bother in prod environment to others as well so think it
> would be good to get an RCA for it?
> 
> thanks for your help so far, has been really interesting.
> 
> Gareth
> 
> 
> ------------------------------------------------------------------------
> *From:* Gowen <gowen@potatocomputing.co.uk>
> *Sent:* 11 September 2019 13:48
> *To:* David Ahern <dsahern@gmail.com>; Alexis Bauvin
> <abauvin@online.net>; mmanning@vyatta.att-mail.com
> <mmanning@vyatta.att-mail.com>
> *Cc:* netdev@vger.kernel.org <netdev@vger.kernel.org>
> *Subject:* Re: VRF Issue Since kernel 5
>  
> yep no problem:
> 
> Admin@NETM06:~$ sudo sysctl -a | grep l3mdev
> net.ipv4.raw_l3mdev_accept = 1
> net.ipv4.tcp_l3mdev_accept = 1
> net.ipv4.udp_l3mdev_accept = 1
> 
> 
> The source of the DNS issue in the vrf exec command is something to do
> with networkd managing the DNS servers, I can fix it by explicitly
> mentioning the DNS server:
> 
> systemd-resolve --status --no-page
> 
> <OUTPUT OMITTED>
> 
> Link 4 (mgmt-vrf)
>       Current Scopes: none
>        LLMNR setting: yes
> MulticastDNS setting: no
>       DNSSEC setting: no
>     DNSSEC supported: no
> 
> Link 3 (eth1)
>       Current Scopes: DNS
>        LLMNR setting: yes
> MulticastDNS setting: no
>       DNSSEC setting: no
>     DNSSEC supported: no
>          DNS Servers: 10.24.65.203
>                       10.24.65.204
>                       10.25.65.203
>                       10.25.65.204
>           DNS Domain: reddog.microsoft.com
> 
> Link 2 (eth0)
>       Current Scopes: DNS
>        LLMNR setting: yes
> MulticastDNS setting: no
>       DNSSEC setting: no
>     DNSSEC supported: no
>          DNS Servers: 10.24.65.203
>                       10.24.65.204
>                       10.25.65.203
>                       10.25.65.204
>           DNS Domain: reddog.microsoft.com
> 
> there is no DNS server when I use ip vrf exec command (tcpdump shows
> only loopback traffic when invoked without my DNS sever explicitly
> entered) - odd as mgmt-vrf isnt L3 device so thought it would pick up
> eth0 DNS servers?
> 
> I dont think this helps with my update cache traffic from global vrf
> though on port 80
> 

Let's back up a bit: your subject line says vrf issue since kernel 5.
Did you update / change the OS as well?

ie., the previous version that worked what is the OS and kernel version?
What is the OS and kernel version with the problem?

^ permalink raw reply

* Re: [PATCH] sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
From: Marcelo Ricardo Leitner @ 2019-09-11 16:23 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: davem, vyasevich, nhorman, linux-sctp, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <20190911160239.10734-1-christophe.jaillet@wanadoo.fr>

On Wed, Sep 11, 2019 at 06:02:39PM +0200, Christophe JAILLET wrote:
> The '.exit' functions from 'pernet_operations' structure should be marked
> as __net_exit, not __net_init.
> 
> Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* Re: [PATCH v2] vhost: block speculation of translated descriptors
From: Will Deacon @ 2019-09-11 16:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Jason Wang, kvm, virtualization, netdev, security
In-Reply-To: <20190911095147-mutt-send-email-mst@kernel.org>

On Wed, Sep 11, 2019 at 09:52:25AM -0400, Michael S. Tsirkin wrote:
> On Wed, Sep 11, 2019 at 08:10:00AM -0400, Michael S. Tsirkin wrote:
> > iovec addresses coming from vhost are assumed to be
> > pre-validated, but in fact can be speculated to a value
> > out of range.
> > 
> > Userspace address are later validated with array_index_nospec so we can
> > be sure kernel info does not leak through these addresses, but vhost
> > must also not leak userspace info outside the allowed memory table to
> > guests.
> > 
> > Following the defence in depth principle, make sure
> > the address is not validated out of node range.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > Tested-by: Jason Wang <jasowang@redhat.com>
> > ---
> 
> Cc: security@kernel.org
> 
> Pls advise on whether you'd like me to merge this directly,
> Cc stable, or handle it in some other way.

I think you're fine taking it directly, with a cc stable and a Fixes: tag.

Cheers,

Will

^ permalink raw reply

* Re: [PATCH v3 0/5] Introduce variable length mdev alias
From: Cornelia Huck @ 2019-09-11 16:29 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Alex Williamson, Jiri Pirko, kwankhede@nvidia.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48668DFF8E816F0D2D3041BFD1B10@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Wed, 11 Sep 2019 15:30:40 +0000
Parav Pandit <parav@mellanox.com> wrote:

> Hi Alex,
> 
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, September 11, 2019 8:56 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: Re: [PATCH v3 0/5] Introduce variable length mdev alias
> > 
> > On Mon, 9 Sep 2019 20:42:32 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >   
> > > Hi Alex,
> > >  
> > > > -----Original Message-----
> > > > From: Parav Pandit <parav@mellanox.com>
> > > > Sent: Sunday, September 1, 2019 11:25 PM
> > > > To: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> > > > kwankhede@nvidia.com; cohuck@redhat.com; davem@davemloft.net
> > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>
> > > > Subject: [PATCH v3 0/5] Introduce variable length mdev alias
> > > >
> > > > To have consistent naming for the netdevice of a mdev and to have
> > > > consistent naming of the devlink port [1] of a mdev, which is formed
> > > > using phys_port_name of the devlink port, current UUID is not usable
> > > > because UUID is too long.
> > > >
> > > > UUID in string format is 36-characters long and in binary 128-bit.
> > > > Both formats are not able to fit within 15 characters limit of netdev  
> > name.  
> > > >
> > > > It is desired to have mdev device naming consistent using UUID.
> > > > So that widely used user space framework such as ovs [2] can make
> > > > use of mdev representor in similar way as PCIe SR-IOV VF and PF  
> > representors.  
> > > >
> > > > Hence,
> > > > (a) mdev alias is created which is derived using sha1 from the mdev  
> > name.  
> > > > (b) Vendor driver describes how long an alias should be for the
> > > > child mdev created for a given parent.
> > > > (c) Mdev aliases are unique at system level.
> > > > (d) alias is created optionally whenever parent requested.
> > > > This ensures that non networking mdev parents can function without
> > > > alias creation overhead.
> > > >
> > > > This design is discussed at [3].
> > > >
> > > > An example systemd/udev extension will have,
> > > >
> > > > 1. netdev name created using mdev alias available in sysfs.
> > > >
> > > > mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
> > > > mdev 12 character alias=cd5b146a80a5
> > > >
> > > > netdev name of this mdev = enmcd5b146a80a5 Here en = Ethernet link m
> > > > = mediated device
> > > >
> > > > 2. devlink port phys_port_name created using mdev alias.
> > > > devlink phys_port_name=pcd5b146a80a5
> > > >
> > > > This patchset enables mdev core to maintain unique alias for a mdev.
> > > >
> > > > Patch-1 Introduces mdev alias using sha1.
> > > > Patch-2 Ensures that mdev alias is unique in a system.
> > > > Patch-3 Exposes mdev alias in a sysfs hirerchy, update Documentation
> > > > Patch-4 Introduces mdev_alias() API.
> > > > Patch-5 Extends mtty driver to optionally provide alias generation.
> > > > This also enables to test UUID based sha1 collision and trigger
> > > > error handling for duplicate sha1 results.
> > > >
> > > > [1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > [2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
> > > > [3] https://patchwork.kernel.org/cover/11084231/
> > > >
> > > > ---
> > > > Changelog:
> > > > v2->v3:
> > > >  - Addressed comment from Yunsheng Lin
> > > >  - Changed strcmp() ==0 to !strcmp()
> > > >  - Addressed comment from Cornelia Hunk
> > > >  - Merged sysfs Documentation patch with syfs patch
> > > >  - Added more description for alias return value  
> > >
> > > Did you get a chance review this updated series?
> > > I addressed Cornelia's and yours comment.
> > > I do not think allocating alias memory twice, once for comparison and
> > > once for storing is good idea or moving alias generation logic inside
> > > the mdev_list_lock(). So I didn't address that suggestion of Cornelia.  
> > 
> > Sorry, I'm at LPC this week.  I agree, I don't think the double allocation is
> > necessary, I thought the comment was sufficient to clarify null'ing the
> > variable.  It's awkward, but seems correct.

Not hot about it, but no real complaints.

However, please give me some more time, as I'm at LPC as well.

> > 
> > I'm not sure what we do with this patch series though, has the real
> > consumer of this even been proposed?  It feels optimistic to include at this
> > point.  We've used the sample driver as a placeholder in the past for
> > mdev_uuid(), but we arrived at that via a conversion rather than explicitly
> > adding the API.  Please let me know where the consumer patches stand,
> > perhaps it would make more sense for them to go in together rather than
> > risk adding an unused API.  Thanks,
> >   
> Given that consumer patch series is relatively large (around 15+ patches), I was considering to merge this one as pre-series to it.
> Its ok to combine this with consumer patch series.
> But wanted to have it reviewed beforehand, so that churn is less in actual consumer series which is more mlx5_core and devlink/netdev centric.
> So if you can add Review-by, it will be easier to combine with consumer series.
> 
> And if we merge it with consumer series, it will come through Dave Miller's tree instead of your tree.
> Would that work for you?

It would be easier to see what to do here if we could see the consumer
for this. If those patches are fine, we could maybe queue this series
via both trees?

^ permalink raw reply

* RE: [PATCH v3 0/5] Introduce variable length mdev alias
From: Parav Pandit @ 2019-09-11 16:38 UTC (permalink / raw)
  To: Parav Pandit, Alex Williamson
  Cc: Jiri Pirko, kwankhede@nvidia.com, cohuck@redhat.com,
	davem@davemloft.net, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48668DFF8E816F0D2D3041BFD1B10@AM0PR05MB4866.eurprd05.prod.outlook.com>



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Wednesday, September 11, 2019 10:31 AM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: RE: [PATCH v3 0/5] Introduce variable length mdev alias
> 
> Hi Alex,
> 
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, September 11, 2019 8:56 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@mellanox.com>; kwankhede@nvidia.com;
> > cohuck@redhat.com; davem@davemloft.net; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org
> > Subject: Re: [PATCH v3 0/5] Introduce variable length mdev alias
> >
> > On Mon, 9 Sep 2019 20:42:32 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >
> > > Hi Alex,
> > >
> > > > -----Original Message-----
> > > > From: Parav Pandit <parav@mellanox.com>
> > > > Sent: Sunday, September 1, 2019 11:25 PM
> > > > To: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> > > > kwankhede@nvidia.com; cohuck@redhat.com; davem@davemloft.net
> > > > Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>
> > > > Subject: [PATCH v3 0/5] Introduce variable length mdev alias
> > > >
> > > > To have consistent naming for the netdevice of a mdev and to have
> > > > consistent naming of the devlink port [1] of a mdev, which is
> > > > formed using phys_port_name of the devlink port, current UUID is
> > > > not usable because UUID is too long.
> > > >
> > > > UUID in string format is 36-characters long and in binary 128-bit.
> > > > Both formats are not able to fit within 15 characters limit of
> > > > netdev
> > name.
> > > >
> > > > It is desired to have mdev device naming consistent using UUID.
> > > > So that widely used user space framework such as ovs [2] can make
> > > > use of mdev representor in similar way as PCIe SR-IOV VF and PF
> > representors.
> > > >
> > > > Hence,
> > > > (a) mdev alias is created which is derived using sha1 from the
> > > > mdev
> > name.
> > > > (b) Vendor driver describes how long an alias should be for the
> > > > child mdev created for a given parent.
> > > > (c) Mdev aliases are unique at system level.
> > > > (d) alias is created optionally whenever parent requested.
> > > > This ensures that non networking mdev parents can function without
> > > > alias creation overhead.
> > > >
> > > > This design is discussed at [3].
> > > >
> > > > An example systemd/udev extension will have,
> > > >
> > > > 1. netdev name created using mdev alias available in sysfs.
> > > >
> > > > mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
> > > > mdev 12 character alias=cd5b146a80a5
> > > >
> > > > netdev name of this mdev = enmcd5b146a80a5 Here en = Ethernet link
> > > > m = mediated device
> > > >
> > > > 2. devlink port phys_port_name created using mdev alias.
> > > > devlink phys_port_name=pcd5b146a80a5
> > > >
> > > > This patchset enables mdev core to maintain unique alias for a mdev.
> > > >
> > > > Patch-1 Introduces mdev alias using sha1.
> > > > Patch-2 Ensures that mdev alias is unique in a system.
> > > > Patch-3 Exposes mdev alias in a sysfs hirerchy, update
> > > > Documentation
> > > > Patch-4 Introduces mdev_alias() API.
> > > > Patch-5 Extends mtty driver to optionally provide alias generation.
> > > > This also enables to test UUID based sha1 collision and trigger
> > > > error handling for duplicate sha1 results.
> > > >
> > > > [1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > [2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
> > > > [3] https://patchwork.kernel.org/cover/11084231/
> > > >
> > > > ---
> > > > Changelog:
> > > > v2->v3:
> > > >  - Addressed comment from Yunsheng Lin
> > > >  - Changed strcmp() ==0 to !strcmp()
> > > >  - Addressed comment from Cornelia Hunk
> > > >  - Merged sysfs Documentation patch with syfs patch
> > > >  - Added more description for alias return value
> > >
> > > Did you get a chance review this updated series?
> > > I addressed Cornelia's and yours comment.
> > > I do not think allocating alias memory twice, once for comparison
> > > and once for storing is good idea or moving alias generation logic
> > > inside the mdev_list_lock(). So I didn't address that suggestion of
> Cornelia.
> >
> > Sorry, I'm at LPC this week.  I agree, I don't think the double
> > allocation is necessary, I thought the comment was sufficient to
> > clarify null'ing the variable.  It's awkward, but seems correct.
> >
> > I'm not sure what we do with this patch series though, has the real
> > consumer of this even been proposed?  

Jiri already acked to use mdev_alias() to generate phys_port_name several days back in the discussion we had in [1].
After concluding in the thread [1], I proceed with mdev_alias().
mlx5_core patches are not yet present on netdev mailing list, but we all agree to use it in mdev_alias() in devlink phys_port_name generation.
So we have collective agreement on how to proceed forward.
I wasn't probably clear enough in previous email reply about it, so adding link here.

[1] https://patchwork.kernel.org/cover/11084231/#22838955

> It feels optimistic to include
> > at this point.  We've used the sample driver as a placeholder in the
> > past for mdev_uuid(), but we arrived at that via a conversion rather
> > than explicitly adding the API.  Please let me know where the consumer
> > patches stand, perhaps it would make more sense for them to go in
> > together rather than risk adding an unused API.  Thanks,
> >
> Given that consumer patch series is relatively large (around 15+ patches), I
> was considering to merge this one as pre-series to it.
> Its ok to combine this with consumer patch series.
> But wanted to have it reviewed beforehand, so that churn is less in actual
> consumer series which is more mlx5_core and devlink/netdev centric.
> So if you can add Review-by, it will be easier to combine with consumer
> series.
> 
> And if we merge it with consumer series, it will come through Dave Miller's
> tree instead of your tree.
> Would that work for you?

^ permalink raw reply

* [net 0/2][pull request] Intel Wired LAN Driver Updates 2019-09-11
From: Jeff Kirsher @ 2019-09-11 16:49 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann

This series contains fixes to ixgbe.

Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
value that was either 0 or something less than 10 which was causing
issues with hardware features, like RSC, that do not function well with
ITR values that low.

Ilya Maximets fixes the ixgbe driver to limit the number of transmit
descriptors to clean by the number of transmit descriptors used in the
transmit ring, so that the driver does not try to "double" clean the
same descriptors. 

The following are changes since commit f4b752a6b2708bfdf7fbe8a241082c8104f4ce05:
  mlx4: fix spelling mistake "veify" -> "verify"
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE

Alexander Duyck (1):
  ixgbe: Prevent u8 wrapping of ITR value to something less than 10us

Ilya Maximets (1):
  ixgbe: fix double clean of Tx descriptors with xdp

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  4 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 29 +++++++------------
 2 files changed, 14 insertions(+), 19 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [net 2/2] ixgbe: fix double clean of Tx descriptors with xdp
From: Jeff Kirsher @ 2019-09-11 16:49 UTC (permalink / raw)
  To: davem
  Cc: Ilya Maximets, netdev, nhorman, sassmann, stable, William Tu,
	Eelco Chaudron, Jeff Kirsher
In-Reply-To: <20190911164955.10644-1-jeffrey.t.kirsher@intel.com>

From: Ilya Maximets <i.maximets@samsung.com>

Tx code doesn't clear the descriptors' status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the completion queue ring.

Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the Tx ring.

'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
'next_to_clean' and 'next_to_use' indexes.

CC: stable@vger.kernel.org
Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 29 ++++++++------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..a3b6d8c89127 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -633,19 +633,17 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
 bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 			    struct ixgbe_ring *tx_ring, int napi_budget)
 {
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
 	unsigned int total_packets = 0, total_bytes = 0;
-	u32 i = tx_ring->next_to_clean, xsk_frames = 0;
-	unsigned int budget = q_vector->tx.work_limit;
 	struct xdp_umem *umem = tx_ring->xsk_umem;
 	union ixgbe_adv_tx_desc *tx_desc;
 	struct ixgbe_tx_buffer *tx_bi;
-	bool xmit_done;
+	u32 xsk_frames = 0;
 
-	tx_bi = &tx_ring->tx_buffer_info[i];
-	tx_desc = IXGBE_TX_DESC(tx_ring, i);
-	i -= tx_ring->count;
+	tx_bi = &tx_ring->tx_buffer_info[ntc];
+	tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
 
-	do {
+	while (ntc != ntu) {
 		if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
 			break;
 
@@ -661,22 +659,18 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 
 		tx_bi++;
 		tx_desc++;
-		i++;
-		if (unlikely(!i)) {
-			i -= tx_ring->count;
+		ntc++;
+		if (unlikely(ntc == tx_ring->count)) {
+			ntc = 0;
 			tx_bi = tx_ring->tx_buffer_info;
 			tx_desc = IXGBE_TX_DESC(tx_ring, 0);
 		}
 
 		/* issue prefetch for next Tx descriptor */
 		prefetch(tx_desc);
+	}
 
-		/* update budget accounting */
-		budget--;
-	} while (likely(budget));
-
-	i += tx_ring->count;
-	tx_ring->next_to_clean = i;
+	tx_ring->next_to_clean = ntc;
 
 	u64_stats_update_begin(&tx_ring->syncp);
 	tx_ring->stats.bytes += total_bytes;
@@ -688,8 +682,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 	if (xsk_frames)
 		xsk_umem_complete_tx(umem, xsk_frames);
 
-	xmit_done = ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
-	return budget > 0 && xmit_done;
+	return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
 }
 
 int ixgbe_xsk_async_xmit(struct net_device *dev, u32 qid)
-- 
2.21.0


^ permalink raw reply related

* [net 1/2] ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
From: Jeff Kirsher @ 2019-09-11 16:49 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, nhorman, sassmann, stable,
	Gregg Leventhal, Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190911164955.10644-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

There were a couple cases where the ITR value generated via the adaptive
ITR scheme could exceed 126. This resulted in the value becoming either 0
or something less than 10. Switching back and forth between a value less
than 10 and a value greater than 10 can cause issues as certain hardware
features such as RSC to not function well when the ITR value has dropped
that low.

CC: stable@vger.kernel.org
Fixes: b4ded8327fea ("ixgbe: Update adaptive ITR algorithm")
Reported-by: Gregg Leventhal <gleventhal@janestreet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7882148abb43..77ca9005dc41 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2621,7 +2621,7 @@ static void ixgbe_update_itr(struct ixgbe_q_vector *q_vector,
 		/* 16K ints/sec to 9.2K ints/sec */
 		avg_wire_size *= 15;
 		avg_wire_size += 11452;
-	} else if (avg_wire_size <= 1980) {
+	} else if (avg_wire_size < 1968) {
 		/* 9.2K ints/sec to 8K ints/sec */
 		avg_wire_size *= 5;
 		avg_wire_size += 22420;
@@ -2654,6 +2654,8 @@ static void ixgbe_update_itr(struct ixgbe_q_vector *q_vector,
 	case IXGBE_LINK_SPEED_2_5GB_FULL:
 	case IXGBE_LINK_SPEED_1GB_FULL:
 	case IXGBE_LINK_SPEED_10_FULL:
+		if (avg_wire_size > 8064)
+			avg_wire_size = 8064;
 		itr += DIV_ROUND_UP(avg_wire_size,
 				    IXGBE_ITR_ADAPTIVE_MIN_INC * 64) *
 		       IXGBE_ITR_ADAPTIVE_MIN_INC;
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 01/13] ixgbe: fix memory leaks
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Wenwen Wang, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Wenwen Wang <wenwen@cs.uga.edu>

In ixgbe_configure_clsu32(), 'jump', 'input', and 'mask' are allocated
through kzalloc() respectively in a for loop body. Then,
ixgbe_clsu32_build_input() is invoked to build the input. If this process
fails, next iteration of the for loop will be executed. However, the
allocated 'jump', 'input', and 'mask' are not deallocated on this execution
path, leading to memory leaks.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 99df595abfba..95c0827dfd4c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9490,6 +9490,10 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 				jump->mat = nexthdr[i].jump;
 				adapter->jump_tables[link_uhtid] = jump;
 				break;
+			} else {
+				kfree(mask);
+				kfree(input);
+				kfree(jump);
 			}
 		}
 		return 0;
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 08/13] i40e: Fix message for other card without FEC.
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Czeslaw Zagorski, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Czeslaw Zagorski <czeslawx.zagorski@intel.com>

When variable "req_fec, fec, an" are empty,
dmesg shows log with "Requested FEC: , Negotiated FEC: , Autoneg:".
Add link dmesg log for cards without FEC.

Signed-off-by: Czeslaw Zagorski <czeslawx.zagorski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 700f38ec8e91..6031223eafab 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6594,11 +6594,15 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 			else
 				req_fec = "CL74 FC-FEC/BASE-R";
 		}
+		netdev_info(vsi->netdev,
+			    "NIC Link is Up, %sbps Full Duplex, Requested FEC: %s, Negotiated FEC: %s, Autoneg: %s, Flow Control: %s\n",
+			    speed, req_fec, fec, an, fc);
+	} else {
+		netdev_info(vsi->netdev,
+			    "NIC Link is Up, %sbps Full Duplex, Flow Control: %s\n",
+			    speed, fc);
 	}
 
-	netdev_info(vsi->netdev,
-		    "NIC Link is Up, %sbps Full Duplex, Requested FEC: %s, Negotiated FEC: %s, Autoneg: %s, Flow Control: %s\n",
-		    speed, req_fec, fec, an, fc);
 }
 
 /**
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 07/13] i40e: fix missed "Negotiated" string in i40e_print_link_message()
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Aleksandr Loktionov, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

The "Negotiated" string in i40e_print_link_message() function was missed.
This string has been added to the dmesg and small refactoring done removing
common substrings and unifying link status message format.
Without this patch it was not clear that FEC is related to negotiated FEC.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3e2e465f43f9..700f38ec8e91 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6569,19 +6569,19 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 	}
 
 	if (pf->hw.phy.link_info.link_speed == I40E_LINK_SPEED_25GB) {
-		req_fec = ", Requested FEC: None";
-		fec = ", FEC: None";
-		an = ", Autoneg: False";
+		req_fec = "None";
+		fec = "None";
+		an = "False";
 
 		if (pf->hw.phy.link_info.an_info & I40E_AQ_AN_COMPLETED)
-			an = ", Autoneg: True";
+			an = "True";
 
 		if (pf->hw.phy.link_info.fec_info &
 		    I40E_AQ_CONFIG_FEC_KR_ENA)
-			fec = ", FEC: CL74 FC-FEC/BASE-R";
+			fec = "CL74 FC-FEC/BASE-R";
 		else if (pf->hw.phy.link_info.fec_info &
 			 I40E_AQ_CONFIG_FEC_RS_ENA)
-			fec = ", FEC: CL108 RS-FEC";
+			fec = "CL108 RS-FEC";
 
 		/* 'CL108 RS-FEC' should be displayed when RS is requested, or
 		 * both RS and FC are requested
@@ -6590,13 +6590,14 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
 		    (I40E_AQ_REQUEST_FEC_KR | I40E_AQ_REQUEST_FEC_RS)) {
 			if (vsi->back->hw.phy.link_info.req_fec_info &
 			    I40E_AQ_REQUEST_FEC_RS)
-				req_fec = ", Requested FEC: CL108 RS-FEC";
+				req_fec = "CL108 RS-FEC";
 			else
-				req_fec = ", Requested FEC: CL74 FC-FEC/BASE-R";
+				req_fec = "CL74 FC-FEC/BASE-R";
 		}
 	}
 
-	netdev_info(vsi->netdev, "NIC Link is Up, %sbps Full Duplex%s%s%s, Flow Control: %s\n",
+	netdev_info(vsi->netdev,
+		    "NIC Link is Up, %sbps Full Duplex, Requested FEC: %s, Negotiated FEC: %s, Autoneg: %s, Flow Control: %s\n",
 		    speed, req_fec, fec, an, fc);
 }
 
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 06/13] i40e: mark additional missing bits as reserved
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Mark bits 0xD through 0xF for the command flags of a cloud filter as
reserved. These bits are not yet defined and are considered as reserved
in the data sheet.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 7ff768761659..530613f31527 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1394,6 +1394,9 @@ struct i40e_aqc_cloud_filters_element_data {
 #define I40E_AQC_ADD_CLOUD_FILTER_IMAC			0x000A
 #define I40E_AQC_ADD_CLOUD_FILTER_OMAC_TEN_ID_IMAC	0x000B
 #define I40E_AQC_ADD_CLOUD_FILTER_IIP			0x000C
+/* 0x000D reserved */
+/* 0x000E reserved */
+/* 0x000F reserved */
 /* 0x0010 to 0x0017 is for custom filters */
 #define I40E_AQC_ADD_CLOUD_FILTER_IP_PORT		0x0010 /* Dest IP + L4 Port */
 #define I40E_AQC_ADD_CLOUD_FILTER_MAC_PORT		0x0011 /* Dest MAC + L4 Port */
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 13/13] i40e: fix potential RX buffer starvation for AF_XDP
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Magnus Karlsson, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Magnus Karlsson <magnus.karlsson@intel.com>

When the RX rings are created they are also populated with buffers
so that packets can be received. Usually these are kernel buffers,
but for AF_XDP in zero-copy mode, these are user-space buffers and
in this case the application might not have sent down any buffers
to the driver at this point. And if no buffers are allocated at ring
creation time, no packets can be received and no interrupts will be
generated so the NAPI poll function that allocates buffers to the
rings will never get executed.

To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once
after an XDP program has loaded and once after the umem is registered.
This take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.

Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 0373bc6c7e61..feb5bd54d840 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -157,6 +157,11 @@ static int i40e_xsk_umem_disable(struct i40e_vsi *vsi, u16 qid)
 		err = i40e_queue_pair_enable(vsi, qid);
 		if (err)
 			return err;
+
+		/* Kick start the NAPI context so that receiving will start */
+		err = i40e_xsk_wakeup(vsi->netdev, qid, XDP_WAKEUP_RX);
+		if (err)
+			return err;
 	}
 
 	return 0;
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 12/13] net/ixgbevf: make array api static const, makes object smaller
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Colin Ian King, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Colin Ian King <colin.king@canonical.com>

Don't populate the array API on the stack but instead make it
static const. Makes the object code smaller by 58 bytes.

Before:
   text	   data	    bss	    dec	    hex	filename
  82969	   9763	    256	  92988	  16b3c	ixgbevf/ixgbevf_main.o

After:
   text	   data	    bss	    dec	    hex	filename
  82815	   9859	    256	  92930	  16b02	ixgbevf/ixgbevf_main.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 75e849a64db7..75e93ce2ed99 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2261,12 +2261,14 @@ static void ixgbevf_init_last_counter_stats(struct ixgbevf_adapter *adapter)
 static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	int api[] = { ixgbe_mbox_api_14,
-		      ixgbe_mbox_api_13,
-		      ixgbe_mbox_api_12,
-		      ixgbe_mbox_api_11,
-		      ixgbe_mbox_api_10,
-		      ixgbe_mbox_api_unknown };
+	static const int api[] = {
+		ixgbe_mbox_api_14,
+		ixgbe_mbox_api_13,
+		ixgbe_mbox_api_12,
+		ixgbe_mbox_api_11,
+		ixgbe_mbox_api_10,
+		ixgbe_mbox_api_unknown
+	};
 	int err, idx = 0;
 
 	spin_lock_bh(&adapter->mbx_lock);
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 11/13] iavf: fix MAC address setting for VFs when filter is rejected
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Stefan Assmann, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

Currently iavf unconditionally applies MAC address change requests. This
brings the VF in a state where it is no longer able to pass traffic if
the PF rejects a MAC filter change for the VF.
A typical scenario for a rejected MAC filter is for an untrusted VF to
request to change the MAC address when an administratively set MAC is
present.

To keep iavf working in this scenario the MAC filter handling in iavf
needs to act on the PF reply regarding the MAC filter change. In the
case of an ack the new MAC address gets set, whereas in the case of a
nack the previous MAC address needs to stay in place.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c     | 1 -
 drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 7 +++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 07f5541a0f01..8f310e520b06 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -804,7 +804,6 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
 
 	if (f) {
 		ether_addr_copy(hw->mac.addr, addr->sa_data);
-		ether_addr_copy(netdev->dev_addr, adapter->hw.mac.addr);
 	}
 
 	return (f == NULL) ? -ENOMEM : 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index d49d58a6de80..c46770eba320 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -1252,6 +1252,8 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		case VIRTCHNL_OP_ADD_ETH_ADDR:
 			dev_err(&adapter->pdev->dev, "Failed to add MAC filter, error %s\n",
 				iavf_stat_str(&adapter->hw, v_retval));
+			/* restore administratively set MAC address */
+			ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
 			break;
 		case VIRTCHNL_OP_DEL_VLAN:
 			dev_err(&adapter->pdev->dev, "Failed to delete VLAN filter, error %s\n",
@@ -1319,6 +1321,11 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		}
 	}
 	switch (v_opcode) {
+	case VIRTCHNL_OP_ADD_ETH_ADDR: {
+		if (!ether_addr_equal(netdev->dev_addr, adapter->hw.mac.addr))
+			ether_addr_copy(netdev->dev_addr, adapter->hw.mac.addr);
+		}
+		break;
 	case VIRTCHNL_OP_GET_STATS: {
 		struct iavf_eth_stats *stats =
 			(struct iavf_eth_stats *)msg;
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 10/13] i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min Tx rate
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Stefan Assmann, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

In the case of an invalid min Tx rate being requested
i40e_ndo_set_vf_bw() immediately returns -EINVAL instead of releasing
__I40E_VIRTCHNL_OP_PENDING first.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index f8aa4deceb5e..3d2440838822 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -4263,7 +4263,8 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
 	if (min_tx_rate) {
 		dev_err(&pf->pdev->dev, "Invalid min tx rate (%d) (greater than 0) specified for VF %d.\n",
 			min_tx_rate, vf_id);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto error;
 	}
 
 	vf = &pf->vf[vf_id];
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 09/13] i40e: use BIT macro to specify the cloud filter field flags
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

The macros used to specify the cloud filter fields are intended to be
individual bits. Declare them using the BIT() macro to make their
intention a little more clear.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index f1a1bd324b50..2af9f6308f84 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -243,11 +243,11 @@ struct i40e_fdir_filter {
 	u32 fd_id;
 };
 
-#define I40E_CLOUD_FIELD_OMAC	0x01
-#define I40E_CLOUD_FIELD_IMAC	0x02
-#define I40E_CLOUD_FIELD_IVLAN	0x04
-#define I40E_CLOUD_FIELD_TEN_ID	0x08
-#define I40E_CLOUD_FIELD_IIP	0x10
+#define I40E_CLOUD_FIELD_OMAC		BIT(0)
+#define I40E_CLOUD_FIELD_IMAC		BIT(1)
+#define I40E_CLOUD_FIELD_IVLAN		BIT(2)
+#define I40E_CLOUD_FIELD_TEN_ID		BIT(3)
+#define I40E_CLOUD_FIELD_IIP		BIT(4)
 
 #define I40E_CLOUD_FILTER_FLAGS_OMAC	I40E_CLOUD_FIELD_OMAC
 #define I40E_CLOUD_FILTER_FLAGS_IMAC	I40E_CLOUD_FIELD_IMAC
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 03/13] ixgbe: use skb_get_queue_mapping in tx path
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Tonghao Zhang, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Use the common api, and don't access queue_mapping directly.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 95c0827dfd4c..dc034f4e8cf6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8748,7 +8748,7 @@ static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb,
 	if (skb_put_padto(skb, 17))
 		return NETDEV_TX_OK;
 
-	tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
+	tx_ring = ring ? ring : adapter->tx_ring[skb_get_queue_mapping(skb)];
 	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &tx_ring->state)))
 		return NETDEV_TX_BUSY;
 
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 05/13] i40e: remove I40E_AQC_ADD_CLOUD_FILTER_OIP
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

The bit 0x0001 used in the cloud filters adminq command is reserved, and
is not actually a valid type.

The Linux driver has never used this type, and it's not clear if any
driver ever has.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 21cccec328e3..7ff768761659 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1382,7 +1382,7 @@ struct i40e_aqc_cloud_filters_element_data {
 #define I40E_AQC_ADD_CLOUD_FILTER_MASK	(0x3F << \
 					I40E_AQC_ADD_CLOUD_FILTER_SHIFT)
 /* 0x0000 reserved */
-#define I40E_AQC_ADD_CLOUD_FILTER_OIP			0x0001
+/* 0x0001 reserved */
 /* 0x0002 reserved */
 #define I40E_AQC_ADD_CLOUD_FILTER_IMAC_IVLAN		0x0003
 #define I40E_AQC_ADD_CLOUD_FILTER_IMAC_IVLAN_TEN_ID	0x0004
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 04/13] i40e: use ktime_get_real_ts64 instead of ktime_to_timespec64
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Remove a call to ktime_to_timespec64 by calling ktime_get_real_ts64
directly.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 11394a52e21c..9bf1ad4319f5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -725,7 +725,7 @@ static long i40e_ptp_create_clock(struct i40e_pf *pf)
 	pf->tstamp_config.tx_type = HWTSTAMP_TX_OFF;
 
 	/* Set the previous "reset" time to the current Kernel clock time */
-	pf->ptp_prev_hw_time = ktime_to_timespec64(ktime_get_real());
+	ktime_get_real_ts64(&pf->ptp_prev_hw_time);
 	pf->ptp_reset_start = ktime_get();
 
 	return 0;
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 02/13] i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem
  Cc: Stefan Assmann, netdev, nhorman, sassmann, stable, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190911165014.10742-1-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

While testing VF spawn/destroy the following panic occurred.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
 ? __switch_to_asm+0x35/0x70
 ? __switch_to_asm+0x41/0x70
 ? __switch_to_asm+0x35/0x70
 ? _cond_resched+0x15/0x30
 i40e_sync_filters_subtask+0x56/0x70 [i40e]
 i40e_service_task+0x382/0x11b0 [i40e]
 ? __switch_to_asm+0x41/0x70
 ? __switch_to_asm+0x41/0x70
 process_one_work+0x1a7/0x3b0
 worker_thread+0x30/0x390
 ? create_worker+0x1a0/0x1a0
 kthread+0x112/0x130
 ? kthread_bind+0x30/0x30
 ret_from_fork+0x35/0x40

Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().

Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.

CC: stable@vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9f2f276bf27..3e2e465f43f9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2592,6 +2592,10 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
 		return;
 	if (!test_and_clear_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state))
 		return;
+	if (test_and_set_bit(__I40E_VF_DISABLE, pf->state)) {
+		set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
+		return;
+	}
 
 	for (v = 0; v < pf->num_alloc_vsi; v++) {
 		if (pf->vsi[v] &&
@@ -2606,6 +2610,7 @@ static void i40e_sync_filters_subtask(struct i40e_pf *pf)
 			}
 		}
 	}
+	clear_bit(__I40E_VF_DISABLE, pf->state);
 }
 
 /**
-- 
2.21.0


^ permalink raw reply related

* [net-next v2 00/13][pull request] Intel Wired LAN Driver Updates 2019-09-11
From: Jeff Kirsher @ 2019-09-11 16:50 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann

This series contains updates to i40e, ixgbe/vf and iavf.

Wenwen Wang fixes a potential memory leak where 3 allocated variables
are not properly cleaned up on failure for ixgbe.

Stefan Assmann fixes a potential kernel panic found when repeatedly
spawning and destroying VFs in i40e when a NULL pointer is dereferenced
due to a race condition.  Fixed up the i40e driver to clear the
__I40E_VIRTCHNL_OP_PENDING bit before returning after an invalid
minimum transmit rate is requested.  Updates the iavf driver to only
apply the MAC address change when the PF ACK's the requested change.

Tonghao Zhang updates ixgbe to use the skb_get_queue_mapping() API call
instead of the driver accessing the queue mapping directly.

Jake updates i40e to use ktime_get_real_ts64() instead of
ktime_to_timespec64().  Removes the define for bit 0x0001 for cloud
filters, since it is a reserved bit and not a valid type.  Also added
code comments to clearly state which bits are reserved and should not be
used or defined for cloud filter adminq command.  Clarify the macros
used to specify the cloud filter fields are individual bits, so use the
BIT() macro.

Aleksandr fixes up the print_link_message() to include the "negotiated"
FEC status for i40e.

Czeslaw also adds additional log message for devices without FEC in the
print_link_message() for i40e.

Colin Ian King reduces the object code size by making the array API
static constant.

Magnus fixes a potential receive buffer starvation issue for AF_XDP by
kicking the NAPI context of any queue with an attached AF_XDP zero-copy
socket.

v2: Removed patch 11 from the original series (Alex Duyck's ITR fix), 
    so that it can be sent to the net tree.

The following are changes since commit c1609946b8b6485e1d405663004867ea9e92178a:
  Merge branch 'qed-Fix-series'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Aleksandr Loktionov (1):
  i40e: fix missed "Negotiated" string in i40e_print_link_message()

Colin Ian King (1):
  net/ixgbevf: make array api static const, makes object smaller

Czeslaw Zagorski (1):
  i40e: Fix message for other card without FEC.

Jacob Keller (4):
  i40e: use ktime_get_real_ts64 instead of ktime_to_timespec64
  i40e: remove I40E_AQC_ADD_CLOUD_FILTER_OIP
  i40e: mark additional missing bits as reserved
  i40e: use BIT macro to specify the cloud filter field flags

Magnus Karlsson (1):
  i40e: fix potential RX buffer starvation for AF_XDP

Stefan Assmann (3):
  i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask
  i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min Tx rate
  iavf: fix MAC address setting for VFs when filter is rejected

Tonghao Zhang (1):
  ixgbe: use skb_get_queue_mapping in tx path

Wenwen Wang (1):
  ixgbe: fix memory leaks

 drivers/net/ethernet/intel/i40e/i40e.h        | 10 +++----
 .../net/ethernet/intel/i40e/i40e_adminq_cmd.h |  5 +++-
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 30 ++++++++++++-------
 drivers/net/ethernet/intel/i40e/i40e_ptp.c    |  2 +-
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c    |  3 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  5 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  1 -
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   |  7 +++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  6 +++-
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 +++++----
 10 files changed, 57 insertions(+), 26 deletions(-)

-- 
2.21.0


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox