* Re: [PATCH net-next v5 2/9] net: ensure unbound stream socket to be chosen when not in a VRF
From: David Ahern @ 2018-11-07 19:06 UTC (permalink / raw)
To: Mike Manning, netdev
In-Reply-To: <20181107153610.7526-3-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> The commit a04a480d4392 ("net: Require exact match for TCP socket
> lookups if dif is l3mdev") only ensures that the correct socket is
> selected for packets in a VRF. However, there is no guarantee that
> the unbound socket will be selected for packets when not in a VRF.
> By checking for a device match in compute_score() also for the case
> when there is no bound device and attaching a score to this, the
> unbound socket is selected. And if a failure is returned when there
> is no device match, this ensures that bound sockets are never selected,
> even if there is no unbound socket.
>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> include/net/inet_hashtables.h | 11 +++++++++++
> include/net/inet_sock.h | 8 ++++++++
> net/ipv4/inet_hashtables.c | 14 ++++++--------
> net/ipv6/inet6_hashtables.c | 14 ++++++--------
> 4 files changed, 31 insertions(+), 16 deletions(-)
>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 3/9] net: ensure unbound datagram socket to be chosen when not in a VRF
From: David Ahern @ 2018-11-07 19:06 UTC (permalink / raw)
To: Mike Manning, netdev
In-Reply-To: <20181107153610.7526-4-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> Ensure an unbound datagram skt is chosen when not in a VRF. The check
> for a device match in compute_score() for UDP must be performed when
> there is no device match. For this, a failure is returned when there is
> no device match. This ensures that bound sockets are never selected,
> even if there is no unbound socket.
>
> Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
> packets are currently blocked, as flowi6_oif was set to that of the
> master vrf device, and the ipi6_ifindex is that of the slave device.
> Allow these packets to be sent by checking the device with ipi6_ifindex
> has the same L3 scope as that of the bound device of the skt, which is
> the master vrf device. Note that this check always succeeds if the skt
> is unbound.
>
> Even though the right datagram skt is now selected by compute_score(),
> a different skt is being returned that is bound to the wrong vrf. The
> difference between these and stream sockets is the handling of the skt
> option for SO_REUSEPORT. While the handling when adding a skt for reuse
> correctly checks that the bound device of the skt is a match, the skts
> in the hashslot are already incorrect. So for the same hash, a skt for
> the wrong vrf may be selected for the required port. The root cause is
> that the skt is immediately placed into a slot when it is created,
> but when the skt is then bound using SO_BINDTODEVICE, it remains in the
> same slot. The solution is to move the skt to the correct slot by
> forcing a rehash.
>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> include/net/udp.h | 11 +++++++++++
> net/core/sock.c | 2 ++
> net/ipv4/udp.c | 15 ++++++---------
> net/ipv6/datagram.c | 10 +++++++---
> net/ipv6/udp.c | 14 +++++---------
> 5 files changed, 31 insertions(+), 21 deletions(-)
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 4/9] net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs
From: David Ahern @ 2018-11-07 19:07 UTC (permalink / raw)
To: Mike Manning, netdev
In-Reply-To: <20181107153610.7526-5-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
> similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
> for datagram sockets. Have this default to enabled for reasons of
> backwards compatibility. This is so as to specify the output device
> with cmsg and IP_PKTINFO, but using a socket not bound to the
> corresponding VRF. This allows e.g. older ping implementations to be
> run with specifying the device but without executing it in the VRF.
> If the option is disabled, packets received in a VRF context are only
> handled by a raw socket bound to the VRF, and correspondingly packets
> in the default VRF are only handled by a socket not bound to any VRF.
>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> Documentation/networking/ip-sysctl.txt | 12 ++++++++++++
> Documentation/networking/vrf.txt | 13 +++++++++++++
> include/net/netns/ipv4.h | 3 +++
> include/net/raw.h | 1 +
> net/ipv4/af_inet.c | 2 ++
> net/ipv4/raw.c | 28 ++++++++++++++++++++++++++--
> net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++
> 7 files changed, 68 insertions(+), 2 deletions(-)
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 5/9] net: fix raw socket lookup device bind matching with VRFs
From: David Ahern @ 2018-11-07 19:07 UTC (permalink / raw)
To: Mike Manning, netdev; +Cc: Duncan Eastoe
In-Reply-To: <20181107153610.7526-6-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> From: Duncan Eastoe <deastoe@vyatta.att-mail.com>
>
> When there exist a pair of raw sockets one unbound and one bound
> to a VRF but equal in all other respects, when a packet is received
> in the VRF context, __raw_v4_lookup() matches on both sockets.
>
> This results in the packet being delivered over both sockets,
> instead of only the raw socket bound to the VRF. The bound device
> checks in __raw_v4_lookup() are replaced with a call to
> raw_sk_bound_dev_eq() which correctly handles whether the packet
> should be delivered over the unbound socket in such cases.
>
> In __raw_v6_lookup() the match on the device binding of the socket is
> similarly updated to use raw_sk_bound_dev_eq() which matches the
> handling in __raw_v4_lookup().
>
> Importantly raw_sk_bound_dev_eq() takes the raw_l3mdev_accept sysctl
> into account.
>
> Signed-off-by: Duncan Eastoe <deastoe@vyatta.att-mail.com>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> include/net/raw.h | 13 ++++++++++++-
> net/ipv4/raw.c | 3 +--
> net/ipv6/raw.c | 5 ++---
> 3 files changed, 15 insertions(+), 6 deletions(-)
>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 6/9] vrf: mark skb for multicast or link-local as enslaved to VRF
From: David Ahern @ 2018-11-07 19:07 UTC (permalink / raw)
To: Mike Manning, netdev
In-Reply-To: <20181107153610.7526-7-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> The skb for packets that are multicast or to a link-local address are
> not marked as being enslaved to a VRF, if they are received on a socket
> bound to the VRF. This is needed for ND and it is preferable for the
> kernel not to have to deal with the additional use-cases if ll or mcast
> packets are handled as enslaved. However, this does not allow service
> instances listening on unbound and bound to VRF sockets to distinguish
> the VRF used, if packets are sent as multicast or to a link-local
> address. The fix is for the VRF driver to also mark these skb as being
> enslaved to the VRF.
>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> drivers/net/vrf.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 7/9] ipv6: allow ping to link-local address in VRF
From: David Ahern @ 2018-11-07 19:07 UTC (permalink / raw)
To: Mike Manning, netdev
In-Reply-To: <20181107153610.7526-8-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> If link-local packets are marked as enslaved to a VRF, then to allow
> ping to the link-local from a vrf, the error handling for IPV6_PKTINFO
> needs to be relaxed to also allow the pkt ipi6_ifindex to be that of a
> slave device to the vrf.
>
> Note that the real device also needs to be retrieved in icmp6_iif()
> to set the ipv6 flow oif to this for icmp echo reply handling. The
> recent commit 24b711edfc34 ("net/ipv6: Fix linklocal to global address
> with VRF") takes care of this, so the sdif does not need checking here.
>
> This fix makes ping to link-local consistent with that to global
> addresses, in that this can now be done from within the same VRF that
> the address is in.
>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> net/ipv6/ipv6_sockglue.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 8/9] ipv6: handling of multicast packets received in VRF
From: David Ahern @ 2018-11-07 19:08 UTC (permalink / raw)
To: Mike Manning, netdev; +Cc: Dewi Morgan
In-Reply-To: <20181107153610.7526-9-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> If the skb for multicast packets marked as enslaved to a VRF are
> received, then the secondary device index should be used to obtain
> the real device. And verify the multicast address against the
> enslaved rather than the l3mdev device.
>
> Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> net/ipv6/ip6_input.c | 35 ++++++++++++++++++++++++++++++++---
> 1 file changed, 32 insertions(+), 3 deletions(-)
>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v5 9/9] ipv6: do not drop vrf udp multicast packets
From: David Ahern @ 2018-11-07 19:08 UTC (permalink / raw)
To: Mike Manning, netdev; +Cc: Dewi Morgan
In-Reply-To: <20181107153610.7526-10-mmanning@vyatta.att-mail.com>
On 11/7/18 8:36 AM, Mike Manning wrote:
> From: Dewi Morgan <morgand@vyatta.att-mail.com>
>
> For bound udp sockets in a vrf, also check the sdif to get the index
> for ingress devices enslaved to an l3mdev.
>
> Signed-off-by: Dewi Morgan <morgand@vyatta.att-mail.com>
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
> ---
> net/ipv6/udp.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
^ permalink raw reply
* [net 00/14][pull request] Intel Wired LAN Driver Updates 2018-11-07
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann
This series contains fixes to igb, i40e and ice drivers.
Anirudh fixes an issue during rebuild of the ice driver, where we need
to set the carrier state, as well as start or stop the queues all based
on the link status. Removed functions that were duplicating current
functionality in the VSI rebuild/replay framework.
Dave fixes a potential resource collision during the remove path, so add
a check to see if we are in the middle of a reset. Fixed the remove
path to ensure we call netif_napi_del() to free vectors before we set
vsi->netdev to NULL.
Akeem fixes an issue when the receive or transmit pause parameter is
set, results in link loss on the interface. Fixed the spelling of
"Enabling" in error message.
Victor fixes potential memory leak by also freeing the related VSI
contexts in the unload path.
Md Fahad fixes a flag during port VLAN insertion, which was not being
set properly.
Brett fixes a transmit timeout during stress due to the hardware tail
and software tail were incorrectly out of sync.
Miroslav Lichvar fixes the igb PHC timecounter update interval to be
sure the timecounter is updated in time.
Chinh fixes the req_speeds variable to be u16 instead of u8 so that it
can handle all the link speeds.
Jake fixes i40e to add back the missing feature flags, which was causing
IP-in-IP offloads to be reported as not supported.
The following are changes since commit 042cb56478152b31c50bea8a784fc826891eb38e:
net: phy: Allow BCM54616S PHY to setup internal TX/RX clock delay
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 100GbE
Akeem G Abodunrin (1):
ice: Fix dead device link issue with flow control
Anirudh Venkataramanan (4):
ice: Set carrier state and start/stop queues in rebuild
ice: Check for reset in progress during remove
ice: Remove duplicate addition of VLANs in replay path
ice: Fix typo in error message
Brett Creeley (2):
ice: Fix tx_timeout in PF driver
ice: Fix the bytecount sent to netdev_tx_sent_queue
Chinh T Cao (1):
ice: Change req_speeds to be u16
Dave Ertman (1):
ice: Fix napi delete calls for remove
Jacob Keller (2):
i40e: restore NETIF_F_GSO_IPXIP[46] to netdev features
i40e: enable NETIF_F_NTUPLE and NETIF_F_HW_TC at driver load
Md Fahad Iqbal Polash (1):
ice: Fix flags for port VLAN
Miroslav Lichvar (1):
igb: shorten maximum PHC timecounter update interval
Victor Raj (1):
ice: Free VSI contexts during for unload
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 +-
drivers/net/ethernet/intel/ice/ice.h | 4 +-
drivers/net/ethernet/intel/ice/ice_common.c | 3 +
drivers/net/ethernet/intel/ice/ice_ethtool.c | 7 +-
.../net/ethernet/intel/ice/ice_hw_autogen.h | 2 +
drivers/net/ethernet/intel/ice/ice_lib.c | 3 +-
drivers/net/ethernet/intel/ice/ice_main.c | 86 +++++++++++--------
drivers/net/ethernet/intel/ice/ice_switch.c | 12 +++
drivers/net/ethernet/intel/ice/ice_switch.h | 2 +
drivers/net/ethernet/intel/ice/ice_txrx.c | 11 +--
drivers/net/ethernet/intel/ice/ice_txrx.h | 17 +++-
drivers/net/ethernet/intel/ice/ice_type.h | 2 +-
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 4 +-
drivers/net/ethernet/intel/igb/igb_ptp.c | 12 +--
14 files changed, 113 insertions(+), 60 deletions(-)
--
2.19.1
^ permalink raw reply
* [net 02/14] ice: Check for reset in progress during remove
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Dave Ertman,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
The remove path does not currently check to see if a
reset is in progress before proceeding. This can cause
a resource collision resulting in various types of errors.
Check for reset in progress and wait for a reasonable
amount of time before allowing the remove to progress.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 2 ++
drivers/net/ethernet/intel/ice/ice_main.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 4c4b5717a627..e5b37fa60884 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -76,6 +76,8 @@ extern const char ice_drv_ver[];
#define ICE_MIN_INTR_PER_VF (ICE_MIN_QS_PER_VF + 1)
#define ICE_DFLT_INTR_PER_VF (ICE_DFLT_QS_PER_VF + 1)
+#define ICE_MAX_RESET_WAIT 20
+
#define ICE_VSIQF_HKEY_ARRAY_SIZE ((VSIQF_HKEY_MAX_INDEX + 1) * 4)
#define ICE_DFLT_NETIF_M (NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6d31ffb64940..aee22f11a41a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2182,6 +2182,12 @@ static void ice_remove(struct pci_dev *pdev)
if (!pf)
return;
+ for (i = 0; i < ICE_MAX_RESET_WAIT; i++) {
+ if (!ice_is_reset_in_progress(pf->state))
+ break;
+ msleep(100);
+ }
+
set_bit(__ICE_DOWN, pf->state);
ice_service_task_stop(pf);
--
2.19.1
^ permalink raw reply related
* [net 01/14] ice: Set carrier state and start/stop queues in rebuild
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Set the carrier state post rebuild by querying the link status. Also
start/stop queues based on link status.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_main.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 05993451147a..6d31ffb64940 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3296,7 +3296,7 @@ static void ice_rebuild(struct ice_pf *pf)
struct device *dev = &pf->pdev->dev;
struct ice_hw *hw = &pf->hw;
enum ice_status ret;
- int err;
+ int err, i;
if (test_bit(__ICE_DOWN, pf->state))
goto clear_recovery;
@@ -3370,6 +3370,22 @@ static void ice_rebuild(struct ice_pf *pf)
}
ice_reset_all_vfs(pf, true);
+
+ for (i = 0; i < pf->num_alloc_vsi; i++) {
+ bool link_up;
+
+ if (!pf->vsi[i] || pf->vsi[i]->type != ICE_VSI_PF)
+ continue;
+ ice_get_link_status(pf->vsi[i]->port_info, &link_up);
+ if (link_up) {
+ netif_carrier_on(pf->vsi[i]->netdev);
+ netif_tx_wake_all_queues(pf->vsi[i]->netdev);
+ } else {
+ netif_carrier_off(pf->vsi[i]->netdev);
+ netif_tx_stop_all_queues(pf->vsi[i]->netdev);
+ }
+ }
+
/* if we get here, reset flow is successful */
clear_bit(__ICE_RESET_FAILED, pf->state);
return;
--
2.19.1
^ permalink raw reply related
* [net 03/14] ice: Fix dead device link issue with flow control
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Akeem G Abodunrin, netdev, nhorman, sassmann,
Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Setting Rx or Tx pause parameter currently results in link loss on the
interface, requiring the platform/host to be cold power cycled. Fix it.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_ethtool.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 96923580f2a6..648acdb4c644 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -1517,10 +1517,15 @@ ice_set_pauseparam(struct net_device *netdev, struct ethtool_pauseparam *pause)
}
if (!test_bit(__ICE_DOWN, pf->state)) {
- /* Give it a little more time to try to come back */
+ /* Give it a little more time to try to come back. If still
+ * down, restart autoneg link or reinitialize the interface.
+ */
msleep(75);
if (!test_bit(__ICE_DOWN, pf->state))
return ice_nway_reset(netdev);
+
+ ice_down(vsi);
+ ice_up(vsi);
}
return err;
--
2.19.1
^ permalink raw reply related
* [net 05/14] ice: Remove duplicate addition of VLANs in replay path
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem; +Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
ice_restore_vlan and active_vlans were originally put in place to
reprogram VLAN filters in the replay path. This is now done as part
of the much broader VSI rebuild/replay framework. So remove both
ice_restore_vlan and active_vlans
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 1 -
drivers/net/ethernet/intel/ice/ice_main.c | 42 +++----------------
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 2 -
3 files changed, 6 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index e5b37fa60884..1639e955f158 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -191,7 +191,6 @@ struct ice_vsi {
u64 tx_linearize;
DECLARE_BITMAP(state, __ICE_STATE_NBITS);
DECLARE_BITMAP(flags, ICE_VSI_FLAG_NBITS);
- unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
unsigned int current_netdev_flags;
u32 tx_restart;
u32 tx_busy;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index aee22f11a41a..338abb1b9233 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1622,7 +1622,6 @@ static int ice_vlan_rx_add_vid(struct net_device *netdev,
{
struct ice_netdev_priv *np = netdev_priv(netdev);
struct ice_vsi *vsi = np->vsi;
- int ret;
if (vid >= VLAN_N_VID) {
netdev_err(netdev, "VLAN id requested %d is out of range %d\n",
@@ -1635,7 +1634,8 @@ static int ice_vlan_rx_add_vid(struct net_device *netdev,
/* Enable VLAN pruning when VLAN 0 is added */
if (unlikely(!vid)) {
- ret = ice_cfg_vlan_pruning(vsi, true);
+ int ret = ice_cfg_vlan_pruning(vsi, true);
+
if (ret)
return ret;
}
@@ -1644,12 +1644,7 @@ static int ice_vlan_rx_add_vid(struct net_device *netdev,
* needed to continue allowing all untagged packets since VLAN prune
* list is applied to all packets by the switch
*/
- ret = ice_vsi_add_vlan(vsi, vid);
-
- if (!ret)
- set_bit(vid, vsi->active_vlans);
-
- return ret;
+ return ice_vsi_add_vlan(vsi, vid);
}
/**
@@ -1677,8 +1672,6 @@ static int ice_vlan_rx_kill_vid(struct net_device *netdev,
if (status)
return status;
- clear_bit(vid, vsi->active_vlans);
-
/* Disable VLAN pruning when VLAN 0 is removed */
if (unlikely(!vid))
status = ice_cfg_vlan_pruning(vsi, false);
@@ -2515,31 +2508,6 @@ static int ice_vsi_vlan_setup(struct ice_vsi *vsi)
return ret;
}
-/**
- * ice_restore_vlan - Reinstate VLANs when vsi/netdev comes back up
- * @vsi: the VSI being brought back up
- */
-static int ice_restore_vlan(struct ice_vsi *vsi)
-{
- int err;
- u16 vid;
-
- if (!vsi->netdev)
- return -EINVAL;
-
- err = ice_vsi_vlan_setup(vsi);
- if (err)
- return err;
-
- for_each_set_bit(vid, vsi->active_vlans, VLAN_N_VID) {
- err = ice_vlan_rx_add_vid(vsi->netdev, htons(ETH_P_8021Q), vid);
- if (err)
- break;
- }
-
- return err;
-}
-
/**
* ice_vsi_cfg - Setup the VSI
* @vsi: the VSI being configured
@@ -2552,7 +2520,9 @@ static int ice_vsi_cfg(struct ice_vsi *vsi)
if (vsi->netdev) {
ice_set_rx_mode(vsi->netdev);
- err = ice_restore_vlan(vsi);
+
+ err = ice_vsi_vlan_setup(vsi);
+
if (err)
return err;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 45f10f8f01dc..9576b958622b 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -2171,7 +2171,6 @@ static int ice_vc_process_vlan_msg(struct ice_vf *vf, u8 *msg, bool add_v)
if (!ice_vsi_add_vlan(vsi, vid)) {
vf->num_vlan++;
- set_bit(vid, vsi->active_vlans);
/* Enable VLAN pruning when VLAN 0 is added */
if (unlikely(!vid))
@@ -2190,7 +2189,6 @@ static int ice_vc_process_vlan_msg(struct ice_vf *vf, u8 *msg, bool add_v)
*/
if (!ice_vsi_kill_vlan(vsi, vid)) {
vf->num_vlan--;
- clear_bit(vid, vsi->active_vlans);
/* Disable VLAN pruning when removing VLAN 0 */
if (unlikely(!vid))
--
2.19.1
^ permalink raw reply related
* [net 06/14] ice: Fix flags for port VLAN
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Md Fahad Iqbal Polash, netdev, nhorman, sassmann,
Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
According to the spec, whenever insert PVID field is set, the VLAN
driver insertion mode should be set to 01b which isn't done currently.
Fix it.
Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 9576b958622b..e71065f9d391 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -348,7 +348,7 @@ static int ice_vsi_set_pvid(struct ice_vsi *vsi, u16 vid)
struct ice_vsi_ctx ctxt = { 0 };
enum ice_status status;
- ctxt.info.vlan_flags = ICE_AQ_VSI_VLAN_MODE_TAGGED |
+ ctxt.info.vlan_flags = ICE_AQ_VSI_VLAN_MODE_UNTAGGED |
ICE_AQ_VSI_PVLAN_INSERT_PVID |
ICE_AQ_VSI_VLAN_EMOD_STR;
ctxt.info.pvid = cpu_to_le16(vid);
--
2.19.1
^ permalink raw reply related
* [net 07/14] ice: Fix typo in error message
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann,
Akeem G Abodunrin, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Print should say "Enabling" instead of "Enaabling"
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 5bacad01f0c9..c604a44c8cfb 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1997,7 +1997,7 @@ int ice_cfg_vlan_pruning(struct ice_vsi *vsi, bool ena)
status = ice_update_vsi(&vsi->back->hw, vsi->idx, ctxt, NULL);
if (status) {
netdev_err(vsi->netdev, "%sabling VLAN pruning on VSI handle: %d, VSI HW ID: %d failed, err = %d, aq_err = %d\n",
- ena ? "Ena" : "Dis", vsi->idx, vsi->vsi_num, status,
+ ena ? "En" : "Dis", vsi->idx, vsi->vsi_num, status,
vsi->back->hw.adminq.sq_last_status);
goto err_out;
}
--
2.19.1
^ permalink raw reply related
* [net 04/14] ice: Free VSI contexts during for unload
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Victor Raj, netdev, nhorman, sassmann, Anirudh Venkataramanan,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Victor Raj <victor.raj@intel.com>
In the unload path, all VSIs are freed. Also free the related VSI
contexts to prevent memory leaks.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_common.c | 3 +++
drivers/net/ethernet/intel/ice/ice_switch.c | 12 ++++++++++++
drivers/net/ethernet/intel/ice/ice_switch.h | 2 ++
3 files changed, 17 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 8cd6a2401fd9..554fd707a6d6 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -811,6 +811,9 @@ void ice_deinit_hw(struct ice_hw *hw)
/* Attempt to disable FW logging before shutting down control queues */
ice_cfg_fw_log(hw, false);
ice_shutdown_all_ctrlq(hw);
+
+ /* Clear VSI contexts if not already cleared */
+ ice_clear_all_vsi_ctx(hw);
}
/**
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c
index 33403f39f1b3..40c9c6558956 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -347,6 +347,18 @@ static void ice_clear_vsi_ctx(struct ice_hw *hw, u16 vsi_handle)
}
}
+/**
+ * ice_clear_all_vsi_ctx - clear all the VSI context entries
+ * @hw: pointer to the hw struct
+ */
+void ice_clear_all_vsi_ctx(struct ice_hw *hw)
+{
+ u16 i;
+
+ for (i = 0; i < ICE_MAX_VSI; i++)
+ ice_clear_vsi_ctx(hw, i);
+}
+
/**
* ice_add_vsi - add VSI context to the hardware and VSI handle list
* @hw: pointer to the hw struct
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.h b/drivers/net/ethernet/intel/ice/ice_switch.h
index b88d96a1ef69..d5ef0bd58bf9 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.h
+++ b/drivers/net/ethernet/intel/ice/ice_switch.h
@@ -190,6 +190,8 @@ ice_update_vsi(struct ice_hw *hw, u16 vsi_handle, struct ice_vsi_ctx *vsi_ctx,
struct ice_sq_cd *cd);
bool ice_is_vsi_valid(struct ice_hw *hw, u16 vsi_handle);
struct ice_vsi_ctx *ice_get_vsi_ctx(struct ice_hw *hw, u16 vsi_handle);
+void ice_clear_all_vsi_ctx(struct ice_hw *hw);
+/* Switch config */
enum ice_status ice_get_initial_sw_cfg(struct ice_hw *hw);
/* Switch/bridge related commands */
--
2.19.1
^ permalink raw reply related
* [net 08/14] ice: Fix napi delete calls for remove
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Dave Ertman, netdev, nhorman, sassmann, Anirudh Venkataramanan,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Dave Ertman <david.m.ertman@intel.com>
In the remove path, the vsi->netdev is being set to NULL before the call
to free vectors. This is causing the netif_napi_del call to never be made.
Add a call to ice_napi_del to the same location as the calls to
unregister_netdev and just prior to them. This will use the reverse flow
as the register and netif_napi_add calls.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 1 +
drivers/net/ethernet/intel/ice/ice_lib.c | 1 +
drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 1639e955f158..b8548370f1c7 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -370,5 +370,6 @@ int ice_set_rss(struct ice_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size);
int ice_get_rss(struct ice_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size);
void ice_fill_rss_lut(u8 *lut, u16 rss_table_size, u16 rss_size);
void ice_print_link_msg(struct ice_vsi *vsi, bool isup);
+void ice_napi_del(struct ice_vsi *vsi);
#endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index c604a44c8cfb..1041fa2a7767 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2458,6 +2458,7 @@ int ice_vsi_release(struct ice_vsi *vsi)
* on this wq
*/
if (vsi->netdev && !ice_is_reset_in_progress(pf->state)) {
+ ice_napi_del(vsi);
unregister_netdev(vsi->netdev);
free_netdev(vsi->netdev);
vsi->netdev = NULL;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 338abb1b9233..82f49dbd762c 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1465,7 +1465,7 @@ static int ice_req_irq_msix_misc(struct ice_pf *pf)
* ice_napi_del - Remove NAPI handler for the VSI
* @vsi: VSI for which NAPI handler is to be removed
*/
-static void ice_napi_del(struct ice_vsi *vsi)
+void ice_napi_del(struct ice_vsi *vsi)
{
int v_idx;
--
2.19.1
^ permalink raw reply related
* [net 11/14] igb: shorten maximum PHC timecounter update interval
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Miroslav Lichvar, netdev, nhorman, sassmann, Thomas Gleixner,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Miroslav Lichvar <mlichvar@redhat.com>
The timecounter needs to be updated at least once per ~550 seconds in
order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old
timestamp.
Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"),
scheduling of delayed work seems to be less accurate and a requested
delay of 540 seconds may actually be longer than 550 seconds. Also, the
PHC may be adjusted to run up to 6% faster than real time and the system
clock up to 10% slower. Shorten the delay to 360 seconds to be sure the
timecounter is updated in time.
This fixes an issue with HW timestamps on 82580/I350/I354 being off by
~1100 seconds for few seconds every ~9 minutes.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_ptp.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 29ced6b74d36..2b95dc9c7a6a 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -53,13 +53,15 @@
* 2^40 * 10^-9 / 60 = 18.3 minutes.
*
* SYSTIM is converted to real time using a timecounter. As
- * timecounter_cyc2time() allows old timestamps, the timecounter
- * needs to be updated at least once per half of the SYSTIM interval.
- * Scheduling of delayed work is not very accurate, so we aim for 8
- * minutes to be sure the actual interval is shorter than 9.16 minutes.
+ * timecounter_cyc2time() allows old timestamps, the timecounter needs
+ * to be updated at least once per half of the SYSTIM interval.
+ * Scheduling of delayed work is not very accurate, and also the NIC
+ * clock can be adjusted to run up to 6% faster and the system clock
+ * up to 10% slower, so we aim for 6 minutes to be sure the actual
+ * interval in the NIC time is shorter than 9.16 minutes.
*/
-#define IGB_SYSTIM_OVERFLOW_PERIOD (HZ * 60 * 8)
+#define IGB_SYSTIM_OVERFLOW_PERIOD (HZ * 60 * 6)
#define IGB_PTP_TX_TIMEOUT (HZ * 15)
#define INCPERIOD_82576 BIT(E1000_TIMINCA_16NS_SHIFT)
#define INCVALUE_82576_MASK GENMASK(E1000_TIMINCA_16NS_SHIFT - 1, 0)
--
2.19.1
^ permalink raw reply related
* [net 09/14] ice: Fix tx_timeout in PF driver
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Brett Creeley, netdev, nhorman, sassmann, Anirudh Venkataramanan,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Brett Creeley <brett.creeley@intel.com>
Prior to this commit the driver was running into tx_timeouts when a
queue was stressed enough. This was happening because the HW tail
and SW tail (NTU) were incorrectly out of sync. Consequently this was
causing the HW head to collide with the HW tail, which to the hardware
means that all descriptors posted for Tx have been processed.
Due to the Tx logic used in the driver SW tail and HW tail are allowed
to be out of sync. This is done as an optimization because it allows the
driver to write HW tail as infrequently as possible, while still
updating the SW tail index to keep track. However, there are situations
where this results in the tail never getting updated, resulting in Tx
timeouts.
Tx HW tail write condition:
if (netif_xmit_stopped(txring_txq(tx_ring) || !skb->xmit_more)
writel(sw_tail, tx_ring->tail);
An issue was found in the Tx logic that was causing the afore mentioned
condition for updating HW tail to never happen, causing tx_timeouts.
In ice_xmit_frame_ring we calculate how many descriptors we need for the
Tx transaction based on the skb the kernel hands us. This is then passed
into ice_maybe_stop_tx along with some extra padding to determine if we
have enough descriptors available for this transaction. If we don't then
we return -EBUSY to the stack, otherwise we move on and eventually
prepare the Tx descriptors accordingly in ice_tx_map and set
next_to_watch. In ice_tx_map we make another call to ice_maybe_stop_tx
with a value of MAX_SKB_FRAGS + 4. The key here is that this value is
possibly less than the value we sent in the first call to
ice_maybe_stop_tx in ice_xmit_frame_ring. Now, if the number of unused
descriptors is between MAX_SKB_FRAGS + 4 and the value used in the first
call to ice_maybe_stop_tx in ice_xmit_frame_ring then we do not update
the HW tail because of the "Tx HW tail write condition" above. This is
because in ice_maybe_stop_tx we return success from ice_maybe_stop_tx
instead of calling __ice_maybe_stop_tx and subsequently calling
netif_stop_subqueue, which sets the __QUEUE_STATE_DEV_XOFF bit. This
bit is then checked in the "Tx HW tail write condition" by calling
netif_xmit_stopped and subsequently updating HW tail if the
afore mentioned bit is set.
In ice_clean_tx_irq, if next_to_watch is not NULL, we end up cleaning
the descriptors that HW sets the DD bit on and we have the budget. The
HW head will eventually run into the HW tail in response to the
description in the paragraph above.
The next time through ice_xmit_frame_ring we make the initial call to
ice_maybe_stop_tx with another skb from the stack. This time we do not
have enough descriptors available and we return NETDEV_TX_BUSY to the
stack and end up setting next_to_watch to NULL.
This is where we are stuck. In ice_clean_tx_irq we never clean anything
because next_to_watch is always NULL and in ice_xmit_frame_ring we never
update HW tail because we already return NETDEV_TX_BUSY to the stack and
eventually we hit a tx_timeout.
This issue was fixed by making sure that the second call to
ice_maybe_stop_tx in ice_tx_map is passed a value that is >= the value
that was used on the initial call to ice_maybe_stop_tx in
ice_xmit_frame_ring. This was done by adding the following defines to
make the logic more clear and to reduce the chance of mucking this up
again:
ICE_CACHE_LINE_BYTES 64
ICE_DESCS_PER_CACHE_LINE (ICE_CACHE_LINE_BYTES / \
sizeof(struct ice_tx_desc))
ICE_DESCS_FOR_CTX_DESC 1
ICE_DESCS_FOR_SKB_DATA_PTR 1
The ICE_CACHE_LINE_BYTES being 64 is an assumption being made so we
don't have to figure this out on every pass through the Tx path. Instead
I added a sanity check in ice_probe to verify cache line size and print
a message if it's not 64 Bytes. This will make it easier to file issues
if they are seen when the cache line size is not 64 Bytes when reading
from the GLPCI_CNF2 register.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
.../net/ethernet/intel/ice/ice_hw_autogen.h | 2 ++
drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++++++++++++++++
drivers/net/ethernet/intel/ice/ice_txrx.c | 9 +++++----
drivers/net/ethernet/intel/ice/ice_txrx.h | 17 +++++++++++++++--
4 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 5fdea6ec7675..596b9fb1c510 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -242,6 +242,8 @@
#define GLNVM_ULD 0x000B6008
#define GLNVM_ULD_CORER_DONE_M BIT(3)
#define GLNVM_ULD_GLOBR_DONE_M BIT(4)
+#define GLPCI_CNF2 0x000BE004
+#define GLPCI_CNF2_CACHELINE_SIZE_M BIT(1)
#define PF_FUNC_RID 0x0009E880
#define PF_FUNC_RID_FUNC_NUM_S 0
#define PF_FUNC_RID_FUNC_NUM_M ICE_M(0x7, 0)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 82f49dbd762c..333312a1d595 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1994,6 +1994,22 @@ static int ice_init_interrupt_scheme(struct ice_pf *pf)
return 0;
}
+/**
+ * ice_verify_cacheline_size - verify driver's assumption of 64 Byte cache lines
+ * @pf: pointer to the PF structure
+ *
+ * There is no error returned here because the driver should be able to handle
+ * 128 Byte cache lines, so we only print a warning in case issues are seen,
+ * specifically with Tx.
+ */
+static void ice_verify_cacheline_size(struct ice_pf *pf)
+{
+ if (rd32(&pf->hw, GLPCI_CNF2) & GLPCI_CNF2_CACHELINE_SIZE_M)
+ dev_warn(&pf->pdev->dev,
+ "%d Byte cache line assumption is invalid, driver may have Tx timeouts!\n",
+ ICE_CACHE_LINE_BYTES);
+}
+
/**
* ice_probe - Device initialization routine
* @pdev: PCI device information struct
@@ -2144,6 +2160,8 @@ static int ice_probe(struct pci_dev *pdev,
/* since everything is good, start the service timer */
mod_timer(&pf->serv_tmr, round_jiffies(jiffies + pf->serv_tmr_period));
+ ice_verify_cacheline_size(pf);
+
return 0;
err_alloc_sw_unroll:
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 5dae968d853e..3387c67c848d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1556,15 +1556,15 @@ int ice_tso(struct ice_tx_buf *first, struct ice_tx_offload_params *off)
* magnitude greater than our largest possible GSO size.
*
* This would then be implemented as:
- * return (((size >> 12) * 85) >> 8) + 1;
+ * return (((size >> 12) * 85) >> 8) + ICE_DESCS_FOR_SKB_DATA_PTR;
*
* Since multiplication and division are commutative, we can reorder
* operations into:
- * return ((size * 85) >> 20) + 1;
+ * return ((size * 85) >> 20) + ICE_DESCS_FOR_SKB_DATA_PTR;
*/
static unsigned int ice_txd_use_count(unsigned int size)
{
- return ((size * 85) >> 20) + 1;
+ return ((size * 85) >> 20) + ICE_DESCS_FOR_SKB_DATA_PTR;
}
/**
@@ -1706,7 +1706,8 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_ring *tx_ring)
* + 1 desc for context descriptor,
* otherwise try next time
*/
- if (ice_maybe_stop_tx(tx_ring, count + 4 + 1)) {
+ if (ice_maybe_stop_tx(tx_ring, count + ICE_DESCS_PER_CACHE_LINE +
+ ICE_DESCS_FOR_CTX_DESC)) {
tx_ring->tx_stats.tx_busy++;
return NETDEV_TX_BUSY;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 1d0f58bd389b..75d0eaf6c9dd 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -22,8 +22,21 @@
#define ICE_RX_BUF_WRITE 16 /* Must be power of 2 */
#define ICE_MAX_TXQ_PER_TXQG 128
-/* Tx Descriptors needed, worst case */
-#define DESC_NEEDED (MAX_SKB_FRAGS + 4)
+/* We are assuming that the cache line is always 64 Bytes here for ice.
+ * In order to make sure that is a correct assumption there is a check in probe
+ * to print a warning if the read from GLPCI_CNF2 tells us that the cache line
+ * size is 128 bytes. We do it this way because we do not want to read the
+ * GLPCI_CNF2 register or a variable containing the value on every pass through
+ * the Tx path.
+ */
+#define ICE_CACHE_LINE_BYTES 64
+#define ICE_DESCS_PER_CACHE_LINE (ICE_CACHE_LINE_BYTES / \
+ sizeof(struct ice_tx_desc))
+#define ICE_DESCS_FOR_CTX_DESC 1
+#define ICE_DESCS_FOR_SKB_DATA_PTR 1
+/* Tx descriptors needed, worst case */
+#define DESC_NEEDED (MAX_SKB_FRAGS + ICE_DESCS_FOR_CTX_DESC + \
+ ICE_DESCS_PER_CACHE_LINE + ICE_DESCS_FOR_SKB_DATA_PTR)
#define ICE_DESC_UNUSED(R) \
((((R)->next_to_clean > (R)->next_to_use) ? 0 : (R)->count) + \
(R)->next_to_clean - (R)->next_to_use - 1)
--
2.19.1
^ permalink raw reply related
* [net 12/14] ice: Change req_speeds to be u16
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Chinh T Cao, netdev, nhorman, sassmann, Anirudh Venkataramanan,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Chinh T Cao <chinh.t.cao@intel.com>
Since the req_speeds field in struct ice_link_status is a u8,
req_speeds & ICE_AQ_LINK_SPEED_40GB always returns 0. This was caught
by a coverity scan.
Fix this by changing req_speeds to be u16.
Reported-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_type.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 12f9432abf11..f4dbc81c1988 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -92,12 +92,12 @@ struct ice_link_status {
u64 phy_type_low;
u16 max_frame_size;
u16 link_speed;
+ u16 req_speeds;
u8 lse_ena; /* Link Status Event notification */
u8 link_info;
u8 an_info;
u8 ext_info;
u8 pacing;
- u8 req_speeds;
/* Refer to #define from module_type[ICE_MODULE_TYPE_TOTAL_BYTE] of
* ice_aqc_get_phy_caps structure
*/
--
2.19.1
^ permalink raw reply related
* [net 10/14] ice: Fix the bytecount sent to netdev_tx_sent_queue
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem
Cc: Brett Creeley, netdev, nhorman, sassmann, Anirudh Venkataramanan,
Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Brett Creeley <brett.creeley@intel.com>
Currently if the driver does a TSO offload the bytecount sent to
netdev_tx_sent_queue will be incorrect. This is because in ice_tso we
overwrite the initial value that we set in ice_tx_map. This creates a
mismatch between the Tx and Tx clean flow. In the Tx clean flow we
calculate the bytecount (called total_bytes) as we clean the
descriptors so the value used in the Tx clean path is correct. Fix this
by using += in ice_tso instead of =. This fixes the mismatch in
bytecount mentioned above.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 3387c67c848d..fe5bbabbb41e 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1520,7 +1520,7 @@ int ice_tso(struct ice_tx_buf *first, struct ice_tx_offload_params *off)
/* update gso_segs and bytecount */
first->gso_segs = skb_shinfo(skb)->gso_segs;
- first->bytecount = (first->gso_segs - 1) * off->header_len;
+ first->bytecount += (first->gso_segs - 1) * off->header_len;
cd_tso_len = skb->len - off->header_len;
cd_mss = skb_shinfo(skb)->gso_size;
--
2.19.1
^ permalink raw reply related
* [net 14/14] i40e: enable NETIF_F_NTUPLE and NETIF_F_HW_TC at driver load
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
The assignment of the feature flag NETIF_F_NTUPLE and NETIF_F_HW_TC
occurs prior to the initial setup of the local hw_features variable.
This means the features are set as user-changeable, but are not set in
the currently active feature list. This results in the features being
disabled at the driver's initial load.
Move the assignment after the initial assignment of hw_features, and
assign to the local variable. This ensures that NETIF_F_NTUPLE and
NETIF_F_HW_TC are marked as user-changeable, and also enables them by
default when the driver loads.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3ff5ee49818b..21c2688d6308 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12268,13 +12268,13 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
/* record features VLANs can make use of */
netdev->vlan_features |= hw_enc_features | NETIF_F_TSO_MANGLEID;
- if (!(pf->flags & I40E_FLAG_MFP_ENABLED))
- netdev->hw_features |= NETIF_F_NTUPLE | NETIF_F_HW_TC;
-
hw_features = hw_enc_features |
NETIF_F_HW_VLAN_CTAG_TX |
NETIF_F_HW_VLAN_CTAG_RX;
+ if (!(pf->flags & I40E_FLAG_MFP_ENABLED))
+ hw_features |= NETIF_F_NTUPLE | NETIF_F_HW_TC;
+
netdev->hw_features |= hw_features;
netdev->features |= hw_features | NETIF_F_HW_VLAN_CTAG_FILTER;
--
2.19.1
^ permalink raw reply related
* [net 13/14] i40e: restore NETIF_F_GSO_IPXIP[46] to netdev features
From: Jeff Kirsher @ 2018-11-07 19:16 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107191631.5072-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Since commit bacd75cfac8a ("i40e/i40evf: Add capability exchange for
outer checksum", 2017-04-06) the i40e driver has not reported support
for IP-in-IP offloads. This likely occurred due to a bad rebase, as the
commit extracts hw_enc_features into its own variable. As part of this
change, it dropped the NETIF_F_FSO_IPXIP flags from the
netdev->hw_enc_features. This was unfortunately not caught during code
review.
Fix this by adding back the missing feature flags.
For reference, NETIF_F_GSO_IPXIP4 was added in commit 7e13318daa4a
("net: define gso types for IPx over IPv4 and IPv6", 2016-05-20),
replacing NETIF_F_GSO_IPIP and NETIF_F_GSO_SIT.
NETIF_F_GSO_IPXIP6 was added in commit bf2d1df39502 ("intel: Add support
for IPv6 IP-in-IP offload", 2016-05-20).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index bc71a21c1dc2..3ff5ee49818b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12249,6 +12249,8 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
NETIF_F_GSO_GRE |
NETIF_F_GSO_GRE_CSUM |
NETIF_F_GSO_PARTIAL |
+ NETIF_F_GSO_IPXIP4 |
+ NETIF_F_GSO_IPXIP6 |
NETIF_F_GSO_UDP_TUNNEL |
NETIF_F_GSO_UDP_TUNNEL_CSUM |
NETIF_F_SCTP_CRC |
--
2.19.1
^ permalink raw reply related
* Re: [PATCH v4 1/3] net: emac: implement 802.1Q VLAN TX tagging support
From: David Miller @ 2018-11-07 19:23 UTC (permalink / raw)
To: chunkeey; +Cc: netdev, ivan, f.fainelli
In-Reply-To: <20181106.150954.2230387672447683694.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Tue, 06 Nov 2018 15:09:54 -0800 (PST)
> From: Christian Lamparter <chunkeey@gmail.com>
> Date: Tue, 6 Nov 2018 23:27:49 +0100
>
>> @@ -1435,6 +1436,22 @@ static inline netdev_tx_t emac_xmit_finish(struct emac_instance *dev, int len)
>> return NETDEV_TX_OK;
>> }
>>
>> +static inline u16 emac_tx_vlan(struct emac_instance *dev, struct sk_buff *skb)
>> +{
>> + /* Handle VLAN TPID and TCI insert if this is a VLAN skb */
>> + if (emac_has_feature(dev, EMAC_FTR_HAS_VLAN_CTAG_TX) &&
>> + skb_vlan_tag_present(skb)) {
>> + struct emac_regs __iomem *p = dev->emacp;
>> +
>> + /* update the VLAN TCI */
>> + out_be32(&p->vtci, (u32)skb_vlan_tag_get(skb));
>
> Hmmm, how does this vtci register work?
I'm tossing your patches since you refuse to answer my question and
explain why this works properly.
Sorry.
^ permalink raw reply
* Re: [PATCH net-next 04/11] selftests: pmtu: Introduce tests for IPv4/IPv6 over VxLAN over IPv6
From: David Ahern @ 2018-11-07 19:28 UTC (permalink / raw)
To: Stefano Brivio, David S. Miller; +Cc: Sabrina Dubroca, Xin Long, netdev
In-Reply-To: <366b75ae560cc2d0b3a0f69b84d43b621c8fcce4.1541533786.git.sbrivio@redhat.com>
On 11/6/18 2:39 PM, Stefano Brivio wrote:
> Use a router between endpoints, implemented via namespaces, set a low MTU
> between router and destination endpoint, exceed it and check PMTU value in
> route exceptions.
>
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> This only introduces tests over VxLAN over IPv6 right now. I'll introduce
> tests over IPv4 (they can be added trivially) once DF configuration support
> is accepted into iproute2.
you can add them now and wrapped in a 'does ip support the df option'
check. That is needed regardless of order (kernel vs iproute2).
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox