* [PATCH 0/2] l2 hardware accelerated macvlans @ 2013-11-04 17:15 John Fastabend 2013-11-04 17:15 ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices John Fastabend 2013-11-04 17:15 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans John Fastabend 0 siblings, 2 replies; 7+ messages in thread From: John Fastabend @ 2013-11-04 17:15 UTC (permalink / raw) To: nhorman, alexander.h.duyck; +Cc: netdev, andy, davem, jeffrey.t.kirsher This patch adds support to offload macvlan net_devices to the hardware. With these patches packets are pushed to the macvlan net_device directly and do not pass through the lower dev. The patches here have made it through multiple iterations each with a slightly different focus. First I tried to push these as a new link type called "VMDQ". The patches shown here, http://comments.gmane.org/gmane.linux.network/237617 Following this implementation I renamed the link type "VSI" and addressed various comments. Finally Neil Horman picked up the patches and integrated the offload into the macvlan code. Here, http://permalink.gmane.org/gmane.linux.network/285658 The attached series is clean-up of his patches, with a few fixes. I suspect Neil will add his signed-off-by line assuming I didn't mangle anything. If folks find this series acceptable there are a few items we can work on next. First broadcast and multicast will use the hardware even for local traffic with this series. It would be best (I think) to use the software path for macvlan to macvlan traffic and save the PCIe bus. Also this only allows for layer 2 mac forwarding where some hardware supports more interesting forwarding capabilities. Integrating with OVS may be useful here. As always any comments/feedback welcome. I'm going to continue testing these on top of ixgbe but I believe these are stable and wanted to get them out to a wider audience. I've tested multiple offloaded macvlans with iperf and netperf using multiple sessions of each and seen no issues. My basic I/O test is here but I've also done some link testing and others, #ip link add link eth2 numtxqueues 4 numrxqueues 4 txqueuelen 50 type macvlan #tc qdisc add dev macvlan0 mq #iperf -c 10.0.0.1 -P 8 -t 5000 -i 10 Thanks, John --- John Fastabend (2): ixgbe: enable l2 forwarding acceleration for macvlans net: Add layer 2 hardware acceleration operations for macvlan devices drivers/net/ethernet/intel/ixgbe/ixgbe.h | 20 + drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 12 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 465 +++++++++++++++++++++---- drivers/net/macvlan.c | 36 ++ include/linux/if_macvlan.h | 1 include/linux/netdev_features.h | 2 include/linux/netdevice.h | 36 ++ include/uapi/linux/if.h | 1 net/core/dev.c | 18 + net/core/ethtool.c | 1 net/sched/sch_generic.c | 2 11 files changed, 506 insertions(+), 88 deletions(-) -- Signature ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices 2013-11-04 17:15 [PATCH 0/2] l2 hardware accelerated macvlans John Fastabend @ 2013-11-04 17:15 ` John Fastabend 2013-11-04 17:15 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans John Fastabend 1 sibling, 0 replies; 7+ messages in thread From: John Fastabend @ 2013-11-04 17:15 UTC (permalink / raw) To: nhorman, alexander.h.duyck; +Cc: netdev, andy, davem, jeffrey.t.kirsher Add a operations structure that allows a network interface to export the fact that it supports package forwarding in hardware between physical interfaces and other mac layer devices assigned to it (such as macvlans). This operaions structure can be used by virtual mac devices to bypass software switching so that forwarding can be done in hardware more efficiently. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> --- drivers/net/macvlan.c | 36 +++++++++++++++++++++++++++++++++++- include/linux/if_macvlan.h | 1 + include/linux/netdev_features.h | 2 ++ include/linux/netdevice.h | 36 +++++++++++++++++++++++++++++++++++- include/uapi/linux/if.h | 1 + net/core/dev.c | 18 +++++++++++++----- net/core/ethtool.c | 1 + net/sched/sch_generic.c | 2 +- 8 files changed, 89 insertions(+), 8 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index cc9845e..eb68dd0 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -297,7 +297,13 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb, int ret; const struct macvlan_dev *vlan = netdev_priv(dev); - ret = macvlan_queue_xmit(skb, dev); + if (vlan->fwd_priv) { + skb->dev = vlan->lowerdev; + ret = dev_hard_start_xmit(skb, skb->dev, NULL, vlan->fwd_priv); + } else { + ret = macvlan_queue_xmit(skb, dev); + } + if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) { struct macvlan_pcpu_stats *pcpu_stats; @@ -347,6 +353,21 @@ static int macvlan_open(struct net_device *dev) goto hash_add; } + if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) { + vlan->fwd_priv = + lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev); + + /* If we get a NULL pointer back, or if we get an error + * then we should just fall through to the non accelerated path + */ + if (IS_ERR_OR_NULL(vlan->fwd_priv)) { + vlan->fwd_priv = NULL; + } else { + dev->features &= ~NETIF_F_LLTX; + return 0; + } + } + err = -EBUSY; if (macvlan_addr_busy(vlan->port, dev->dev_addr)) goto out; @@ -367,6 +388,11 @@ hash_add: del_unicast: dev_uc_del(lowerdev, dev->dev_addr); out: + if (vlan->fwd_priv) { + lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev, + vlan->fwd_priv); + vlan->fwd_priv = NULL; + } return err; } @@ -375,6 +401,13 @@ static int macvlan_stop(struct net_device *dev) struct macvlan_dev *vlan = netdev_priv(dev); struct net_device *lowerdev = vlan->lowerdev; + if (vlan->fwd_priv) { + lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev, + vlan->fwd_priv); + vlan->fwd_priv = NULL; + return 0; + } + dev_uc_unsync(lowerdev, dev); dev_mc_unsync(lowerdev, dev); @@ -833,6 +866,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, if (err < 0) goto destroy_port; + dev->priv_flags |= IFF_MACVLAN; err = netdev_upper_dev_link(lowerdev, dev); if (err) goto destroy_port; diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index ddd33fd..c270285 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -61,6 +61,7 @@ struct macvlan_dev { struct hlist_node hlist; struct macvlan_port *port; struct net_device *lowerdev; + void *fwd_priv; struct macvlan_pcpu_stats __percpu *pcpu_stats; DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ); diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index b05a4b5..1005ebf 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -62,6 +62,7 @@ enum { NETIF_F_HW_VLAN_STAG_TX_BIT, /* Transmit VLAN STAG HW acceleration */ NETIF_F_HW_VLAN_STAG_RX_BIT, /* Receive VLAN STAG HW acceleration */ NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */ + NETIF_F_HW_L2FW_DOFFLOAD_BIT, /* Allow L2 Forwarding in Hardware */ /* * Add your fresh new feature above and remember to update @@ -116,6 +117,7 @@ enum { #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER) #define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX) #define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX) +#define NETIF_F_HW_L2FW_DOFFLOAD __NETIF_F(HW_L2FW_DOFFLOAD) /* Features valid for ethtool to change */ /* = all defined minus driver/device-class-related */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e6353ca..d62c130 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -962,6 +962,25 @@ struct netdev_phys_port_id { * Called by vxlan to notify the driver about a UDP port and socket * address family that vxlan is not listening to anymore. The operation * is protected by the vxlan_net->sock_lock. + * + * void* (*ndo_dfwd_add_station)(struct net_device *pdev, + * struct net_device *dev) + * Called by upper layer devices to accelerate switching or other + * station functionality into hardware. 'pdev is the lowerdev + * to use for the offload and 'dev' is the net device that will + * back the offload. Returns a pointer to the private structure + * the upper layer will maintain. + * void (*ndo_dfwd_del_station)(struct net_device *pdev, void *priv) + * Called by upper layer device to delete the station created + * by 'ndo_dfwd_add_station'. 'pdev' is the net device backing + * the station and priv is the structure returned by the add + * operation. + * netdev_tx_t (*ndo_dfwd_start_xmit)(struct sk_buff *skb, + * struct net_device *dev, + * void *priv); + * Callback to use for xmit over the accelerated station. This + * is used in place of ndo_start_xmit on accelerated net + * devices. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1098,6 +1117,15 @@ struct net_device_ops { void (*ndo_del_vxlan_port)(struct net_device *dev, sa_family_t sa_family, __be16 port); + + void* (*ndo_dfwd_add_station)(struct net_device *pdev, + struct net_device *dev); + void (*ndo_dfwd_del_station)(struct net_device *pdev, + void *priv); + + netdev_tx_t (*ndo_dfwd_start_xmit) (struct sk_buff *skb, + struct net_device *dev, + void *priv); }; /* @@ -1195,6 +1223,7 @@ struct net_device { /* Management operations */ const struct net_device_ops *netdev_ops; const struct ethtool_ops *ethtool_ops; + const struct forwarding_accel_ops *fwd_ops; /* Hardware header description */ const struct header_ops *header_ops; @@ -2388,7 +2417,7 @@ int dev_change_carrier(struct net_device *, bool new_carrier); int dev_get_phys_port_id(struct net_device *dev, struct netdev_phys_port_id *ppid); int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, - struct netdev_queue *txq); + struct netdev_queue *txq, void *accel_priv); int dev_forward_skb(struct net_device *dev, struct sk_buff *skb); extern int netdev_budget; @@ -2967,6 +2996,11 @@ static inline void netif_set_gso_max_size(struct net_device *dev, dev->gso_max_size = size; } +static inline bool netif_is_macvlan(struct net_device *dev) +{ + return dev->priv_flags & IFF_MACVLAN; +} + static inline bool netif_is_bond_master(struct net_device *dev) { return dev->flags & IFF_MASTER && dev->priv_flags & IFF_BONDING; diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h index 1ec407b..d758163 100644 --- a/include/uapi/linux/if.h +++ b/include/uapi/linux/if.h @@ -83,6 +83,7 @@ #define IFF_SUPP_NOFCS 0x80000 /* device supports sending custom FCS */ #define IFF_LIVE_ADDR_CHANGE 0x100000 /* device supports hardware address * change when it's running */ +#define IFF_MACVLAN 0x200000 /* Macvlan device */ #define IF_GET_IFACE 0x0001 /* for querying only */ diff --git a/net/core/dev.c b/net/core/dev.c index 0e61365..8ffc52e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2538,7 +2538,7 @@ static inline int skb_needs_linearize(struct sk_buff *skb, } int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, - struct netdev_queue *txq) + struct netdev_queue *txq, void *accel_priv) { const struct net_device_ops *ops = dev->netdev_ops; int rc = NETDEV_TX_OK; @@ -2604,9 +2604,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, dev_queue_xmit_nit(skb, dev); skb_len = skb->len; - rc = ops->ndo_start_xmit(skb, dev); + if (accel_priv) + rc = ops->ndo_dfwd_start_xmit(skb, dev, accel_priv); + else + rc = ops->ndo_start_xmit(skb, dev); + trace_net_dev_xmit(skb, rc, dev, skb_len); - if (rc == NETDEV_TX_OK) + if (rc == NETDEV_TX_OK && txq) txq_trans_update(txq); return rc; } @@ -2622,7 +2626,10 @@ gso: dev_queue_xmit_nit(nskb, dev); skb_len = nskb->len; - rc = ops->ndo_start_xmit(nskb, dev); + if (accel_priv) + rc = ops->ndo_dfwd_start_xmit(nskb, dev, accel_priv); + else + rc = ops->ndo_start_xmit(nskb, dev); trace_net_dev_xmit(nskb, rc, dev, skb_len); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) @@ -2647,6 +2654,7 @@ out_kfree_skb: out: return rc; } +EXPORT_SYMBOL_GPL(dev_hard_start_xmit); static void qdisc_pkt_len_init(struct sk_buff *skb) { @@ -2854,7 +2862,7 @@ int dev_queue_xmit(struct sk_buff *skb) if (!netif_xmit_stopped(txq)) { __this_cpu_inc(xmit_recursion); - rc = dev_hard_start_xmit(skb, dev, txq); + rc = dev_hard_start_xmit(skb, dev, txq, NULL); __this_cpu_dec(xmit_recursion); if (dev_xmit_complete(rc)) { HARD_TX_UNLOCK(dev, txq); diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 8629898..30071de 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -96,6 +96,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_LOOPBACK_BIT] = "loopback", [NETIF_F_RXFCS_BIT] = "rx-fcs", [NETIF_F_RXALL_BIT] = "rx-all", + [NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload", }; static int ethtool_get_features(struct net_device *dev, void __user *useraddr) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 7fc899a..922a094 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -126,7 +126,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, HARD_TX_LOCK(dev, txq, smp_processor_id()); if (!netif_xmit_frozen_or_stopped(txq)) - ret = dev_hard_start_xmit(skb, dev, txq); + ret = dev_hard_start_xmit(skb, dev, txq, NULL); HARD_TX_UNLOCK(dev, txq); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans 2013-11-04 17:15 [PATCH 0/2] l2 hardware accelerated macvlans John Fastabend 2013-11-04 17:15 ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices John Fastabend @ 2013-11-04 17:15 ` John Fastabend 1 sibling, 0 replies; 7+ messages in thread From: John Fastabend @ 2013-11-04 17:15 UTC (permalink / raw) To: nhorman, alexander.h.duyck; +Cc: netdev, andy, davem, jeffrey.t.kirsher Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to take advantage of these operations. Allow it to allocate queues for a macvlan so that when we transmit a frame, we can do the switching in hardware inside the ixgbe card, rather than in software. For now this patch limits the hardware to 8 offloaded macvlan ports. A follow on patch will remove this limitation but to simplify review/validation of the new macvlan offload ops we leave it at 8 for now. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 20 + drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 12 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 465 +++++++++++++++++++++---- 3 files changed, 417 insertions(+), 80 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 0914914..236d4fb 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -223,6 +223,15 @@ enum ixgbe_ring_state_t { __IXGBE_RX_FCOE, }; +struct ixgbe_fwd_adapter { + unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)]; + struct net_device *netdev; + struct ixgbe_adapter *real_adapter; + unsigned int tx_base_queue; + unsigned int rx_base_queue; + int pool; +}; + #define check_for_tx_hang(ring) \ test_bit(__IXGBE_TX_DETECT_HANG, &(ring)->state) #define set_check_for_tx_hang(ring) \ @@ -240,6 +249,7 @@ struct ixgbe_ring { struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */ struct net_device *netdev; /* netdev ring belongs to */ struct device *dev; /* device for DMA mapping */ + struct ixgbe_fwd_adapter *l2_accel_priv; void *desc; /* descriptor ring memory */ union { struct ixgbe_tx_buffer *tx_buffer_info; @@ -292,11 +302,15 @@ enum ixgbe_ring_f_enum { }; #define IXGBE_MAX_RSS_INDICES 16 -#define IXGBE_MAX_VMDQ_INDICES 64 +#define IXGBE_MAX_VMDQ_INDICES 8 #define IXGBE_MAX_FDIR_INDICES 63 /* based on q_vector limit */ #define IXGBE_MAX_FCOE_INDICES 8 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) +#define IXGBE_MAX_L2A_QUEUES 4 +#define IXGBE_MAX_L2A_QUEUES 4 +#define IXGBE_BAD_L2A_QUEUE 3 + struct ixgbe_ring_feature { u16 limit; /* upper limit on feature indices */ u16 indices; /* current value of indices */ @@ -766,6 +780,7 @@ struct ixgbe_adapter { #endif /*CONFIG_DEBUG_FS*/ u8 default_up; + unsigned long fwd_bitmask; /* Bitmask indicating in use pools */ }; struct ixgbe_fdir_filter { @@ -939,4 +954,7 @@ void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr); void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter); #endif +netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, + struct ixgbe_adapter *adapter, + struct ixgbe_ring *tx_ring); #endif /* _IXGBE_H_ */ diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c index 90b4e10..5eae76a 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c @@ -852,7 +852,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, /* apply Tx specific ring traits */ ring->count = adapter->tx_ring_count; - ring->queue_index = txr_idx; + if (adapter->num_rx_pools > 1) + ring->queue_index = + txr_idx % adapter->num_rx_queues_per_pool; + else + ring->queue_index = txr_idx; /* assign ring to adapter */ adapter->tx_ring[txr_idx] = ring; @@ -895,7 +899,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, #endif /* IXGBE_FCOE */ /* apply Rx specific ring traits */ ring->count = adapter->rx_ring_count; - ring->queue_index = rxr_idx; + if (adapter->num_rx_pools > 1) + ring->queue_index = + rxr_idx % adapter->num_rx_queues_per_pool; + else + ring->queue_index = rxr_idx; /* assign ring to adapter */ adapter->rx_ring[rxr_idx] = ring; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 5191b3c..fe04e86 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -44,6 +44,7 @@ #include <linux/ethtool.h> #include <linux/if.h> #include <linux/if_vlan.h> +#include <linux/if_macvlan.h> #include <linux/if_bridge.h> #include <linux/prefetch.h> #include <scsi/fc/fc_fcoe.h> @@ -870,11 +871,18 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring) static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring) { - struct ixgbe_adapter *adapter = netdev_priv(ring->netdev); - struct ixgbe_hw *hw = &adapter->hw; + struct ixgbe_adapter *adapter; + struct ixgbe_hw *hw; + u32 head, tail; + + if (ring->l2_accel_priv) + adapter = ring->l2_accel_priv->real_adapter; + else + adapter = netdev_priv(ring->netdev); - u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx)); - u32 tail = IXGBE_READ_REG(hw, IXGBE_TDT(ring->reg_idx)); + hw = &adapter->hw; + head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx)); + tail = IXGBE_READ_REG(hw, IXGBE_TDT(ring->reg_idx)); if (head != tail) return (head < tail) ? @@ -3003,7 +3011,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter, struct ixgbe_q_vector *q_vector = ring->q_vector; if (q_vector) - netif_set_xps_queue(adapter->netdev, + netif_set_xps_queue(ring->netdev, &q_vector->affinity_mask, ring->queue_index); } @@ -3393,7 +3401,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; int rss_i = adapter->ring_feature[RING_F_RSS].indices; - int p; + u16 pool; /* PSRTYPE must be initialized in non 82598 adapters */ u32 psrtype = IXGBE_PSRTYPE_TCPHDR | @@ -3410,9 +3418,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter) else if (rss_i > 1) psrtype |= 1 << 29; - for (p = 0; p < adapter->num_rx_pools; p++) - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)), - psrtype); + for_each_set_bit(pool, &adapter->fwd_bitmask, 32) + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype); } static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter) @@ -3681,7 +3688,11 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter) case ixgbe_mac_82599EB: case ixgbe_mac_X540: for (i = 0; i < adapter->num_rx_queues; i++) { - j = adapter->rx_ring[i]->reg_idx; + struct ixgbe_ring *ring = adapter->rx_ring[i]; + + if (ring->l2_accel_priv) + continue; + j = ring->reg_idx; vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); vlnctrl &= ~IXGBE_RXDCTL_VME; IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(j), vlnctrl); @@ -3711,7 +3722,11 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter) case ixgbe_mac_82599EB: case ixgbe_mac_X540: for (i = 0; i < adapter->num_rx_queues; i++) { - j = adapter->rx_ring[i]->reg_idx; + struct ixgbe_ring *ring = adapter->rx_ring[i]; + + if (ring->l2_accel_priv) + continue; + j = ring->reg_idx; vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); vlnctrl |= IXGBE_RXDCTL_VME; IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(j), vlnctrl); @@ -3748,7 +3763,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev) unsigned int rar_entries = hw->mac.num_rar_entries - 1; int count = 0; - /* In SR-IOV mode significantly less RAR entries are available */ + /* In SR-IOV/VMDQ modes significantly less RAR entries are available */ if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) rar_entries = IXGBE_MAX_PF_MACVLANS - 1; @@ -4113,6 +4128,195 @@ static void ixgbe_fdir_filter_restore(struct ixgbe_adapter *adapter) spin_unlock(&adapter->fdir_perfect_lock); } +static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool, + struct ixgbe_adapter *adapter) +{ + struct ixgbe_hw *hw = &adapter->hw; + u32 vmolr; + + /* No unicast promiscuous support for VMDQ devices. */ + vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(pool)); + vmolr |= (IXGBE_VMOLR_ROMPE | IXGBE_VMOLR_BAM | IXGBE_VMOLR_AUPE); + + /* clear the affected bit */ + vmolr &= ~IXGBE_VMOLR_MPE; + + if (dev->flags & IFF_ALLMULTI) { + vmolr |= IXGBE_VMOLR_MPE; + } else { + vmolr |= IXGBE_VMOLR_ROMPE; + hw->mac.ops.update_mc_addr_list(hw, dev); + } + ixgbe_write_uc_addr_list(adapter->netdev); + IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr); +} + +static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter, + u8 *addr, u16 pool) +{ + struct ixgbe_hw *hw = &adapter->hw; + unsigned int entry; + + entry = hw->mac.num_rar_entries - pool; + hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV); +} + +static void ixgbe_fwd_psrtype(struct ixgbe_fwd_adapter *vadapter) +{ + struct ixgbe_adapter *adapter = vadapter->real_adapter; + int rss_i = vadapter->netdev->real_num_rx_queues; + struct ixgbe_hw *hw = &adapter->hw; + u16 pool = vadapter->pool; + u32 psrtype = IXGBE_PSRTYPE_TCPHDR | + IXGBE_PSRTYPE_UDPHDR | + IXGBE_PSRTYPE_IPV4HDR | + IXGBE_PSRTYPE_L2HDR | + IXGBE_PSRTYPE_IPV6HDR; + + if (hw->mac.type == ixgbe_mac_82598EB) + return; + + if (rss_i > 3) + psrtype |= 2 << 29; + else if (rss_i > 1) + psrtype |= 1 << 29; + + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype); +} + +/** + * ixgbe_clean_rx_ring - Free Rx Buffers per Queue + * @rx_ring: ring to free buffers from + **/ +static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) +{ + struct device *dev = rx_ring->dev; + unsigned long size; + u16 i; + + /* ring already cleared, nothing to do */ + if (!rx_ring->rx_buffer_info) + return; + + /* Free all the Rx ring sk_buffs */ + for (i = 0; i < rx_ring->count; i++) { + struct ixgbe_rx_buffer *rx_buffer; + + rx_buffer = &rx_ring->rx_buffer_info[i]; + if (rx_buffer->skb) { + struct sk_buff *skb = rx_buffer->skb; + if (IXGBE_CB(skb)->page_released) { + dma_unmap_page(dev, + IXGBE_CB(skb)->dma, + ixgbe_rx_bufsz(rx_ring), + DMA_FROM_DEVICE); + IXGBE_CB(skb)->page_released = false; + } + dev_kfree_skb(skb); + } + rx_buffer->skb = NULL; + if (rx_buffer->dma) + dma_unmap_page(dev, rx_buffer->dma, + ixgbe_rx_pg_size(rx_ring), + DMA_FROM_DEVICE); + rx_buffer->dma = 0; + if (rx_buffer->page) + __free_pages(rx_buffer->page, + ixgbe_rx_pg_order(rx_ring)); + rx_buffer->page = NULL; + } + + size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count; + memset(rx_ring->rx_buffer_info, 0, size); + + /* Zero out the descriptor ring */ + memset(rx_ring->desc, 0, rx_ring->size); + + rx_ring->next_to_alloc = 0; + rx_ring->next_to_clean = 0; + rx_ring->next_to_use = 0; +} + +static void ixgbe_disable_fwd_ring(struct ixgbe_fwd_adapter *vadapter, + struct ixgbe_ring *rx_ring) +{ + struct ixgbe_adapter *adapter = vadapter->real_adapter; + int index = rx_ring->queue_index + vadapter->rx_base_queue; + + /* shutdown specific queue receive and wait for dma to settle */ + ixgbe_disable_rx_queue(adapter, rx_ring); + usleep_range(10000, 20000); + ixgbe_irq_disable_queues(adapter, ((u64)1 << index)); + ixgbe_clean_rx_ring(rx_ring); + rx_ring->l2_accel_priv = NULL; +} + +static int ixgbe_fwd_ring_up(struct net_device *vdev, + struct ixgbe_fwd_adapter *accel) +{ + struct ixgbe_adapter *adapter = accel->real_adapter; + unsigned int rxbase; + unsigned int txbase; + int i, vmdq_pool, baseq; + + /* Configure VSI adapter structure */ + vmdq_pool = VMDQ_P(accel->pool); + baseq = vmdq_pool * adapter->num_rx_queues_per_pool; + + netdev_dbg(vdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n", + accel->pool, adapter->num_rx_pools, + baseq, baseq + adapter->num_rx_queues_per_pool, + adapter->fwd_bitmask); + + accel->netdev = vdev; + accel->rx_base_queue = rxbase = baseq; + accel->tx_base_queue = txbase = baseq; + + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + + for (i = 0; i < vdev->num_rx_queues; i++) { + adapter->rx_ring[rxbase + i]->netdev = vdev; + adapter->rx_ring[rxbase + i]->l2_accel_priv = accel; + ixgbe_configure_rx_ring(adapter, adapter->rx_ring[rxbase + i]); + } + + for (i = 0; i < vdev->num_tx_queues; i++) { + adapter->tx_ring[txbase + i]->netdev = vdev; + adapter->tx_ring[txbase + i]->l2_accel_priv = accel; + } + + if (is_valid_ether_addr(vdev->dev_addr)) + ixgbe_add_mac_filter(adapter, vdev->dev_addr, accel->pool); + + ixgbe_fwd_psrtype(accel); + ixgbe_macvlan_set_rx_mode(vdev, accel->pool, adapter); + return 0; +} + +static void ixgbe_configure_vsi(struct ixgbe_adapter *adapter) +{ + struct net_device *upper; + struct list_head *iter; + int err; + + netdev_for_each_all_upper_dev_rcu(adapter->netdev, upper, iter) { + if (netif_is_macvlan(upper)) { + struct macvlan_dev *vlan = netdev_priv(upper); + struct ixgbe_fwd_adapter *vadapter = vlan->fwd_priv; + + if (vlan->fwd_priv) { + err = ixgbe_fwd_ring_up(upper, vadapter); + if (err) + continue; + ixgbe_macvlan_set_rx_mode(upper, + vadapter->pool, + adapter); + } + } + } +} + static void ixgbe_configure(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; @@ -4164,6 +4368,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter) #endif /* IXGBE_FCOE */ ixgbe_configure_tx(adapter); ixgbe_configure_rx(adapter); + ixgbe_configure_vsi(adapter); } static inline bool ixgbe_is_sfp(struct ixgbe_hw *hw) @@ -4317,6 +4522,8 @@ static void ixgbe_setup_gpie(struct ixgbe_adapter *adapter) static void ixgbe_up_complete(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; + struct net_device *upper; + struct list_head *iter; int err; u32 ctrl_ext; @@ -4360,6 +4567,16 @@ static void ixgbe_up_complete(struct ixgbe_adapter *adapter) /* enable transmits */ netif_tx_start_all_queues(adapter->netdev); + /* enable any upper devices */ + netdev_for_each_all_upper_dev_rcu(adapter->netdev, upper, iter) { + if (netif_is_macvlan(upper)) { + struct macvlan_dev *vlan = netdev_priv(upper); + + if (vlan->fwd_priv) + netif_tx_start_all_queues(upper); + } + } + /* bring the link up in the watchdog, this could race with our first * link up interrupt but shouldn't be a problem */ adapter->flags |= IXGBE_FLAG_NEED_LINK_UPDATE; @@ -4451,59 +4668,6 @@ void ixgbe_reset(struct ixgbe_adapter *adapter) } /** - * ixgbe_clean_rx_ring - Free Rx Buffers per Queue - * @rx_ring: ring to free buffers from - **/ -static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) -{ - struct device *dev = rx_ring->dev; - unsigned long size; - u16 i; - - /* ring already cleared, nothing to do */ - if (!rx_ring->rx_buffer_info) - return; - - /* Free all the Rx ring sk_buffs */ - for (i = 0; i < rx_ring->count; i++) { - struct ixgbe_rx_buffer *rx_buffer; - - rx_buffer = &rx_ring->rx_buffer_info[i]; - if (rx_buffer->skb) { - struct sk_buff *skb = rx_buffer->skb; - if (IXGBE_CB(skb)->page_released) { - dma_unmap_page(dev, - IXGBE_CB(skb)->dma, - ixgbe_rx_bufsz(rx_ring), - DMA_FROM_DEVICE); - IXGBE_CB(skb)->page_released = false; - } - dev_kfree_skb(skb); - } - rx_buffer->skb = NULL; - if (rx_buffer->dma) - dma_unmap_page(dev, rx_buffer->dma, - ixgbe_rx_pg_size(rx_ring), - DMA_FROM_DEVICE); - rx_buffer->dma = 0; - if (rx_buffer->page) - __free_pages(rx_buffer->page, - ixgbe_rx_pg_order(rx_ring)); - rx_buffer->page = NULL; - } - - size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count; - memset(rx_ring->rx_buffer_info, 0, size); - - /* Zero out the descriptor ring */ - memset(rx_ring->desc, 0, rx_ring->size); - - rx_ring->next_to_alloc = 0; - rx_ring->next_to_clean = 0; - rx_ring->next_to_use = 0; -} - -/** * ixgbe_clean_tx_ring - Free Tx Buffers * @tx_ring: ring to be cleaned **/ @@ -4580,6 +4744,8 @@ void ixgbe_down(struct ixgbe_adapter *adapter) { struct net_device *netdev = adapter->netdev; struct ixgbe_hw *hw = &adapter->hw; + struct net_device *upper; + struct list_head *iter; u32 rxctrl; int i; @@ -4603,6 +4769,19 @@ void ixgbe_down(struct ixgbe_adapter *adapter) netif_carrier_off(netdev); netif_tx_disable(netdev); + /* disable any upper devices */ + netdev_for_each_all_upper_dev_rcu(adapter->netdev, upper, iter) { + if (netif_is_macvlan(upper)) { + struct macvlan_dev *vlan = netdev_priv(upper); + + if (vlan->fwd_priv) { + netif_tx_stop_all_queues(upper); + netif_carrier_off(upper); + netif_tx_disable(upper); + } + } + } + ixgbe_irq_disable(adapter); ixgbe_napi_disable_all(adapter); @@ -4833,6 +5012,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter) return -EIO; } + /* PF holds first pool slot */ + set_bit(0, &adapter->fwd_bitmask); set_bit(__IXGBE_DOWN, &adapter->state); return 0; @@ -5138,7 +5319,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu) static int ixgbe_open(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - int err; + int err, queues; /* disallow open during test */ if (test_bit(__IXGBE_TESTING, &adapter->state)) @@ -5163,16 +5344,22 @@ static int ixgbe_open(struct net_device *netdev) goto err_req_irq; /* Notify the stack of the actual queue counts. */ - err = netif_set_real_num_tx_queues(netdev, - adapter->num_rx_pools > 1 ? 1 : - adapter->num_tx_queues); + if (adapter->num_rx_pools > 1 && + adapter->num_tx_queues > IXGBE_MAX_L2A_QUEUES) + queues = IXGBE_MAX_L2A_QUEUES; + else + queues = adapter->num_tx_queues; + + err = netif_set_real_num_tx_queues(netdev, queues); if (err) goto err_set_queues; - - err = netif_set_real_num_rx_queues(netdev, - adapter->num_rx_pools > 1 ? 1 : - adapter->num_rx_queues); + if (adapter->num_rx_pools > 1 && + adapter->num_rx_queues > IXGBE_MAX_L2A_QUEUES) + queues = IXGBE_MAX_L2A_QUEUES; + else + queues = adapter->num_rx_queues; + err = netif_set_real_num_rx_queues(netdev, queues); if (err) goto err_set_queues; @@ -6762,8 +6949,9 @@ out_drop: return NETDEV_TX_OK; } -static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, - struct net_device *netdev) +static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb, + struct net_device *netdev, + struct ixgbe_ring *ring) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_ring *tx_ring; @@ -6779,10 +6967,17 @@ static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, skb_set_tail_pointer(skb, 17); } - tx_ring = adapter->tx_ring[skb->queue_mapping]; + tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping]; + return ixgbe_xmit_frame_ring(skb, adapter, tx_ring); } +static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, + struct net_device *netdev) +{ + return __ixgbe_xmit_frame(skb, netdev, NULL); +} + /** * ixgbe_set_mac - Change the Ethernet Address of the NIC * @netdev: network interface device structure @@ -7300,6 +7495,118 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode); } +int ixgbe_fwd_ring_down(struct net_device *vdev, + struct ixgbe_fwd_adapter *accel) +{ + struct ixgbe_adapter *adapter = accel->real_adapter; + unsigned int rxbase = accel->rx_base_queue; + unsigned int txbase = accel->tx_base_queue; + int i; + + netif_tx_stop_all_queues(vdev); + + for (i = 0; i < vdev->num_rx_queues; i++) { + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + adapter->rx_ring[rxbase + i]->netdev = adapter->netdev; + } + + for (i = 0; i < vdev->num_tx_queues; i++) { + adapter->tx_ring[txbase + i]->l2_accel_priv = NULL; + adapter->tx_ring[txbase + i]->netdev = adapter->netdev; + } + + + return 0; +} + +static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev) +{ + struct ixgbe_fwd_adapter *fwd_adapter = NULL; + struct ixgbe_adapter *adapter = netdev_priv(pdev); + int pool, err; + + /* Check for hardware restriction on number of rx/tx queues */ + if (vdev->num_rx_queues != vdev->num_tx_queues || + vdev->num_tx_queues > IXGBE_MAX_L2A_QUEUES || + vdev->num_tx_queues == IXGBE_BAD_L2A_QUEUE) { + netdev_info(pdev, "%s: Supports RX/TX Queue counts 1,2, and 4\n", + pdev->name); + return ERR_PTR(-EINVAL); + } + + if (adapter->num_rx_pools >= IXGBE_MAX_VMDQ_INDICES) + return ERR_PTR(-EBUSY); + + fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL); + if (!fwd_adapter) + return ERR_PTR(-ENOMEM); + + pool = find_first_zero_bit(&adapter->fwd_bitmask, 32); + adapter->num_rx_pools++; + set_bit(pool, &adapter->fwd_bitmask); + + /* Enable VMDq flag so device will be set in VM mode */ + adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED; + adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools; + adapter->ring_feature[RING_F_VMDQ].offset = 0; + adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_L2A_QUEUES; + + /* Force reinit of ring allocation with VMDQ enabled */ + ixgbe_setup_tc(pdev, netdev_get_num_tc(pdev)); + fwd_adapter->pool = pool; + fwd_adapter->real_adapter = adapter; + err = ixgbe_fwd_ring_up(vdev, fwd_adapter); + if (err) + goto fwd_add_err; + + err = netif_set_real_num_tx_queues(vdev, vdev->num_tx_queues); + if (err) + goto fwd_queue_err; + err = netif_set_real_num_rx_queues(vdev, vdev->num_rx_queues); + if (err) + goto fwd_queue_err; + + netif_tx_start_all_queues(vdev); + + return fwd_adapter; +fwd_queue_err: + ixgbe_fwd_ring_down(vdev, fwd_adapter); +fwd_add_err: + kfree(fwd_adapter); + return ERR_PTR(err); +} + +static void ixgbe_fwd_del(struct net_device *pdev, void *priv) +{ + struct ixgbe_fwd_adapter *fwd_adapter = priv; + struct ixgbe_adapter *adapter = fwd_adapter->real_adapter; + + clear_bit(fwd_adapter->pool, &adapter->fwd_bitmask); + adapter->num_rx_pools--; + + ixgbe_fwd_ring_down(fwd_adapter->netdev, fwd_adapter); + + netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n", + fwd_adapter->pool, adapter->num_rx_pools, + fwd_adapter->rx_base_queue, + fwd_adapter->rx_base_queue + adapter->num_rx_queues_per_pool, + adapter->fwd_bitmask); +} + +static netdev_tx_t ixgbe_fwd_xmit(struct sk_buff *skb, + struct net_device *dev, + void *priv) +{ + struct ixgbe_fwd_adapter *fwd_adapter = priv; + unsigned int queue; + struct ixgbe_ring *tx_ring; + + queue = skb->queue_mapping + fwd_adapter->tx_base_queue; + tx_ring = fwd_adapter->real_adapter->tx_ring[queue]; + + return __ixgbe_xmit_frame(skb, dev, tx_ring); +} + static const struct net_device_ops ixgbe_netdev_ops = { .ndo_open = ixgbe_open, .ndo_stop = ixgbe_close, @@ -7344,6 +7651,9 @@ static const struct net_device_ops ixgbe_netdev_ops = { .ndo_fdb_add = ixgbe_ndo_fdb_add, .ndo_bridge_setlink = ixgbe_ndo_bridge_setlink, .ndo_bridge_getlink = ixgbe_ndo_bridge_getlink, + .ndo_dfwd_add_station = ixgbe_fwd_add, + .ndo_dfwd_del_station = ixgbe_fwd_del, + .ndo_dfwd_start_xmit = ixgbe_fwd_xmit, }; /** @@ -7645,7 +7955,8 @@ skip_sriov: NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_RXHASH | - NETIF_F_RXCSUM; + NETIF_F_RXCSUM | + NETIF_F_HW_L2FW_DOFFLOAD; netdev->hw_features = netdev->features; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration @ 2013-09-25 20:16 Neil Horman 2013-10-04 20:10 ` [RFC PATCH 0/2 v2] " Neil Horman 2013-10-11 18:43 ` [RFC PATCH 0/2 v3] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman 0 siblings, 2 replies; 7+ messages in thread From: Neil Horman @ 2013-09-25 20:16 UTC (permalink / raw) To: netdev John, et al. - As promised, heres my (very rough) first pass at an alternate propsal for what you're trying to do with virtual station interfaces here. Its completely untested, but it builds, and I'll be trying to run it over the next few days (though I'm sure I got part of the hardware manipulation wrong). I wanted to post it early though so you could get a look at it to see what you did and didn't like about it. Some notes: 1) As discussed, the major effort here is to tie in macvlans with l2 forwarding acceleration, rather than creating a new vsi link type. That should make management easier for admins (be it via ovs or some other mechanism). It basically exposes a bit less to the user, which I think is good. 2) I've separated out the l2 forwarding acceleration operations from the net_device_operations structure. I'm not sure I like that yet, but I'm kind on leaning that way. Since a limited set of hardare supports forwarding acceleration, it makes for a nice easy way to group functionality without polluting the net_device_operations structure. It also lets us group simmilar functions together nicely (I can see a future l3_accel_ops structure if we can do l3 flows in hardware). Anywho, its a divergence from what we've been doing so I thought I would call attention to it. 3) I've included a l2_accel_xmit method in the accel_ops structure for fast path forwarding, but I'm not sure I like that. It seems we should be able to use ndo_start_xmit and key off some data to recognize that we should be doing hardware forwarding. I'm not quite sure how to do that yet though. Something to think about. 4) I've borrowed heavily from your vsi work of course just to get this building. I think theres probbaly alot of consolidation that can be done in the code that I added to ixgbe_main.c to make it smaller. Again, I just wanted to post this so you could speak up if you though this was all crap before I wen't too far down the rabbit hole. Regards Neil ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration 2013-09-25 20:16 [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman @ 2013-10-04 20:10 ` Neil Horman 2013-10-04 20:10 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman 2013-10-11 18:43 ` [RFC PATCH 0/2 v3] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman 1 sibling, 1 reply; 7+ messages in thread From: Neil Horman @ 2013-10-04 20:10 UTC (permalink / raw) To: netdev; +Cc: John Fastabend, Andy Gospodarek, David Miller Hey all- heres the next, updated version of the vsi/macvlan integration that we've been discussing. Some change notes: * Changes to the fowarding ops structure - Removed the priv_size field, and added a flags field. Removal of the priv_size field was accomplished by just having the add method return a void * and using ERR_PTR and PTR_ERR checks, which also allows us to allocate memory for the acceleration path in the driver, which I like. I'm not super happy still with how I'm using the flags (currenly only used to indicate support for feature sets), but at least we have the flags now, and they can be exposed to user space via iproute2 or ethtool if need be * Changes to the Transmit path - Specifically I'm using dev_queue_xmit to send frames now, which I like as it makes the macvlan subject to the lowerdevs qdisc configuration. * Changes to the acceleration fail path behavior - Now if we don't/can't use acceleration, we just fall back to using the normal macvlan software switch strategy * General clenups (some renaming, that I'm not super sure of, but I though forwarding acceleration (fwd) would be a better prefix than l2 acceleration). Still a long way to go I think, and lots of tweaking to do, but I didn't want to keep you waiting John. Anywho, take a look at what I'm doing and feel free to rip it apart. Thanks! Neil ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans 2013-10-04 20:10 ` [RFC PATCH 0/2 v2] " Neil Horman @ 2013-10-04 20:10 ` Neil Horman 0 siblings, 0 replies; 7+ messages in thread From: Neil Horman @ 2013-10-04 20:10 UTC (permalink / raw) To: netdev; +Cc: John Fastabend, Andy Gospodarek, David Miller, Neil Horman Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to take advantage of these operations. Allow it to allocate queues for a macvlan so that when we transmit a frame, we can do the switching in hardware inside the ixgbe card, rather than in software. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: John Fastabend <john.r.fastabend@intel.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 33 +- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h | 54 ++++ drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 15 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 373 +++++++++++++++++------ 5 files changed, 387 insertions(+), 92 deletions(-) create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 0ac6b11..e924efa 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -219,6 +219,17 @@ enum ixgbe_ring_state_t { __IXGBE_RX_FCOE, }; +struct ixgbe_fwd_adapter { + unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)]; + struct net_device *netdev; + struct ixgbe_adapter *real_adapter; + unsigned int tx_base_queue; + unsigned int rx_base_queue; + struct net_device_stats net_stats; + int pool; + bool online; +}; + #define check_for_tx_hang(ring) \ test_bit(__IXGBE_TX_DETECT_HANG, &(ring)->state) #define set_check_for_tx_hang(ring) \ @@ -236,6 +247,7 @@ struct ixgbe_ring { struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */ struct net_device *netdev; /* netdev ring belongs to */ struct device *dev; /* device for DMA mapping */ + struct ixgbe_fwd_adapter *l2_accel_priv; void *desc; /* descriptor ring memory */ union { struct ixgbe_tx_buffer *tx_buffer_info; @@ -244,6 +256,7 @@ struct ixgbe_ring { unsigned long last_rx_timestamp; unsigned long state; u8 __iomem *tail; + struct net_device *vmdq_netdev; dma_addr_t dma; /* phys. address of descriptor ring */ unsigned int size; /* length in bytes */ @@ -288,11 +301,15 @@ enum ixgbe_ring_f_enum { }; #define IXGBE_MAX_RSS_INDICES 16 -#define IXGBE_MAX_VMDQ_INDICES 64 +#define IXGBE_MAX_VMDQ_INDICES 32 #define IXGBE_MAX_FDIR_INDICES 63 /* based on q_vector limit */ #define IXGBE_MAX_FCOE_INDICES 8 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) +#define IXGBE_MAX_L2A_QUEUES 4 +#define IXGBE_MAX_L2A_QUEUES 4 +#define IXGBE_BAD_L2A_QUEUE 3 + struct ixgbe_ring_feature { u16 limit; /* upper limit on feature indices */ u16 indices; /* current value of indices */ @@ -738,6 +755,7 @@ struct ixgbe_adapter { #endif /*CONFIG_DEBUG_FS*/ u8 default_up; + unsigned long fwd_bitmask; /* Bitmask indicating in use pools */ }; struct ixgbe_fdir_filter { @@ -879,9 +897,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {} static inline void ixgbe_dbg_init(void) {} static inline void ixgbe_dbg_exit(void) {} #endif /* CONFIG_DEBUG_FS */ +static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring) +{ + return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev; +} + static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring) { - return netdev_get_tx_queue(ring->netdev, ring->queue_index); + return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index); } extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter); @@ -915,4 +938,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr); void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter); #endif +int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd); +int ixgbe_write_uc_addr_list(struct net_device *netdev); +netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, + struct ixgbe_adapter *adapter, + struct ixgbe_ring *tx_ring); +void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring); #endif /* _IXGBE_H_ */ diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c index e8649ab..277af14 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c @@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = { }; #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN -static int ixgbe_get_settings(struct net_device *netdev, - struct ethtool_cmd *ecmd) +int ixgbe_get_settings(struct net_device *netdev, + struct ethtool_cmd *ecmd) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h new file mode 100644 index 0000000..2f36584 --- /dev/null +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h @@ -0,0 +1,54 @@ +/******************************************************************************* + + Intel 10 Gigabit PCI Express Linux driver + Copyright(c) 1999 - 2013 Intel Corporation. + + This program is free software; you can redistribute it and/or modify it + under the terms and conditions of the GNU General Public License, + version 2, as published by the Free Software Foundation. + + This program is distributed in the hope it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + You should have received a copy of the GNU General Public License along with + this program; if not, write to the Free Software Foundation, Inc., + 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + + The full GNU General Public License is included in this distribution in + the file called "COPYING". + + Contact Information: + e1000-devel Mailing List <e1000-devel@lists.sourceforge.net> + Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 + +*******************************************************************************/ +#include "ixgbe.h" + + +static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter, + u64 qmask) +{ + u32 mask; + struct ixgbe_hw *hw = &adapter->hw; + + switch (hw->mac.type) { + case ixgbe_mac_82598EB: + mask = (IXGBE_EIMS_RTX_QUEUE & qmask); + IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask); + break; + case ixgbe_mac_82599EB: + case ixgbe_mac_X540: + mask = (qmask & 0xFFFFFFFF); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask); + mask = (qmask >> 32); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask); + break; + default: + break; + } + /* skip the flush */ +} diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c index 90b4e10..e2dd635 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c @@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter) #endif /* only proceed if SR-IOV is enabled */ - if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)) + if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) && + !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)) return false; /* Add starting offset to total pool count */ @@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, /* apply Tx specific ring traits */ ring->count = adapter->tx_ring_count; - ring->queue_index = txr_idx; + if (adapter->num_rx_pools > 1) + ring->queue_index = + txr_idx % adapter->num_rx_queues_per_pool; + else + ring->queue_index = txr_idx; /* assign ring to adapter */ adapter->tx_ring[txr_idx] = ring; @@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, #endif /* IXGBE_FCOE */ /* apply Rx specific ring traits */ ring->count = adapter->rx_ring_count; - ring->queue_index = rxr_idx; + if (adapter->num_rx_pools > 1) + ring->queue_index = + rxr_idx % adapter->num_rx_queues_per_pool; + else + ring->queue_index = rxr_idx; /* assign ring to adapter */ adapter->rx_ring[rxr_idx] = ring; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 0ade0cd..5fa553f 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -52,6 +52,7 @@ #include "ixgbe_common.h" #include "ixgbe_dcb_82599.h" #include "ixgbe_sriov.h" +#include "ixgbe_l2a.h" char ixgbe_driver_name[] = "ixgbe"; static const char ixgbe_driver_string[] = @@ -118,6 +119,8 @@ static DEFINE_PCI_DEVICE_TABLE(ixgbe_pci_tbl) = { }; MODULE_DEVICE_TABLE(pci, ixgbe_pci_tbl); +netdev_tx_t ixgbe_fwd_xmit_frame(struct sk_buff *skb, void *priv); + #ifdef CONFIG_IXGBE_DCA static int ixgbe_notify_dca(struct notifier_block *, unsigned long event, void *p); @@ -872,7 +875,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring) static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring) { - struct ixgbe_adapter *adapter = netdev_priv(ring->netdev); + struct net_device *dev = ring->netdev; + struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_hw *hw = &adapter->hw; u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx)); @@ -1055,7 +1059,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, tx_ring->next_to_use, i, tx_ring->tx_buffer_info[i].time_stamp, jiffies); - netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index); + netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); e_info(probe, "tx hang %d detected on queue %d, resetting adapter\n", @@ -1072,16 +1076,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, total_packets, total_bytes); #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2) - if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) && + if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) && (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) { /* Make sure that anybody stopping the queue after this * sees the new next_to_clean. */ smp_mb(); - if (__netif_subqueue_stopped(tx_ring->netdev, + if (__netif_subqueue_stopped(netdev_ring(tx_ring), tx_ring->queue_index) && !test_bit(__IXGBE_DOWN, &adapter->state)) { - netif_wake_subqueue(tx_ring->netdev, + netif_wake_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); ++tx_ring->tx_stats.restart_queue; } @@ -1226,7 +1230,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { - if (ring->netdev->features & NETIF_F_RXHASH) + if (netdev_ring(ring)->features & NETIF_F_RXHASH) skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss); } @@ -1260,10 +1264,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { + struct net_device *dev = netdev_ring(ring); + skb_checksum_none_assert(skb); /* Rx csum disabled */ - if (!(ring->netdev->features & NETIF_F_RXCSUM)) + if (!(dev->features & NETIF_F_RXCSUM)) return; /* if IP and error */ @@ -1559,7 +1565,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { - struct net_device *dev = rx_ring->netdev; + struct net_device *dev = netdev_ring(rx_ring); ixgbe_update_rsc_stats(rx_ring, skb); @@ -1739,7 +1745,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { - struct net_device *netdev = rx_ring->netdev; + struct net_device *netdev = netdev_ring(rx_ring); /* verify that the packet does not have any known errors */ if (unlikely(ixgbe_test_staterr(rx_desc, @@ -1905,7 +1911,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring, #endif /* allocate a skb to store the frags */ - skb = netdev_alloc_skb_ip_align(rx_ring->netdev, + skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring), IXGBE_RX_HDR_SIZE); if (unlikely(!skb)) { rx_ring->rx_stats.alloc_rx_buff_failed++; @@ -1986,6 +1992,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, struct ixgbe_adapter *adapter = q_vector->adapter; int ddp_bytes; unsigned int mss = 0; + struct net_device *netdev = netdev_ring(rx_ring); #endif /* IXGBE_FCOE */ u16 cleaned_count = ixgbe_desc_unused(rx_ring); @@ -1993,6 +2000,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, union ixgbe_adv_rx_desc *rx_desc; struct sk_buff *skb; + if (rx_ring->l2_accel_priv) { + printk(KERN_CRIT "RECEIVING ON AN ACCELERATED QUEUE\n"); + } + /* return some buffers to hardware, one at a time is too slow */ if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) { ixgbe_alloc_rx_buffers(rx_ring, cleaned_count); @@ -2041,7 +2052,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, /* include DDPed FCoE data */ if (ddp_bytes > 0) { if (!mss) { - mss = rx_ring->netdev->mtu - + mss = netdev->mtu - sizeof(struct fcoe_hdr) - sizeof(struct fc_frame_header) - sizeof(struct fcoe_crc_eof); @@ -2455,58 +2466,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter) } } -static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - struct ixgbe_hw *hw = &adapter->hw; - - switch (hw->mac.type) { - case ixgbe_mac_82598EB: - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask); - break; - case ixgbe_mac_82599EB: - case ixgbe_mac_X540: - mask = (qmask & 0xFFFFFFFF); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask); - mask = (qmask >> 32); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask); - break; - default: - break; - } - /* skip the flush */ -} - -static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - struct ixgbe_hw *hw = &adapter->hw; - - switch (hw->mac.type) { - case ixgbe_mac_82598EB: - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask); - break; - case ixgbe_mac_82599EB: - case ixgbe_mac_X540: - mask = (qmask & 0xFFFFFFFF); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask); - mask = (qmask >> 32); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask); - break; - default: - break; - } - /* skip the flush */ -} - /** * ixgbe_irq_enable - Enable default interrupt generation settings * @adapter: board private structure @@ -2946,6 +2905,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter) void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter, struct ixgbe_ring *ring) { + struct net_device *netdev = netdev_ring(ring); struct ixgbe_hw *hw = &adapter->hw; u64 tdba = ring->dma; int wait_loop = 10; @@ -3005,7 +2965,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter, struct ixgbe_q_vector *q_vector = ring->q_vector; if (q_vector) - netif_set_xps_queue(adapter->netdev, + netif_set_xps_queue(netdev, &q_vector->affinity_mask, ring->queue_index); } @@ -3395,7 +3355,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; int rss_i = adapter->ring_feature[RING_F_RSS].indices; - int p; + u16 pool; /* PSRTYPE must be initialized in non 82598 adapters */ u32 psrtype = IXGBE_PSRTYPE_TCPHDR | @@ -3412,9 +3372,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter) else if (rss_i > 1) psrtype |= 1 << 29; - for (p = 0; p < adapter->num_rx_pools; p++) - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)), - psrtype); + for_each_set_bit(pool, &adapter->fwd_bitmask, 32) + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype); } static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter) @@ -3683,6 +3642,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter) case ixgbe_mac_82599EB: case ixgbe_mac_X540: for (i = 0; i < adapter->num_rx_queues; i++) { + if (adapter->rx_ring[i]->vmdq_netdev) + continue; j = adapter->rx_ring[i]->reg_idx; vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); vlnctrl &= ~IXGBE_RXDCTL_VME; @@ -3713,6 +3674,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter) case ixgbe_mac_82599EB: case ixgbe_mac_X540: for (i = 0; i < adapter->num_rx_queues; i++) { + if (adapter->rx_ring[i]->vmdq_netdev) + continue; j = adapter->rx_ring[i]->reg_idx; vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); vlnctrl |= IXGBE_RXDCTL_VME; @@ -3743,15 +3706,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter) * 0 on no addresses written * X on writing X addresses to the RAR table **/ -static int ixgbe_write_uc_addr_list(struct net_device *netdev) +int ixgbe_write_uc_addr_list(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; unsigned int rar_entries = hw->mac.num_rar_entries - 1; int count = 0; - /* In SR-IOV mode significantly less RAR entries are available */ - if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) + /* In SR-IOV/VMDQ modes significantly less RAR entries are available */ + if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED || + adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) rar_entries = IXGBE_MAX_PF_MACVLANS - 1; /* return ENOMEM indicating insufficient memory for addresses */ @@ -3772,6 +3736,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev) count++; } } + /* write the addresses in reverse order to avoid write combining */ for (; rar_entries > 0 ; rar_entries--) hw->mac.ops.clear_rar(hw, rar_entries); @@ -4133,6 +4098,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter) ixgbe_configure_virtualization(adapter); ixgbe_set_rx_mode(adapter->netdev); + ixgbe_restore_vlan(adapter); switch (hw->mac.type) { @@ -4459,7 +4425,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter) * ixgbe_clean_rx_ring - Free Rx Buffers per Queue * @rx_ring: ring to free buffers from **/ -static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) +void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) { struct device *dev = rx_ring->dev; unsigned long size; @@ -4838,6 +4804,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter) return -EIO; } + /* PF holds first pool slot */ + set_bit(0, &adapter->fwd_bitmask); set_bit(__IXGBE_DOWN, &adapter->state); return 0; @@ -5143,7 +5111,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu) static int ixgbe_open(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - int err; + int err, queues; /* disallow open during test */ if (test_bit(__IXGBE_TESTING, &adapter->state)) @@ -5168,16 +5136,22 @@ static int ixgbe_open(struct net_device *netdev) goto err_req_irq; /* Notify the stack of the actual queue counts. */ - err = netif_set_real_num_tx_queues(netdev, - adapter->num_rx_pools > 1 ? 1 : - adapter->num_tx_queues); + if (adapter->num_rx_pools > 1 && + adapter->num_tx_queues > IXGBE_MAX_L2A_QUEUES) + queues = IXGBE_MAX_L2A_QUEUES; + else + queues = adapter->num_tx_queues; + + err = netif_set_real_num_tx_queues(netdev, queues); if (err) goto err_set_queues; - - err = netif_set_real_num_rx_queues(netdev, - adapter->num_rx_pools > 1 ? 1 : - adapter->num_rx_queues); + if (adapter->num_rx_pools > 1 && + adapter->num_rx_queues > IXGBE_MAX_L2A_QUEUES) + queues = IXGBE_MAX_L2A_QUEUES; + else + queues = adapter->num_rx_queues; + err = netif_set_real_num_rx_queues(netdev, queues); if (err) goto err_set_queues; @@ -5215,7 +5189,6 @@ static int ixgbe_close(struct net_device *netdev) struct ixgbe_adapter *adapter = netdev_priv(netdev); ixgbe_ptp_stop(adapter); - ixgbe_down(adapter); ixgbe_free_irq(adapter); @@ -6576,7 +6549,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring, static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) { - netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index); + netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); /* Herbert's original patch had: * smp_mb__after_netif_stop_queue(); * but since that doesn't exist yet, just open code it. */ @@ -6588,7 +6561,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) return -EBUSY; /* A reprieve! - use start_queue because it doesn't call schedule */ - netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index); + netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); ++tx_ring->tx_stats.restart_queue; return 0; } @@ -6639,6 +6612,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, struct ixgbe_ring *tx_ring) { struct ixgbe_tx_buffer *first; +#ifdef IXGBE_FCOE + struct net_device *dev; +#endif int tso; u32 tx_flags = 0; unsigned short f; @@ -6730,9 +6706,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, first->protocol = protocol; #ifdef IXGBE_FCOE + dev = netdev_ring(tx_ring); /* setup tx offload for FCoE */ if ((protocol == __constant_htons(ETH_P_FCOE)) && - (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) { + (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) { tso = ixgbe_fso(tx_ring, first, &hdr_len); if (tso < 0) goto out_drop; @@ -6784,7 +6761,15 @@ static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, skb_set_tail_pointer(skb, 17); } - tx_ring = adapter->tx_ring[skb->queue_mapping]; + if (skb->accel_priv) { + struct ixgbe_fwd_adapter *fwd_adapter = skb->accel_priv; + unsigned int queue; + + queue = skb->queue_mapping + fwd_adapter->tx_base_queue; + tx_ring = fwd_adapter->real_adapter->tx_ring[queue]; + } else + tx_ring = adapter->tx_ring[skb->queue_mapping]; + return ixgbe_xmit_frame_ring(skb, adapter, tx_ring); } @@ -7057,6 +7042,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc) */ if (netif_running(dev)) ixgbe_close(dev); + ixgbe_clear_interrupt_scheme(adapter); #ifdef CONFIG_IXGBE_DCB @@ -7305,6 +7291,217 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode); } +static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask) +{ + u32 mask; + struct ixgbe_hw *hw = &adapter->hw; + + switch (hw->mac.type) { + case ixgbe_mac_82598EB: + mask = (IXGBE_EIMS_RTX_QUEUE & qmask); + IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask); + break; + case ixgbe_mac_82599EB: + case ixgbe_mac_X540: + mask = (qmask & 0xFFFFFFFF); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask); + mask = (qmask >> 32); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask); + break; + default: + break; + } +} + +static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter, + u8 *addr, u16 pool) +{ + struct ixgbe_hw *hw = &adapter->hw; + unsigned int entry; + + entry = hw->mac.num_rar_entries - pool; + hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV); +} + +static void ixgbe_fwd_psrtype(struct ixgbe_fwd_adapter *vadapter) +{ + struct ixgbe_adapter *adapter = vadapter->real_adapter; + int rss_i = vadapter->netdev->real_num_rx_queues; + struct ixgbe_hw *hw = &adapter->hw; + u16 pool = vadapter->pool; + u32 psrtype = IXGBE_PSRTYPE_TCPHDR | + IXGBE_PSRTYPE_UDPHDR | + IXGBE_PSRTYPE_IPV4HDR | + IXGBE_PSRTYPE_L2HDR | + IXGBE_PSRTYPE_IPV6HDR; + + if (hw->mac.type == ixgbe_mac_82598EB) + return; + + if (rss_i > 3) + psrtype |= 2 << 29; + else if (rss_i > 1) + psrtype |= 1 << 29; + + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype); +} + +static void ixgbe_enable_fwd_ring(struct ixgbe_adapter *adapter, + struct ixgbe_ring *rx_ring, + struct ixgbe_fwd_adapter *accel) +{ + rx_ring->l2_accel_priv = accel; + ixgbe_configure_rx_ring(adapter, rx_ring); +} + + +static void ixgbe_disable_fwd_ring(struct ixgbe_fwd_adapter *vadapter, + struct ixgbe_ring *rx_ring) +{ + struct ixgbe_adapter *adapter = vadapter->real_adapter; + int index = rx_ring->queue_index + vadapter->rx_base_queue; + + /* shutdown specific queue receive and wait for dma to settle */ + ixgbe_disable_rx_queue(adapter, rx_ring); + usleep_range(10000, 20000); + ixgbe_irq_disable_queues(adapter, ((u64)1 << index)); + ixgbe_clean_rx_ring(rx_ring); + rx_ring->l2_accel_priv = NULL; +} + +int ixgbe_fwd_ring_up(struct net_device *vdev, struct ixgbe_fwd_adapter *accel) +{ + struct ixgbe_adapter *adapter = accel->real_adapter; + unsigned int rxbase = accel->pool * adapter->num_rx_queues_per_pool; + unsigned int txbase = accel->pool * adapter->num_rx_queues_per_pool; + int err, i; + + + accel->rx_base_queue = rxbase; + accel->tx_base_queue = txbase; + + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + + for (i = 0; i < vdev->num_rx_queues; i++) { + adapter->rx_ring[rxbase + i]->vmdq_netdev = vdev; + ixgbe_enable_fwd_ring(adapter, adapter->rx_ring[rxbase + i], accel); + } + + for (i = 0; i < vdev->num_tx_queues; i++) + adapter->tx_ring[txbase + i]->vmdq_netdev = vdev; + + if (is_valid_ether_addr(vdev->dev_addr)) + ixgbe_add_mac_filter(adapter, vdev->dev_addr, accel->pool); + + err = netif_set_real_num_tx_queues(vdev, vdev->num_tx_queues); + if (err) + goto err_set_queues; + err = netif_set_real_num_rx_queues(vdev, vdev->num_rx_queues); + if (err) + goto err_set_queues; + + ixgbe_fwd_psrtype(accel); + netif_tx_start_all_queues(vdev); + return 0; +err_set_queues: + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + return err; +} + +int ixgbe_fwd_ring_down(struct net_device *vdev, struct ixgbe_fwd_adapter *accel) +{ + struct ixgbe_adapter *adapter = accel->real_adapter; + unsigned int rxbase = accel->rx_base_queue; + int i; + + netif_tx_stop_all_queues(vdev); + + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + + return 0; +} + +static void* ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev) +{ + struct ixgbe_fwd_adapter *fwd_adapter = NULL; + struct ixgbe_adapter *adapter = netdev_priv(pdev); + int pool, vmdq_pool, base_queue; + int err; + + /* Check for hardware restriction on number of rx/tx queues */ + if (vdev->num_rx_queues != vdev->num_tx_queues || + vdev->num_tx_queues > IXGBE_MAX_L2A_QUEUES || + vdev->num_tx_queues == IXGBE_BAD_L2A_QUEUE) { + netdev_info(pdev, "%s: Supports RX/TX Queue counts 1,2, and 4\n", + pdev->name); + return ERR_PTR(-EINVAL); + } + + if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES) + return ERR_PTR(-EBUSY); + + fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL); + if (!fwd_adapter) + return ERR_PTR(-ENOMEM); + + pool = find_first_zero_bit(&adapter->fwd_bitmask, 32); + adapter->num_rx_pools++; + set_bit(pool, &adapter->fwd_bitmask); + + /* Enable VMDq flag so device will be set in VM mode */ + adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED; + adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools; + adapter->ring_feature[RING_F_VMDQ].offset = 0; + adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_L2A_QUEUES; + + /* Force reinit of ring allocation with VMDQ enabled */ + ixgbe_setup_tc(pdev, netdev_get_num_tc(pdev)); + + /* Configure VSI adapter structure */ + vmdq_pool = VMDQ_P(pool); + base_queue = vmdq_pool * adapter->num_rx_queues_per_pool; + + netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n", + pool, adapter->num_rx_pools, + base_queue, base_queue + adapter->num_rx_queues_per_pool, + adapter->fwd_bitmask); + + fwd_adapter->pool = pool; + fwd_adapter->netdev = vdev; + fwd_adapter->real_adapter = adapter; + fwd_adapter->rx_base_queue = base_queue; + fwd_adapter->tx_base_queue = base_queue; + + err = ixgbe_fwd_ring_up(vdev, fwd_adapter); + if (!err) { + kfree(fwd_adapter); + return ERR_PTR(err); + } + return fwd_adapter; +} + +static void ixgbe_fwd_del(struct net_device *pdev, void *priv) +{ + struct ixgbe_fwd_adapter *fwd_adapter = priv; + struct ixgbe_adapter *adapter = fwd_adapter->real_adapter; + + clear_bit(fwd_adapter->pool, &adapter->fwd_bitmask); + adapter->num_rx_pools--; + + ixgbe_fwd_ring_down(fwd_adapter->netdev, fwd_adapter); + + netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n", + fwd_adapter->pool, adapter->num_rx_pools, + fwd_adapter->rx_base_queue, + fwd_adapter->rx_base_queue + adapter->num_rx_queues_per_pool, + adapter->fwd_bitmask); +} + static const struct net_device_ops ixgbe_netdev_ops = { .ndo_open = ixgbe_open, .ndo_stop = ixgbe_close, @@ -7351,6 +7548,11 @@ static const struct net_device_ops ixgbe_netdev_ops = { .ndo_bridge_getlink = ixgbe_ndo_bridge_getlink, }; +const struct forwarding_accel_ops ixgbe_fwd_ops = { + .fwd_accel_add_station = ixgbe_fwd_add, + .fwd_accel_del_station = ixgbe_fwd_del, +}; + /** * ixgbe_enumerate_functions - Get the number of ports this device has * @adapter: adapter structure @@ -7554,6 +7756,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent) } netdev->netdev_ops = &ixgbe_netdev_ops; + netdev->fwd_ops = &ixgbe_fwd_ops; ixgbe_set_ethtool_ops(netdev); netdev->watchdog_timeo = 5 * HZ; strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 0/2 v3] net: alternate proposal for using macvlans with forwarding acceleration 2013-09-25 20:16 [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman 2013-10-04 20:10 ` [RFC PATCH 0/2 v2] " Neil Horman @ 2013-10-11 18:43 ` Neil Horman 2013-10-11 18:43 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman 1 sibling, 1 reply; 7+ messages in thread From: Neil Horman @ 2013-10-11 18:43 UTC (permalink / raw) To: netdev; +Cc: john.fastabend, Andy Gospodarek, David Miller Hey all- heres the next, updated version of the vsi/macvlan integration that we've been discussing. Change notes: * Moved the feature flag to netdev_features.h. No ethtool option for disabling it yet, but its there now, and seems to fit fairly well. I was actually thinking about your comment John, regarding the clumsiness in allowing sw and hw accel vlans on the same lowerdev, and it just occured to me that we could use the same flag on the macvlan device directly - i.e. if we found that a lowerdev supported acceleration, then call ndo_dfwd_station_add, and, if successfull, set the same feature flag in the macvlan device. Then we could use ethtool to control the enabling/disabling of acceleration at the macvlan device directly. Thoughts? * Moved the acceleration net device methods back into net_device_ops. Looks pretty good to me there. * Restored the use of a separate xmit routine so we weren't subject to the lowerdevs queue disciplines. I integrated its use with dev_hard_start_xmit, so we could share the use of the linearization code, etc. Let me know what you think. Best Neil ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans 2013-10-11 18:43 ` [RFC PATCH 0/2 v3] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman @ 2013-10-11 18:43 ` Neil Horman 2013-10-13 20:48 ` John Fastabend 0 siblings, 1 reply; 7+ messages in thread From: Neil Horman @ 2013-10-11 18:43 UTC (permalink / raw) To: netdev; +Cc: john.fastabend, Andy Gospodarek, David Miller, Neil Horman Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to take advantage of these operations. Allow it to allocate queues for a macvlan so that when we transmit a frame, we can do the switching in hardware inside the ixgbe card, rather than in software. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: john.fastabend@gmail.com CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 33 +- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h | 54 ++++ drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 15 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 388 ++++++++++++++++++----- 5 files changed, 400 insertions(+), 94 deletions(-) create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 0ac6b11..e924efa 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -219,6 +219,17 @@ enum ixgbe_ring_state_t { __IXGBE_RX_FCOE, }; +struct ixgbe_fwd_adapter { + unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)]; + struct net_device *netdev; + struct ixgbe_adapter *real_adapter; + unsigned int tx_base_queue; + unsigned int rx_base_queue; + struct net_device_stats net_stats; + int pool; + bool online; +}; + #define check_for_tx_hang(ring) \ test_bit(__IXGBE_TX_DETECT_HANG, &(ring)->state) #define set_check_for_tx_hang(ring) \ @@ -236,6 +247,7 @@ struct ixgbe_ring { struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */ struct net_device *netdev; /* netdev ring belongs to */ struct device *dev; /* device for DMA mapping */ + struct ixgbe_fwd_adapter *l2_accel_priv; void *desc; /* descriptor ring memory */ union { struct ixgbe_tx_buffer *tx_buffer_info; @@ -244,6 +256,7 @@ struct ixgbe_ring { unsigned long last_rx_timestamp; unsigned long state; u8 __iomem *tail; + struct net_device *vmdq_netdev; dma_addr_t dma; /* phys. address of descriptor ring */ unsigned int size; /* length in bytes */ @@ -288,11 +301,15 @@ enum ixgbe_ring_f_enum { }; #define IXGBE_MAX_RSS_INDICES 16 -#define IXGBE_MAX_VMDQ_INDICES 64 +#define IXGBE_MAX_VMDQ_INDICES 32 #define IXGBE_MAX_FDIR_INDICES 63 /* based on q_vector limit */ #define IXGBE_MAX_FCOE_INDICES 8 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1) +#define IXGBE_MAX_L2A_QUEUES 4 +#define IXGBE_MAX_L2A_QUEUES 4 +#define IXGBE_BAD_L2A_QUEUE 3 + struct ixgbe_ring_feature { u16 limit; /* upper limit on feature indices */ u16 indices; /* current value of indices */ @@ -738,6 +755,7 @@ struct ixgbe_adapter { #endif /*CONFIG_DEBUG_FS*/ u8 default_up; + unsigned long fwd_bitmask; /* Bitmask indicating in use pools */ }; struct ixgbe_fdir_filter { @@ -879,9 +897,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {} static inline void ixgbe_dbg_init(void) {} static inline void ixgbe_dbg_exit(void) {} #endif /* CONFIG_DEBUG_FS */ +static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring) +{ + return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev; +} + static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring) { - return netdev_get_tx_queue(ring->netdev, ring->queue_index); + return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index); } extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter); @@ -915,4 +938,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr); void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter); #endif +int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd); +int ixgbe_write_uc_addr_list(struct net_device *netdev); +netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, + struct ixgbe_adapter *adapter, + struct ixgbe_ring *tx_ring); +void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring); #endif /* _IXGBE_H_ */ diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c index e8649ab..277af14 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c @@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = { }; #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN -static int ixgbe_get_settings(struct net_device *netdev, - struct ethtool_cmd *ecmd) +int ixgbe_get_settings(struct net_device *netdev, + struct ethtool_cmd *ecmd) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h new file mode 100644 index 0000000..2f36584 --- /dev/null +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h @@ -0,0 +1,54 @@ +/******************************************************************************* + + Intel 10 Gigabit PCI Express Linux driver + Copyright(c) 1999 - 2013 Intel Corporation. + + This program is free software; you can redistribute it and/or modify it + under the terms and conditions of the GNU General Public License, + version 2, as published by the Free Software Foundation. + + This program is distributed in the hope it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + You should have received a copy of the GNU General Public License along with + this program; if not, write to the Free Software Foundation, Inc., + 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + + The full GNU General Public License is included in this distribution in + the file called "COPYING". + + Contact Information: + e1000-devel Mailing List <e1000-devel@lists.sourceforge.net> + Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497 + +*******************************************************************************/ +#include "ixgbe.h" + + +static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter, + u64 qmask) +{ + u32 mask; + struct ixgbe_hw *hw = &adapter->hw; + + switch (hw->mac.type) { + case ixgbe_mac_82598EB: + mask = (IXGBE_EIMS_RTX_QUEUE & qmask); + IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask); + break; + case ixgbe_mac_82599EB: + case ixgbe_mac_X540: + mask = (qmask & 0xFFFFFFFF); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask); + mask = (qmask >> 32); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask); + break; + default: + break; + } + /* skip the flush */ +} diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c index 90b4e10..e2dd635 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c @@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter) #endif /* only proceed if SR-IOV is enabled */ - if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)) + if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) && + !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)) return false; /* Add starting offset to total pool count */ @@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, /* apply Tx specific ring traits */ ring->count = adapter->tx_ring_count; - ring->queue_index = txr_idx; + if (adapter->num_rx_pools > 1) + ring->queue_index = + txr_idx % adapter->num_rx_queues_per_pool; + else + ring->queue_index = txr_idx; /* assign ring to adapter */ adapter->tx_ring[txr_idx] = ring; @@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter, #endif /* IXGBE_FCOE */ /* apply Rx specific ring traits */ ring->count = adapter->rx_ring_count; - ring->queue_index = rxr_idx; + if (adapter->num_rx_pools > 1) + ring->queue_index = + rxr_idx % adapter->num_rx_queues_per_pool; + else + ring->queue_index = rxr_idx; /* assign ring to adapter */ adapter->rx_ring[rxr_idx] = ring; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 0ade0cd..582d30b 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -52,6 +52,7 @@ #include "ixgbe_common.h" #include "ixgbe_dcb_82599.h" #include "ixgbe_sriov.h" +#include "ixgbe_l2a.h" char ixgbe_driver_name[] = "ixgbe"; static const char ixgbe_driver_string[] = @@ -118,6 +119,8 @@ static DEFINE_PCI_DEVICE_TABLE(ixgbe_pci_tbl) = { }; MODULE_DEVICE_TABLE(pci, ixgbe_pci_tbl); +netdev_tx_t ixgbe_fwd_xmit_frame(struct sk_buff *skb, void *priv); + #ifdef CONFIG_IXGBE_DCA static int ixgbe_notify_dca(struct notifier_block *, unsigned long event, void *p); @@ -872,7 +875,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring) static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring) { - struct ixgbe_adapter *adapter = netdev_priv(ring->netdev); + struct net_device *dev = ring->netdev; + struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_hw *hw = &adapter->hw; u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx)); @@ -1055,7 +1059,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, tx_ring->next_to_use, i, tx_ring->tx_buffer_info[i].time_stamp, jiffies); - netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index); + netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); e_info(probe, "tx hang %d detected on queue %d, resetting adapter\n", @@ -1072,16 +1076,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, total_packets, total_bytes); #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2) - if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) && + if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) && (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) { /* Make sure that anybody stopping the queue after this * sees the new next_to_clean. */ smp_mb(); - if (__netif_subqueue_stopped(tx_ring->netdev, + if (__netif_subqueue_stopped(netdev_ring(tx_ring), tx_ring->queue_index) && !test_bit(__IXGBE_DOWN, &adapter->state)) { - netif_wake_subqueue(tx_ring->netdev, + netif_wake_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); ++tx_ring->tx_stats.restart_queue; } @@ -1226,7 +1230,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { - if (ring->netdev->features & NETIF_F_RXHASH) + if (netdev_ring(ring)->features & NETIF_F_RXHASH) skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss); } @@ -1260,10 +1264,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { + struct net_device *dev = netdev_ring(ring); + skb_checksum_none_assert(skb); /* Rx csum disabled */ - if (!(ring->netdev->features & NETIF_F_RXCSUM)) + if (!(dev->features & NETIF_F_RXCSUM)) return; /* if IP and error */ @@ -1559,7 +1565,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { - struct net_device *dev = rx_ring->netdev; + struct net_device *dev = netdev_ring(rx_ring); ixgbe_update_rsc_stats(rx_ring, skb); @@ -1739,7 +1745,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring, union ixgbe_adv_rx_desc *rx_desc, struct sk_buff *skb) { - struct net_device *netdev = rx_ring->netdev; + struct net_device *netdev = netdev_ring(rx_ring); /* verify that the packet does not have any known errors */ if (unlikely(ixgbe_test_staterr(rx_desc, @@ -1905,7 +1911,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring, #endif /* allocate a skb to store the frags */ - skb = netdev_alloc_skb_ip_align(rx_ring->netdev, + skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring), IXGBE_RX_HDR_SIZE); if (unlikely(!skb)) { rx_ring->rx_stats.alloc_rx_buff_failed++; @@ -1986,6 +1992,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, struct ixgbe_adapter *adapter = q_vector->adapter; int ddp_bytes; unsigned int mss = 0; + struct net_device *netdev = netdev_ring(rx_ring); #endif /* IXGBE_FCOE */ u16 cleaned_count = ixgbe_desc_unused(rx_ring); @@ -1993,6 +2000,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, union ixgbe_adv_rx_desc *rx_desc; struct sk_buff *skb; + if (rx_ring->l2_accel_priv) { + printk(KERN_CRIT "RECEIVING ON AN ACCELERATED QUEUE\n"); + } + /* return some buffers to hardware, one at a time is too slow */ if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) { ixgbe_alloc_rx_buffers(rx_ring, cleaned_count); @@ -2041,7 +2052,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, /* include DDPed FCoE data */ if (ddp_bytes > 0) { if (!mss) { - mss = rx_ring->netdev->mtu - + mss = netdev->mtu - sizeof(struct fcoe_hdr) - sizeof(struct fc_frame_header) - sizeof(struct fcoe_crc_eof); @@ -2455,58 +2466,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter) } } -static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - struct ixgbe_hw *hw = &adapter->hw; - - switch (hw->mac.type) { - case ixgbe_mac_82598EB: - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask); - break; - case ixgbe_mac_82599EB: - case ixgbe_mac_X540: - mask = (qmask & 0xFFFFFFFF); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask); - mask = (qmask >> 32); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask); - break; - default: - break; - } - /* skip the flush */ -} - -static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, - u64 qmask) -{ - u32 mask; - struct ixgbe_hw *hw = &adapter->hw; - - switch (hw->mac.type) { - case ixgbe_mac_82598EB: - mask = (IXGBE_EIMS_RTX_QUEUE & qmask); - IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask); - break; - case ixgbe_mac_82599EB: - case ixgbe_mac_X540: - mask = (qmask & 0xFFFFFFFF); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask); - mask = (qmask >> 32); - if (mask) - IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask); - break; - default: - break; - } - /* skip the flush */ -} - /** * ixgbe_irq_enable - Enable default interrupt generation settings * @adapter: board private structure @@ -2946,6 +2905,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter) void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter, struct ixgbe_ring *ring) { + struct net_device *netdev = netdev_ring(ring); struct ixgbe_hw *hw = &adapter->hw; u64 tdba = ring->dma; int wait_loop = 10; @@ -3005,7 +2965,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter, struct ixgbe_q_vector *q_vector = ring->q_vector; if (q_vector) - netif_set_xps_queue(adapter->netdev, + netif_set_xps_queue(netdev, &q_vector->affinity_mask, ring->queue_index); } @@ -3395,7 +3355,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; int rss_i = adapter->ring_feature[RING_F_RSS].indices; - int p; + u16 pool; /* PSRTYPE must be initialized in non 82598 adapters */ u32 psrtype = IXGBE_PSRTYPE_TCPHDR | @@ -3412,9 +3372,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter) else if (rss_i > 1) psrtype |= 1 << 29; - for (p = 0; p < adapter->num_rx_pools; p++) - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)), - psrtype); + for_each_set_bit(pool, &adapter->fwd_bitmask, 32) + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype); } static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter) @@ -3683,6 +3642,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter) case ixgbe_mac_82599EB: case ixgbe_mac_X540: for (i = 0; i < adapter->num_rx_queues; i++) { + if (adapter->rx_ring[i]->vmdq_netdev) + continue; j = adapter->rx_ring[i]->reg_idx; vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); vlnctrl &= ~IXGBE_RXDCTL_VME; @@ -3713,6 +3674,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter) case ixgbe_mac_82599EB: case ixgbe_mac_X540: for (i = 0; i < adapter->num_rx_queues; i++) { + if (adapter->rx_ring[i]->vmdq_netdev) + continue; j = adapter->rx_ring[i]->reg_idx; vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j)); vlnctrl |= IXGBE_RXDCTL_VME; @@ -3743,15 +3706,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter) * 0 on no addresses written * X on writing X addresses to the RAR table **/ -static int ixgbe_write_uc_addr_list(struct net_device *netdev) +int ixgbe_write_uc_addr_list(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; unsigned int rar_entries = hw->mac.num_rar_entries - 1; int count = 0; - /* In SR-IOV mode significantly less RAR entries are available */ - if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) + /* In SR-IOV/VMDQ modes significantly less RAR entries are available */ + if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED || + adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) rar_entries = IXGBE_MAX_PF_MACVLANS - 1; /* return ENOMEM indicating insufficient memory for addresses */ @@ -3772,6 +3736,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev) count++; } } + /* write the addresses in reverse order to avoid write combining */ for (; rar_entries > 0 ; rar_entries--) hw->mac.ops.clear_rar(hw, rar_entries); @@ -4133,6 +4098,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter) ixgbe_configure_virtualization(adapter); ixgbe_set_rx_mode(adapter->netdev); + ixgbe_restore_vlan(adapter); switch (hw->mac.type) { @@ -4459,7 +4425,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter) * ixgbe_clean_rx_ring - Free Rx Buffers per Queue * @rx_ring: ring to free buffers from **/ -static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) +void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring) { struct device *dev = rx_ring->dev; unsigned long size; @@ -4838,6 +4804,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter) return -EIO; } + /* PF holds first pool slot */ + set_bit(0, &adapter->fwd_bitmask); set_bit(__IXGBE_DOWN, &adapter->state); return 0; @@ -5143,7 +5111,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu) static int ixgbe_open(struct net_device *netdev) { struct ixgbe_adapter *adapter = netdev_priv(netdev); - int err; + int err, queues; /* disallow open during test */ if (test_bit(__IXGBE_TESTING, &adapter->state)) @@ -5168,16 +5136,22 @@ static int ixgbe_open(struct net_device *netdev) goto err_req_irq; /* Notify the stack of the actual queue counts. */ - err = netif_set_real_num_tx_queues(netdev, - adapter->num_rx_pools > 1 ? 1 : - adapter->num_tx_queues); + if (adapter->num_rx_pools > 1 && + adapter->num_tx_queues > IXGBE_MAX_L2A_QUEUES) + queues = IXGBE_MAX_L2A_QUEUES; + else + queues = adapter->num_tx_queues; + + err = netif_set_real_num_tx_queues(netdev, queues); if (err) goto err_set_queues; - - err = netif_set_real_num_rx_queues(netdev, - adapter->num_rx_pools > 1 ? 1 : - adapter->num_rx_queues); + if (adapter->num_rx_pools > 1 && + adapter->num_rx_queues > IXGBE_MAX_L2A_QUEUES) + queues = IXGBE_MAX_L2A_QUEUES; + else + queues = adapter->num_rx_queues; + err = netif_set_real_num_rx_queues(netdev, queues); if (err) goto err_set_queues; @@ -5215,7 +5189,6 @@ static int ixgbe_close(struct net_device *netdev) struct ixgbe_adapter *adapter = netdev_priv(netdev); ixgbe_ptp_stop(adapter); - ixgbe_down(adapter); ixgbe_free_irq(adapter); @@ -6576,7 +6549,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring, static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) { - netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index); + netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); /* Herbert's original patch had: * smp_mb__after_netif_stop_queue(); * but since that doesn't exist yet, just open code it. */ @@ -6588,7 +6561,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) return -EBUSY; /* A reprieve! - use start_queue because it doesn't call schedule */ - netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index); + netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index); ++tx_ring->tx_stats.restart_queue; return 0; } @@ -6639,6 +6612,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, struct ixgbe_ring *tx_ring) { struct ixgbe_tx_buffer *first; +#ifdef IXGBE_FCOE + struct net_device *dev; +#endif int tso; u32 tx_flags = 0; unsigned short f; @@ -6730,9 +6706,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb, first->protocol = protocol; #ifdef IXGBE_FCOE + dev = netdev_ring(tx_ring); /* setup tx offload for FCoE */ if ((protocol == __constant_htons(ETH_P_FCOE)) && - (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) { + (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) { tso = ixgbe_fso(tx_ring, first, &hdr_len); if (tso < 0) goto out_drop; @@ -6767,8 +6744,9 @@ out_drop: return NETDEV_TX_OK; } -static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, - struct net_device *netdev) +static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb, + struct net_device *netdev, + struct ixgbe_ring *ring) { struct ixgbe_adapter *adapter = netdev_priv(netdev); struct ixgbe_ring *tx_ring; @@ -6784,10 +6762,17 @@ static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, skb_set_tail_pointer(skb, 17); } - tx_ring = adapter->tx_ring[skb->queue_mapping]; + tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping]; + return ixgbe_xmit_frame_ring(skb, adapter, tx_ring); } +static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb, + struct net_device*netdev) +{ + return __ixgbe_xmit_frame(skb, netdev, NULL); +} + /** * ixgbe_set_mac - Change the Ethernet Address of the NIC * @netdev: network interface device structure @@ -7057,6 +7042,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc) */ if (netif_running(dev)) ixgbe_close(dev); + ixgbe_clear_interrupt_scheme(adapter); #ifdef CONFIG_IXGBE_DCB @@ -7305,6 +7291,231 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode); } +static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask) +{ + u32 mask; + struct ixgbe_hw *hw = &adapter->hw; + + switch (hw->mac.type) { + case ixgbe_mac_82598EB: + mask = (IXGBE_EIMS_RTX_QUEUE & qmask); + IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask); + break; + case ixgbe_mac_82599EB: + case ixgbe_mac_X540: + mask = (qmask & 0xFFFFFFFF); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask); + mask = (qmask >> 32); + if (mask) + IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask); + break; + default: + break; + } +} + +static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter, + u8 *addr, u16 pool) +{ + struct ixgbe_hw *hw = &adapter->hw; + unsigned int entry; + + entry = hw->mac.num_rar_entries - pool; + hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV); +} + +static void ixgbe_fwd_psrtype(struct ixgbe_fwd_adapter *vadapter) +{ + struct ixgbe_adapter *adapter = vadapter->real_adapter; + int rss_i = vadapter->netdev->real_num_rx_queues; + struct ixgbe_hw *hw = &adapter->hw; + u16 pool = vadapter->pool; + u32 psrtype = IXGBE_PSRTYPE_TCPHDR | + IXGBE_PSRTYPE_UDPHDR | + IXGBE_PSRTYPE_IPV4HDR | + IXGBE_PSRTYPE_L2HDR | + IXGBE_PSRTYPE_IPV6HDR; + + if (hw->mac.type == ixgbe_mac_82598EB) + return; + + if (rss_i > 3) + psrtype |= 2 << 29; + else if (rss_i > 1) + psrtype |= 1 << 29; + + IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype); +} + +static void ixgbe_enable_fwd_ring(struct ixgbe_adapter *adapter, + struct ixgbe_ring *rx_ring, + struct ixgbe_fwd_adapter *accel) +{ + rx_ring->l2_accel_priv = accel; + ixgbe_configure_rx_ring(adapter, rx_ring); +} + + +static void ixgbe_disable_fwd_ring(struct ixgbe_fwd_adapter *vadapter, + struct ixgbe_ring *rx_ring) +{ + struct ixgbe_adapter *adapter = vadapter->real_adapter; + int index = rx_ring->queue_index + vadapter->rx_base_queue; + + /* shutdown specific queue receive and wait for dma to settle */ + ixgbe_disable_rx_queue(adapter, rx_ring); + usleep_range(10000, 20000); + ixgbe_irq_disable_queues(adapter, ((u64)1 << index)); + ixgbe_clean_rx_ring(rx_ring); + rx_ring->l2_accel_priv = NULL; +} + +int ixgbe_fwd_ring_up(struct net_device *vdev, struct ixgbe_fwd_adapter *accel) +{ + struct ixgbe_adapter *adapter = accel->real_adapter; + unsigned int rxbase = accel->pool * adapter->num_rx_queues_per_pool; + unsigned int txbase = accel->pool * adapter->num_rx_queues_per_pool; + int err, i; + + + accel->rx_base_queue = rxbase; + accel->tx_base_queue = txbase; + + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + + for (i = 0; i < vdev->num_rx_queues; i++) { + adapter->rx_ring[rxbase + i]->vmdq_netdev = vdev; + ixgbe_enable_fwd_ring(adapter, adapter->rx_ring[rxbase + i], accel); + } + + for (i = 0; i < vdev->num_tx_queues; i++) + adapter->tx_ring[txbase + i]->vmdq_netdev = vdev; + + if (is_valid_ether_addr(vdev->dev_addr)) + ixgbe_add_mac_filter(adapter, vdev->dev_addr, accel->pool); + + err = netif_set_real_num_tx_queues(vdev, vdev->num_tx_queues); + if (err) + goto err_set_queues; + err = netif_set_real_num_rx_queues(vdev, vdev->num_rx_queues); + if (err) + goto err_set_queues; + + ixgbe_fwd_psrtype(accel); + netif_tx_start_all_queues(vdev); + return 0; +err_set_queues: + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + return err; +} + +int ixgbe_fwd_ring_down(struct net_device *vdev, struct ixgbe_fwd_adapter *accel) +{ + struct ixgbe_adapter *adapter = accel->real_adapter; + unsigned int rxbase = accel->rx_base_queue; + int i; + + netif_tx_stop_all_queues(vdev); + + for (i = 0; i < vdev->num_rx_queues; i++) + ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]); + + return 0; +} + +static void* ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev) +{ + struct ixgbe_fwd_adapter *fwd_adapter = NULL; + struct ixgbe_adapter *adapter = netdev_priv(pdev); + int pool, vmdq_pool, base_queue; + int err; + + /* Check for hardware restriction on number of rx/tx queues */ + if (vdev->num_rx_queues != vdev->num_tx_queues || + vdev->num_tx_queues > IXGBE_MAX_L2A_QUEUES || + vdev->num_tx_queues == IXGBE_BAD_L2A_QUEUE) { + netdev_info(pdev, "%s: Supports RX/TX Queue counts 1,2, and 4\n", + pdev->name); + return ERR_PTR(-EINVAL); + } + + if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES) + return ERR_PTR(-EBUSY); + + fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL); + if (!fwd_adapter) + return ERR_PTR(-ENOMEM); + + pool = find_first_zero_bit(&adapter->fwd_bitmask, 32); + adapter->num_rx_pools++; + set_bit(pool, &adapter->fwd_bitmask); + + /* Enable VMDq flag so device will be set in VM mode */ + adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED; + adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools; + adapter->ring_feature[RING_F_VMDQ].offset = 0; + adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_L2A_QUEUES; + + /* Force reinit of ring allocation with VMDQ enabled */ + ixgbe_setup_tc(pdev, netdev_get_num_tc(pdev)); + + /* Configure VSI adapter structure */ + vmdq_pool = VMDQ_P(pool); + base_queue = vmdq_pool * adapter->num_rx_queues_per_pool; + + netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n", + pool, adapter->num_rx_pools, + base_queue, base_queue + adapter->num_rx_queues_per_pool, + adapter->fwd_bitmask); + + fwd_adapter->pool = pool; + fwd_adapter->netdev = vdev; + fwd_adapter->real_adapter = adapter; + fwd_adapter->rx_base_queue = base_queue; + fwd_adapter->tx_base_queue = base_queue; + + err = ixgbe_fwd_ring_up(vdev, fwd_adapter); + if (!err) { + kfree(fwd_adapter); + return ERR_PTR(err); + } + return fwd_adapter; +} + +static void ixgbe_fwd_del(struct net_device *pdev, void *priv) +{ + struct ixgbe_fwd_adapter *fwd_adapter = priv; + struct ixgbe_adapter *adapter = fwd_adapter->real_adapter; + + clear_bit(fwd_adapter->pool, &adapter->fwd_bitmask); + adapter->num_rx_pools--; + + ixgbe_fwd_ring_down(fwd_adapter->netdev, fwd_adapter); + + netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n", + fwd_adapter->pool, adapter->num_rx_pools, + fwd_adapter->rx_base_queue, + fwd_adapter->rx_base_queue + adapter->num_rx_queues_per_pool, + adapter->fwd_bitmask); +} + +static netdev_tx_t ixgbe_fwd_xmit (struct sk_buff *skb, + struct net_device *dev, + void *priv) +{ + struct ixgbe_fwd_adapter *fwd_adapter = priv; + unsigned int queue; + struct ixgbe_ring *tx_ring; + + queue = skb->queue_mapping + fwd_adapter->tx_base_queue; + tx_ring = fwd_adapter->real_adapter->tx_ring[queue]; + + return __ixgbe_xmit_frame(skb, dev, tx_ring); +} + static const struct net_device_ops ixgbe_netdev_ops = { .ndo_open = ixgbe_open, .ndo_stop = ixgbe_close, @@ -7349,6 +7560,9 @@ static const struct net_device_ops ixgbe_netdev_ops = { .ndo_fdb_add = ixgbe_ndo_fdb_add, .ndo_bridge_setlink = ixgbe_ndo_bridge_setlink, .ndo_bridge_getlink = ixgbe_ndo_bridge_getlink, + .ndo_dfwd_add_station = ixgbe_fwd_add, + .ndo_dfwd_del_station = ixgbe_fwd_del, + .ndo_dfwd_start_xmit = ixgbe_fwd_xmit, }; /** -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans 2013-10-11 18:43 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman @ 2013-10-13 20:48 ` John Fastabend 2013-10-14 10:50 ` Neil Horman 0 siblings, 1 reply; 7+ messages in thread From: John Fastabend @ 2013-10-13 20:48 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, Andy Gospodarek, David Miller On 10/11/2013 11:43 AM, Neil Horman wrote: > Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to > take advantage of these operations. Allow it to allocate queues for a macvlan > so that when we transmit a frame, we can do the switching in hardware inside the > ixgbe card, rather than in software. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > CC: john.fastabend@gmail.com > CC: Andy Gospodarek <andy@greyhouse.net> > CC: "David S. Miller" <davem@davemloft.net> > --- Neil, I'm fairly sure I can simplify this patch further. I'll be able to take a shot at it Tuesday if you don't mind waiting. I can resubmit the series then and preserve your signed-off on at least the first patch. Seem reasonable? .John -- John Fastabend Intel Corporation ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans 2013-10-13 20:48 ` John Fastabend @ 2013-10-14 10:50 ` Neil Horman 0 siblings, 0 replies; 7+ messages in thread From: Neil Horman @ 2013-10-14 10:50 UTC (permalink / raw) To: John Fastabend; +Cc: netdev, Andy Gospodarek, David Miller On Sun, Oct 13, 2013 at 01:48:33PM -0700, John Fastabend wrote: > On 10/11/2013 11:43 AM, Neil Horman wrote: > >Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to > >take advantage of these operations. Allow it to allocate queues for a macvlan > >so that when we transmit a frame, we can do the switching in hardware inside the > >ixgbe card, rather than in software. > > > >Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > >CC: john.fastabend@gmail.com > >CC: Andy Gospodarek <andy@greyhouse.net> > >CC: "David S. Miller" <davem@davemloft.net> > >--- > > Neil, I'm fairly sure I can simplify this patch further. I'll be able > to take a shot at it Tuesday if you don't mind waiting. > > I can resubmit the series then and preserve your signed-off on at least > the first patch. Seem reasonable? > That sounds fine. Thanks! Neil > .John > > -- > John Fastabend Intel Corporation > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-11-04 17:42 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-04 17:15 [PATCH 0/2] l2 hardware accelerated macvlans John Fastabend 2013-11-04 17:15 ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices John Fastabend 2013-11-04 17:15 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans John Fastabend -- strict thread matches above, loose matches on Subject: below -- 2013-09-25 20:16 [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman 2013-10-04 20:10 ` [RFC PATCH 0/2 v2] " Neil Horman 2013-10-04 20:10 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman 2013-10-11 18:43 ` [RFC PATCH 0/2 v3] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman 2013-10-11 18:43 ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman 2013-10-13 20:48 ` John Fastabend 2013-10-14 10:50 ` Neil Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).