Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next iproute2] ip: increase number of MPLS labels
From: David Ahern @ 2017-04-30  3:48 UTC (permalink / raw)
  To: netdev, stephen; +Cc: David Ahern

Kernel now supports more than 2 labels. Increase ip to
handle up to 16 labels.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/utils.h | 8 ++++----
 lib/utils.c     | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 8c12e1e2a60c..a69e176c260d 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -54,6 +54,9 @@ void incomplete_command(void) __attribute__((noreturn));
 #define NEXT_ARG_FWD() do { argv++; argc--; } while(0)
 #define PREV_ARG() do { argv--; argc++; } while(0)
 
+/* Maximum number of labels the mpls helpers support */
+#define MPLS_MAX_LABELS 16
+
 typedef struct
 {
 	__u16 flags;
@@ -61,7 +64,7 @@ typedef struct
 	__s16 bitlen;
 	/* These next two fields match rtvia */
 	__u16 family;
-	__u32 data[8];
+	__u32 data[MPLS_MAX_LABELS];
 } inet_prefix;
 
 #define PREFIXLEN_SPECIFIED 1
@@ -88,9 +91,6 @@ struct ipx_addr {
 # define AF_MPLS 28
 #endif
 
-/* Maximum number of labels the mpls helpers support */
-#define MPLS_MAX_LABELS 8
-
 __u32 get_addr32(const char *name);
 int get_addr_1(inet_prefix *dst, const char *arg, int family);
 int get_prefix_1(inet_prefix *dst, char *arg, int family);
diff --git a/lib/utils.c b/lib/utils.c
index 6d5642f4f1f3..c23251067180 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -526,7 +526,7 @@ int get_addr_1(inet_prefix *addr, const char *name, int family)
 		addr->bytelen = 4;
 		addr->bitlen = 20;
 		/* How many bytes do I need? */
-		for (i = 0; i < 8; i++) {
+		for (i = 0; i < MPLS_MAX_LABELS; i++) {
 			if (ntohl(addr->data[i]) & MPLS_LS_S_MASK) {
 				addr->bytelen = (i + 1)*4;
 				break;
-- 
2.1.4

^ permalink raw reply related

* Re: [net-next v2 00/11][pull request] 10GbE Intel Wired LAN Driver Updates 2017-04-29
From: David Miller @ 2017-04-30  3:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Sat, 29 Apr 2017 20:07:59 -0700

> This series contains updates to ixgbe and ixgbevf only, most notable is
> the addition of XDP support to our 10GbE drivers.

Awesome, pulled, thanks Jeff.

^ permalink raw reply

* [net-next v2 10/11] ixgbevf: Fix errors in retrieving RETA and RSS from PF
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

Mailbox support for getting RETA and RSS is available for only 82599 and
x540; a previous patch reversed the logic and these adapters were
returning not supported.

Also, the NACK check in ixgbevf_get_rss_key_locked() was checking for the
command IXGBE_VF_GET_RETA instead of IXGBE_VF_GET_RSS_KEY.

This patch corrects both issues by correcting the logic and checking for
the right command.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/vf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 8a5db9d7219d..b6d0c01eab10 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -333,7 +333,7 @@ int ixgbevf_get_reta_locked(struct ixgbe_hw *hw, u32 *reta, int num_rx_queues)
 	switch (hw->api_version) {
 	case ixgbe_mbox_api_13:
 	case ixgbe_mbox_api_12:
-		if (hw->mac.type >= ixgbe_mac_X550_vf)
+		if (hw->mac.type < ixgbe_mac_X550_vf)
 			break;
 	default:
 		return -EOPNOTSUPP;
@@ -399,7 +399,7 @@ int ixgbevf_get_rss_key_locked(struct ixgbe_hw *hw, u8 *rss_key)
 	switch (hw->api_version) {
 	case ixgbe_mbox_api_13:
 	case ixgbe_mbox_api_12:
-		if (hw->mac.type >= ixgbe_mac_X550_vf)
+		if (hw->mac.type < ixgbe_mac_X550_vf)
 			break;
 	default:
 		return -EOPNOTSUPP;
@@ -419,7 +419,7 @@ int ixgbevf_get_rss_key_locked(struct ixgbe_hw *hw, u8 *rss_key)
 	msgbuf[0] &= ~IXGBE_VT_MSGTYPE_CTS;
 
 	/* If the operation has been refused by a PF return -EPERM */
-	if (msgbuf[0] == (IXGBE_VF_GET_RETA | IXGBE_VT_MSGTYPE_NACK))
+	if (msgbuf[0] == (IXGBE_VF_GET_RSS_KEY | IXGBE_VT_MSGTYPE_NACK))
 		return -EPERM;
 
 	/* If we didn't get an ACK there must have been
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 09/11] ixgbe: Check for RSS key before setting value
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

The RSS key is being repopulated every time the interface is brought up
regardless of whether there is an existing value. If the user sets the RSS
key and the interface is brought up (e.g. reset), the user specified RSS
key will be overwritten.

This patch changes the rss_key to a pointer so we can check to see if the
key has been populated and preserve it accordingly.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  4 +---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 30 ++++++++++++++++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c   |  2 +-
 4 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 85b1afb345e3..76263762bea1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -769,7 +769,7 @@ struct ixgbe_adapter {
 	u8 rss_indir_tbl[IXGBE_MAX_RETA_ENTRIES];
 
 #define IXGBE_RSS_KEY_SIZE     40  /* size of RSS Hash Key in bytes */
-	u32 rss_key[IXGBE_RSS_KEY_SIZE / sizeof(u32)];
+	u32 *rss_key;
 };
 
 static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index b0fd2f58a69c..7e5e336d7dcc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2967,9 +2967,7 @@ static int ixgbe_rss_indir_tbl_max(struct ixgbe_adapter *adapter)
 
 static u32 ixgbe_get_rxfh_key_size(struct net_device *netdev)
 {
-	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-
-	return sizeof(adapter->rss_key);
+	return IXGBE_RSS_KEY_SIZE;
 }
 
 static u32 ixgbe_rss_indir_size(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f765a2a0ed4b..22a29df1d29e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3639,6 +3639,28 @@ void ixgbe_store_key(struct ixgbe_adapter *adapter)
 }
 
 /**
+ * ixgbe_init_rss_key - Initialize adapter RSS key
+ * @adapter: device handle
+ *
+ * Allocates and initializes the RSS key if it is not allocated.
+ **/
+static inline int ixgbe_init_rss_key(struct ixgbe_adapter *adapter)
+{
+	u32 *rss_key;
+
+	if (!adapter->rss_key) {
+		rss_key = kzalloc(IXGBE_RSS_KEY_SIZE, GFP_KERNEL);
+		if (unlikely(!rss_key))
+			return -ENOMEM;
+
+		netdev_rss_key_fill(rss_key, IXGBE_RSS_KEY_SIZE);
+		adapter->rss_key = rss_key;
+	}
+
+	return 0;
+}
+
+/**
  * ixgbe_store_reta - Write the RETA table to HW
  * @adapter: device handle
  *
@@ -3740,7 +3762,7 @@ static void ixgbe_setup_vfreta(struct ixgbe_adapter *adapter)
 	/* Fill out hash function seeds */
 	for (i = 0; i < 10; i++)
 		IXGBE_WRITE_REG(hw, IXGBE_PFVFRSSRK(i, pf_pool),
-				adapter->rss_key[i]);
+				*(adapter->rss_key + i));
 
 	/* Fill out the redirection table */
 	for (i = 0, j = 0; i < 64; i++, j++) {
@@ -3801,7 +3823,6 @@ static void ixgbe_setup_mrqc(struct ixgbe_adapter *adapter)
 	if (adapter->flags2 & IXGBE_FLAG2_RSS_FIELD_IPV6_UDP)
 		rss_field |= IXGBE_MRQC_RSS_FIELD_IPV6_UDP;
 
-	netdev_rss_key_fill(adapter->rss_key, sizeof(adapter->rss_key));
 	if ((hw->mac.type >= ixgbe_mac_X550) &&
 	    (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)) {
 		unsigned int pf_pool = adapter->num_vfs;
@@ -6015,6 +6036,9 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
 	if (!adapter->mac_table)
 		return -ENOMEM;
 
+	if (ixgbe_init_rss_key(adapter))
+		return -ENOMEM;
+
 	/* Set MAC specific capability flags and exceptions */
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
@@ -10391,6 +10415,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	iounmap(adapter->io_addr);
 	kfree(adapter->jump_tables[0]);
 	kfree(adapter->mac_table);
+	kfree(adapter->rss_key);
 err_ioremap:
 	disable_dev = !test_and_set_bit(__IXGBE_DISABLED, &adapter->state);
 	free_netdev(netdev);
@@ -10475,6 +10500,7 @@ static void ixgbe_remove(struct pci_dev *pdev)
 	}
 
 	kfree(adapter->mac_table);
+	kfree(adapter->rss_key);
 	disable_dev = !test_and_set_bit(__IXGBE_DISABLED, &adapter->state);
 	free_netdev(netdev);
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 58897d97412e..8baf298a8516 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -1113,7 +1113,7 @@ static int ixgbe_get_vf_rss_key(struct ixgbe_adapter *adapter,
 		return -EOPNOTSUPP;
 	}
 
-	memcpy(rss_key, adapter->rss_key, sizeof(adapter->rss_key));
+	memcpy(rss_key, adapter->rss_key, IXGBE_RSS_KEY_SIZE);
 
 	return 0;
 }
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 11/11] ixgbevf: Check for RSS key before setting value
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

The RSS key is being repopulated every time the interface is brought up
regardless of whether there is an existing value. If the user sets the RSS
key and the interface is brought up (e.g. reset), the user specified RSS
key will be overwritten.

This patch changes the rss_key to a pointer so we can check to see if the
key has been populated and preserve it accordingly.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  3 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 33 +++++++++++++++++++++--
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 43b70cd55bc6..ff9d05f308ee 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -855,7 +855,8 @@ static int ixgbevf_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
 
 	if (adapter->hw.mac.type >= ixgbe_mac_X550_vf) {
 		if (key)
-			memcpy(key, adapter->rss_key, sizeof(adapter->rss_key));
+			memcpy(key, adapter->rss_key,
+			       ixgbevf_get_rxfh_key_size(netdev));
 
 		if (indir) {
 			int i;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index a8cbc2dda0dd..581f44bbd7b3 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -319,7 +319,7 @@ struct ixgbevf_adapter {
 	spinlock_t mbx_lock;
 	unsigned long last_reset;
 
-	u32 rss_key[IXGBEVF_VFRSSRK_REGS];
+	u32 *rss_key;
 	u8 rss_indir_tbl[IXGBEVF_X550_VFRETA_SIZE];
 };
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 80bab261a0ec..eee29bddddc1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1660,6 +1660,28 @@ static void ixgbevf_rx_desc_queue_enable(struct ixgbevf_adapter *adapter,
 		       reg_idx);
 }
 
+/**
+ * ixgbevf_init_rss_key - Initialize adapter RSS key
+ * @adapter: device handle
+ *
+ * Allocates and initializes the RSS key if it is not allocated.
+ **/
+static inline int ixgbevf_init_rss_key(struct ixgbevf_adapter *adapter)
+{
+	u32 *rss_key;
+
+	if (!adapter->rss_key) {
+		rss_key = kzalloc(IXGBEVF_RSS_HASH_KEY_SIZE, GFP_KERNEL);
+		if (unlikely(!rss_key))
+			return -ENOMEM;
+
+		netdev_rss_key_fill(rss_key, IXGBEVF_RSS_HASH_KEY_SIZE);
+		adapter->rss_key = rss_key;
+	}
+
+	return 0;
+}
+
 static void ixgbevf_setup_vfmrqc(struct ixgbevf_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -1668,9 +1690,8 @@ static void ixgbevf_setup_vfmrqc(struct ixgbevf_adapter *adapter)
 	u8 i, j;
 
 	/* Fill out hash function seeds */
-	netdev_rss_key_fill(adapter->rss_key, sizeof(adapter->rss_key));
 	for (i = 0; i < IXGBEVF_VFRSSRK_REGS; i++)
-		IXGBE_WRITE_REG(hw, IXGBE_VFRSSRK(i), adapter->rss_key[i]);
+		IXGBE_WRITE_REG(hw, IXGBE_VFRSSRK(i), *(adapter->rss_key + i));
 
 	for (i = 0, j = 0; i < IXGBEVF_X550_VFRETA_SIZE; i++, j++) {
 		if (j == rss_i)
@@ -2611,6 +2632,12 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter *adapter)
 
 	hw->mbx.ops.init_params(hw);
 
+	if (hw->mac.type >= ixgbe_mac_X550_vf) {
+		err = ixgbevf_init_rss_key(adapter);
+		if (err)
+			goto out;
+	}
+
 	/* assume legacy case in which PF would only give VF 2 queues */
 	hw->mac.max_tx_queues = 2;
 	hw->mac.max_rx_queues = 2;
@@ -4127,6 +4154,7 @@ static int ixgbevf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 err_sw_init:
 	ixgbevf_reset_interrupt_capability(adapter);
 	iounmap(adapter->io_addr);
+	kfree(adapter->rss_key);
 err_ioremap:
 	disable_dev = !test_and_set_bit(__IXGBEVF_DISABLED, &adapter->state);
 	free_netdev(netdev);
@@ -4173,6 +4201,7 @@ static void ixgbevf_remove(struct pci_dev *pdev)
 
 	hw_dbg(&adapter->hw, "Remove complete\n");
 
+	kfree(adapter->rss_key);
 	disable_dev = !test_and_set_bit(__IXGBEVF_DISABLED, &adapter->state);
 	free_netdev(netdev);
 
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 08/11] ixgbe: Add 1000Base-T device based on X550EM_X MAC
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Paul Greenwalt, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Paul Greenwalt <paul.greenwalt@intel.com>

Add support for new 1000Base-T device based on X550EM_X MAC
type. All PHY operations are disabled as the PHY is controlled
by FW.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 45 ++++++++++++++++++++++++++-
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index e8cd4491f1fd..85b1afb345e3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -826,6 +826,7 @@ enum ixgbe_boards {
 	board_X540,
 	board_X550,
 	board_X550EM_x,
+	board_x550em_x_fw,
 	board_x550em_a,
 	board_x550em_a_fw,
 };
@@ -835,6 +836,7 @@ extern const struct ixgbe_info ixgbe_82599_info;
 extern const struct ixgbe_info ixgbe_X540_info;
 extern const struct ixgbe_info ixgbe_X550_info;
 extern const struct ixgbe_info ixgbe_X550EM_x_info;
+extern const struct ixgbe_info ixgbe_x550em_x_fw_info;
 extern const struct ixgbe_info ixgbe_x550em_a_info;
 extern const struct ixgbe_info ixgbe_x550em_a_fw_info;
 #ifdef CONFIG_IXGBE_DCB
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3d7b09100945..f765a2a0ed4b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -88,6 +88,7 @@ static const struct ixgbe_info *ixgbe_info_tbl[] = {
 	[board_X540]		= &ixgbe_X540_info,
 	[board_X550]		= &ixgbe_X550_info,
 	[board_X550EM_x]	= &ixgbe_X550EM_x_info,
+	[board_x550em_x_fw]	= &ixgbe_x550em_x_fw_info,
 	[board_x550em_a]	= &ixgbe_x550em_a_info,
 	[board_x550em_a_fw]	= &ixgbe_x550em_a_fw_info,
 };
@@ -138,6 +139,7 @@ static const struct pci_device_id ixgbe_pci_tbl[] = {
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_X_KR), board_X550EM_x},
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_X_10G_T), board_X550EM_x},
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_X_SFP), board_X550EM_x},
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_X_1G_T), board_x550em_x_fw},
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_A_KR), board_x550em_a },
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_A_KR_L), board_x550em_a },
 	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X550EM_A_SFP_N), board_x550em_a },
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 2f06e4d9208d..9c2460c5ef1b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -3128,6 +3128,7 @@ enum ixgbe_phy_type {
 	ixgbe_phy_x550em_kx4,
 	ixgbe_phy_x550em_xfi,
 	ixgbe_phy_x550em_ext_t,
+	ixgbe_phy_ext_1g_t,
 	ixgbe_phy_cu_unknown,
 	ixgbe_phy_qt,
 	ixgbe_phy_xaui,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 58d3bcaca2b9..2ba024b575ea 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -49,6 +49,18 @@ static s32 ixgbe_get_invariants_X550_x(struct ixgbe_hw *hw)
 	return 0;
 }
 
+static s32 ixgbe_get_invariants_X550_x_fw(struct ixgbe_hw *hw)
+{
+	struct ixgbe_phy_info *phy = &hw->phy;
+
+	/* Start with X540 invariants, since so similar */
+	ixgbe_get_invariants_X540(hw);
+
+	phy->ops.set_phy_power = NULL;
+
+	return 0;
+}
+
 static s32 ixgbe_get_invariants_X550_a(struct ixgbe_hw *hw)
 {
 	struct ixgbe_mac_info *mac = &hw->mac;
@@ -334,9 +346,11 @@ static s32 ixgbe_identify_phy_x550em(struct ixgbe_hw *hw)
 		else
 			hw->phy.phy_semaphore_mask = IXGBE_GSSR_PHY0_SM;
 		/* Fallthrough */
-	case IXGBE_DEV_ID_X550EM_X_1G_T:
 	case IXGBE_DEV_ID_X550EM_X_10G_T:
 		return ixgbe_identify_phy_generic(hw);
+	case IXGBE_DEV_ID_X550EM_X_1G_T:
+		hw->phy.type = ixgbe_phy_ext_1g_t;
+		break;
 	case IXGBE_DEV_ID_X550EM_A_1G_T:
 	case IXGBE_DEV_ID_X550EM_A_1G_T_L:
 		hw->phy.type = ixgbe_phy_fw;
@@ -2158,6 +2172,8 @@ static void ixgbe_init_mac_link_ops_X550em(struct ixgbe_hw *hw)
 					ixgbe_set_soft_rate_select_speed;
 		break;
 	case ixgbe_media_type_copper:
+		if (hw->device_id == IXGBE_DEV_ID_X550EM_X_1G_T)
+			break;
 		mac->ops.setup_link = ixgbe_setup_mac_link_t_X550em;
 		mac->ops.setup_fc = ixgbe_setup_fc_generic;
 		mac->ops.check_link = ixgbe_check_link_t_X550em;
@@ -2238,6 +2254,7 @@ static s32 ixgbe_get_link_capabilities_X550em(struct ixgbe_hw *hw,
 			*speed = IXGBE_LINK_SPEED_1GB_FULL |
 				 IXGBE_LINK_SPEED_10GB_FULL;
 			break;
+		case ixgbe_phy_ext_1g_t:
 		case ixgbe_phy_sgmii:
 			*speed = IXGBE_LINK_SPEED_1GB_FULL;
 			break;
@@ -3185,6 +3202,11 @@ static s32 ixgbe_init_phy_ops_X550em(struct ixgbe_hw *hw)
 		phy->ops.setup_link = ixgbe_setup_fw_link;
 		phy->ops.reset = ixgbe_reset_phy_fw;
 		break;
+	case ixgbe_phy_ext_1g_t:
+		phy->ops.setup_link = NULL;
+		phy->ops.read_reg = NULL;
+		phy->ops.write_reg = NULL;
+		break;
 	default:
 		break;
 	}
@@ -3888,6 +3910,17 @@ static const struct ixgbe_phy_operations phy_ops_X550EM_x = {
 	.write_reg		= &ixgbe_write_phy_reg_generic,
 };
 
+static const struct ixgbe_phy_operations phy_ops_x550em_x_fw = {
+	X550_COMMON_PHY
+	.check_overtemp		= NULL,
+	.init			= ixgbe_init_phy_ops_X550em,
+	.identify		= ixgbe_identify_phy_x550em,
+	.read_reg		= NULL,
+	.write_reg		= NULL,
+	.read_reg_mdi		= NULL,
+	.write_reg_mdi		= NULL,
+};
+
 static const struct ixgbe_phy_operations phy_ops_x550em_a = {
 	X550_COMMON_PHY
 	.check_overtemp		= &ixgbe_tn_check_overtemp,
@@ -3950,6 +3983,16 @@ const struct ixgbe_info ixgbe_X550EM_x_info = {
 	.link_ops		= &link_ops_x550em_x,
 };
 
+const struct ixgbe_info ixgbe_x550em_x_fw_info = {
+	.mac			= ixgbe_mac_X550EM_x,
+	.get_invariants		= ixgbe_get_invariants_X550_x_fw,
+	.mac_ops		= &mac_ops_X550EM_x,
+	.eeprom_ops		= &eeprom_ops_X550EM_x,
+	.phy_ops		= &phy_ops_x550em_x_fw,
+	.mbx_ops		= &mbx_ops_generic,
+	.mvals			= ixgbe_mvals_X550EM_x,
+};
+
 const struct ixgbe_info ixgbe_x550em_a_info = {
 	.mac			= ixgbe_mac_x550em_a,
 	.get_invariants		= &ixgbe_get_invariants_X550_a,
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 07/11] ixgbe: Allow setting zero MAC address for VF
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

Currently, there is no logic that allows a VF's MAC address to be removed
from the RAR table.

Allow the user to specify a zero MAC address in order to clear the VF's
MAC address from the RAR table.  This functionality is also utilized by
libvirt when removing VFs.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 28 +++++++++++++++++---------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index c45de53300aa..58897d97412e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -1347,18 +1347,26 @@ void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	if (!is_valid_ether_addr(mac) || (vf >= adapter->num_vfs))
+
+	if (vf >= adapter->num_vfs)
+		return -EINVAL;
+
+	if (is_zero_ether_addr(mac)) {
+		adapter->vfinfo[vf].pf_set_mac = false;
+		dev_info(&adapter->pdev->dev, "removing MAC on VF %d\n", vf);
+	} else if (is_valid_ether_addr(mac)) {
+		adapter->vfinfo[vf].pf_set_mac = true;
+		dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n",
+			 mac, vf);
+		dev_info(&adapter->pdev->dev, "Reload the VF driver to make this change effective.");
+		if (test_bit(__IXGBE_DOWN, &adapter->state)) {
+			dev_warn(&adapter->pdev->dev, "The VF MAC address has been set, but the PF device is not up.\n");
+			dev_warn(&adapter->pdev->dev, "Bring the PF device up before attempting to use the VF device.\n");
+		}
+	} else {
 		return -EINVAL;
-	adapter->vfinfo[vf].pf_set_mac = true;
-	dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n", mac, vf);
-	dev_info(&adapter->pdev->dev, "Reload the VF driver to make this"
-				      " change effective.");
-	if (test_bit(__IXGBE_DOWN, &adapter->state)) {
-		dev_warn(&adapter->pdev->dev, "The VF MAC address has been set,"
-			 " but the PF device is not up.\n");
-		dev_warn(&adapter->pdev->dev, "Bring the PF device up before"
-			 " attempting to use the VF device.\n");
 	}
+
 	return ixgbe_set_vf_mac(adapter, vf, mac);
 }
 
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 06/11] ixgbevf: fix size of queue stats length
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

IXGBEVF_QUEUE_STATS_LEN is based on ixgebvf_stats, not ixgbe_stats.

This change fixes a bug where ethtool -S displayed some empty fields.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 6bf740945260..43b70cd55bc6 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -80,7 +80,7 @@ static struct ixgbe_stats ixgbevf_gstrings_stats[] = {
 #define IXGBEVF_QUEUE_STATS_LEN ( \
 	(((struct ixgbevf_adapter *)netdev_priv(netdev))->num_tx_queues + \
 	 ((struct ixgbevf_adapter *)netdev_priv(netdev))->num_rx_queues) * \
-	 (sizeof(struct ixgbe_stats) / sizeof(u64)))
+	 (sizeof(struct ixgbevf_stats) / sizeof(u64)))
 #define IXGBEVF_GLOBAL_STATS_LEN ARRAY_SIZE(ixgbevf_gstrings_stats)
 
 #define IXGBEVF_STATS_LEN (IXGBEVF_GLOBAL_STATS_LEN + IXGBEVF_QUEUE_STATS_LEN)
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 05/11] ixgbe: clean macvlan MAC filter table on VF reset
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

Flush the macvlan filters on VF reset to avoid conflict with other VFs that
may end up using the same MAC address.

The main change here is the call to ixgbe_set_vf_macvlan() with index 0.

Moved ixgbe_set_vf_macvlan() in front of ixgbe_vf_reset_event() to avoid
adding a prototype.

Reported-by: Sritej Kanakadandi Sritej Rama <skanakad@cisco.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 105 +++++++++++++------------
 1 file changed, 53 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 102ca937ddb4..c45de53300aa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -677,58 +677,6 @@ static void ixgbe_clear_vf_vlans(struct ixgbe_adapter *adapter, u32 vf)
 	}
 }
 
-static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
-	u8 num_tcs = netdev_get_num_tc(adapter->netdev);
-
-	/* remove VLAN filters beloning to this VF */
-	ixgbe_clear_vf_vlans(adapter, vf);
-
-	/* add back PF assigned VLAN or VLAN 0 */
-	ixgbe_set_vf_vlan(adapter, true, vfinfo->pf_vlan, vf);
-
-	/* reset offloads to defaults */
-	ixgbe_set_vmolr(hw, vf, !vfinfo->pf_vlan);
-
-	/* set outgoing tags for VFs */
-	if (!vfinfo->pf_vlan && !vfinfo->pf_qos && !num_tcs) {
-		ixgbe_clear_vmvir(adapter, vf);
-	} else {
-		if (vfinfo->pf_qos || !num_tcs)
-			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-					vfinfo->pf_qos, vf);
-		else
-			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-					adapter->default_up, vf);
-
-		if (vfinfo->spoofchk_enabled)
-			hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
-	}
-
-	/* reset multicast table array for vf */
-	adapter->vfinfo[vf].num_vf_mc_hashes = 0;
-
-	/* Flush and reset the mta with the new values */
-	ixgbe_set_rx_mode(adapter->netdev);
-
-	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
-
-	/* reset VF api back to unknown */
-	adapter->vfinfo[vf].vf_api = ixgbe_mbox_api_10;
-}
-
-static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
-			    int vf, unsigned char *mac_addr)
-{
-	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
-	memcpy(adapter->vfinfo[vf].vf_mac_addresses, mac_addr, ETH_ALEN);
-	ixgbe_add_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
-
-	return 0;
-}
-
 static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
 				int vf, int index, unsigned char *mac_addr)
 {
@@ -784,6 +732,59 @@ static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
 	return 0;
 }
 
+static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
+	u8 num_tcs = netdev_get_num_tc(adapter->netdev);
+
+	/* remove VLAN filters beloning to this VF */
+	ixgbe_clear_vf_vlans(adapter, vf);
+
+	/* add back PF assigned VLAN or VLAN 0 */
+	ixgbe_set_vf_vlan(adapter, true, vfinfo->pf_vlan, vf);
+
+	/* reset offloads to defaults */
+	ixgbe_set_vmolr(hw, vf, !vfinfo->pf_vlan);
+
+	/* set outgoing tags for VFs */
+	if (!vfinfo->pf_vlan && !vfinfo->pf_qos && !num_tcs) {
+		ixgbe_clear_vmvir(adapter, vf);
+	} else {
+		if (vfinfo->pf_qos || !num_tcs)
+			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
+					vfinfo->pf_qos, vf);
+		else
+			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
+					adapter->default_up, vf);
+
+		if (vfinfo->spoofchk_enabled)
+			hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
+	}
+
+	/* reset multicast table array for vf */
+	adapter->vfinfo[vf].num_vf_mc_hashes = 0;
+
+	/* Flush and reset the mta with the new values */
+	ixgbe_set_rx_mode(adapter->netdev);
+
+	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+	ixgbe_set_vf_macvlan(adapter, vf, 0, NULL);
+
+	/* reset VF api back to unknown */
+	adapter->vfinfo[vf].vf_api = ixgbe_mbox_api_10;
+}
+
+static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
+			    int vf, unsigned char *mac_addr)
+{
+	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+	memcpy(adapter->vfinfo[vf].vf_mac_addresses, mac_addr, ETH_ALEN);
+	ixgbe_add_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+
+	return 0;
+}
+
 int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask)
 {
 	struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 03/11] ixgbe: add support for XDP_TX action
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>

A couple design choices were made here. First I use a new ring
pointer structure xdp_ring[] in the adapter struct instead of
pushing the newly allocated XDP TX rings into the tx_ring[]
structure. This means we have to duplicate loops around rings
in places we want to initialize both TX rings and XDP rings.
But by making it explicit it is obvious when we are using XDP
rings and when we are using TX rings. Further we don't have
to do ring arithmatic which is error prone. As a proof point
for doing this my first patches used only a single ring structure
and introduced bugs in FCoE code and macvlan code paths.

Second I am aware this is not the most optimized version of
this code possible. I want to get baseline support in using
the most readable format possible and then once this series
is included I will optimize the TX path in another series
of patches.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |  19 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  25 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |  75 +++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 282 +++++++++++++++++++----
 4 files changed, 348 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index cb14813b0080..e8cd4491f1fd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -235,7 +235,11 @@ struct vf_macvlans {
 struct ixgbe_tx_buffer {
 	union ixgbe_adv_tx_desc *next_to_watch;
 	unsigned long time_stamp;
-	struct sk_buff *skb;
+	union {
+		struct sk_buff *skb;
+		/* XDP uses address ptr on irq_clean */
+		void *data;
+	};
 	unsigned int bytecount;
 	unsigned short gso_segs;
 	__be16 protocol;
@@ -288,6 +292,7 @@ enum ixgbe_ring_state_t {
 	__IXGBE_TX_XPS_INIT_DONE,
 	__IXGBE_TX_DETECT_HANG,
 	__IXGBE_HANG_CHECK_ARMED,
+	__IXGBE_TX_XDP_RING,
 };
 
 #define ring_uses_build_skb(ring) \
@@ -314,6 +319,12 @@ struct ixgbe_fwd_adapter {
 	set_bit(__IXGBE_RX_RSC_ENABLED, &(ring)->state)
 #define clear_ring_rsc_enabled(ring) \
 	clear_bit(__IXGBE_RX_RSC_ENABLED, &(ring)->state)
+#define ring_is_xdp(ring) \
+	test_bit(__IXGBE_TX_XDP_RING, &(ring)->state)
+#define set_ring_xdp(ring) \
+	set_bit(__IXGBE_TX_XDP_RING, &(ring)->state)
+#define clear_ring_xdp(ring) \
+	clear_bit(__IXGBE_TX_XDP_RING, &(ring)->state)
 struct ixgbe_ring {
 	struct ixgbe_ring *next;	/* pointer to next ring in q_vector */
 	struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
@@ -380,6 +391,7 @@ enum ixgbe_ring_f_enum {
 #define IXGBE_MAX_FCOE_INDICES		8
 #define MAX_RX_QUEUES			(IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES			(IXGBE_MAX_FDIR_INDICES + 1)
+#define MAX_XDP_QUEUES			(IXGBE_MAX_FDIR_INDICES + 1)
 #define IXGBE_MAX_L2A_QUEUES		4
 #define IXGBE_BAD_L2A_QUEUE		3
 #define IXGBE_MAX_MACVLANS		31
@@ -623,6 +635,10 @@ struct ixgbe_adapter {
 	__be16 vxlan_port;
 	__be16 geneve_port;
 
+	/* XDP */
+	int num_xdp_queues;
+	struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES];
+
 	/* TX */
 	struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
 
@@ -669,6 +685,7 @@ struct ixgbe_adapter {
 
 	u64 tx_busy;
 	unsigned int tx_ring_count;
+	unsigned int xdp_ring_count;
 	unsigned int rx_ring_count;
 
 	u32 link_speed;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 79a126d9e091..b0fd2f58a69c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1071,15 +1071,19 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
 	if (!netif_running(adapter->netdev)) {
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			adapter->tx_ring[i]->count = new_tx_count;
+		for (i = 0; i < adapter->num_xdp_queues; i++)
+			adapter->xdp_ring[i]->count = new_tx_count;
 		for (i = 0; i < adapter->num_rx_queues; i++)
 			adapter->rx_ring[i]->count = new_rx_count;
 		adapter->tx_ring_count = new_tx_count;
+		adapter->xdp_ring_count = new_tx_count;
 		adapter->rx_ring_count = new_rx_count;
 		goto clear_reset;
 	}
 
 	/* allocate temporary buffer to store rings in */
 	i = max_t(int, adapter->num_tx_queues, adapter->num_rx_queues);
+	i = max_t(int, i, adapter->num_xdp_queues);
 	temp_ring = vmalloc(i * sizeof(struct ixgbe_ring));
 
 	if (!temp_ring) {
@@ -1111,12 +1115,33 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
 			}
 		}
 
+		for (i = 0; i < adapter->num_xdp_queues; i++) {
+			memcpy(&temp_ring[i], adapter->xdp_ring[i],
+			       sizeof(struct ixgbe_ring));
+
+			temp_ring[i].count = new_tx_count;
+			err = ixgbe_setup_tx_resources(&temp_ring[i]);
+			if (err) {
+				while (i) {
+					i--;
+					ixgbe_free_tx_resources(&temp_ring[i]);
+				}
+				goto err_setup;
+			}
+		}
+
 		for (i = 0; i < adapter->num_tx_queues; i++) {
 			ixgbe_free_tx_resources(adapter->tx_ring[i]);
 
 			memcpy(adapter->tx_ring[i], &temp_ring[i],
 			       sizeof(struct ixgbe_ring));
 		}
+		for (i = 0; i < adapter->num_xdp_queues; i++) {
+			ixgbe_free_tx_resources(adapter->xdp_ring[i]);
+
+			memcpy(adapter->xdp_ring[i], &temp_ring[i],
+			       sizeof(struct ixgbe_ring));
+		}
 
 		adapter->tx_ring_count = new_tx_count;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 1b8be7d813bd..b45fdc98033d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -267,12 +267,14 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
  **/
 static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
 {
-	int i;
+	int i, reg_idx;
 
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		adapter->rx_ring[i]->reg_idx = i;
-	for (i = 0; i < adapter->num_tx_queues; i++)
-		adapter->tx_ring[i]->reg_idx = i;
+	for (i = 0, reg_idx = 0; i < adapter->num_tx_queues; i++, reg_idx++)
+		adapter->tx_ring[i]->reg_idx = reg_idx;
+	for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++)
+		adapter->xdp_ring[i]->reg_idx = reg_idx;
 
 	return true;
 }
@@ -308,6 +310,11 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
 	ixgbe_cache_ring_rss(adapter);
 }
 
+static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
+{
+	return adapter->xdp_prog ? nr_cpu_ids : 0;
+}
+
 #define IXGBE_RSS_64Q_MASK	0x3F
 #define IXGBE_RSS_16Q_MASK	0xF
 #define IXGBE_RSS_8Q_MASK	0x7
@@ -382,6 +389,7 @@ static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
 	adapter->num_rx_queues_per_pool = tcs;
 
 	adapter->num_tx_queues = vmdq_i * tcs;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = vmdq_i * tcs;
 
 #ifdef IXGBE_FCOE
@@ -479,6 +487,7 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
 		netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
 
 	adapter->num_tx_queues = rss_i * tcs;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = rss_i * tcs;
 
 	return true;
@@ -549,6 +558,7 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 
 	adapter->num_rx_queues = vmdq_i * rss_i;
 	adapter->num_tx_queues = vmdq_i * rss_i;
+	adapter->num_xdp_queues = 0;
 
 	/* disable ATR as it is not supported when VMDq is enabled */
 	adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
@@ -669,6 +679,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
 #endif /* IXGBE_FCOE */
 	adapter->num_rx_queues = rss_i;
 	adapter->num_tx_queues = rss_i;
+	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
 
 	return true;
 }
@@ -689,6 +700,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 	/* Start with base case */
 	adapter->num_rx_queues = 1;
 	adapter->num_tx_queues = 1;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_pools = adapter->num_rx_queues;
 	adapter->num_rx_queues_per_pool = 1;
 
@@ -719,8 +731,11 @@ static int ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	int i, vectors, vector_threshold;
 
-	/* We start by asking for one vector per queue pair */
+	/* We start by asking for one vector per queue pair with XDP queues
+	 * being stacked with TX queues.
+	 */
 	vectors = max(adapter->num_rx_queues, adapter->num_tx_queues);
+	vectors = max(vectors, adapter->num_xdp_queues);
 
 	/* It is easy to be greedy for MSI-X vectors. However, it really
 	 * doesn't do much good if we have a lot more vectors than CPUs. We'll
@@ -800,6 +815,8 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
  * @v_idx: index of vector in adapter struct
  * @txr_count: total number of Tx rings to allocate
  * @txr_idx: index of first Tx ring to allocate
+ * @xdp_count: total number of XDP rings to allocate
+ * @xdp_idx: index of first XDP ring to allocate
  * @rxr_count: total number of Rx rings to allocate
  * @rxr_idx: index of first Rx ring to allocate
  *
@@ -808,6 +825,7 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
 static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 				int v_count, int v_idx,
 				int txr_count, int txr_idx,
+				int xdp_count, int xdp_idx,
 				int rxr_count, int rxr_idx)
 {
 	struct ixgbe_q_vector *q_vector;
@@ -817,7 +835,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 	int ring_count, size;
 	u8 tcs = netdev_get_num_tc(adapter->netdev);
 
-	ring_count = txr_count + rxr_count;
+	ring_count = txr_count + rxr_count + xdp_count;
 	size = sizeof(struct ixgbe_q_vector) +
 	       (sizeof(struct ixgbe_ring) * ring_count);
 
@@ -909,6 +927,33 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 		ring++;
 	}
 
+	while (xdp_count) {
+		/* assign generic ring traits */
+		ring->dev = &adapter->pdev->dev;
+		ring->netdev = adapter->netdev;
+
+		/* configure backlink on ring */
+		ring->q_vector = q_vector;
+
+		/* update q_vector Tx values */
+		ixgbe_add_ring(ring, &q_vector->tx);
+
+		/* apply Tx specific ring traits */
+		ring->count = adapter->tx_ring_count;
+		ring->queue_index = xdp_idx;
+		set_ring_xdp(ring);
+
+		/* assign ring to adapter */
+		adapter->xdp_ring[xdp_idx] = ring;
+
+		/* update count and index */
+		xdp_count--;
+		xdp_idx++;
+
+		/* push pointer to next ring */
+		ring++;
+	}
+
 	while (rxr_count) {
 		/* assign generic ring traits */
 		ring->dev = &adapter->pdev->dev;
@@ -1002,17 +1047,18 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 	int q_vectors = adapter->num_q_vectors;
 	int rxr_remaining = adapter->num_rx_queues;
 	int txr_remaining = adapter->num_tx_queues;
-	int rxr_idx = 0, txr_idx = 0, v_idx = 0;
+	int xdp_remaining = adapter->num_xdp_queues;
+	int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
 	int err;
 
 	/* only one q_vector if MSI-X is disabled. */
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
 		q_vectors = 1;
 
-	if (q_vectors >= (rxr_remaining + txr_remaining)) {
+	if (q_vectors >= (rxr_remaining + txr_remaining + xdp_remaining)) {
 		for (; rxr_remaining; v_idx++) {
 			err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
-						   0, 0, 1, rxr_idx);
+						   0, 0, 0, 0, 1, rxr_idx);
 
 			if (err)
 				goto err_out;
@@ -1026,8 +1072,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 	for (; v_idx < q_vectors; v_idx++) {
 		int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_idx);
 		int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_idx);
+		int xqpv = DIV_ROUND_UP(xdp_remaining, q_vectors - v_idx);
+
 		err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
 					   tqpv, txr_idx,
+					   xqpv, xdp_idx,
 					   rqpv, rxr_idx);
 
 		if (err)
@@ -1036,14 +1085,17 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		/* update counts and index */
 		rxr_remaining -= rqpv;
 		txr_remaining -= tqpv;
+		xdp_remaining -= xqpv;
 		rxr_idx++;
 		txr_idx++;
+		xdp_idx += xqpv;
 	}
 
 	return 0;
 
 err_out:
 	adapter->num_tx_queues = 0;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = 0;
 	adapter->num_q_vectors = 0;
 
@@ -1066,6 +1118,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
 	int v_idx = adapter->num_q_vectors;
 
 	adapter->num_tx_queues = 0;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = 0;
 	adapter->num_q_vectors = 0;
 
@@ -1172,9 +1225,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 
 	ixgbe_cache_ring_register(adapter);
 
-	e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u\n",
+	e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u XDP Queue count = %u\n",
 		   (adapter->num_rx_queues > 1) ? "Enabled" : "Disabled",
-		   adapter->num_rx_queues, adapter->num_tx_queues);
+		   adapter->num_rx_queues, adapter->num_tx_queues,
+		   adapter->num_xdp_queues);
 
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
@@ -1195,6 +1249,7 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
 {
 	adapter->num_tx_queues = 0;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = 0;
 
 	ixgbe_free_q_vectors(adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 99b5357c3e00..cb5be7de2c91 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -593,6 +593,19 @@ static void ixgbe_regdump(struct ixgbe_hw *hw, struct ixgbe_reg_info *reginfo)
 
 }
 
+static void ixgbe_print_buffer(struct ixgbe_ring *ring, int n)
+{
+	struct ixgbe_tx_buffer *tx_buffer;
+
+	tx_buffer = &ring->tx_buffer_info[ring->next_to_clean];
+	pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
+		n, ring->next_to_use, ring->next_to_clean,
+		(u64)dma_unmap_addr(tx_buffer, dma),
+		dma_unmap_len(tx_buffer, len),
+		tx_buffer->next_to_watch,
+		(u64)tx_buffer->time_stamp);
+}
+
 /*
  * ixgbe_dump - Print registers, tx-rings and rx-rings
  */
@@ -602,7 +615,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbe_reg_info *reginfo;
 	int n = 0;
-	struct ixgbe_ring *tx_ring;
+	struct ixgbe_ring *ring;
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	struct my_u0 { u64 a; u64 b; } *u0;
@@ -642,14 +655,13 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 		"Queue [NTU] [NTC] [bi(ntc)->dma  ]",
 		"leng", "ntw", "timestamp");
 	for (n = 0; n < adapter->num_tx_queues; n++) {
-		tx_ring = adapter->tx_ring[n];
-		tx_buffer = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
-		pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
-			   n, tx_ring->next_to_use, tx_ring->next_to_clean,
-			   (u64)dma_unmap_addr(tx_buffer, dma),
-			   dma_unmap_len(tx_buffer, len),
-			   tx_buffer->next_to_watch,
-			   (u64)tx_buffer->time_stamp);
+		ring = adapter->tx_ring[n];
+		ixgbe_print_buffer(ring, n);
+	}
+
+	for (n = 0; n < adapter->num_xdp_queues; n++) {
+		ring = adapter->xdp_ring[n];
+		ixgbe_print_buffer(ring, n);
 	}
 
 	/* Print TX Rings */
@@ -694,28 +706,28 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 	 */
 
 	for (n = 0; n < adapter->num_tx_queues; n++) {
-		tx_ring = adapter->tx_ring[n];
+		ring = adapter->tx_ring[n];
 		pr_info("------------------------------------\n");
-		pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
+		pr_info("TX QUEUE INDEX = %d\n", ring->queue_index);
 		pr_info("------------------------------------\n");
 		pr_info("%s%s    %s              %s        %s          %s\n",
 			"T [desc]     [address 63:0  ] ",
 			"[PlPOIdStDDt Ln] [bi->dma       ] ",
 			"leng", "ntw", "timestamp", "bi->skb");
 
-		for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
-			tx_desc = IXGBE_TX_DESC(tx_ring, i);
-			tx_buffer = &tx_ring->tx_buffer_info[i];
+		for (i = 0; ring->desc && (i < ring->count); i++) {
+			tx_desc = IXGBE_TX_DESC(ring, i);
+			tx_buffer = &ring->tx_buffer_info[i];
 			u0 = (struct my_u0 *)tx_desc;
 			if (dma_unmap_len(tx_buffer, len) > 0) {
 				const char *ring_desc;
 
-				if (i == tx_ring->next_to_use &&
-				    i == tx_ring->next_to_clean)
+				if (i == ring->next_to_use &&
+				    i == ring->next_to_clean)
 					ring_desc = " NTC/U";
-				else if (i == tx_ring->next_to_use)
+				else if (i == ring->next_to_use)
 					ring_desc = " NTU";
-				else if (i == tx_ring->next_to_clean)
+				else if (i == ring->next_to_clean)
 					ring_desc = " NTC";
 				else
 					ring_desc = "";
@@ -984,6 +996,10 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		clear_bit(__IXGBE_HANG_CHECK_ARMED,
 			  &adapter->tx_ring[i]->state);
+
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		clear_bit(__IXGBE_HANG_CHECK_ARMED,
+			  &adapter->xdp_ring[i]->state);
 }
 
 static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
@@ -1028,6 +1044,14 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
 		if (xoff[tc])
 			clear_bit(__IXGBE_HANG_CHECK_ARMED, &tx_ring->state);
 	}
+
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
+
+		tc = xdp_ring->dcb_tc;
+		if (xoff[tc])
+			clear_bit(__IXGBE_HANG_CHECK_ARMED, &xdp_ring->state);
+	}
 }
 
 static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
@@ -1179,7 +1203,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		total_packets += tx_buffer->gso_segs;
 
 		/* free the skb */
-		napi_consume_skb(tx_buffer->skb, napi_budget);
+		if (ring_is_xdp(tx_ring))
+			page_frag_free(tx_buffer->data);
+		else
+			napi_consume_skb(tx_buffer->skb, napi_budget);
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
@@ -1240,7 +1267,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
 		/* schedule immediate reset if we believe we hung */
 		struct ixgbe_hw *hw = &adapter->hw;
-		e_err(drv, "Detected Tx Unit Hang\n"
+		e_err(drv, "Detected Tx Unit Hang %s\n"
 			"  Tx Queue             <%d>\n"
 			"  TDH, TDT             <%x>, <%x>\n"
 			"  next_to_use          <%x>\n"
@@ -1248,13 +1275,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			"tx_buffer_info[next_to_clean]\n"
 			"  time_stamp           <%lx>\n"
 			"  jiffies              <%lx>\n",
+			ring_is_xdp(tx_ring) ? "(XDP)" : "",
 			tx_ring->queue_index,
 			IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
 			IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		if (!ring_is_xdp(tx_ring))
+			netif_stop_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1267,6 +1297,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		return true;
 	}
 
+	if (ring_is_xdp(tx_ring))
+		return !!budget;
+
 	netdev_tx_completed_queue(txring_txq(tx_ring),
 				  total_packets, total_bytes);
 
@@ -2169,8 +2202,13 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 
 #define IXGBE_XDP_PASS 0
 #define IXGBE_XDP_CONSUMED 1
+#define IXGBE_XDP_TX 2
 
-static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
+static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			       struct xdp_buff *xdp);
+
+static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
+				     struct ixgbe_ring *rx_ring,
 				     struct xdp_buff *xdp)
 {
 	int result = IXGBE_XDP_PASS;
@@ -2187,9 +2225,11 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
 	switch (act) {
 	case XDP_PASS:
 		break;
+	case XDP_TX:
+		result = ixgbe_xmit_xdp_ring(adapter, xdp);
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-	case XDP_TX:
 	case XDP_ABORTED:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
 		/* fallthrough -- handle aborts by dropping packet */
@@ -2202,6 +2242,23 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
 	return ERR_PTR(-result);
 }
 
+static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
+				 struct ixgbe_rx_buffer *rx_buffer,
+				 unsigned int size)
+{
+#if (PAGE_SIZE < 8192)
+	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
+
+	rx_buffer->page_offset ^= truesize;
+#else
+	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
+				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+				SKB_DATA_ALIGN(size);
+
+	rx_buffer->page_offset += truesize;
+#endif
+}
+
 /**
  * ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @q_vector: structure containing interrupt and ring information
@@ -2220,8 +2277,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       const int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-#ifdef IXGBE_FCOE
 	struct ixgbe_adapter *adapter = q_vector->adapter;
+#ifdef IXGBE_FCOE
 	int ddp_bytes;
 	unsigned int mss = 0;
 #endif /* IXGBE_FCOE */
@@ -2261,13 +2318,16 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
 
-			skb = ixgbe_run_xdp(rx_ring, &xdp);
+			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 
 		if (IS_ERR(skb)) {
+			if (PTR_ERR(skb) == -IXGBE_XDP_TX)
+				ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size);
+			else
+				rx_buffer->pagecnt_bias++;
 			total_rx_packets++;
 			total_rx_bytes += size;
-			rx_buffer->pagecnt_bias++;
 		} else if (skb) {
 			ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
 		} else if (ring_uses_build_skb(rx_ring)) {
@@ -3437,6 +3497,8 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
 	/* Setup the HW Tx Head and Tail descriptor pointers */
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		ixgbe_configure_tx_ring(adapter, adapter->tx_ring[i]);
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		ixgbe_configure_tx_ring(adapter, adapter->xdp_ring[i]);
 }
 
 static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
@@ -5578,7 +5640,10 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 		union ixgbe_adv_tx_desc *eop_desc, *tx_desc;
 
 		/* Free all the Tx ring sk_buffs */
-		dev_kfree_skb_any(tx_buffer->skb);
+		if (ring_is_xdp(tx_ring))
+			page_frag_free(tx_buffer->data);
+		else
+			dev_kfree_skb_any(tx_buffer->skb);
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
@@ -5619,7 +5684,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	}
 
 	/* reset BQL for queue */
-	netdev_tx_reset_queue(txring_txq(tx_ring));
+	if (!ring_is_xdp(tx_ring))
+		netdev_tx_reset_queue(txring_txq(tx_ring));
 
 	/* reset next_to_use and next_to_clean */
 	tx_ring->next_to_use = 0;
@@ -5648,6 +5714,8 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
 
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		ixgbe_clean_tx_ring(adapter->tx_ring[i]);
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		ixgbe_clean_tx_ring(adapter->xdp_ring[i]);
 }
 
 static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
@@ -5742,6 +5810,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 		u8 reg_idx = adapter->tx_ring[i]->reg_idx;
 		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
 	}
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		u8 reg_idx = adapter->xdp_ring[i]->reg_idx;
+
+		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+	}
 
 	/* Disable the Tx DMA engine on 82599 and later MAC */
 	switch (hw->mac.type) {
@@ -6112,7 +6185,7 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
  **/
 static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
 {
-	int i, err = 0;
+	int i, j = 0, err = 0;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
@@ -6122,10 +6195,20 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
 		e_err(probe, "Allocation for Tx Queue %u failed\n", i);
 		goto err_setup_tx;
 	}
+	for (j = 0; j < adapter->num_xdp_queues; j++) {
+		err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
+		if (!err)
+			continue;
+
+		e_err(probe, "Allocation for Tx Queue %u failed\n", j);
+		goto err_setup_tx;
+	}
 
 	return 0;
 err_setup_tx:
 	/* rewind the index freeing the rings as we go */
+	while (j--)
+		ixgbe_free_tx_resources(adapter->xdp_ring[j]);
 	while (i--)
 		ixgbe_free_tx_resources(adapter->tx_ring[i]);
 	return err;
@@ -6258,6 +6341,9 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter)
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		if (adapter->tx_ring[i]->desc)
 			ixgbe_free_tx_resources(adapter->tx_ring[i]);
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		if (adapter->xdp_ring[i]->desc)
+			ixgbe_free_tx_resources(adapter->xdp_ring[i]);
 }
 
 /**
@@ -6677,6 +6763,14 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 		bytes += tx_ring->stats.bytes;
 		packets += tx_ring->stats.packets;
 	}
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
+
+		restart_queue += xdp_ring->tx_stats.restart_queue;
+		tx_busy += xdp_ring->tx_stats.tx_busy;
+		bytes += xdp_ring->stats.bytes;
+		packets += xdp_ring->stats.packets;
+	}
 	adapter->restart_queue = restart_queue;
 	adapter->tx_busy = tx_busy;
 	netdev->stats.tx_bytes = bytes;
@@ -6870,6 +6964,9 @@ static void ixgbe_fdir_reinit_subtask(struct ixgbe_adapter *adapter)
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			set_bit(__IXGBE_TX_FDIR_INIT_DONE,
 				&(adapter->tx_ring[i]->state));
+		for (i = 0; i < adapter->num_xdp_queues; i++)
+			set_bit(__IXGBE_TX_FDIR_INIT_DONE,
+				&adapter->xdp_ring[i]->state);
 		/* re-enable flow director interrupts */
 		IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_FLOW_DIR);
 	} else {
@@ -6903,6 +7000,8 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
 	if (netif_carrier_ok(adapter->netdev)) {
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			set_check_for_tx_hang(adapter->tx_ring[i]);
+		for (i = 0; i < adapter->num_xdp_queues; i++)
+			set_check_for_tx_hang(adapter->xdp_ring[i]);
 	}
 
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
@@ -7133,6 +7232,13 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
 			return true;
 	}
 
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *ring = adapter->xdp_ring[i];
+
+		if (ring->next_to_use != ring->next_to_clean)
+			return true;
+	}
+
 	return false;
 }
 
@@ -8090,6 +8196,69 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 #endif
 }
 
+static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			       struct xdp_buff *xdp)
+{
+	struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
+	struct ixgbe_tx_buffer *tx_buffer;
+	union ixgbe_adv_tx_desc *tx_desc;
+	u32 len, cmd_type;
+	dma_addr_t dma;
+	u16 i;
+
+	len = xdp->data_end - xdp->data;
+
+	if (unlikely(!ixgbe_desc_unused(ring)))
+		return IXGBE_XDP_CONSUMED;
+
+	dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
+	if (dma_mapping_error(ring->dev, dma))
+		return IXGBE_XDP_CONSUMED;
+
+	/* record the location of the first descriptor for this packet */
+	tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
+	tx_buffer->bytecount = len;
+	tx_buffer->gso_segs = 1;
+	tx_buffer->protocol = 0;
+
+	i = ring->next_to_use;
+	tx_desc = IXGBE_TX_DESC(ring, i);
+
+	dma_unmap_len_set(tx_buffer, len, len);
+	dma_unmap_addr_set(tx_buffer, dma, dma);
+	tx_buffer->data = xdp->data;
+	tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+	/* put descriptor type bits */
+	cmd_type = IXGBE_ADVTXD_DTYP_DATA |
+		   IXGBE_ADVTXD_DCMD_DEXT |
+		   IXGBE_ADVTXD_DCMD_IFCS;
+	cmd_type |= len | IXGBE_TXD_CMD;
+	tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+	tx_desc->read.olinfo_status =
+		cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
+	/* Force memory writes to complete before letting h/w know there
+	 * are new descriptors to fetch.  (Only applicable for weak-ordered
+	 * memory model archs, such as IA-64).
+	 *
+	 * We also need this memory barrier to make certain all of the
+	 * status bits have been updated before next_to_watch is written.
+	 */
+	wmb();
+
+	/* set next_to_watch value indicating a packet is present */
+	i++;
+	if (i == ring->count)
+		i = 0;
+
+	tx_buffer->next_to_watch = tx_desc;
+	ring->next_to_use = i;
+
+	writel(i, ring->tail);
+	return IXGBE_XDP_TX;
+}
+
 netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_adapter *adapter,
 			  struct ixgbe_ring *tx_ring)
@@ -8381,6 +8550,23 @@ static void ixgbe_netpoll(struct net_device *netdev)
 
 #endif
 
+static void ixgbe_get_ring_stats64(struct rtnl_link_stats64 *stats,
+				   struct ixgbe_ring *ring)
+{
+	u64 bytes, packets;
+	unsigned int start;
+
+	if (ring) {
+		do {
+			start = u64_stats_fetch_begin_irq(&ring->syncp);
+			packets = ring->stats.packets;
+			bytes   = ring->stats.bytes;
+		} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
+		stats->tx_packets += packets;
+		stats->tx_bytes   += bytes;
+	}
+}
+
 static void ixgbe_get_stats64(struct net_device *netdev,
 			      struct rtnl_link_stats64 *stats)
 {
@@ -8406,18 +8592,13 @@ static void ixgbe_get_stats64(struct net_device *netdev,
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
-		u64 bytes, packets;
-		unsigned int start;
 
-		if (ring) {
-			do {
-				start = u64_stats_fetch_begin_irq(&ring->syncp);
-				packets = ring->stats.packets;
-				bytes   = ring->stats.bytes;
-			} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
-			stats->tx_packets += packets;
-			stats->tx_bytes   += bytes;
-		}
+		ixgbe_get_ring_stats64(stats, ring);
+	}
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *ring = ACCESS_ONCE(adapter->xdp_ring[i]);
+
+		ixgbe_get_ring_stats64(stats, ring);
 	}
 	rcu_read_unlock();
 
@@ -9559,9 +9740,23 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
 			return -EINVAL;
 	}
 
+	if (nr_cpu_ids > MAX_XDP_QUEUES)
+		return -ENOMEM;
+
 	old_prog = xchg(&adapter->xdp_prog, prog);
-	for (i = 0; i < adapter->num_rx_queues; i++)
-		xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+
+	/* If transitioning XDP modes reconfigure rings */
+	if (!!prog != !!old_prog) {
+		int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+
+		if (err) {
+			rcu_assign_pointer(adapter->xdp_prog, old_prog);
+			return -EINVAL;
+		}
+	} else {
+		for (i = 0; i < adapter->num_rx_queues; i++)
+			xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+	}
 
 	if (old_prog)
 		bpf_prog_put(old_prog);
@@ -10060,6 +10255,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto err_sw_init;
 
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		u64_stats_init(&adapter->xdp_ring[i]->syncp);
+
 	/* WOL not supported for all devices */
 	adapter->wol = 0;
 	hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 04/11] ixgbe: delay tail write to every 'n' packets
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>

Current XDP implementation hits the tail on every XDP_TX return
code. This patch changes driver behavior to only hit the tail after
packet processing is complete.

With this patch I can run XDP drop programs @ 14+Mpps and XDP_TX
programs are at ~13.5Mpps.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 ++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cb5be7de2c91..3d7b09100945 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2283,6 +2283,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	unsigned int mss = 0;
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	bool xdp_xmit = false;
 
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
@@ -2322,10 +2323,12 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 		if (IS_ERR(skb)) {
-			if (PTR_ERR(skb) == -IXGBE_XDP_TX)
+			if (PTR_ERR(skb) == -IXGBE_XDP_TX) {
+				xdp_xmit = true;
 				ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size);
-			else
+			} else {
 				rx_buffer->pagecnt_bias++;
+			}
 			total_rx_packets++;
 			total_rx_bytes += size;
 		} else if (skb) {
@@ -2393,6 +2396,16 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		total_rx_packets++;
 	}
 
+	if (xdp_xmit) {
+		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.
+		 */
+		wmb();
+		writel(ring->next_to_use, ring->tail);
+	}
+
 	u64_stats_update_begin(&rx_ring->syncp);
 	rx_ring->stats.packets += total_rx_packets;
 	rx_ring->stats.bytes += total_rx_bytes;
@@ -8238,14 +8251,8 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 	tx_desc->read.olinfo_status =
 		cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
 
-	/* Force memory writes to complete before letting h/w know there
-	 * are new descriptors to fetch.  (Only applicable for weak-ordered
-	 * memory model archs, such as IA-64).
-	 *
-	 * We also need this memory barrier to make certain all of the
-	 * status bits have been updated before next_to_watch is written.
-	 */
-	wmb();
+	/* Avoid any potential race with xdp_xmit and cleanup */
+	smp_wmb();
 
 	/* set next_to_watch value indicating a packet is present */
 	i++;
@@ -8255,7 +8262,6 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 	tx_buffer->next_to_watch = tx_desc;
 	ring->next_to_use = i;
 
-	writel(i, ring->tail);
 	return IXGBE_XDP_TX;
 }
 
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 02/11] ixgbe: add XDP support for pass and drop actions
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>

Basic XDP drop support for ixgbe. Uses READ_ONCE/xchg semantics on XDP
programs instead of RCU primitives as suggested by Daniel Borkmann and
Alex Duyck.

v2: fix the build issues seen w/ XDP when page sizes are larger than 4K
    and made minor fixes based on feedback from Jakub Kicinski

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 169 +++++++++++++++++++----
 3 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 656ca8f69768..cb14813b0080 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -318,6 +318,7 @@ struct ixgbe_ring {
 	struct ixgbe_ring *next;	/* pointer to next ring in q_vector */
 	struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
 	struct net_device *netdev;	/* netdev ring belongs to */
+	struct bpf_prog *xdp_prog;
 	struct device *dev;		/* device for DMA mapping */
 	struct ixgbe_fwd_adapter *l2_accel_priv;
 	void *desc;			/* descriptor ring memory */
@@ -555,6 +556,7 @@ struct ixgbe_adapter {
 	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 	/* OS defined structs */
 	struct net_device *netdev;
+	struct bpf_prog *xdp_prog;
 	struct pci_dev *pdev;
 
 	unsigned long state;
@@ -835,7 +837,7 @@ void ixgbe_down(struct ixgbe_adapter *adapter);
 void ixgbe_reinit_locked(struct ixgbe_adapter *adapter);
 void ixgbe_reset(struct ixgbe_adapter *adapter);
 void ixgbe_set_ethtool_ops(struct net_device *netdev);
-int ixgbe_setup_rx_resources(struct ixgbe_ring *);
+int ixgbe_setup_rx_resources(struct ixgbe_adapter *, struct ixgbe_ring *);
 int ixgbe_setup_tx_resources(struct ixgbe_ring *);
 void ixgbe_free_rx_resources(struct ixgbe_ring *);
 void ixgbe_free_tx_resources(struct ixgbe_ring *);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 59730ede4746..79a126d9e091 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1128,7 +1128,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
 			       sizeof(struct ixgbe_ring));
 
 			temp_ring[i].count = new_rx_count;
-			err = ixgbe_setup_rx_resources(&temp_ring[i]);
+			err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]);
 			if (err) {
 				while (i) {
 					i--;
@@ -1761,7 +1761,7 @@ static int ixgbe_setup_desc_rings(struct ixgbe_adapter *adapter)
 	rx_ring->netdev = adapter->netdev;
 	rx_ring->reg_idx = adapter->rx_ring[0]->reg_idx;
 
-	err = ixgbe_setup_rx_resources(rx_ring);
+	err = ixgbe_setup_rx_resources(adapter, rx_ring);
 	if (err) {
 		ret_val = 4;
 		goto err_nomem;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afff2ca7f8c0..99b5357c3e00 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -49,6 +49,9 @@
 #include <linux/if_macvlan.h>
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
+#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
+#include <linux/atomic.h>
 #include <scsi/fc/fc_fcoe.h>
 #include <net/udp_tunnel.h>
 #include <net/pkt_cls.h>
@@ -1855,6 +1858,10 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
  * @rx_desc: pointer to the EOP Rx descriptor
  * @skb: pointer to current skb being fixed
  *
+ * Check if the skb is valid in the XDP case it will be an error pointer.
+ * Return true in this case to abort processing and advance to next
+ * descriptor.
+ *
  * Check for corrupted packet headers caused by senders on the local L2
  * embedded NIC switch not setting up their Tx Descriptors right.  These
  * should be very rare.
@@ -1873,6 +1880,10 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 {
 	struct net_device *netdev = rx_ring->netdev;
 
+	/* XDP packets use error pointer so abort at this point */
+	if (IS_ERR(skb))
+		return true;
+
 	/* verify that the packet does not have any known errors */
 	if (unlikely(ixgbe_test_staterr(rx_desc,
 					IXGBE_RXDADV_ERR_FRAME_ERR_MASK) &&
@@ -2048,7 +2059,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
 		/* hand second half of page back to the ring */
 		ixgbe_reuse_rx_page(rx_ring, rx_buffer);
 	} else {
-		if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
+		if (!IS_ERR(skb) && IXGBE_CB(skb)->dma == rx_buffer->dma) {
 			/* the page has been released from the ring */
 			IXGBE_CB(skb)->page_released = true;
 		} else {
@@ -2069,21 +2080,22 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
 
 static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 					   struct ixgbe_rx_buffer *rx_buffer,
-					   union ixgbe_adv_rx_desc *rx_desc,
-					   unsigned int size)
+					   struct xdp_buff *xdp,
+					   union ixgbe_adv_rx_desc *rx_desc)
 {
-	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+	unsigned int size = xdp->data_end - xdp->data;
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(size);
+	unsigned int truesize = SKB_DATA_ALIGN(xdp->data_end -
+					       xdp->data_hard_start);
 #endif
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
+	prefetch(xdp->data);
 #if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
+	prefetch(xdp->data + L1_CACHE_BYTES);
 #endif
 
 	/* allocate a skb to store the frags */
@@ -2096,7 +2108,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 			IXGBE_CB(skb)->dma = rx_buffer->dma;
 
 		skb_add_rx_frag(skb, 0, rx_buffer->page,
-				rx_buffer->page_offset,
+				xdp->data - page_address(rx_buffer->page),
 				size, truesize);
 #if (PAGE_SIZE < 8192)
 		rx_buffer->page_offset ^= truesize;
@@ -2104,7 +2116,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 		rx_buffer->page_offset += truesize;
 #endif
 	} else {
-		memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
+		memcpy(__skb_put(skb, size),
+		       xdp->data, ALIGN(size, sizeof(long)));
 		rx_buffer->pagecnt_bias++;
 	}
 
@@ -2113,32 +2126,32 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 
 static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 				       struct ixgbe_rx_buffer *rx_buffer,
-				       union ixgbe_adv_rx_desc *rx_desc,
-				       unsigned int size)
+				       struct xdp_buff *xdp,
+				       union ixgbe_adv_rx_desc *rx_desc)
 {
-	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
 #else
 	unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size);
+				SKB_DATA_ALIGN(xdp->data_end -
+					       xdp->data_hard_start);
 #endif
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
+	prefetch(xdp->data);
 #if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
+	prefetch(xdp->data + L1_CACHE_BYTES);
 #endif
 
-	/* build an skb around the page buffer */
-	skb = build_skb(va - IXGBE_SKB_PAD, truesize);
+	/* build an skb to around the page buffer */
+	skb = build_skb(xdp->data_hard_start, truesize);
 	if (unlikely(!skb))
 		return NULL;
 
 	/* update pointers within the skb to store the data */
-	skb_reserve(skb, IXGBE_SKB_PAD);
-	__skb_put(skb, size);
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
+	__skb_put(skb, xdp->data_end - xdp->data);
 
 	/* record DMA address if this is the start of a chain of buffers */
 	if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
@@ -2154,6 +2167,41 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
+#define IXGBE_XDP_PASS 0
+#define IXGBE_XDP_CONSUMED 1
+
+static struct sk_buff *ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
+				     struct xdp_buff *xdp)
+{
+	int result = IXGBE_XDP_PASS;
+	struct bpf_prog *xdp_prog;
+	u32 act;
+
+	rcu_read_lock();
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+
+	if (!xdp_prog)
+		goto xdp_out;
+
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
+	switch (act) {
+	case XDP_PASS:
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+	case XDP_TX:
+	case XDP_ABORTED:
+		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+		/* fallthrough -- handle aborts by dropping packet */
+	case XDP_DROP:
+		result = IXGBE_XDP_CONSUMED;
+		break;
+	}
+xdp_out:
+	rcu_read_unlock();
+	return ERR_PTR(-result);
+}
+
 /**
  * ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @q_vector: structure containing interrupt and ring information
@@ -2183,6 +2231,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
 		struct sk_buff *skb;
+		struct xdp_buff xdp;
 		unsigned int size;
 
 		/* return some buffers to hardware, one at a time is too slow */
@@ -2205,14 +2254,29 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
 
 		/* retrieve a buffer from the ring */
-		if (skb)
+		if (!skb) {
+			xdp.data = page_address(rx_buffer->page) +
+				   rx_buffer->page_offset;
+			xdp.data_hard_start = xdp.data -
+					      ixgbe_rx_offset(rx_ring);
+			xdp.data_end = xdp.data + size;
+
+			skb = ixgbe_run_xdp(rx_ring, &xdp);
+		}
+
+		if (IS_ERR(skb)) {
+			total_rx_packets++;
+			total_rx_bytes += size;
+			rx_buffer->pagecnt_bias++;
+		} else if (skb) {
 			ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
-		else if (ring_uses_build_skb(rx_ring))
+		} else if (ring_uses_build_skb(rx_ring)) {
 			skb = ixgbe_build_skb(rx_ring, rx_buffer,
-					      rx_desc, size);
-		else
+					      &xdp, rx_desc);
+		} else {
 			skb = ixgbe_construct_skb(rx_ring, rx_buffer,
-						  rx_desc, size);
+						  &xdp, rx_desc);
+		}
 
 		/* exit if we failed to retrieve a buffer */
 		if (!skb) {
@@ -6073,7 +6137,8 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
  *
  * Returns 0 on success, negative on failure
  **/
-int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
+int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
+			     struct ixgbe_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
 	int orig_node = dev_to_node(dev);
@@ -6112,6 +6177,8 @@ int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
 
+	rx_ring->xdp_prog = adapter->xdp_prog;
+
 	return 0;
 err:
 	vfree(rx_ring->rx_buffer_info);
@@ -6135,7 +6202,7 @@ static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter)
 	int i, err = 0;
 
 	for (i = 0; i < adapter->num_rx_queues; i++) {
-		err = ixgbe_setup_rx_resources(adapter->rx_ring[i]);
+		err = ixgbe_setup_rx_resources(adapter, adapter->rx_ring[i]);
 		if (!err)
 			continue;
 
@@ -6203,6 +6270,7 @@ void ixgbe_free_rx_resources(struct ixgbe_ring *rx_ring)
 {
 	ixgbe_clean_rx_ring(rx_ring);
 
+	rx_ring->xdp_prog = NULL;
 	vfree(rx_ring->rx_buffer_info);
 	rx_ring->rx_buffer_info = NULL;
 
@@ -9468,6 +9536,54 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev,
 	return features;
 }
 
+static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
+{
+	int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct bpf_prog *old_prog;
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		return -EINVAL;
+
+	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
+		return -EINVAL;
+
+	/* verify ixgbe ring attributes are sufficient for XDP */
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		struct ixgbe_ring *ring = adapter->rx_ring[i];
+
+		if (ring_is_rsc_enabled(ring))
+			return -EINVAL;
+
+		if (frame_size > ixgbe_rx_bufsz(ring))
+			return -EINVAL;
+	}
+
+	old_prog = xchg(&adapter->xdp_prog, prog);
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		xchg(&adapter->rx_ring[i]->xdp_prog, adapter->xdp_prog);
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	return 0;
+}
+
+static int ixgbe_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return ixgbe_xdp_setup(dev, xdp->prog);
+	case XDP_QUERY_PROG:
+		xdp->prog_attached = !!(adapter->xdp_prog);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -9513,6 +9629,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_udp_tunnel_add	= ixgbe_add_udp_tunnel_port,
 	.ndo_udp_tunnel_del	= ixgbe_del_udp_tunnel_port,
 	.ndo_features_check	= ixgbe_features_check,
+	.ndo_xdp		= ixgbe_xdp,
 };
 
 /**
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 01/11] ixgbe: Acquire PHY semaphore before device reset
From: Jeff Kirsher @ 2017-04-30  3:08 UTC (permalink / raw)
  To: davem; +Cc: Paul Greenwalt, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170430030810.56415-1-jeffrey.t.kirsher@intel.com>

From: Paul Greenwalt <paul.greenwalt@intel.com>

A recent firmware change fixed an issue to acquire the PHY semaphore before
accessing PHY registers. This led to a case where  SW can issue a device
reset clearing the MDIO registers. This patch makes SW acquire the PHY
semaphore before issuing a device reset.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c | 8 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
index 84a467a8ed3d..6ea0d6a5fb90 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
@@ -95,6 +95,7 @@ s32 ixgbe_reset_hw_X540(struct ixgbe_hw *hw)
 {
 	s32 status;
 	u32 ctrl, i;
+	u32 swfw_mask = hw->phy.phy_semaphore_mask;
 
 	/* Call adapter stop to disable tx/rx and clear interrupts */
 	status = hw->mac.ops.stop_adapter(hw);
@@ -105,10 +106,17 @@ s32 ixgbe_reset_hw_X540(struct ixgbe_hw *hw)
 	ixgbe_clear_tx_pending(hw);
 
 mac_reset_top:
+	status = hw->mac.ops.acquire_swfw_sync(hw, swfw_mask);
+	if (status) {
+		hw_dbg(hw, "semaphore failed with %d", status);
+		return IXGBE_ERR_SWFW_SYNC;
+	}
+
 	ctrl = IXGBE_CTRL_RST;
 	ctrl |= IXGBE_READ_REG(hw, IXGBE_CTRL);
 	IXGBE_WRITE_REG(hw, IXGBE_CTRL, ctrl);
 	IXGBE_WRITE_FLUSH(hw);
+	hw->mac.ops.release_swfw_sync(hw, swfw_mask);
 	usleep_range(1000, 1200);
 
 	/* Poll for reset bit to self-clear indicating reset is complete */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 2658394599e4..58d3bcaca2b9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -3318,6 +3318,7 @@ static s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)
 	u32 ctrl = 0;
 	u32 i;
 	bool link_up = false;
+	u32 swfw_mask = hw->phy.phy_semaphore_mask;
 
 	/* Call adapter stop to disable Tx/Rx and clear interrupts */
 	status = hw->mac.ops.stop_adapter(hw);
@@ -3363,9 +3364,16 @@ static s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)
 			ctrl = IXGBE_CTRL_RST;
 	}
 
+	status = hw->mac.ops.acquire_swfw_sync(hw, swfw_mask);
+	if (status) {
+		hw_dbg(hw, "semaphore failed with %d", status);
+		return IXGBE_ERR_SWFW_SYNC;
+	}
+
 	ctrl |= IXGBE_READ_REG(hw, IXGBE_CTRL);
 	IXGBE_WRITE_REG(hw, IXGBE_CTRL, ctrl);
 	IXGBE_WRITE_FLUSH(hw);
+	hw->mac.ops.release_swfw_sync(hw, swfw_mask);
 	usleep_range(1000, 1200);
 
 	/* Poll for reset bit to self-clear meaning reset is complete */
-- 
2.12.2

^ permalink raw reply related

* [net-next v2 00/11][pull request] 10GbE Intel Wired LAN Driver Updates 2017-04-29
From: Jeff Kirsher @ 2017-04-30  3:07 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to ixgbe and ixgbevf only, most notable is
the addition of XDP support to our 10GbE drivers.

Paul fixes ixgbe to acquire the PHY semaphore before accessing PHY
registers when issuing a device reset.

John adds XDP support (yeah!) for ixgbe.

Emil fixes an issue by flushing the MACVLAN filters on VF reset to avoid
conflicts with other VFs that may end up using the same MAC address.  Also
fixed a bug where ethtool -S displayed some empty fields for ixgbevf
because it was using ixgbe_stats instead ixgbevf_stats for
IXGBEVF_QUEUE_STATS_LEN.

Tony adds the ability to specify a zero MAC address in order to clear the
VF's MAC address from the RAR table.  Also adds support for a new
1000Base-T device based on x550EM_X MAC type.  Fixed an issue where the
RSS key specified by the user would be over-written with a pre-existing
value, so change the rss_key to a pointer so we can check to see if the
key has a value set before attempting to set it.  Fixed the logic for
mailbox support for getting RETA and RSS values, which are only supported
by 82599 and x540 devices.

v2: fixed up patches #2 and #3 based on feedback from Jakub and to
    address build issues when page sizes are larger than 4k

The following are changes since commit 4c042a80c1776477d8f13d0b214809a2eaedf01d:
  Merge branch 'bpf-misc-next'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Emil Tantilov (2):
  ixgbe: clean macvlan MAC filter table on VF reset
  ixgbevf: fix size of queue stats length

John Fastabend (3):
  ixgbe: add XDP support for pass and drop actions
  ixgbe: add support for XDP_TX action
  ixgbe: delay tail write to every 'n' packets

Paul Greenwalt (2):
  ixgbe: Acquire PHY semaphore before device reset
  ixgbe: Add 1000Base-T device based on X550EM_X MAC

Tony Nguyen (4):
  ixgbe: Allow setting zero MAC address for VF
  ixgbe: Check for RSS key before setting value
  ixgbevf: Fix errors in retrieving RETA and RSS from PF
  ixgbevf: Check for RSS key before setting value

 drivers/net/ethernet/intel/ixgbe/ixgbe.h          |  27 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  33 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c      |  75 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 477 +++++++++++++++++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 135 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h     |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c     |   8 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c     |  53 ++-
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |   5 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |   2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  33 +-
 drivers/net/ethernet/intel/ixgbevf/vf.c           |   6 +-
 12 files changed, 701 insertions(+), 154 deletions(-)

-- 
2.12.2

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: David Miller @ 2017-04-30  2:37 UTC (permalink / raw)
  To: ast; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <20170429.222450.1300920007783667009.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Sat, 29 Apr 2017 22:24:50 -0400 (EDT)

> Some of your bugs should be fixed by this patch below, I'll add
> test cases soon:

Ok, here are all the local changes in my tree.  I made the relocs
match LLVM and I fixed some dwarf debugging stuff.

With this we are also down to one test case failure under binutils/
and it's something weird with merging 64-bit notes which I should be
able to fix soon.

I can fix these bugs fast, keep reporting.

BTW, should I just remove tailcall from the opcode table altogether?

diff --git a/binutils/readelf.c b/binutils/readelf.c
index 6c67d98..3931a3a 100644
--- a/binutils/readelf.c
+++ b/binutils/readelf.c
@@ -95,6 +95,7 @@
 #include "elf/arm.h"
 #include "elf/avr.h"
 #include "elf/bfin.h"
+#include "elf/bpf.h"
 #include "elf/cr16.h"
 #include "elf/cris.h"
 #include "elf/crx.h"
@@ -746,6 +747,7 @@ guess_is_rela (unsigned int e_machine)
     case EM_AVR:
     case EM_AVR_OLD:
     case EM_BLACKFIN:
+    case EM_BPF:
     case EM_CR16:
     case EM_CRIS:
     case EM_CRX:
@@ -1458,6 +1460,10 @@ dump_relocations (FILE * file,
 	  rtype = elf_bfin_reloc_type (type);
 	  break;
 
+	case EM_BPF:
+	  rtype = elf_bpf_reloc_type (type);
+	  break;
+
 	case EM_CYGNUS_MEP:
 	  rtype = elf_mep_reloc_type (type);
 	  break;
diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
index 0ba2afa..36393b7 100644
--- a/gas/config/tc-bpf.c
+++ b/gas/config/tc-bpf.c
@@ -288,6 +288,14 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
 	  switch (*args)
 	    {
 	    case '+':
+	      if (*s == '+')
+		{
+		  ++s;
+		  continue;
+		}
+	      if (*s == '-')
+		continue;
+	      break;
 	    case ',':
 	    case '[':
 	    case ']':
@@ -494,6 +502,9 @@ md_apply_fix (fixS *fixP, valueT *valP ATTRIBUTE_UNUSED, segT segment ATTRIBUTE_
   offsetT val = * (offsetT *) valP;
 
   gas_assert (fixP->fx_r_type < BFD_RELOC_UNUSED);
+
+  fixP->fx_addnumber = val;	/* Remember value for emit_reloc.  */
+
   /* If this is a data relocation, just output VAL.  */
 
   if (fixP->fx_r_type == BFD_RELOC_8)
diff --git a/include/elf/bpf.h b/include/elf/bpf.h
index 5019b11..77463e3 100644
--- a/include/elf/bpf.h
+++ b/include/elf/bpf.h
@@ -26,14 +26,14 @@
 /* Relocation types.  */
 START_RELOC_NUMBERS (elf_bpf_reloc_type)
   RELOC_NUMBER (R_BPF_NONE, 0)
-  RELOC_NUMBER (R_BPF_INSN_16, 1)
-  RELOC_NUMBER (R_BPF_INSN_32, 2)
-  RELOC_NUMBER (R_BPF_INSN_64, 3)
-  RELOC_NUMBER (R_BPF_WDISP16, 4)
-  RELOC_NUMBER (R_BPF_DATA_8,  5)
-  RELOC_NUMBER (R_BPF_DATA_16, 6)
-  RELOC_NUMBER (R_BPF_DATA_32, 7)
-  RELOC_NUMBER (R_BPF_DATA_64, 8)
+  RELOC_NUMBER (R_BPF_DATA_64, 1)
+  RELOC_NUMBER (R_BPF_INSN_16, 2)
+  RELOC_NUMBER (R_BPF_INSN_32, 3)
+  RELOC_NUMBER (R_BPF_INSN_64, 4)
+  RELOC_NUMBER (R_BPF_WDISP16, 5)
+  RELOC_NUMBER (R_BPF_DATA_8,  6)
+  RELOC_NUMBER (R_BPF_DATA_16, 7)
+  RELOC_NUMBER (R_BPF_DATA_32, 10)
 END_RELOC_NUMBERS (R_BPF_max)
 
 #endif /* _ELF_BPF_H */
diff --git a/opcodes/bpf-dis.c b/opcodes/bpf-dis.c
index 92e29af..39656bf 100644
--- a/opcodes/bpf-dis.c
+++ b/opcodes/bpf-dis.c
@@ -49,7 +49,7 @@ print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
   bpf_opcode_hash *op;
   int code, dest, src;
   bfd_byte buffer[8];
-  unsigned short off;
+  signed short off;
   int status, ret;
   signed int imm;
 
@@ -78,7 +78,7 @@ print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
   else
     {
       getword = bfd_getl32;
-      gethalf = bfd_getl32;
+      gethalf = bfd_getl16;
     }  
 
   code = buffer[0];
@@ -128,7 +128,7 @@ print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
 	      (*info->fprintf_func) (stream, "%d", imm);
 	      break;
 	    case 'O':
-	      (*info->fprintf_func) (stream, "%d", off);
+	      (*info->fprintf_func) (stream, "%d", (int) off);
 	      break;
 	    case 'L':
 	      info->target = memaddr + ((off - 1) * 8);

^ permalink raw reply related

* Re: prog ID and next steps. Was: [RFC net-next 0/2] Introduce bpf_prog ID and iteration
From: Hannes Frederic Sowa @ 2017-04-30  2:36 UTC (permalink / raw)
  To: Alexei Starovoitov, Martin KaFai Lau, netdev
  Cc: Daniel Borkmann, kernel-team, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Thomas Graf
In-Reply-To: <b3dbda71-df4c-7e92-76aa-2c2252266cbc@fb.com>

Hi,

just quickly, because I am on a run:

On Sun, Apr 30, 2017, at 04:06, Alexei Starovoitov wrote:
> On 4/28/17 2:13 PM, Hannes Frederic Sowa wrote:
> >
> > Let's assume the following program with a constant key lookup and
> > different tables:
> >
> > action = bpf_map_lookup_elem(&actions, 0);
> > if (!*action)
> > 	return XDP_DROP;
> > else
> > 	return bpf_redirect(skb->ifindex, 0);
> >
> > It does something completely different depending on the map being used.
> > That is the reason why I see it makes sense to be specific which program
> > gets used if you try to analyze a program interactively.
> 
> Instead of two exactly the same programs accessing different
> maps it could have been one program accessing one map.
> The end result is the same.
> Depending on contents of the map and the input bytes of the packet
> the program will do arbitrary decisions. I still don't see how printing
> prog ID can help debugging.

Because during live inspection/debugging you get the relationship chain
in order:

n bpf-programs with same tag : loaded at m hooks : using z maps

You trace bpf_redirect and get back only the bpf tag.

You inspect all hooks and get back a bpf program identified by either or
all of filedescriptor, tag or prog_id. This doesn't solve the ambiguity
which program just actually was traced. Thus you have to consider all of
the programs. Each program can reference different maps.

So you extract all bpf programs and attributes and take the union of all
referenced maps, which you actually would have to inspect. If you had
the prog_id available, you could clearly figure out which maps are
referenced by the traced program.

I do see this as a prolem, because lot's of policy scripts might be
absolutely the same (thus also same tag) but might use different maps in
the end. We might not be able to correlate traces back to the maps.
Imagine a lot of bpf-lwt programs loaded, all the same, doing decisions
based on labels in the maps. It would be nice to see which specific map
also caused the behavior and to trace it back to user space and where
and why it was generated.

> >> The programs gets unloaded too and this 'perf record' and stack
> >> traces come from the past, hence the need for stable prog_tag.
> >
> > perf only stores addresses in perf.data. That said, if the program isn't
> > loaded, it won't give you any tag. If another program is reusing the
> > same address, if will give you any other random name for the function in
> > the calltrace.
> 
> This issue is well understood. We discussed it with Arnaldo and
> the plan is to emit PERF_RECORD_MMAP that will contain the address
> of JITed bpf program, so perf.data will contain all the info necessary
> for later analysis.

I think you meant to say "will contain the *name*" of the JITed bpf
program?

If so, that is cool.

> > If you store the perf script output or have kallsyms handy, certainly, yes.
> 
> right now 'perf script' doesn't work for short lived programs either.
> The stack trace IPs may be already gone from kallsyms.
> PERF_RECORD_MMAP is the right solution to that as well.

Absolutely ack.

> > Most of the time I was debugging interactively. Developers would
> > probably also enjoy to have a way to trace the program to the exact
> > identity. I have no problem keeping the tag in place and append just the
> > prog_id for the specific reason that the program might be loaded
> > multiple times with different tags in place. I was concerned about the
> > space for function names in kallsyms.
> 
> developers enjoy 'exact identity of the program' with program tag.
> Dynamic prog ID doesn't provide additional info to the user.
> Like when you write brand new kernel module, do you care what
> order it's loaded in lsmod ?
> Or if system crashed, do you care that veth0 had ifindex 10 or 20?
> Or PID of segfaulted program was 3256 ?

I do care which data structures my kernel module or program used,
because I might want to reference exactly those for debugging. ;)

> > I have to think more about it. Maybe there is a way to achieve both
> > without too much hassle.
> 
> It seems we agree on 80+% of topics discussed in this thread, so
> I suggest we implement, review and land these key pieces and later
> resolve the contentious points. By that time we will likely
> see each other arguments in different light :)

I agree with that. Maybe better ideas come up.

I was a bit pushy here, because I fear that kallsym name changes could
have uapi impact.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: David Miller @ 2017-04-30  2:24 UTC (permalink / raw)
  To: ast; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <9d5e3a87-15d0-01d7-6272-fa8d2bf0d076@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Sat, 29 Apr 2017 17:48:43 -0700

> $ bld/binutils/objdump -S test.o
> 
> test.o:     file format elf64-bpfbe
> 
> Disassembly of section .text:
> 
> 0000000000000000 <bpf_prog1>:
>    0:	18 10 00 00 83 98 47 39 	ldimm64	r1, 590618314553
>    8:	00 00 00 00 00 00 00 89
>   10:	7b a1 ff f8 00 00 00 00 	stdw	[r10+65528], r1
>   18:	79 1a ff f8 00 00 00 00 	lddw	r1, [r10+65528]
>   20:	07 10 00 00 8f ff 00 02 	add	r1, -1879113726
>   28:	79 01 00 00 00 00 00 00 	lddw	r0, [r1+0]
>   30:	95 00 00 00 00 00 00 00 	exit
> 
> looks good except negative offsets are reported as large positive.

Some of your bugs should be fixed by this patch below, I'll add
test cases soon:

diff --git a/gas/config/tc-bpf.c b/gas/config/tc-bpf.c
index 0ba2afa..36393b7 100644
--- a/gas/config/tc-bpf.c
+++ b/gas/config/tc-bpf.c
@@ -288,6 +288,14 @@ md_assemble (char *str ATTRIBUTE_UNUSED)
 	  switch (*args)
 	    {
 	    case '+':
+	      if (*s == '+')
+		{
+		  ++s;
+		  continue;
+		}
+	      if (*s == '-')
+		continue;
+	      break;
 	    case ',':
 	    case '[':
 	    case ']':
diff --git a/opcodes/bpf-dis.c b/opcodes/bpf-dis.c
index 92e29af..39656bf 100644
--- a/opcodes/bpf-dis.c
+++ b/opcodes/bpf-dis.c
@@ -49,7 +49,7 @@ print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
   bpf_opcode_hash *op;
   int code, dest, src;
   bfd_byte buffer[8];
-  unsigned short off;
+  signed short off;
   int status, ret;
   signed int imm;
 
@@ -78,7 +78,7 @@ print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
   else
     {
       getword = bfd_getl32;
-      gethalf = bfd_getl32;
+      gethalf = bfd_getl16;
     }  
 
   code = buffer[0];
@@ -128,7 +128,7 @@ print_insn_bpf (bfd_vma memaddr, disassemble_info *info)
 	      (*info->fprintf_func) (stream, "%d", imm);
 	      break;
 	    case 'O':
-	      (*info->fprintf_func) (stream, "%d", off);
+	      (*info->fprintf_func) (stream, "%d", (int) off);
 	      break;
 	    case 'L':
 	      info->target = memaddr + ((off - 1) * 8);

^ permalink raw reply related

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: Alexei Starovoitov @ 2017-04-30  2:24 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <20170429.221315.1605574480939856975.davem@davemloft.net>

On 4/29/17 7:13 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Sat, 29 Apr 2017 17:48:43 -0700
>
>> /w/binutils-gdb/bld/binutils/objdump: invalid relocation type 10
>> /w/binutils-gdb/bld/binutils/objdump: Dwarf Error: found address size
>
> I discussed this in another email, the relocation numbers I used in
> binutils do not match what is in LLVM currently.
>
> In fact, I thought you guys weren't using relocations in any capacity
> at all so just picked things from scratch :-)

yeah :) will reply in the other thread.
Too many public and internal discussions in the last week.
Weekend is the only time to reduce the backlog :)

 > Please use "--target=bpf-elf"

Thanks. That worked. Built the whole thing :)

objdump behaves the same way.
When compiled by clang with '-g'
(gdb) x/10i bpf_prog1
    0x0 <clang version 5.0.0 (trunk 301652) (llvm/trunk 301662)>: 
ldimm64	r0, 590618314553
    0x10 <clang version 5.0.0 (trunk 301652) (llvm/trunk 301662)+16>: 
stdw	[r1+65528], r10
    0x18 <clang version 5.0.0 (trunk 301652) (llvm/trunk 301662)+24>: 
lddw	r10, [r1+65528]
    0x20 <clang version 5.0.0 (trunk 301652) (llvm/trunk 301662)+32>: 
add	r0, -1879113726
    0x28 <clang version 5.0.0 (trunk 301652) (llvm/trunk 301662)+40>: 
lddw	r1, [r0+0]
    0x30 <clang version 5.0.0 (trunk 301652) (llvm/trunk 301662)+48>:	exit
    0x38:	Cannot access memory at address 0x38

Even without -g the last line 'Cannot access' is printed.
It seems gdb miscalculates the total func size?
The printing of 'clang version...' is due to '-g'.
Without -g it looks good:
(gdb) x/10i bpf_prog1
    0x0 <bpf_prog1>:	ldimm64	r1, 590618314553
    0x10 <bpf_prog1+16>:	stdw	[r10+65528], r1

Btw, I'm using this C file for testing:
int bpf_prog1(void *ign)
{
   volatile unsigned long t = 0x8983984739ull;
   return *(unsigned long *)((0xffffffff8fff0002ull) + t);
}

There was a bug in llvm backend with imm overflow which
was recently fixed.

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: David Miller @ 2017-04-30  2:13 UTC (permalink / raw)
  To: ast; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <9d5e3a87-15d0-01d7-6272-fa8d2bf0d076@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Sat, 29 Apr 2017 17:48:43 -0700

> /w/binutils-gdb/bld/binutils/objdump: invalid relocation type 10
> /w/binutils-gdb/bld/binutils/objdump: Dwarf Error: found address size

I discussed this in another email, the relocation numbers I used in
binutils do not match what is in LLVM currently.

In fact, I thought you guys weren't using relocations in any capacity
at all so just picked things from scratch :-)

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: David Miller @ 2017-04-30  2:11 UTC (permalink / raw)
  To: ast; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <9d5e3a87-15d0-01d7-6272-fa8d2bf0d076@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Sat, 29 Apr 2017 17:48:43 -0700

> On 4/28/17 1:33 PM, David Miller wrote:
>> New in this version:
>>
>> 1) All the relocation work I posted earlier today.
>> 2) Teach readelf about a few bpf relocs as needed
>> 3) Add a 'nop' instruction which facilitates the gas
>>    testsuite.  I used "mov r0,r0"
>>
>> The whole gas testsuite passes now. :-)  But that just
>> means we have to add more tests I guess....
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> it seems by default bpf target is not enabled,
> so I tried to build it with:
> ../configure --enable-targets=bpf,x86

Please use "--target=bpf-elf"

^ permalink raw reply

* Re: prog ID and next steps. Was: [RFC net-next 0/2] Introduce bpf_prog ID and iteration
From: Alexei Starovoitov @ 2017-04-30  2:06 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Martin KaFai Lau, netdev
  Cc: Daniel Borkmann, kernel-team, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Thomas Graf
In-Reply-To: <cc962035-4374-b6c1-b350-c4dda9bf095c@stressinduktion.org>

On 4/28/17 2:13 PM, Hannes Frederic Sowa wrote:
>
> Let's assume the following program with a constant key lookup and
> different tables:
>
> action = bpf_map_lookup_elem(&actions, 0);
> if (!*action)
> 	return XDP_DROP;
> else
> 	return bpf_redirect(skb->ifindex, 0);
>
> It does something completely different depending on the map being used.
> That is the reason why I see it makes sense to be specific which program
> gets used if you try to analyze a program interactively.

Instead of two exactly the same programs accessing different
maps it could have been one program accessing one map.
The end result is the same.
Depending on contents of the map and the input bytes of the packet
the program will do arbitrary decisions. I still don't see how printing
prog ID can help debugging.

>> The programs gets unloaded too and this 'perf record' and stack
>> traces come from the past, hence the need for stable prog_tag.
>
> perf only stores addresses in perf.data. That said, if the program isn't
> loaded, it won't give you any tag. If another program is reusing the
> same address, if will give you any other random name for the function in
> the calltrace.

This issue is well understood. We discussed it with Arnaldo and
the plan is to emit PERF_RECORD_MMAP that will contain the address
of JITed bpf program, so perf.data will contain all the info necessary
for later analysis.

> If you store the perf script output or have kallsyms handy, certainly, yes.

right now 'perf script' doesn't work for short lived programs either.
The stack trace IPs may be already gone from kallsyms.
PERF_RECORD_MMAP is the right solution to that as well.

> Most of the time I was debugging interactively. Developers would
> probably also enjoy to have a way to trace the program to the exact
> identity. I have no problem keeping the tag in place and append just the
> prog_id for the specific reason that the program might be loaded
> multiple times with different tags in place. I was concerned about the
> space for function names in kallsyms.

developers enjoy 'exact identity of the program' with program tag.
Dynamic prog ID doesn't provide additional info to the user.
Like when you write brand new kernel module, do you care what
order it's loaded in lsmod ?
Or if system crashed, do you care that veth0 had ifindex 10 or 20?
Or PID of segfaulted program was 3256 ?

> I have to think more about it. Maybe there is a way to achieve both
> without too much hassle.

It seems we agree on 80+% of topics discussed in this thread, so
I suggest we implement, review and land these key pieces and later
resolve the contentious points. By that time we will likely
see each other arguments in different light :)

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: Alexei Starovoitov @ 2017-04-30  1:35 UTC (permalink / raw)
  To: Hannes Frederic Sowa, John Fastabend, Jesper Dangaard Brouer,
	Andy Gospodarek
  Cc: Alexei Starovoitov, Daniel Borkmann, Daniel Borkmann,
	netdev@vger.kernel.org, xdp-newbies@vger.kernel.org
In-Reply-To: <dd965e88-b9fd-4d58-4272-07588b842252@stressinduktion.org>

On 4/28/17 12:43 PM, Hannes Frederic Sowa wrote:
> On 28.04.2017 07:30, Alexei Starovoitov wrote:
>> On 4/27/17 10:06 PM, John Fastabend wrote:
>>> That is more or less what I was thinking as well. The other question
>>> I have though is should we have a bpf_redirect() call for the simple
>>> case where I use the ifindex directly. This will be helpful for taking
>>> existing programs from tc_cls into xdp. I think it makes sense to have
>>> both bpf_tx_allports(), bpf_tx_port(), and bpf_redirect().
>>
>> I think so too.
>> Once netdevice is stored into netdev_array map the netdevice is pinned
>> and we need to figure out what to do if somebody tries to delete it.
>> Should we add a new netlink notifier that this netdev's refcnt is
>> almost zero and it's only in netdev_array(s) ?
>
> We basically do that automatically in netdev_wait_allrefs:
>
> pr_emerg("unregister_netdevice: waiting for %s to become free. Usage
> count = %d\n",
>          dev->name, refcnt);
>
> It is a very unpleasant warning and users probably think about a bug in
> the kernel at first.
>
> I don't think we should wait for user space to clean that up but have to
> do it automatically from the kernel. Maybe we can introduce a special
> value that basically NOPs the transmission. The hash table itself would
> install a netdevice notifier and would clean all tables. Could
> definitely cause some storm in the kernel, if a lot of keys are mapped
> to the same interface.
>
>> or should it be deleted from the array(s) automatically and
>> then user space will be notified post-deletion?
>> Both approaches have their pros and cons.
>
> I am leaning more towards deleting it automatically. But walking all
> tables and in there all keys might cause some unwanted load spikes.

yeah, when netdev is being unregistered we have to drop the ref
to avoid the warning.

Speaking of delete notifiers for maps.
Until recently all deletions were explicit and the program
could do extra perf_event_submit() if it needed to notify
user space of deletion.
But right now we have LRU map that deletes automatically and this
netdev_array will be deleting automatically too, so we probably
need some sort of generic map notifier for all map types.
It can be as simple as user space sleeping on read(map_fd, buf);
or waiting for epoll on map_fd and kernel wakes it up
when map element is deleted and pushes 'key/value' pair
into the buf. That should be generic for all map types,
but it needs some mechanism to avoid blocking, since number
of deletions can be huge. Something similar to perf's event record
'lost N records' should do the trick.

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: Alexei Starovoitov @ 2017-04-30  1:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Andy Gospodarek, John Fastabend, Alexei Starovoitov,
	Daniel Borkmann, Daniel Borkmann, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org
In-Reply-To: <20170428125842.6b144d20@redhat.com>

On 4/28/17 3:58 AM, Jesper Dangaard Brouer wrote:
> On Thu, 27 Apr 2017 16:31:14 -0700
> Alexei Starovoitov <ast@fb.com> wrote:
>
>> On 4/27/17 1:41 AM, Jesper Dangaard Brouer wrote:
>>> When registering/attaching a XDP/bpf program, we would just send the
>>> file-descriptor for this port-map along (like we do with the bpf_prog
>>> FD). Plus, it own ingress-port number this program is in the port-map.
>>>
>>> It is not clear to me, in-which-data-structure on the kernel-side we
>>> store this reference to the port-map and ingress-port. As today we only
>>> have the "raw" struct bpf_prog pointer. I see several options:
>>>
>>> 1. Create a new xdp_prog struct that contains existing bpf_prog,
>>> a port-map pointer and ingress-port. (IMHO easiest solution)
>>>
>>> 2. Just create a new pointer to port-map and store it in driver rx-ring
>>> struct (like existing bpf_prog), but this create a race-challenge
>>> replacing (cmpxchg) the program (or perhaps it's not a problem as it
>>> runs under rcu and RTNL-lock).
>>>
>>> 3. Extend bpf_prog to store this port-map and ingress-port, and have a
>>> fast-way to access it.  I assume it will be accessible via
>>> bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.
>>
>> I'm not sure I completely follow the 3 proposals.
>> Are you suggesting to have only one netdev_array per program?
>
> Yes, but I can see you have a more clever idea below.
>
>> Why not to allow any number like we do for tailcall+prog_array, etc?
>
>> We can teach verifier to allow new helper
>>  bpf_tx_port(netdev_array, port_num);
>> to only be used with netdev_array map type.
>> It will fetch netdevice pointer from netdev_array[port_num]
>> and will tx the packet into it.
>
> I love it.
>
> I just don't like the "netdev" part of the name "netdev_array" as one
> basic ideas of a port tabel, is that a port can be anything that can
> consume a XDP_buff packet.  This generalization allow us to move code
> out of the drivers.  We might be on the same page, as I do imagine that
> netdev_array or port_array is just a struct bpf_map pointer, and the
> bpf_map->map_type will tell us that this bpf_map contains net_device
> pointers.  Thus, when later introducing a new type of redirect (like to
> a socket or remote-CPU) then we just add a new bpf_map_type for this,
> without needing to change anything in the drivers, right?

In theory, yes, but in practice I doubt it will be so easy.
We probably shouldn't allow very different types of netdev
into the same netdev_array or port_array (whatever the name).
They need to be similar enough, otherwise we'd have to do run-time
checks. If they're all the same, these checks can be done at
insertion time instead.

> Do you imagine that bpf-side bpf_tx_port() returns XDP_REDIRECT?
> Or does it return if the call was successful (e.g validate port_num
> existed in map)?

don't know :)
we need to brainstorm pros and cons.

> On the kernel side, we need to receive this info "port_array" and
> "port_num", given you don't provide the call a xdp_buff/ctx, then I
> assume you want the per-CPU temp-store solution.  Then during the
> XDP_REDIRECT action we call a core redirect function that based on the
> bpf_map_type does a lookup, and find the net_device ptr.

hmm. didn't think that far either :)
indeed makes sense to pass 'ctx' into such helper as well,
so it's easier to deal with original netdev.

>> We can make it similar to bpf_tail_call(), so that program will
>> finish on successful bpf_tx_port() or
>> make it into 'delayed' tx which will be executed when program finishes.
>> Not sure which approach is better.
>
> I know you are talking about something slightly different, about
> delaying TX.
>
> But I want to mention (as I've done before) that it is important (for
> me) that we get bulking working/integrated.   I imagine the driver will
> call a function that will delay the TX/redirect action and at the end
> of the NAPI cycle have a function that flush packets, bulk per
> destination port.
>
> I was wondering where to store these delayed TX packets, but now that
> we have an associated bpf_map data-structure (netdev_array), I'm thinking
> about storing packets (ordered by port) inside that.  And then have a
> bpf_tx_flush(netdev_array) call in the driver (for every port-table-map
> seen, which will likely be small).

makes sense to me as well.
Ideally we should try to make an api such, that batching or no-batching
can be kernel choice. The program will just do
xdp_tx_port(...something here...)
and the kernel does the best for performance. If it needs to delay
the result to do batching the api should allow that transparently.
Like right now xdp program does 'return XDP_TX;' and
on the kernel side we can either do immediate xmit (like we do today)
or can change it to do batching and programs don't need to change.

>> We can also extend this netdev_array into broadcast/multicast. Like
>> bpf_tx_allports(&netdev_array);
>> call from the program will xmit the packet to all netdevices
>> in that 'netdev_array' map type.
>
> When broadcasting you often don't want to broadcast the packet out of
> the incoming interface.  How can you support this?
>
> Normally you would know your ingress port, and then excluded that port
> in the broadcast.  But with many netdev_array's how do the program know
> it's own ingress port.

absolutely!
bpf_tx_allports() should somehow exclude the port packet arrived on.
What you're proposing about passing 'ctx' into this helper,
should solve it, I guess.

> Thanks a lot for all this input, I got a much more clear picture of how
> I can/should implement this :-)

awesome :) Let's brainstorm more and get John's opinion on it as well,
since sounds like he'll be heavy user of such api.

^ permalink raw reply

* Re: [PATCH v3 binutils] Add BPF support to binutils...
From: Alexei Starovoitov @ 2017-04-30  0:48 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, aconole, netdev, xdp-newbies
In-Reply-To: <20170428.163355.2067951664875385680.davem@davemloft.net>

On 4/28/17 1:33 PM, David Miller wrote:
> New in this version:
>
> 1) All the relocation work I posted earlier today.
> 2) Teach readelf about a few bpf relocs as needed
> 3) Add a 'nop' instruction which facilitates the gas
>    testsuite.  I used "mov r0,r0"
>
> The whole gas testsuite passes now. :-)  But that just
> means we have to add more tests I guess....
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

it seems by default bpf target is not enabled,
so I tried to build it with:
../configure --enable-targets=bpf,x86
make -j40
but it failed to build :(

Then I did ../configure --target=bpf
and it went better. At least I could compile objdump,
but gas refused to be configured:
checking whether byte ordering is bigendian... no
This target is no longer supported in gas
make[1]: *** [configure-gas] Error 1

Not sure what I'm doing wrong.

At least i tested objdump:
$ clang -O2 -target bpfeb -c test.c
$ llvm-objdump -S test.o

test.o:	file format ELF64-BPF

Disassembly of section .text:
bpf_prog1:
        0:	18 10 00 00 83 98 47 39 00 00 00 00 00 00 00 89 	r1 = 
590618314553ll
        2:	7b a1 ff f8 00 00 00 00 	*(u64 *)(r10 - 8) = r1
        3:	79 1a ff f8 00 00 00 00 	r1 = *(u64 *)(r10 - 8)
        4:	07 10 00 00 8f ff 00 02 	r1 += -1879113726
        5:	79 01 00 00 00 00 00 00 	r0 = *(u64 *)(r1 + 0)
        6:	95 00 00 00 00 00 00 00 	exit

$ bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfbe

Disassembly of section .text:

0000000000000000 <bpf_prog1>:
    0:	18 10 00 00 83 98 47 39 	ldimm64	r1, 590618314553
    8:	00 00 00 00 00 00 00 89
   10:	7b a1 ff f8 00 00 00 00 	stdw	[r10+65528], r1
   18:	79 1a ff f8 00 00 00 00 	lddw	r1, [r10+65528]
   20:	07 10 00 00 8f ff 00 02 	add	r1, -1879113726
   28:	79 01 00 00 00 00 00 00 	lddw	r0, [r1+0]
   30:	95 00 00 00 00 00 00 00 	exit

looks good except negative offsets are reported as large positive.

If compiled with clang -O2 -target bpfel and result is:
$ bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfle

Disassembly of section .text:

0000000000000000 <bpf_prog1>:
    0:	18 01 00 00 39 47 98 83 	ldimm64	r0, 590618314553
    8:	00 00 00 00 89 00 00 00
   10:	7b 1a f8 ff 00 00 00 00 	stdw	[r1+65528], r10
   18:	79 a1 f8 ff 00 00 00 00 	lddw	r10, [r1+65528]

Now the src/dst registers are swapped :(
The bytes printed are correct though.

clang debug info is not recognized as well:
$ clang -O2 -g -target bpfel -c test.c
/w/binutils-gdb/bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfle


Disassembly of section .text:

0000000000000000 <bpf_prog1>:
/w/binutils-gdb/bld/binutils/objdump: invalid relocation type 10
/w/binutils-gdb/bld/binutils/objdump: BFD (GNU Binutils) 
2.28.51.20170429 assertion fail ../../bfd/elf64-bpf.c:139
...
/w/binutils-gdb/bld/binutils/objdump: BFD (GNU Binutils) 
2.28.51.20170429 assertion fail ../../bfd/elf64-bpf.c:139
/w/binutils-gdb/bld/binutils/objdump: invalid relocation type 10
/w/binutils-gdb/bld/binutils/objdump: BFD (GNU Binutils) 
2.28.51.20170429 assertion fail ../../bfd/elf64-bpf.c:139
    0:	18 01 00 00 39 47 98 83 	ldimm64	r0, 590618314553
    8:	00 00 00 00 89 00 00 00
   10:	7b 1a f8 ff 00 00 00 00 	stdw	[r1+65528], r10

$ clang -O2 -g -target bpfeb -c test.c
$ /w/binutils-gdb/bld/binutils/objdump -S test.o

test.o:     file format elf64-bpfbe

Disassembly of section .text:

0000000000000000 <bpf_prog1>:
/w/binutils-gdb/bld/binutils/objdump: invalid relocation type 10
/w/binutils-gdb/bld/binutils/objdump: BFD (GNU Binutils) 
2.28.51.20170429 assertion fail ../../bfd/elf64-bpf.c:139
...
/w/binutils-gdb/bld/binutils/objdump: invalid relocation type 10
/w/binutils-gdb/bld/binutils/objdump: Dwarf Error: found address size 
'0', this reader can only handle address sizes '2', '4' and '8'.
    0:	18 10 00 00 83 98 47 39 	ldimm64	r1, 590618314553
    8:	00 00 00 00 00 00 00 89
   10:	7b a1 ff f8 00 00 00 00 	stdw	[r10+65528], r1
   18:	79 1a ff f8 00 00 00 00 	lddw	r1, [r10+65528]

With llvm it should be like this:
$ ./bin/clang -O2 -g -target bpfel -c test.c
$ ./bin/llvm-objdump -S test.o

test.o:	file format ELF64-BPF

Disassembly of section .text:
bpf_prog1:
; {
        0:	18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00 	r1 = 
590618314553ll
; volatile unsigned long t = 0x8983984739ull;
        2:	7b 1a f8 ff 00 00 00 00 	*(u64 *)(r10 - 8) = r1
; return *(unsigned long *)((0xffffffff8fff0002ull) + t);
        3:	79 a1 f8 ff 00 00 00 00 	r1 = *(u64 *)(r10 - 8)
        4:	07 01 00 00 02 00 ff 8f 	r1 += -1879113726
        5:	79 10 00 00 00 00 00 00 	r0 = *(u64 *)(r1 + 0)
        6:	95 00 00 00 00 00 00 00 	exit

The output of llvm with '-g -target bpfeb' is probably partially broken,
since 'llvm-objdump -S test.o' doesn't see any debug in there.
So gnu objdump's error:
  Dwarf Error: found address size '0'
is likely legit.

^ permalink raw reply

* Re: ipsec doesn't route TCP with 4.11 kernel
From: Don Bowman @ 2017-04-30  0:39 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Cong Wang, linux-kernel@vger.kernel.org, Herbert Xu,
	Linux Kernel Network Developers
In-Reply-To: <20170428071337.GG2649@secunet.com>

On 28 April 2017 at 03:13, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Thu, Apr 27, 2017 at 06:13:38PM -0400, Don Bowman wrote:
>> On 27 April 2017 at 04:42, Steffen Klassert <steffen.klassert@secunet.com>
>> wrote:
>> > On Wed, Apr 26, 2017 at 10:01:34PM -0700, Cong Wang wrote:
>> >> (Cc'ing netdev and IPSec maintainers)
>> >>
>> >> On Tue, Apr 25, 2017 at 6:08 PM, Don Bowman <db@donbowman.ca> wrote:
>>

<snip>

confirmed, with this patch in place that the tcp functions properly.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox