Netdev List
 help / color / mirror / Atom feed
* [net-next 3/6] ixgbe: remove unneeded ipsec state free callback
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180312201215.2854-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

With commit 7f05b467a735 ("xfrm: check for xdo_dev_state_free")
we no longer need to add an empty callback function
to the driver, so now let's remove the useless code.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 8623013e517e..f2254528dcfc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -724,23 +724,10 @@ static bool ixgbe_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs)
 	return true;
 }
 
-/**
- * ixgbe_ipsec_free - called by xfrm garbage collections
- * @xs: pointer to transformer state struct
- *
- * We don't have any garbage to collect, so we shouldn't bother
- * implementing this function, but the XFRM code doesn't check for
- * existence before calling the API callback.
- **/
-static void ixgbe_ipsec_free(struct xfrm_state *xs)
-{
-}
-
 static const struct xfrmdev_ops ixgbe_xfrmdev_ops = {
 	.xdo_dev_state_add = ixgbe_ipsec_add_sa,
 	.xdo_dev_state_delete = ixgbe_ipsec_del_sa,
 	.xdo_dev_offload_ok = ixgbe_ipsec_offload_ok,
-	.xdo_dev_state_free = ixgbe_ipsec_free,
 };
 
 /**
-- 
2.14.3

^ permalink raw reply related

* [net-next 2/6] ixgbe: fix ipsec trailer length
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180312201215.2854-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

Fix up the Tx trailer length calculation.  We can't believe the
trailer len from the xstate information because it was calculated
before the packet was put together and padding added.  This bit
of code finds the padding value in the trailer, adds it to the
authentication length, and saves it so later we can put it into
the Tx descriptor to tell the device where to stop the checksum
calculation.

Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 8b7dbc8057eb..8623013e517e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -789,11 +789,33 @@ int ixgbe_ipsec_tx(struct ixgbe_ring *tx_ring,
 
 	itd->flags = 0;
 	if (xs->id.proto == IPPROTO_ESP) {
+		struct sk_buff *skb = first->skb;
+		int ret, authlen, trailerlen;
+		u8 padlen;
+
 		itd->flags |= IXGBE_ADVTXD_TUCMD_IPSEC_TYPE_ESP |
 			      IXGBE_ADVTXD_TUCMD_L4T_TCP;
 		if (first->protocol == htons(ETH_P_IP))
 			itd->flags |= IXGBE_ADVTXD_TUCMD_IPV4;
-		itd->trailer_len = xs->props.trailer_len;
+
+		/* The actual trailer length is authlen (16 bytes) plus
+		 * 2 bytes for the proto and the padlen values, plus
+		 * padlen bytes of padding.  This ends up not the same
+		 * as the static value found in xs->props.trailer_len (21).
+		 *
+		 * The "correct" way to get the auth length would be to use
+		 *    authlen = crypto_aead_authsize(xs->data);
+		 * but since we know we only have one size to worry about
+		 * we can let the compiler use the constant and save us a
+		 * few CPU cycles.
+		 */
+		authlen = IXGBE_IPSEC_AUTH_BITS / 8;
+
+		ret = skb_copy_bits(skb, skb->len - (authlen + 2), &padlen, 1);
+		if (unlikely(ret))
+			return 0;
+		trailerlen = authlen + 2 + padlen;
+		itd->trailer_len = trailerlen;
 	}
 	if (tsa->encrypt)
 		itd->flags |= IXGBE_ADVTXD_TUCMD_IPSEC_ENCRYPT_EN;
-- 
2.14.3

^ permalink raw reply related

* [net-next 0/6][pull request] 10GbE Intel Wired LAN Driver Updates 2018-03-12
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to ixgbe and ixgbevf only.

Shannon Nelson provides three fixes to the ipsec portion of ixgbe.  Make
sure we are using 128-bit authentication, since it is the only size
supported for hardware offload.  Fixed the transmit trailer length
calculation for ipsec by finding the padding value and adding it to the
authentication length, then save it off so that we can put it in the
transmit descriptor to tell the device where to stop the checksum
calculation.  Lastly, cleaned up useless and dead code.

Tonghao Zhang adds a ethtool stat for receive length errors, since the
driver was already collecting this counter.

Arnd Bergmann fixed a warning about an used variable by "rephrasing" the
code so that the compiler can see the use of the variable in question.

Paul fixes an issue where "HIDE_VLAN" was being cleared on VF reset, so
ensure to set "HIDE_VLAN" when port VLAN is enabled after a VF reset. 

The following are changes since commit 8b4c6ed2ed0e3d53e93e20e3f20cadf0d1a47483:
  Merge branch 'hns3-next'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Arnd Bergmann (1):
  ixgbevf: fix unused variable warning

Paul Greenwalt (1):
  ixgbe: fix disabling hide VLAN on VF reset

Shannon Nelson (3):
  ixgbe: check for 128-bit authentication
  ixgbe: fix ipsec trailer length
  ixgbe: remove unneeded ipsec state free callback

Tonghao Zhang (1):
  ixgbe: Add receive length error counter

 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 53 +++++++++++++++--------
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h    |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |  6 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 10 ++---
 5 files changed, 46 insertions(+), 25 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [net-next 5/6] ixgbevf: fix unused variable warning
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Arnd Bergmann, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180312201215.2854-1-jeffrey.t.kirsher@intel.com>

From: Arnd Bergmann <arnd@arndb.de>

The new ixgbevf_set_rx_buffer_len() function causes a harmless warnings
in configurations with large page size:

drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function 'ixgbevf_set_rx_buffer_len':
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1758:15: error: unused variable 'max_frame' [-Werror=unused-variable]

This rephrases the code so that the compiler can see the use of that
variable, making it slightly easier to read in the process.

Fixes: f15c5ba5b6cd ("ixgbevf: add support for using order 1 pages to receive large frames")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index f37307131eb6..4da449e0a4ba 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1766,12 +1766,12 @@ static void ixgbevf_set_rx_buffer_len(struct ixgbevf_adapter *adapter,
 
 	set_ring_build_skb_enabled(rx_ring);
 
-#if (PAGE_SIZE < 8192)
-	if (max_frame <= IXGBEVF_MAX_FRAME_BUILD_SKB)
-		return;
+	if (PAGE_SIZE < 8192) {
+		if (max_frame <= IXGBEVF_MAX_FRAME_BUILD_SKB)
+			return;
 
-	set_ring_uses_large_buffer(rx_ring);
-#endif
+		set_ring_uses_large_buffer(rx_ring);
+	}
 }
 
 /**
-- 
2.14.3

^ permalink raw reply related

* [net-next 4/6] ixgbe: Add receive length error counter
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Tonghao Zhang, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180312201215.2854-1-jeffrey.t.kirsher@intel.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

ixgbe enabled rlec counter and the rx_error used it.
We can export the counter directly via ethtool -S ethX.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 89edb9fae6f0..c0e6ab42e0e1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -97,6 +97,7 @@ static const struct ixgbe_stats ixgbe_gstrings_stats[] = {
 	{"tx_heartbeat_errors", IXGBE_NETDEV_STAT(tx_heartbeat_errors)},
 	{"tx_timeout_count", IXGBE_STAT(tx_timeout_count)},
 	{"tx_restart_queue", IXGBE_STAT(restart_queue)},
+	{"rx_length_errors", IXGBE_STAT(stats.rlec)},
 	{"rx_long_length_errors", IXGBE_STAT(stats.roc)},
 	{"rx_short_length_errors", IXGBE_STAT(stats.ruc)},
 	{"tx_flow_control_xon", IXGBE_STAT(stats.lxontxc)},
-- 
2.14.3

^ permalink raw reply related

* [net-next 1/6] ixgbe: check for 128-bit authentication
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180312201215.2854-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

Make sure the Security Association is using
a 128-bit authentication, since that's the only
size that the hardware offload supports.

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 +++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h |  1 +
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 93eacddb6704..8b7dbc8057eb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -423,15 +423,21 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
 	const char aes_gcm_name[] = "rfc4106(gcm(aes))";
 	int key_len;
 
-	if (xs->aead) {
-		key_data = &xs->aead->alg_key[0];
-		key_len = xs->aead->alg_key_len;
-		alg_name = xs->aead->alg_name;
-	} else {
+	if (!xs->aead) {
 		netdev_err(dev, "Unsupported IPsec algorithm\n");
 		return -EINVAL;
 	}
 
+	if (xs->aead->alg_icv_len != IXGBE_IPSEC_AUTH_BITS) {
+		netdev_err(dev, "IPsec offload requires %d bit authentication\n",
+			   IXGBE_IPSEC_AUTH_BITS);
+		return -EINVAL;
+	}
+
+	key_data = &xs->aead->alg_key[0];
+	key_len = xs->aead->alg_key_len;
+	alg_name = xs->aead->alg_name;
+
 	if (strcmp(alg_name, aes_gcm_name)) {
 		netdev_err(dev, "Unsupported IPsec algorithm - please use %s\n",
 			   aes_gcm_name);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
index da3ce7849e85..87d2800b94ab 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.h
@@ -32,6 +32,7 @@
 #define IXGBE_IPSEC_MAX_RX_IP_COUNT	128
 #define IXGBE_IPSEC_BASE_RX_INDEX	0
 #define IXGBE_IPSEC_BASE_TX_INDEX	IXGBE_IPSEC_MAX_SA_COUNT
+#define IXGBE_IPSEC_AUTH_BITS		128
 
 #define IXGBE_RXTXIDX_IPS_EN		0x00000001
 #define IXGBE_RXIDX_TBL_SHIFT		1
-- 
2.14.3

^ permalink raw reply related

* [net-next 6/6] ixgbe: fix disabling hide VLAN on VF reset
From: Jeff Kirsher @ 2018-03-12 20:12 UTC (permalink / raw)
  To: davem; +Cc: Paul Greenwalt, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180312201215.2854-1-jeffrey.t.kirsher@intel.com>

From: Paul Greenwalt <paul.greenwalt@intel.com>

If port VLAN is enabled, set PFQDE.HIDE_VLAN during VF reset.

Setting only PFQDE.PFQDE during VF reset was clearing PFQDE.HIDE_VLAN.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 27a70a52f3c9..008aa073a679 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -831,7 +831,11 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), reg);
 
 	/* force drop enable for all VF Rx queues */
-	ixgbe_write_qde(adapter, vf, IXGBE_QDE_ENABLE);
+	reg = IXGBE_QDE_ENABLE;
+	if (adapter->vfinfo[vf].pf_vlan)
+		reg |= IXGBE_QDE_HIDE_VLAN;
+
+	ixgbe_write_qde(adapter, vf, reg);
 
 	/* enable receive for vf */
 	reg = IXGBE_READ_REG(hw, IXGBE_VFRE(reg_offset));
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH net-next v2] ibmvnic: Bail from ibmvnic_open if driver is already open
From: Andrew Lunn @ 2018-03-12 20:10 UTC (permalink / raw)
  To: John Allen; +Cc: netdev, Thomas Falcon, Nathan Fontenot
In-Reply-To: <c7b3b39f-77d5-b6e8-1886-286422e72a47@linux.vnet.ibm.com>

> The problem here is that our routine to change the mtu does a full reset on
> the driver meaning that in the process we go from effectively "open" to
> "closed" to "open" again.
> 
> Consider the scenario where we change the mtu by running "ifdown <interface>",
> editing the ifcfg file with the new mtu, and finally running "ifup <interface".
> In this case, we call ibmvnic_close from the ifdown and as a result of the ifup,
> we do the reset for the mtu change (which puts us back in the "open" state) and
> call ibmvnic_open. After the reset, we are already in the "open" state and the
> following call to open is redundant.

Hi John

So you are saying "ip link set mtu 4242 eth1" on a down interface will
put it up. That i would say is broken. You should be fixing this,
rather than papering over the cracks.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next 2/5] bridge: propagate BR_ flags updates through sysfs to switchdev
From: Igor Mitsyanko @ 2018-03-12 20:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: ivecera, jiri, netdev, bridge, sergey.matyukevich.os, ashevchenko,
	smaksimenko, dlebed
In-Reply-To: <20180310163838.GG29174@lunn.ch>

On 03/10/2018 08:38 AM, Andrew Lunn wrote:
> 
>> +             return 0;
>> +
>> +     err = br_switchdev_set_port_flag(p, flags, mask);
>> +     if (err)
>> +             return err;
> 
> You might want to consider the br_warn() in
> br_switchdev_set_port_flag(). Do we want to spam the kernel log?  Or
> should store_flag() do some validation before calling
> br_switchdev_set_port_flag()?
> 
>          Andrew
> 

Is there any convention for that in Linux? While I would agree that 
simply returning a error code is sufficient in this case, another user 
of br_switchdev_set_port_flag() is a netlink interface, aren't they 
supposed to be an equivalent? That is, if netlink prints into kernel 
log, sysfs should do that too?

^ permalink raw reply

* [RESEND PATCH net-next 1/1] tc-testing: updated gact tests with batch test cases
From: Roman Mashak @ 2018-03-12 20:07 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/gact.json          | 73 +++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
index e2187b6e0b7a..ae96d0350d7e 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
@@ -465,5 +465,76 @@
         "teardown": [
             "$TC actions flush action gact"
         ]
+    },
+    {
+        "id": "1021",
+        "name": "Add batch of 32 gact pass actions",
+        "category": [
+            "actions",
+            "gact"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action gact",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "for i in `seq 1 32`; do cmd=\"action pass index $i \"; args=\"$args$cmd\"; done && $TC actions add $args",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action gact",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action gact"
+        ]
+    },
+    {
+        "id": "da7a",
+        "name": "Add batch of 32 gact continue actions with cookie",
+        "category": [
+            "actions",
+            "gact"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action gact",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "for i in `seq 1 32`; do cmd=\"action continue index $i cookie aabbccddeeff112233445566778800a1 \"; args=\"$args$cmd\"; done && $TC actions add $args",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action gact",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action gact"
+        ]
+    },
+    {
+        "id": "8aa3",
+        "name": "Delete batch of 32 gact continue actions",
+        "category": [
+            "actions",
+            "gact"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action gact",
+                0,
+                1,
+                255
+            ],
+            "for i in `seq 1 32`; do cmd=\"action continue index $i \"; args=\"$args$cmd\"; done && $TC actions add $args"
+        ],
+        "cmdUnderTest": "for i in `seq 1 32`; do cmd=\"action gact index $i \"; args=\"$args$cmd\"; done && $TC actions del $args",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action gact",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
     }
-]
+]
\ No newline at end of file
-- 
2.7.4

^ permalink raw reply related

* [RESEND PATCH net-next 1/1] tc-testing: add TC vlan action tests
From: Roman Mashak @ 2018-03-12 20:06 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/vlan.json          | 410 +++++++++++++++++++++
 1 file changed, 410 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
new file mode 100644
index 000000000000..4510ddfa6e54
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
@@ -0,0 +1,410 @@
+[
+    {
+        "id": "6f5a",
+        "name": "Add vlan pop action",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan pop index 8",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*pop.*index 8 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "ee6f",
+        "name": "Add vlan pop action with large index",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan pop index 4294967295",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*pop.*index 4294967295 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "b6b9",
+        "name": "Add vlan pop action with jump opcode",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan pop jump 10 index 8",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*jump 10.*index 8 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "87c3",
+        "name": "Add vlan pop action with trap opcode",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan pop trap index 8",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*pop trap.*index 8 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "2b91",
+        "name": "Add vlan invalid action",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan bad_mode",
+        "expExitCode": "255",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*bad_mode",
+        "matchCount": "0",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "57fc",
+        "name": "Add vlan action with invalid protocol type",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push protocol ABCD",
+        "expExitCode": "255",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*push",
+        "matchCount": "0",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "3989",
+        "name": "Add vlan push action with default protocol and priority",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push id 123 index 18",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions get action vlan index 18",
+        "matchPattern": "action order [0-9]+: vlan.*push id 123 protocol 802.1Q priority 0 pipe.*index 18 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "79dc",
+        "name": "Add vlan push action with protocol 802.1Q and priority 3",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push id 77 protocol 802.1Q priority 3 continue index 734",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions get action vlan index 734",
+        "matchPattern": "action order [0-9]+: vlan.*push id 77 protocol 802.1Q priority 3 continue.*index 734 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "4d73",
+        "name": "Add vlan push action with protocol 802.1AD",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push id 1024 protocol 802.1AD pass index 10000",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions get action vlan index 10000",
+        "matchPattern": "action order [0-9]+: vlan.*push id 1024 protocol 802.1ad priority 0 pass.*index 10000 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "1f7b",
+        "name": "Add vlan push action with invalid vlan ID",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push id 5678 index 1",
+        "expExitCode": "255",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*push id 5678.*index 1 ref",
+        "matchCount": "0",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "5d02",
+        "name": "Add vlan push action with invalid IEEE 802.1p priority",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push id 5 priority 10 index 1",
+        "expExitCode": "255",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*push id 5.*index 1 ref",
+        "matchCount": "0",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "6812",
+        "name": "Add vlan modify action for protocol 802.1Q",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan modify protocol 802.1Q id 5 index 100",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions get action vlan index 100",
+        "matchPattern": "action order [0-9]+: vlan.*modify id 100 protocol 802.1Q priority 0 pipe.*index 100 ref",
+        "matchCount": "0",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "5a31",
+        "name": "Add vlan modify action for protocol 802.1AD",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan modify protocol 802.1ad id 500 reclassify index 12",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions get action vlan index 12",
+        "matchPattern": "action order [0-9]+: vlan.*modify id 500 protocol 802.1ad priority 0 reclassify.*index 12 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "83a4",
+        "name": "Delete vlan pop action",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ],
+            "$TC actions add action vlan pop index 44"
+        ],
+        "cmdUnderTest": "$TC actions del action vlan index 44",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*pop.*index 44 ref",
+        "matchCount": "0",
+        "teardown": []
+    },
+    {
+        "id": "ed1e",
+        "name": "Delete vlan push action for protocol 802.1Q",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ],
+            "$TC actions add action vlan push id 4094 protocol 802.1Q index 999"
+        ],
+        "cmdUnderTest": "$TC actions del action vlan index 999",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*push id 4094 protocol 802.1Q priority 0 pipe.*index 999 ref",
+        "matchCount": "0",
+        "teardown": []
+    },
+    {
+        "id": "a2a3",
+        "name": "Flush vlan actions",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ],
+            "$TC actions add action vlan push id 4 protocol 802.1ad index 10",
+            "$TC actions add action vlan push id 4 protocol 802.1ad index 11",
+            "$TC actions add action vlan push id 4 protocol 802.1ad index 12",
+            "$TC actions add action vlan push id 4 protocol 802.1ad index 13"
+        ],
+        "cmdUnderTest": "$TC actions flush action vlan",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*push id 4 protocol 802.1ad",
+        "matchCount": "0",
+        "teardown": []
+    },
+    {
+        "id": "1d78",
+        "name": "Add vlan action with cookie",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action vlan push id 4 cookie a0a0a0a0a0a0a0",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "action order [0-9]+: vlan.*push id 4.*cookie a0a0a0a0a0a0a0",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    }
+]
-- 
2.7.4

^ permalink raw reply related

* Re: de-indirect TCP congestion control
From: David Miller @ 2018-03-12 20:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: stephen, netdev
In-Reply-To: <f3e43da6-e549-bad1-c75b-76bbf4fff216@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Mar 2018 13:03:35 -0700

> 
> 
> On 03/12/2018 12:48 PM, Stephen Hemminger wrote:
>> On Mon, 12 Mar 2018 15:04:06 -0400 (EDT)
>> David Miller <davem@davemloft.net> wrote:
>> 
>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>> Date: Mon, 12 Mar 2018 11:45:52 -0700
>>>
>>>> Since indirect calls are expensive, and now even more so, perhaps we
>>>> should figure out
>>>> a way to make the default TCP congestion control hooks into direct
>>>> calls.
>>>> 99% of the users just use the single CC module compiled into the
>>>> kernel.
>>>
>>> Who is this magic user with only one CC algorithm enabled in their
>>> kernel?  I want to know who this dude is?
>>>
>>> I don't think it's going to help much since people will have I think
>>> at least two algorithms compiled into nearly everyone's tree.
>>>
>>> Distributions will enable everything.
>>>
>>> Google is going to have at least two algorithms enabled.
>>>
>>> etc. etc. etc.
>>>
>>> Getting rid of indirect calls is a fine goal, but the precondition you
>>> are mentioning to achieve this doesn't seem practical at all.
>> What I meant is that kernels with N congestion controls, almost all
>> traffic
>> uses the default So that path can be optimized. The example I gave
>> would
>> have all the others doing the same indirect call.
> 
> I do not understand. What is default_tcp_ops anyway ?
> 
> How changes to /proc/sys/net/ipv4/tcp_congestion_control will impact
> this ?

I'm also confused what is being suggested exactly and how this can
work. :-)

^ permalink raw reply

* Re: de-indirect TCP congestion control
From: Eric Dumazet @ 2018-03-12 20:03 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: netdev
In-Reply-To: <20180312124813.3fa6b833@xeon-e3>



On 03/12/2018 12:48 PM, Stephen Hemminger wrote:
> On Mon, 12 Mar 2018 15:04:06 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> Date: Mon, 12 Mar 2018 11:45:52 -0700
>>
>>> Since indirect calls are expensive, and now even more so, perhaps we should figure out
>>> a way to make the default TCP congestion control hooks into direct calls.
>>> 99% of the users just use the single CC module compiled into the kernel.
>>
>> Who is this magic user with only one CC algorithm enabled in their
>> kernel?  I want to know who this dude is?
>>
>> I don't think it's going to help much since people will have I think
>> at least two algorithms compiled into nearly everyone's tree.
>>
>> Distributions will enable everything.
>>
>> Google is going to have at least two algorithms enabled.
>>
>> etc. etc. etc.
>>
>> Getting rid of indirect calls is a fine goal, but the precondition you
>> are mentioning to achieve this doesn't seem practical at all.
> 
> What I meant is that kernels with N congestion controls, almost all traffic
> uses the default So that path can be optimized. The example I gave would
> have all the others doing the same indirect call.
> 

I do not understand. What is default_tcp_ops anyway ?

How changes to /proc/sys/net/ipv4/tcp_congestion_control will impact this ?

^ permalink raw reply

* Re: [PATCH 00/30] Netfilter/IPVS updates for net-next
From: David Miller @ 2018-03-12 20:01 UTC (permalink / raw)
  To: nbd; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <4521f7bd-c63a-9d2d-bdb3-5f4db58a7ba1@nbd.name>

From: Felix Fietkau <nbd@nbd.name>
Date: Mon, 12 Mar 2018 20:30:01 +0100

> It's not dead and useless. In its current state, it has a software fast
> path that significantly improves nftables routing/NAT throughput,
> especially on embedded devices.
> On some devices, I've seen "only" 20% throughput improvement (along with
> CPU usage reduction), on others it's quite a bit lot more. This is
> without any extra drivers or patches aside from what's posted.

I wonder if this software fast path has the exploitability problems that
things like the ipv4 routing cache and the per-cpu flow cache both had.
And the reason for which both were removed.

I don't see how you can avoid this problem.

I'm willing to be shown otherwise :-)

^ permalink raw reply

* Re: [PATCH net-next v2] ibmvnic: Bail from ibmvnic_open if driver is already open
From: John Allen @ 2018-03-12 19:56 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Thomas Falcon, Nathan Fontenot
In-Reply-To: <20180312193351.GA31588@lunn.ch>

On 03/12/2018 02:33 PM, Andrew Lunn wrote:
> On Mon, Mar 12, 2018 at 02:19:52PM -0500, John Allen wrote:
>> If the driver is already in the "open" state, don't attempt the procedure
>> for opening the driver.
>>
>> Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
>> ---
>> v2: Unlock reset_lock mutex before returning.
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
>> index 7be4b06..9a5e8ac 100644
>> --- a/drivers/net/ethernet/ibm/ibmvnic.c
>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>> @@ -1057,6 +1057,11 @@ static int ibmvnic_open(struct net_device *netdev)
>>
>>  	mutex_lock(&adapter->reset_lock);
>>
>> +	if (adapter->state == VNIC_OPEN) {
>> +		mutex_unlock(&adapter->reset_lock);
>> +		return 0;
>> +	}
> 
> Hi John
> 
> This looks better.
> 
> But how did you get ibmvnic_open() to be called twice? Is the actual
> issue that __ibmvnic_close() does not kill off the
> adapter->ibmvnic_reset workqueue, so that a reset can happen after
> close() has been called?

The problem here is that our routine to change the mtu does a full reset on
the driver meaning that in the process we go from effectively "open" to
"closed" to "open" again.

Consider the scenario where we change the mtu by running "ifdown <interface>",
editing the ifcfg file with the new mtu, and finally running "ifup <interface".
In this case, we call ibmvnic_close from the ifdown and as a result of the ifup,
we do the reset for the mtu change (which puts us back in the "open" state) and
call ibmvnic_open. After the reset, we are already in the "open" state and the
following call to open is redundant. The check in ibmvnic_open to see if we're
in the open state is really just insurance to guard against any case like this
where we're in the open state from the driver perspective and something decides
to trigger the drivers open routine.

-John

> 
> 	   Andrew
> 
> 

^ permalink raw reply

* Re: de-indirect TCP congestion control
From: Stephen Hemminger @ 2018-03-12 19:48 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20180312.150406.1432667769589045118.davem@davemloft.net>

On Mon, 12 Mar 2018 15:04:06 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Mon, 12 Mar 2018 11:45:52 -0700
> 
> > Since indirect calls are expensive, and now even more so, perhaps we should figure out
> > a way to make the default TCP congestion control hooks into direct calls.
> > 99% of the users just use the single CC module compiled into the kernel.  
> 
> Who is this magic user with only one CC algorithm enabled in their
> kernel?  I want to know who this dude is?
> 
> I don't think it's going to help much since people will have I think
> at least two algorithms compiled into nearly everyone's tree.
> 
> Distributions will enable everything.
> 
> Google is going to have at least two algorithms enabled.
> 
> etc. etc. etc.
> 
> Getting rid of indirect calls is a fine goal, but the precondition you
> are mentioning to achieve this doesn't seem practical at all.

What I meant is that kernels with N congestion controls, almost all traffic
uses the default So that path can be optimized. The example I gave would
have all the others doing the same indirect call.

^ permalink raw reply

* linux-next 20180307 - UBSAN whine in lib/radix-tree.c
From: valdis.kletnieks @ 2018-03-12 19:35 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox; +Cc: linux-kernel, David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

Seen in the dmesg:

[    0.000000] ================================================================================
[    0.000000] UBSAN: Undefined behaviour in lib/radix-tree.c:123:14
[    0.000000] member access within null pointer of type 'const struct radix_tree_node'
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G                T 4.16.0-rc4-next-20180307-dirty #559
[    0.000000] Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A20 05/08/2017
[    0.000000] Call Trace:
[    0.000000]  dump_stack+0x83/0xca
[    0.000000]  ubsan_epilogue+0xd/0x3a
[    0.000000]  handle_null_ptr_deref+0x85/0x90
[    0.000000]  __ubsan_handle_type_mismatch_v1+0x5e/0x70
[    0.000000]  __radix_tree_replace+0x1e4/0x1f0
[    0.000000]  radix_tree_iter_replace+0x25/0x50
[    0.000000]  idr_alloc_u32+0x166/0x1f0
[    0.000000]  idr_alloc+0x7e/0xd0
[    0.000000]  worker_pool_assign_id+0x61/0xd0
[    0.000000]  ? mutex_lock_nested+0x1b/0x20
[    0.000000]  workqueue_init_early+0x58a/0xc3f
[    0.000000]  start_kernel+0x4f7/0x809
[    0.000000]  x86_64_start_reservations+0x40/0x61
[    0.000000]  x86_64_start_kernel+0x7b/0x9e
[    0.000000]  secondary_startup_64+0xa5/0xb0
[    0.000000] ================================================================================

not sure why a null 'parent' value got passed to get_slot_offset() in the first place, but it sounds
like something is missing an 'if (NULL)' test...

[-- Attachment #2: Type: application/pgp-signature, Size: 486 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2] ibmvnic: Bail from ibmvnic_open if driver is already open
From: Andrew Lunn @ 2018-03-12 19:33 UTC (permalink / raw)
  To: John Allen; +Cc: netdev, Thomas Falcon, Nathan Fontenot
In-Reply-To: <36666f4d-6207-ab97-e7f6-4d7d0c6a1155@linux.vnet.ibm.com>

On Mon, Mar 12, 2018 at 02:19:52PM -0500, John Allen wrote:
> If the driver is already in the "open" state, don't attempt the procedure
> for opening the driver.
> 
> Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
> ---
> v2: Unlock reset_lock mutex before returning.
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 7be4b06..9a5e8ac 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1057,6 +1057,11 @@ static int ibmvnic_open(struct net_device *netdev)
> 
>  	mutex_lock(&adapter->reset_lock);
> 
> +	if (adapter->state == VNIC_OPEN) {
> +		mutex_unlock(&adapter->reset_lock);
> +		return 0;
> +	}

Hi John

This looks better.

But how did you get ibmvnic_open() to be called twice? Is the actual
issue that __ibmvnic_close() does not kill off the
adapter->ibmvnic_reset workqueue, so that a reset can happen after
close() has been called?

	   Andrew

^ permalink raw reply

* Re: [PATCH] net/mlx4_en: Fix a memory leak in case of error in 'mlx4_en_init_netdev()'
From: Christophe Jaillet @ 2018-03-12 19:32 UTC (permalink / raw)
  To: Tariq Toukan; +Cc: netdev, linux-rdma, linux-kernel, kernel-janitors
In-Reply-To: <a61ed690-7df1-7292-4074-3f7d66c0b490@mellanox.com>

Le 12/03/2018 à 09:42, Tariq Toukan a écrit :
 >
 >
 > On 12/03/2018 12:45 AM, Christophe JAILLET wrote:
 >> If 'kzalloc' fails, we must free some memory before returning.
 >>
 >> Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
 >> scheme")
 >> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
 >> ---
 >>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +-
 >>   1 file changed, 1 insertion(+), 1 deletion(-)
 >>
 >> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
 >> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
 >> index 8fc51bc29003..f9db018e858f 100644
 >> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
 >> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
 >> @@ -3327,7 +3327,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev
 >> *mdev, int port,
 >>           if (!priv->tx_cq[t]) {
 >>               kfree(priv->tx_ring[t]);
 >>               err = -ENOMEM;
 >> -            goto out;
 >> +            goto err_free_tx;
 >>           }
 >>       }
 >>       priv->rx_ring_num = prof->rx_ring_num;
 >>
 >
 > Hi Christophe, thanks for spotting this.
 >
 > However, I think these err_free_tx label and loop are redundant.
 > Both tx_ring/tx_cq flows should just goto out, as resources are freed
 > later in mlx4_en_destroy_netdev() -> mlx4_en_free_resources().
 >

Hi,

I do not agree with you and I think that the patch is relevant.

If 'mlx4_en_init_netdev' fails, the only caller, 'mlx4_en_activate()', 
will set:
	mdev->pndev[i] = NULL
(see 
https://elixir.bootlin.com/linux/v4.16-rc5/source/drivers/net/ethernet/mellanox/mlx4/en_main.c#L254)

and 'mlx4_en_destroy_netdev()' is not called in this case.
(see 
https://elixir.bootlin.com/linux/v4.16-rc5/source/drivers/net/ethernet/mellanox/mlx4/en_main.c#L232)

My understanding is that 'mlx4_en_destroy_netdev()' will free resources 
in the normal case but that resources should be freed at allocation time 
if it does not fully succeed.

Best regards,
CJ

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

^ permalink raw reply

* Re: [PATCH 00/30] Netfilter/IPVS updates for net-next
From: Felix Fietkau @ 2018-03-12 19:30 UTC (permalink / raw)
  To: David Miller, pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20180312.145843.1054152977291695095.davem@davemloft.net>

On 2018-03-12 19:58, David Miller wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Mon, 12 Mar 2018 18:58:50 +0100
> 
>> The following patchset contains Netfilter/IPVS updates for your net-next
>> tree. This batch comes with more input sanitization for xtables to
>> address bug reports from fuzzers, preparation works to the flowtable
>> infrastructure and assorted updates. In no particular order, they are:
> 
> Sorry, I've seen enough.  I'm not pulling this.
> 
> What is the story with this flow table stuff?  I tried to ask you
> about this before, but the response I was given was extremely vague
> and did not answer my question at all.
> 
> This is a lot of code, and a lot of infrastructure, yet I see
> no device using the infrastructure to offload conntack.
> 
> Nor can I see how this can possibly be even useful for such an
> application.  What conntrack offload needs are things completely
> outside of what the flow table stuff provides.  Mainly, they
> require that the SKB is completely abstracted away from all of
> the contrack code paths, and that the conntrack infrastructure
> operates on an abstract packet metadata concept.
> 
> If you are targetting one specific piece of hardware with TCAMs
> that you are familiar with.  I'd like you to stop right there.
> Because if that is all that this infrastructure can actually
> be used for, it is definitely designed wrong.
> 
> This, as has been the case in the past, is what is wrong with
> netfilter approach to supporting offloading.  We see all of this
> infrastructure before an actual working use case is provided for a
> specific piece of hardware for a specific driver in the tree.
> 
> Nobody can evaluate whether the approach is good or not without
> a clear driver change implementing support for it.
> 
> No other area of networking puts the cart before the horse like this.
> 
> I do not agree at all with the flow table infrastructure and I
> therefore do not want to pull any more flow table changes into my tree
> until there is an actual user of this stuff in that pull request which
> actually works in a way which is useful for people.  It is completely
> dead and useless code currently.
It's not dead and useless. In its current state, it has a software fast
path that significantly improves nftables routing/NAT throughput,
especially on embedded devices.
On some devices, I've seen "only" 20% throughput improvement (along with
CPU usage reduction), on others it's quite a bit lot more. This is
without any extra drivers or patches aside from what's posted.

Within OpenWrt, I'm working on a patch that makes the same available to
legacy netfilter as well. This is the reason for a lot of the core
refactoring that I did.

Hardware offload is still being worked on, not sure when we will have
the first driver ready. But as it stands now, the code is already very
useful and backported to OpenWrt for testing.

I think that in a couple of weeks this code will be ready to be enabled
by default in OpenWrt, which means that a lot of users' setups will get
a lot faster with no configuration change at all.

- Felix

^ permalink raw reply

* mmiowb() vs wmb() with write combined MMIO
From: Chopra, Manish @ 2018-03-12 19:25 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: john.fastabend@gmail.com, davem@davemloft.net, Eric Dumazet,
	Kalderon, Michal, Elior, Ariel, yuvalm@mellanox.com,
	Michal Schmidt

Hello Folks,

A series from John fastabend [for lockless qdisc support, specifically commit c5ad119fb6c09b0297446be05bd66602fa564758]
has uncovered an issue in qede NIC driver [/drivers/net/ethernet/qlogic/qede] and this driver is seriously broken for basic L2.
Before the series from John, driver was running fine, probably with hiding a serious bug underneath it which got exposed with his series.

Driver uses mmiowb() after writing TX doorbell in function qede_update_tx_producer() to synchronize the write between CPUs and
that TX doorbell area, driver maps with write combined memory using ioremap_wc().

This is causing an issue where HW seems to be reading an unexpected producer value [not as was written on doorbell, probably an older producer]
and causing various SKB null condition hit while handling TX completion. This seems to be indicating that mmiowb() doesn't seems to be synchronizing/flushing
writes here, at least on x86_64 SMP,  it's just like a compiler barrier [Not really a hardware fence].

If we replace mmiowb() with wmb() we never hit the problem. wmb() seems to be adding some synchronization/flush mechanism here.
Moreover, if we map doorbell with ioremap_nocache() we never hit problem with mmiowb() being there even.

I am quite sure this has to do with write combined mapped memory, where mmiowb() doesn't seems to be effective, it seems like we might want to replace
mmiowb() with wmb() here. Few vendors uses wmb() after doorbell write instead of mmiowb(), some uses flush_wc() kind of helpers.

Any suggestions or help about this barrier differences will be appreciated here in order to prepare a proper fix.

Thanks in Advance !!
Manish Chopra

^ permalink raw reply

* [bpf-next PATCH v2 18/18] bpf: sockmap test script
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev
In-Reply-To: <20180312192034.8039.70022.stgit@john-Precision-Tower-5810>

This adds the test script I am currently using to validate
the latest sockmap changes. Shortly sockmap will be ported
to selftests and these will be run from the infrastructure
there. Until then add the script here so we have a coverage
checklist when porting into selftests.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_test.sh |  450 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 450 insertions(+)
 create mode 100755 samples/sockmap/sockmap_test.sh

diff --git a/samples/sockmap/sockmap_test.sh b/samples/sockmap/sockmap_test.sh
new file mode 100755
index 0000000..21c8598
--- /dev/null
+++ b/samples/sockmap/sockmap_test.sh
@@ -0,0 +1,450 @@
+#Test a bunch of positive cases to verify basic functionality
+for prog in "--txmsg" "--txmsg_redir" "--txmsg_drop"; do
+for t in "sendmsg" "sendpage"; do
+for r in 1 10 100; do
+	for i in 1 10 100; do
+		for l in 1 10 100; do
+			TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+			echo $TEST
+			$TEST
+			sleep 2
+		done
+	done
+done
+done
+done
+
+#Test max iov
+t="sendmsg"
+r=1
+i=1024
+l=1
+prog="--txmsg"
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+prog="--txmsg_redir"
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+
+# Test max iov with 1k send
+
+t="sendmsg"
+r=1
+i=1024
+l=1024
+prog="--txmsg"
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+prog="--txmsg_redir"
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+
+# Test apply with 1B
+r=1
+i=1024
+l=1024
+prog="--txmsg_apply 1"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply with apply that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply and redirect with 1B
+r=1
+i=1024
+l=1024
+prog="--txmsg_redir --txmsg_apply 1"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply and redirect with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_redir --txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply and redirect with apply that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with 1B not really useful but test it anyways
+r=1
+i=1024
+l=1024
+prog="--txmsg_cork 1"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with a more reasonable 100B
+r=1
+i=1000
+l=1000
+prog="--txmsg_cork 100"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with cork that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+r=1
+i=1024
+l=1024
+prog="--txmsg_redir --txmsg_cork 1"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with a more reasonable 100B
+r=1
+i=1000
+l=1000
+prog="--txmsg_redir --txmsg_cork 100"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_redir --txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test cork with cork that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+
+# mix and match cork and apply not really useful but valid programs
+
+# Test apply < cork
+r=100
+i=1
+l=5
+prog="--txmsg_apply 10 --txmsg_cork 100"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Try again with larger sizes so we hit overflow case
+r=100
+i=1000
+l=2048
+prog="--txmsg_apply 4096 --txmsg_cork 8096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply > cork
+r=100
+i=1
+l=5
+prog="--txmsg_apply 100 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Again with larger sizes so we hit overflow cases
+r=100
+i=1000
+l=2048
+prog="--txmsg_apply 8096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+
+# Test apply = cork
+r=100
+i=1
+l=5
+prog="--txmsg_apply 10 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+r=100
+i=1000
+l=2048
+prog="--txmsg_apply 4096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply < cork
+r=100
+i=1
+l=5
+prog="--txmsg_redir --txmsg_apply 10 --txmsg_cork 100"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Try again with larger sizes so we hit overflow case
+r=100
+i=1000
+l=2048
+prog="--txmsg_redir --txmsg_apply 4096 --txmsg_cork 8096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Test apply > cork
+r=100
+i=1
+l=5
+prog="--txmsg_redir --txmsg_apply 100 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Again with larger sizes so we hit overflow cases
+r=100
+i=1000
+l=2048
+prog="--txmsg_redir --txmsg_apply 8096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+
+# Test apply = cork
+r=100
+i=1
+l=5
+prog="--txmsg_redir --txmsg_apply 10 --txmsg_cork 10"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+r=100
+i=1000
+l=2048
+prog="--txmsg_redir --txmsg_apply 4096 --txmsg_cork 4096"
+for t in "sendpage" "sendmsg"; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+# Tests for bpf_msg_pull_data()
+for i in `seq 99 100 1600`; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+		--txmsg --txmsg_start 0 --txmsg_end $i --txmsg_cork 1600"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+for i in `seq 199 100 1600`; do
+	TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+		--txmsg --txmsg_start 100 --txmsg_end $i --txmsg_cork 1600"
+	echo $TEST
+	$TEST
+	sleep 2
+done
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 1500 --txmsg_end 1600 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 1111 --txmsg_end 1112 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 1111 --txmsg_end 0 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 0 --txmsg_end 1601 --txmsg_cork 1600"
+echo $TEST
+$TEST
+sleep 2
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t sendpage -r 16 -i 1 -l 100 \
+	--txmsg --txmsg_start 0 --txmsg_end 1601 --txmsg_cork 1602"
+echo $TEST
+$TEST
+sleep 2
+
+# Run through gamut again with start and end
+for prog in "--txmsg" "--txmsg_redir" "--txmsg_drop"; do
+for t in "sendmsg" "sendpage"; do
+for r in 1 10 100; do
+	for i in 1 10 100; do
+		for l in 1 10 100; do
+			TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog --txmsg_start 1 --txmsg_end 2"
+			echo $TEST
+		#	$TEST
+			sleep 2
+		done
+	done
+done
+done
+done
+
+# Some specific tests to cover specific code paths
+./sockmap --cgroup /mnt/cgroup2/ -t sendpage \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 3
+./sockmap --cgroup /mnt/cgroup2/ -t sendmsg \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 3
+./sockmap --cgroup /mnt/cgroup2/ -t sendpage \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 5
+./sockmap --cgroup /mnt/cgroup2/ -t sendmsg \
+	-r 5 -i 1 -l 1 --txmsg_redir --txmsg_cork 5 --txmsg_apply 5

^ permalink raw reply related

* [bpf-next PATCH v2 17/18] bpf: sockmap sample test for bpf_msg_pull_data
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev
In-Reply-To: <20180312192034.8039.70022.stgit@john-Precision-Tower-5810>

This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.

The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_kern.c            |   79 ++++++++++++++++++++++++-----
 samples/sockmap/sockmap_user.c            |   32 ++++++++++++
 tools/testing/selftests/bpf/bpf_helpers.h |    2 +
 3 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 5842f1e..a8b555a 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -71,6 +71,14 @@ struct bpf_map_def SEC("maps") sock_cork_bytes = {
 	.max_entries = 1
 };
 
+struct bpf_map_def SEC("maps") sock_pull_bytes = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 2
+};
+
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -137,7 +145,8 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 SEC("sk_msg1")
 int bpf_prog4(struct sk_msg_md *msg)
 {
-	int *bytes, zero = 0;
+	int *bytes, zero = 0, one = 1;
+	int *start, *end;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -145,15 +154,18 @@ int bpf_prog4(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		bpf_msg_cork_bytes(msg, *bytes);
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end)
+		bpf_msg_pull_data(msg, *start, *end, 0);
 	return SK_PASS;
 }
 
 SEC("sk_msg2")
 int bpf_prog5(struct sk_msg_md *msg)
 {
-	void *data_end = (void *)(long) msg->data_end;
-	void *data = (void *)(long) msg->data;
-	int *bytes, err1 = -1, err2 = -1, zero = 0;
+	int err1 = -1, err2 = -1, zero = 0, one = 1;
+	int *bytes, *start, *end, len1, len2;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -161,17 +173,32 @@ int bpf_prog5(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		err2 = bpf_msg_cork_bytes(msg, *bytes);
+	len1 = (__u32)msg->data_end - (__u32)msg->data;
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end) {
+		int err;
+
+		bpf_printk("sk_msg2: pull(%i:%i)\n",
+		   	   start ? *start : 0, end ? *end : 0);
+		err = bpf_msg_pull_data(msg, *start, *end, 0);
+		if (err)
+			bpf_printk("sk_msg2: pull_data err %i\n",
+				   err);
+		len2 = (__u32)msg->data_end - (__u32)msg->data;
+		bpf_printk("sk_msg2: length update %i->%i\n",
+			   len1, len2);
+	}
 	bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n",
-		   (__u32)data_end - (__u32)data, err1, err2);
+		   len1, err1, err2);
 	return SK_PASS;
 }
 
 SEC("sk_msg3")
 int bpf_prog6(struct sk_msg_md *msg)
 {
-	void *data_end = (void *)(long) msg->data_end;
-	void *data = (void *)(long) msg->data;
-	int *bytes, zero = 0;
+	int *bytes, zero = 0, one = 1;
+	int *start, *end;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -179,15 +206,18 @@ int bpf_prog6(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		bpf_msg_cork_bytes(msg, *bytes);
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end)
+		bpf_msg_pull_data(msg, *start, *end, 0);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
 SEC("sk_msg4")
 int bpf_prog7(struct sk_msg_md *msg)
 {
-	void *data_end = (void *)(long) msg->data_end;
-	void *data = (void *)(long) msg->data;
-	int *bytes, err1 = 0, err2 = 0, zero = 0;
+	int err1 = 0, err2 = 0, zero = 0, one = 1;
+	int *bytes, *start, *end, len1, len2;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -195,9 +225,24 @@ int bpf_prog7(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		err2 = bpf_msg_cork_bytes(msg, *bytes);
-
+	len1 = (__u32)msg->data_end - (__u32)msg->data;
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end) {
+		int err;
+
+		bpf_printk("sk_msg2: pull(%i:%i)\n",
+		   	   start ? *start : 0, end ? *end : 0);
+		err = bpf_msg_pull_data(msg, *start, *end, 0);
+		if (err)
+			bpf_printk("sk_msg2: pull_data err %i\n",
+				   err);
+		len2 = (__u32)msg->data_end - (__u32)msg->data;
+		bpf_printk("sk_msg2: length update %i->%i\n",
+			   len1, len2);
+	}
 	bpf_printk("sk_msg3: redirect(%iB) err1=%i err2=%i\n",
-		   (__u32)data_end - (__u32)data, err1, err2);
+		   len1, err1, err2);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -239,7 +284,8 @@ int bpf_prog9(struct sk_msg_md *msg)
 SEC("sk_msg7")
 int bpf_prog10(struct sk_msg_md *msg)
 {
-	int *bytes, zero = 0;
+	int *bytes, zero = 0, one = 1;
+	int *start, *end;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
@@ -247,6 +293,11 @@ int bpf_prog10(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
 	if (bytes)
 		bpf_msg_cork_bytes(msg, *bytes);
+	start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+	end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+	if (start && end)
+		bpf_msg_pull_data(msg, *start, *end, 0);
+
 	return SK_DROP;
 }
 
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 52c4ed7..307d097 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -62,6 +62,8 @@
 int txmsg_drop;
 int txmsg_apply;
 int txmsg_cork;
+int txmsg_start;
+int txmsg_end;
 
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
@@ -79,6 +81,8 @@
 	{"txmsg_drop",		no_argument,	&txmsg_drop, 1 },
 	{"txmsg_apply",	required_argument,	NULL, 'a'},
 	{"txmsg_cork",	required_argument,	NULL, 'k'},
+	{"txmsg_start", required_argument,	NULL, 's'},
+	{"txmsg_end",	required_argument,	NULL, 'e'},
 	{0, 0, NULL, 0 }
 };
 
@@ -572,6 +576,12 @@ int main(int argc, char **argv)
 	while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
 				  long_options, &longindex)) != -1) {
 		switch (opt) {
+		case 's':
+			txmsg_start = atoi(optarg);
+			break;
+		case 'e':	
+			txmsg_end = atoi(optarg);
+			break;
 		case 'a':
 			txmsg_apply = atoi(optarg);
 			break;
@@ -761,6 +771,28 @@ int main(int argc, char **argv)
 			}
 		}
 
+		if (txmsg_start) {
+			err = bpf_map_update_elem(map_fd[5],
+						  &i, &txmsg_start, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (txmsg_start):  %d (%s)\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
+
+		if (txmsg_end) {
+			i = 1;
+			err = bpf_map_update_elem(map_fd[5],
+						  &i, &txmsg_end, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (txmsg_end):  %d (%s)\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
 	}
 
 	if (txmsg_drop)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index b5b45ff..7cae376 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -92,6 +92,8 @@ static int (*bpf_msg_apply_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_apply_bytes;
 static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_cork_bytes;
+static int (*bpf_msg_pull_data)(void *ctx, int start, int end, int flags) =
+	(void *) BPF_FUNC_msg_pull_data;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related

* [bpf-next PATCH v2 16/18] bpf: sockmap add SK_DROP tests
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev
In-Reply-To: <20180312192034.8039.70022.stgit@john-Precision-Tower-5810>

Add tests for SK_DROP.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 samples/sockmap/sockmap_kern.c |   15 ++++++++++
 samples/sockmap/sockmap_user.c |   62 ++++++++++++++++++++++++++++++----------
 2 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 1c430926..5842f1e 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -236,4 +236,19 @@ int bpf_prog9(struct sk_msg_md *msg)
 	return SK_PASS;
 }
 
+SEC("sk_msg7")
+int bpf_prog10(struct sk_msg_md *msg)
+{
+	int *bytes, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+	if (bytes)
+		bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		bpf_msg_cork_bytes(msg, *bytes);
+	return SK_DROP;
+}
+
+
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 4e0a3d8..52c4ed7 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -59,6 +59,7 @@
 int txmsg_noisy;
 int txmsg_redir;
 int txmsg_redir_noisy;
+int txmsg_drop;
 int txmsg_apply;
 int txmsg_cork;
 
@@ -75,6 +76,7 @@
 	{"txmsg_noisy",		no_argument,	&txmsg_noisy, 1  },
 	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
 	{"txmsg_redir_noisy",	no_argument,	&txmsg_redir_noisy, 1},
+	{"txmsg_drop",		no_argument,	&txmsg_drop, 1 },
 	{"txmsg_apply",	required_argument,	NULL, 'a'},
 	{"txmsg_cork",	required_argument,	NULL, 'k'},
 	{0, 0, NULL, 0 }
@@ -210,9 +212,19 @@ struct msg_stats {
 	struct timespec end;
 };
 
+struct sockmap_options {
+	int verbose;
+	bool base;
+	bool sendpage;
+	bool data_test;
+	bool drop_expected;
+};
+
 static int msg_loop_sendpage(int fd, int iov_length, int cnt,
-			     struct msg_stats *s)
+			     struct msg_stats *s,
+			     struct sockmap_options *opt)
 {
+	bool drop = opt->drop_expected;
 	unsigned char k = 0;
 	FILE *file;
 	int i, fp;
@@ -229,12 +241,18 @@ static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 	for (i = 0; i < cnt; i++) {
 		int sent = sendfile(fd, fp, NULL, iov_length);
 
-		if (sent < 0) {
+		if (!drop && sent < 0) {
 			perror("send loop error:");
 			close(fp);
 			return sent;
+		} else if (drop && sent >= 0) {
+			printf("sendpage loop error expected: %i\n", sent);
+			close(fp);
+			return -EIO;
 		}
-		s->bytes_sent += sent;
+
+		if (sent > 0)
+			s->bytes_sent += sent;
 	}
 	clock_gettime(CLOCK_MONOTONIC, &s->end);
 	close(fp);
@@ -242,12 +260,15 @@ static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 }
 
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
-		    struct msg_stats *s, bool tx, bool data_test)
+		    struct msg_stats *s, bool tx,
+		    struct sockmap_options *opt)
 {
 	struct msghdr msg = {0};
 	int err, i, flags = MSG_NOSIGNAL;
 	struct iovec *iov;
 	unsigned char k;
+	bool data_test = opt->data_test;
+	bool drop = opt->drop_expected;
 
 	iov = calloc(iov_count, sizeof(struct iovec));
 	if (!iov)
@@ -281,11 +302,16 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
 		for (i = 0; i < cnt; i++) {
 			int sent = sendmsg(fd, &msg, flags);
 
-			if (sent < 0) {
+			if (!drop && sent < 0) {
 				perror("send loop error:");
 				goto out_errno;
+			} else if (drop && sent >= 0) {
+				printf("send loop error expected: %i\n", sent);
+				errno = -EIO;
+				goto out_errno;
 			}
-			s->bytes_sent += sent;
+			if (sent > 0)
+				s->bytes_sent += sent;
 		}
 		clock_gettime(CLOCK_MONOTONIC, &s->end);
 	} else {
@@ -375,13 +401,6 @@ static inline float recvdBps(struct msg_stats s)
 	return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
 }
 
-struct sockmap_options {
-	int verbose;
-	bool base;
-	bool sendpage;
-	bool data_test;
-};
-
 static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 			struct sockmap_options *opt)
 {
@@ -399,10 +418,13 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
 	rxpid = fork();
 	if (rxpid == 0) {
+		if (opt->drop_expected)
+			exit(1);
+
 		if (opt->sendpage)
 			iov_count = 1;
 		err = msg_loop(rx_fd, iov_count, iov_buf,
-			       cnt, &s, false, opt->data_test);
+			       cnt, &s, false, opt);
 		if (err)
 			fprintf(stderr,
 				"msg_loop_rx: iov_count %i iov_buf %i cnt %i err %i\n",
@@ -426,10 +448,10 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 	txpid = fork();
 	if (txpid == 0) {
 		if (opt->sendpage)
-			err = msg_loop_sendpage(c1, iov_buf, cnt, &s);
+			err = msg_loop_sendpage(c1, iov_buf, cnt, &s, opt);
 		else
 			err = msg_loop(c1, iov_count, iov_buf,
-				       cnt, &s, true, opt->data_test);
+				       cnt, &s, true, opt);
 
 		if (err)
 			fprintf(stderr,
@@ -674,6 +696,9 @@ int main(int argc, char **argv)
 		tx_prog_fd = prog_fd[5];
 	else if (txmsg_redir_noisy)
 		tx_prog_fd = prog_fd[6];
+	else if (txmsg_drop)
+		tx_prog_fd = prog_fd[9];
+	/* apply and cork must be last */
 	else if (txmsg_apply)
 		tx_prog_fd = prog_fd[7];
 	else if (txmsg_cork)
@@ -700,6 +725,7 @@ int main(int argc, char **argv)
 				err, strerror(errno));
 			return err;
 		}
+
 		if (txmsg_redir || txmsg_redir_noisy)
 			redir_fd = c2;
 		else
@@ -736,6 +762,10 @@ int main(int argc, char **argv)
 		}
 
 	}
+
+	if (txmsg_drop)
+		options.drop_expected = true;
+
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, &options);
 	else if (test == SENDMSG) {

^ permalink raw reply related

* [bpf-next PATCH v2 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()
From: John Fastabend @ 2018-03-12 19:24 UTC (permalink / raw)
  To: davem, ast, daniel, davejwatson; +Cc: netdev
In-Reply-To: <20180312192034.8039.70022.stgit@john-Precision-Tower-5810>

Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.

Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf_common.h           |    7 ++--
 samples/sockmap/sockmap_kern.c            |   53 +++++++++++++++++++++++++----
 samples/sockmap/sockmap_user.c            |   19 ++++++++++
 tools/include/uapi/linux/bpf.h            |    3 +-
 tools/testing/selftests/bpf/bpf_helpers.h |    2 +
 5 files changed, 71 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
index ee97668..18be907 100644
--- a/include/uapi/linux/bpf_common.h
+++ b/include/uapi/linux/bpf_common.h
@@ -15,10 +15,9 @@
 
 /* ld/ldx fields */
 #define BPF_SIZE(code)  ((code) & 0x18)
-#define		BPF_W		0x00 /* 32-bit */
-#define		BPF_H		0x08 /* 16-bit */
-#define		BPF_B		0x10 /*  8-bit */
-/* eBPF		BPF_DW		0x18    64-bit */
+#define		BPF_W		0x00
+#define		BPF_H		0x08
+#define		BPF_B		0x10
 #define BPF_MODE(code)  ((code) & 0xe0)
 #define		BPF_IMM		0x00
 #define		BPF_ABS		0x20
diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 5a51f15..1c430926 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -64,6 +64,13 @@ struct bpf_map_def SEC("maps") sock_apply_bytes = {
 	.max_entries = 1
 };
 
+struct bpf_map_def SEC("maps") sock_cork_bytes = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(int),
+	.max_entries = 1
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -135,6 +142,9 @@ int bpf_prog4(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
 		bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		bpf_msg_cork_bytes(msg, *bytes);
 	return SK_PASS;
 }
 
@@ -143,13 +153,16 @@ int bpf_prog5(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
-	int *bytes, err = 0, zero = 0;
+	int *bytes, err1 = -1, err2 = -1, zero = 0;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
-		err = bpf_msg_apply_bytes(msg, *bytes);
-	bpf_printk("sk_msg2: data length %i err %i\n",
-		   (__u32)data_end - (__u32)data, err);
+		err1 = bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		err2 = bpf_msg_cork_bytes(msg, *bytes);
+	bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n",
+		   (__u32)data_end - (__u32)data, err1, err2);
 	return SK_PASS;
 }
 
@@ -163,6 +176,9 @@ int bpf_prog6(struct sk_msg_md *msg)
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
 		bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		bpf_msg_cork_bytes(msg, *bytes);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -171,13 +187,17 @@ int bpf_prog7(struct sk_msg_md *msg)
 {
 	void *data_end = (void *)(long) msg->data_end;
 	void *data = (void *)(long) msg->data;
-	int *bytes, err = 0, zero = 0;
+	int *bytes, err1 = 0, err2 = 0, zero = 0;
 
 	bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
 	if (bytes)
-		err = bpf_msg_apply_bytes(msg, *bytes);
-	bpf_printk("sk_msg3: redirect(%iB) err=%i\n",
-		   (__u32)data_end - (__u32)data, err);
+		err1 = bpf_msg_apply_bytes(msg, *bytes);
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes)
+		err2 = bpf_msg_cork_bytes(msg, *bytes);
+
+	bpf_printk("sk_msg3: redirect(%iB) err1=%i err2=%i\n",
+		   (__u32)data_end - (__u32)data, err1, err2);
 	return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -198,5 +218,22 @@ int bpf_prog8(struct sk_msg_md *msg)
 	}
 	return SK_PASS;
 }
+SEC("sk_msg6")
+int bpf_prog9(struct sk_msg_md *msg)
+{
+	void *data_end = (void *)(long) msg->data_end;
+	void *data = (void *)(long) msg->data;
+	int ret = 0, *bytes, zero = 0;
+
+	bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+	if (bytes) {
+		if (((__u32)data_end - (__u32)data) >= *bytes)
+			return SK_PASS;
+		ret = bpf_msg_cork_bytes(msg, *bytes);
+		if (ret)
+			return SK_DROP;
+	}
+	return SK_PASS;
+}
 
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 41774ec..4e0a3d8 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -60,6 +60,7 @@
 int txmsg_redir;
 int txmsg_redir_noisy;
 int txmsg_apply;
+int txmsg_cork;
 
 static const struct option long_options[] = {
 	{"help",	no_argument,		NULL, 'h' },
@@ -75,6 +76,7 @@
 	{"txmsg_redir",		no_argument,	&txmsg_redir, 1  },
 	{"txmsg_redir_noisy",	no_argument,	&txmsg_redir_noisy, 1},
 	{"txmsg_apply",	required_argument,	NULL, 'a'},
+	{"txmsg_cork",	required_argument,	NULL, 'k'},
 	{0, 0, NULL, 0 }
 };
 
@@ -551,6 +553,9 @@ int main(int argc, char **argv)
 		case 'a':
 			txmsg_apply = atoi(optarg);
 			break;
+		case 'k':
+			txmsg_cork = atoi(optarg);
+			break;
 		case 'c':
 			cg_fd = open(optarg, O_DIRECTORY, O_RDONLY);
 			if (cg_fd < 0) {
@@ -671,6 +676,8 @@ int main(int argc, char **argv)
 		tx_prog_fd = prog_fd[6];
 	else if (txmsg_apply)
 		tx_prog_fd = prog_fd[7];
+	else if (txmsg_cork)
+		tx_prog_fd = prog_fd[8];
 	else
 		tx_prog_fd = 0;
 
@@ -716,6 +723,18 @@ int main(int argc, char **argv)
 				return err;
 			}
 		}
+
+		if (txmsg_cork) {
+			err = bpf_map_update_elem(map_fd[4],
+						  &i, &txmsg_cork, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (cork_bytes):  %d (%s\n",
+					err, strerror(errno));
+				return err;
+			}
+		}
+
 	}
 	if (test == PING_PONG)
 		err = forever_ping_pong(rate, &options);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 609456f..ce07a13 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -770,7 +770,8 @@ enum bpf_attach_type {
 	FN(override_return),		\
 	FN(sock_ops_cb_flags_set),	\
 	FN(msg_redirect_map),		\
-	FN(msg_apply_bytes),
+	FN(msg_apply_bytes),		\
+	FN(msg_cork_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 4713de4..b5b45ff 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -90,6 +90,8 @@ static int (*bpf_msg_redirect_map)(void *ctx, void *map, int key, int flags) =
 	(void *) BPF_FUNC_msg_redirect_map;
 static int (*bpf_msg_apply_bytes)(void *ctx, int len) =
 	(void *) BPF_FUNC_msg_apply_bytes;
+static int (*bpf_msg_cork_bytes)(void *ctx, int len) =
+	(void *) BPF_FUNC_msg_cork_bytes;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox