* Re: [PATCH net-next v2 07/12] net: dsa: Add ability to program multicast filter for CPU port
From: Vivien Didelot @ 2019-01-30 22:28 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev, Florian Fainelli, andrew, davem, idosch, jiri,
ilias.apalodimas, ivan.khoronzhuk, roopa, nikolay
In-Reply-To: <20190130005548.2212-8-f.fainelli@gmail.com>
Hi Florian,
On Tue, 29 Jan 2019 16:55:43 -0800, Florian Fainelli <f.fainelli@gmail.com> wrote:
> +static int dsa_slave_sync_unsync_mdb_addr(struct net_device *dev,
> + const unsigned char *addr, bool add)
> +{
> + struct switchdev_obj_port_mdb mdb = {
> + .obj = {
> + .id = SWITCHDEV_OBJ_ID_HOST_MDB,
> + .flags = SWITCHDEV_F_DEFER,
> + },
> + .vid = 0,
> + };
> + int ret = -EOPNOTSUPP;
Assignment unneeded here.
> +
> + ether_addr_copy(mdb.addr, addr);
> + if (add)
> + ret = switchdev_port_obj_add(dev, &mdb.obj, NULL);
> + else
> + ret = switchdev_port_obj_del(dev, &mdb.obj);
> +
> + return ret;
> +}
> +
> +static int dsa_slave_sync_mdb_addr(struct net_device *dev,
> + const unsigned char *addr)
> +{
> + return dsa_slave_sync_unsync_mdb_addr(dev, addr, true);
> +}
> +
> +static int dsa_slave_unsync_mdb_addr(struct net_device *dev,
> + const unsigned char *addr)
> +{
> + return dsa_slave_sync_unsync_mdb_addr(dev, addr, false);
> +}
This wrapper isn't necessary IMO. I'd go with something like:
static int dsa_slave_sync(struct net_device *dev, const unsigned char *addr)
{
struct switchdev_obj_port_mdb mdb = {
.obj.id = SWITCHDEV_OBJ_ID_HOST_MDB,
.obj.flags = SWITCHDEV_F_DEFER,
};
ether_addr_copy(mdb.addr, addr);
return switchdev_port_obj_add(dev, &mdb.obj, NULL);
}
static int dsa_slave_unsync(struct net_device *dev, const unsigned char *addr)
{
struct switchdev_obj_port_mdb mdb = {
.obj.id = SWITCHDEV_OBJ_ID_HOST_MDB,
.obj.flags = SWITCHDEV_F_DEFER,
};
ether_addr_copy(mdb.addr, addr);
return switchdev_port_obj_del(dev, &mdb.obj);
}
We may eventually wrap this cryptic netdevery in:
static int dsa_slave_mc_sync(struct net_device *dev)
{
return __hw_addr_sync_dev(&dev->mc, dev, dsa_slave_sync, dsa_slave_unsync);
}
static void dsa_slave_mc_unsync(struct net_device *dev)
{
__hw_addr_unsync_dev(&dev->mc, dev, dsa_slave_sync);
}
> +
> static int dsa_slave_open(struct net_device *dev)
> {
> struct net_device *master = dsa_slave_to_master(dev);
> @@ -126,6 +159,8 @@ static int dsa_slave_close(struct net_device *dev)
>
> dev_mc_unsync(master, dev);
> dev_uc_unsync(master, dev);
> + __hw_addr_unsync_dev(&dev->mc, dev, dsa_slave_unsync_mdb_addr);
> +
> if (dev->flags & IFF_ALLMULTI)
> dev_set_allmulti(master, -1);
> if (dev->flags & IFF_PROMISC)
> @@ -150,7 +185,17 @@ static void dsa_slave_change_rx_flags(struct net_device *dev, int change)
> static void dsa_slave_set_rx_mode(struct net_device *dev)
> {
> struct net_device *master = dsa_slave_to_master(dev);
> + struct dsa_port *dp = dsa_slave_to_port(dev);
>
> + /* If the port is bridged, the bridge takes care of sending
> + * SWITCHDEV_OBJ_ID_HOST_MDB to program the host's MC filter
> + */
> + if (netdev_mc_empty(dev) || dp->bridge_dev)
> + goto out;
> +
> + __hw_addr_sync_dev(&dev->mc, dev, dsa_slave_sync_mdb_addr,
> + dsa_slave_unsync_mdb_addr);
And check the returned error code.
> +out:
> dev_mc_sync(master, dev);
> dev_uc_sync(master, dev);
> }
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH net] ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
From: David Miller @ 2019-01-30 22:06 UTC (permalink / raw)
To: yohei.kanemaru; +Cc: netdev, dlebrun
In-Reply-To: <20190129065234.68121-1-yohei.kanemaru@gmail.com>
From: Yohei Kanemaru <yohei.kanemaru@gmail.com>
Date: Tue, 29 Jan 2019 15:52:34 +0900
> skb->cb may contain data from previous layers (in an observed case
> IPv4 with L3 Master Device). In the observed scenario, the data in
> IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
> eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
> through ip6_finish_output().
>
> This patch clears IP6CB(skb), which potentially contains garbage data,
> on the SRH ip4ip6 encapsulation.
>
> Fixes: 32d99d0b6702 ("ipv6: sr: add support for ip4ip6 encapsulation")
> Signed-off-by: Yohei Kanemaru <yohei.kanemaru@gmail.com>
Applied, thanks.
^ permalink raw reply
* [PATCH net-next v2 3/5] net: tls: Refactor control message handling on recv
From: Dave Watson @ 2019-01-30 21:58 UTC (permalink / raw)
To: netdev@vger.kernel.org, Dave Miller
Cc: Vakul Garg, Boris Pismenny, Aviad Yehezkel, John Fastabend,
Daniel Borkmann
For TLS 1.3, the control message is encrypted. Handle control
message checks after decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
---
net/tls/tls_sw.c | 88 ++++++++++++++++++++++++------------------------
1 file changed, 44 insertions(+), 44 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7b6386f4c685..34f3523f668e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1421,16 +1421,15 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
return err;
}
+ rxm->offset += tls_ctx->rx.prepend_size;
+ rxm->full_len -= tls_ctx->rx.overhead_size;
+ tls_advance_record_sn(sk, &tls_ctx->rx);
+ ctx->decrypted = true;
+ ctx->saved_data_ready(sk);
} else {
*zc = false;
}
- rxm->offset += tls_ctx->rx.prepend_size;
- rxm->full_len -= tls_ctx->rx.overhead_size;
- tls_advance_record_sn(sk, &tls_ctx->rx);
- ctx->decrypted = true;
- ctx->saved_data_ready(sk);
-
return err;
}
@@ -1609,6 +1608,25 @@ int tls_sw_recvmsg(struct sock *sk,
rxm = strp_msg(skb);
+ to_decrypt = rxm->full_len - tls_ctx->rx.overhead_size;
+
+ if (to_decrypt <= len && !is_kvec && !is_peek &&
+ ctx->control == TLS_RECORD_TYPE_DATA)
+ zc = true;
+
+ err = decrypt_skb_update(sk, skb, &msg->msg_iter,
+ &chunk, &zc, ctx->async_capable);
+ if (err < 0 && err != -EINPROGRESS) {
+ tls_err_abort(sk, EBADMSG);
+ goto recv_end;
+ }
+
+ if (err == -EINPROGRESS) {
+ async = true;
+ num_async++;
+ goto pick_next_record;
+ }
+
if (!cmsg) {
int cerr;
@@ -1626,40 +1644,22 @@ int tls_sw_recvmsg(struct sock *sk,
goto recv_end;
}
- to_decrypt = rxm->full_len - tls_ctx->rx.overhead_size;
-
- if (to_decrypt <= len && !is_kvec && !is_peek)
- zc = true;
-
- err = decrypt_skb_update(sk, skb, &msg->msg_iter,
- &chunk, &zc, ctx->async_capable);
- if (err < 0 && err != -EINPROGRESS) {
- tls_err_abort(sk, EBADMSG);
- goto recv_end;
- }
-
- if (err == -EINPROGRESS) {
- async = true;
- num_async++;
- goto pick_next_record;
- } else {
- if (!zc) {
- if (rxm->full_len > len) {
- retain_skb = true;
- chunk = len;
- } else {
- chunk = rxm->full_len;
- }
+ if (!zc) {
+ if (rxm->full_len > len) {
+ retain_skb = true;
+ chunk = len;
+ } else {
+ chunk = rxm->full_len;
+ }
- err = skb_copy_datagram_msg(skb, rxm->offset,
- msg, chunk);
- if (err < 0)
- goto recv_end;
+ err = skb_copy_datagram_msg(skb, rxm->offset,
+ msg, chunk);
+ if (err < 0)
+ goto recv_end;
- if (!is_peek) {
- rxm->offset = rxm->offset + chunk;
- rxm->full_len = rxm->full_len - chunk;
- }
+ if (!is_peek) {
+ rxm->offset = rxm->offset + chunk;
+ rxm->full_len = rxm->full_len - chunk;
}
}
@@ -1759,15 +1759,15 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
if (!skb)
goto splice_read_end;
- /* splice does not support reading control messages */
- if (ctx->control != TLS_RECORD_TYPE_DATA) {
- err = -ENOTSUPP;
- goto splice_read_end;
- }
-
if (!ctx->decrypted) {
err = decrypt_skb_update(sk, skb, NULL, &chunk, &zc, false);
+ /* splice does not support reading control messages */
+ if (ctx->control != TLS_RECORD_TYPE_DATA) {
+ err = -ENOTSUPP;
+ goto splice_read_end;
+ }
+
if (err < 0) {
tls_err_abort(sk, EBADMSG);
goto splice_read_end;
--
2.17.1
^ permalink raw reply related
* [PATCH net-next v2 1/5] net: tls: Support 256 bit keys
From: Dave Watson @ 2019-01-30 21:58 UTC (permalink / raw)
To: netdev@vger.kernel.org, Dave Miller
Cc: Vakul Garg, Boris Pismenny, Aviad Yehezkel, John Fastabend,
Daniel Borkmann
Wire up support for 256 bit keys from the setsockopt to the crypto
framework
Signed-off-by: Dave Watson <davejwatson@fb.com>
---
include/net/tls.h | 5 ++-
include/uapi/linux/tls.h | 15 ++++++++
net/tls/tls_main.c | 33 +++++++++++++++-
net/tls/tls_sw.c | 29 +++++++++++++--
tools/testing/selftests/net/tls.c | 62 +++++++++++++++++++++++++++++++
5 files changed, 137 insertions(+), 7 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index 4592606e136a..da616db48413 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -206,7 +206,10 @@ struct cipher_context {
union tls_crypto_context {
struct tls_crypto_info info;
- struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
+ union {
+ struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
+ struct tls12_crypto_info_aes_gcm_256 aes_gcm_256;
+ };
};
struct tls_context {
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index ff02287495ac..9affceaa3db4 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -59,6 +59,13 @@
#define TLS_CIPHER_AES_GCM_128_TAG_SIZE 16
#define TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE 8
+#define TLS_CIPHER_AES_GCM_256 52
+#define TLS_CIPHER_AES_GCM_256_IV_SIZE 8
+#define TLS_CIPHER_AES_GCM_256_KEY_SIZE 32
+#define TLS_CIPHER_AES_GCM_256_SALT_SIZE 4
+#define TLS_CIPHER_AES_GCM_256_TAG_SIZE 16
+#define TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE 8
+
#define TLS_SET_RECORD_TYPE 1
#define TLS_GET_RECORD_TYPE 2
@@ -75,4 +82,12 @@ struct tls12_crypto_info_aes_gcm_128 {
unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
};
+struct tls12_crypto_info_aes_gcm_256 {
+ struct tls_crypto_info info;
+ unsigned char iv[TLS_CIPHER_AES_GCM_256_IV_SIZE];
+ unsigned char key[TLS_CIPHER_AES_GCM_256_KEY_SIZE];
+ unsigned char salt[TLS_CIPHER_AES_GCM_256_SALT_SIZE];
+ unsigned char rec_seq[TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE];
+};
+
#endif /* _UAPI_LINUX_TLS_H */
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d36d095cbcf0..0f028cfdf835 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -372,6 +372,30 @@ static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
rc = -EFAULT;
break;
}
+ case TLS_CIPHER_AES_GCM_256: {
+ struct tls12_crypto_info_aes_gcm_256 *
+ crypto_info_aes_gcm_256 =
+ container_of(crypto_info,
+ struct tls12_crypto_info_aes_gcm_256,
+ info);
+
+ if (len != sizeof(*crypto_info_aes_gcm_256)) {
+ rc = -EINVAL;
+ goto out;
+ }
+ lock_sock(sk);
+ memcpy(crypto_info_aes_gcm_256->iv,
+ ctx->tx.iv + TLS_CIPHER_AES_GCM_256_SALT_SIZE,
+ TLS_CIPHER_AES_GCM_256_IV_SIZE);
+ memcpy(crypto_info_aes_gcm_256->rec_seq, ctx->tx.rec_seq,
+ TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE);
+ release_sock(sk);
+ if (copy_to_user(optval,
+ crypto_info_aes_gcm_256,
+ sizeof(*crypto_info_aes_gcm_256)))
+ rc = -EFAULT;
+ break;
+ }
default:
rc = -EINVAL;
}
@@ -412,6 +436,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
{
struct tls_crypto_info *crypto_info;
struct tls_context *ctx = tls_get_ctx(sk);
+ size_t optsize;
int rc = 0;
int conf;
@@ -444,8 +469,12 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
}
switch (crypto_info->cipher_type) {
- case TLS_CIPHER_AES_GCM_128: {
- if (optlen != sizeof(struct tls12_crypto_info_aes_gcm_128)) {
+ case TLS_CIPHER_AES_GCM_128:
+ case TLS_CIPHER_AES_GCM_256: {
+ optsize = crypto_info->cipher_type == TLS_CIPHER_AES_GCM_128 ?
+ sizeof(struct tls12_crypto_info_aes_gcm_128) :
+ sizeof(struct tls12_crypto_info_aes_gcm_256);
+ if (optlen != optsize) {
rc = -EINVAL;
goto err_crypto_info;
}
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 3f2a6af27e62..9326c06c2ffe 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1999,6 +1999,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
{
struct tls_crypto_info *crypto_info;
struct tls12_crypto_info_aes_gcm_128 *gcm_128_info;
+ struct tls12_crypto_info_aes_gcm_256 *gcm_256_info;
struct tls_sw_context_tx *sw_ctx_tx = NULL;
struct tls_sw_context_rx *sw_ctx_rx = NULL;
struct cipher_context *cctx;
@@ -2006,7 +2007,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
struct strp_callbacks cb;
u16 nonce_size, tag_size, iv_size, rec_seq_size;
struct crypto_tfm *tfm;
- char *iv, *rec_seq;
+ char *iv, *rec_seq, *key, *salt;
+ size_t keysize;
int rc = 0;
if (!ctx) {
@@ -2067,6 +2069,24 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
((struct tls12_crypto_info_aes_gcm_128 *)crypto_info)->rec_seq;
gcm_128_info =
(struct tls12_crypto_info_aes_gcm_128 *)crypto_info;
+ keysize = TLS_CIPHER_AES_GCM_128_KEY_SIZE;
+ key = gcm_128_info->key;
+ salt = gcm_128_info->salt;
+ break;
+ }
+ case TLS_CIPHER_AES_GCM_256: {
+ nonce_size = TLS_CIPHER_AES_GCM_256_IV_SIZE;
+ tag_size = TLS_CIPHER_AES_GCM_256_TAG_SIZE;
+ iv_size = TLS_CIPHER_AES_GCM_256_IV_SIZE;
+ iv = ((struct tls12_crypto_info_aes_gcm_256 *)crypto_info)->iv;
+ rec_seq_size = TLS_CIPHER_AES_GCM_256_REC_SEQ_SIZE;
+ rec_seq =
+ ((struct tls12_crypto_info_aes_gcm_256 *)crypto_info)->rec_seq;
+ gcm_256_info =
+ (struct tls12_crypto_info_aes_gcm_256 *)crypto_info;
+ keysize = TLS_CIPHER_AES_GCM_256_KEY_SIZE;
+ key = gcm_256_info->key;
+ salt = gcm_256_info->salt;
break;
}
default:
@@ -2090,7 +2110,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
rc = -ENOMEM;
goto free_priv;
}
- memcpy(cctx->iv, gcm_128_info->salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
+ /* Note: 128 & 256 bit salt are the same size */
+ memcpy(cctx->iv, salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
memcpy(cctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
cctx->rec_seq_size = rec_seq_size;
cctx->rec_seq = kmemdup(rec_seq, rec_seq_size, GFP_KERNEL);
@@ -2110,8 +2131,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
ctx->push_pending_record = tls_sw_push_pending_record;
- rc = crypto_aead_setkey(*aead, gcm_128_info->key,
- TLS_CIPHER_AES_GCM_128_KEY_SIZE);
+ rc = crypto_aead_setkey(*aead, key, keysize);
+
if (rc)
goto free_aead;
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index ff68ed19c0ef..c356f481de79 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -763,4 +763,66 @@ TEST_F(tls, control_msg)
EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
}
+TEST(keysizes) {
+ struct tls12_crypto_info_aes_gcm_256 tls12;
+ struct sockaddr_in addr;
+ int sfd, ret, fd, cfd;
+ socklen_t len;
+ bool notls;
+
+ notls = false;
+ len = sizeof(addr);
+
+ memset(&tls12, 0, sizeof(tls12));
+ tls12.info.version = TLS_1_2_VERSION;
+ tls12.info.cipher_type = TLS_CIPHER_AES_GCM_256;
+
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = 0;
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ sfd = socket(AF_INET, SOCK_STREAM, 0);
+
+ ret = bind(sfd, &addr, sizeof(addr));
+ ASSERT_EQ(ret, 0);
+ ret = listen(sfd, 10);
+ ASSERT_EQ(ret, 0);
+
+ ret = getsockname(sfd, &addr, &len);
+ ASSERT_EQ(ret, 0);
+
+ ret = connect(fd, &addr, sizeof(addr));
+ ASSERT_EQ(ret, 0);
+
+ ret = setsockopt(fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ if (ret != 0) {
+ notls = true;
+ printf("Failure setting TCP_ULP, testing without tls\n");
+ }
+
+ if (!notls) {
+ ret = setsockopt(fd, SOL_TLS, TLS_TX, &tls12,
+ sizeof(tls12));
+ EXPECT_EQ(ret, 0);
+ }
+
+ cfd = accept(sfd, &addr, &len);
+ ASSERT_GE(cfd, 0);
+
+ if (!notls) {
+ ret = setsockopt(cfd, IPPROTO_TCP, TCP_ULP, "tls",
+ sizeof("tls"));
+ EXPECT_EQ(ret, 0);
+
+ ret = setsockopt(cfd, SOL_TLS, TLS_RX, &tls12,
+ sizeof(tls12));
+ EXPECT_EQ(ret, 0);
+ }
+
+ close(sfd);
+ close(fd);
+ close(cfd);
+}
+
TEST_HARNESS_MAIN
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net-next v2 6/7] nfp: devlink: report the running and flashed versions
From: Jakub Kicinski @ 2019-01-30 22:21 UTC (permalink / raw)
To: Jiri Pirko
Cc: davem, netdev, oss-drivers, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon
In-Reply-To: <20190130215752.GD349@nanopsycho.orion>
On Wed, 30 Jan 2019 22:57:52 +0100, Jiri Pirko wrote:
> >+/* Control processor FW version, FW is responsible for house keeping tasks,
> >+ * PHY control etc.
> >+ */
> >+#define DEVLINK_VERSION_GENERIC_FW_MGMT "fw.mgmt"
> >+/* Data path microcode controlling high-speed packet processing */
> >+#define DEVLINK_VERSION_GENERIC_FW_APP "fw.app"
> >+/* UNDI software version */
> >+#define DEVLINK_VERSION_GENERIC_FW_UNDI "fw.undi"
> >+/* NCSI support/handler version */
> >+#define DEVLINK_VERSION_GENERIC_FW_NCSI "fw.ncsi"
>
> Same here. Also, please put "INFO" in the names to respect the namespacing
Ack on all, and thanks for the reviews! Do you also think I should add
a doc with them? I was going back and forth on that..
^ permalink raw reply
* Re: [PATCH net] ipvlan, l3mdev: fix broken l3s mode wrt local routes
From: David Ahern @ 2019-01-30 22:24 UTC (permalink / raw)
To: Daniel Borkmann, davem
Cc: netdev, Mahesh Bandewar, Florian Westphal, Martynas Pumputis
In-Reply-To: <20190130114948.24227-1-daniel@iogearbox.net>
On 1/30/19 4:49 AM, Daniel Borkmann wrote:
> While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
> I ran into the issue that while l3 mode is working fine, l3s mode
> does not have any connectivity to kube-apiserver and hence all pods
> end up in Error state as well. The ipvlan master device sits on
> top of a bond device and hostns traffic to kube-apiserver (also running
> in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
> where the latter is the address of the bond0. While in l3 mode, a
> curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
> works fine from hostns, neither of them do in case of l3s. In the
> latter only a curl to https://127.0.0.1:37573 appeared to work where
> for local addresses of bond0 I saw kernel suddenly starting to emit
> ARP requests to query HW address of bond0 which remained unanswered
> and neighbor entries in INCOMPLETE state. These ARP requests only
> happen while in l3s.
>
> Debugging this further, I found the issue is that l3s mode is piggy-
> backing on l3 master device, and in this case local routes are using
> l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
> f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
> if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
> a loopback"). I found that reverting them back into using the
> net->loopback_dev fixed ipvlan l3s connectivity and got everything
> working for the CNI.
>
> Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
> l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
> on l3 master device is to get the l3mdev_ip_rcv() receive hook for
> setting the dst entry of the input route without adding its own
> ipvlan specific hacks into the receive path, however, any l3 domain
> semantics beyond just that are breaking l3s operation. Note that
> ipvlan also has the ability to dynamically switch its internal
> operation from l3 to l3s for all ports via ipvlan_set_port_mode()
> at runtime. In any case, l3 vs l3s soley distinguishes itself by
> 'de-confusing' netfilter through switching skb->dev to ipvlan slave
> device late in NF_INET_LOCAL_IN before handing the skb to L4.
>
> Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
> if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
> without any additional l3mdev semantics on top. This should also have
> minimal impact since dev->priv_flags is already hot in cache. With
> this set, l3s mode is working fine and I also get things like
> masquerading pod traffic on the ipvlan master properly working.
>
> [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
>
> Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
> Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
> Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: David Ahern <dsa@cumulusnetworks.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Martynas Pumputis <m@lambda.lt>
> ---
> drivers/net/ipvlan/ipvlan_main.c | 6 +++---
> include/linux/netdevice.h | 8 ++++++++
> include/net/l3mdev.h | 3 ++-
> 3 files changed, 13 insertions(+), 4 deletions(-)
>
I am not surprised that ipvlan needs a finer grained selection of the
l3mdev hooks.
Acked-by: David Ahern <dsa@cumulusnetworks.com>
^ permalink raw reply
* BUG: KASAN: double-free or invalid-free in ip_defrag after upgrade from 4.19.13
From: Ivan Babrou @ 2019-01-30 22:26 UTC (permalink / raw)
To: Linux Kernel Network Developers
Cc: mkubecek, David S. Miller, Eric Dumazet, Ignat Korchagin,
Shawn Bohrer, Jakub Sitnicki
Hey,
Continuing from this thread earlier today:
* https://marc.info/?t=154886729100001&r=1&w=2
We fired up KASAN enabled kernel one one of those machine and this is
what we saw:
$ /tmp/decode_stacktrace.sh
/usr/lib/debug/lib/modules/4.19.18-cloudflare-2019.1.8-1-gcabf55c/vmlinux
linux-4.19.18 < kasan.txt
[ 2300.250278] ==================================================================
[ 2300.266575] BUG: KASAN: double-free or invalid-free in ip_defrag
(net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
[ 2300.282860]
[ 2300.293415] CPU: 28 PID: 0 Comm: swapper/28 Tainted: G B O
4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
[ 2300.313767] Hardware name: Quanta Computer Inc. QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 2300.332707] Call Trace:
[ 2300.344701] <IRQ>
[ 2300.356188] dump_stack (lib/dump_stack.c:115)
[ 2300.368967] print_address_description (mm/kasan/report.c:257)
[ 2300.383192] ? ip_defrag (net/ipv4/ip_fragment.c:507
net/ipv4/ip_fragment.c:699)
[ 2300.396330] kasan_report_invalid_free (mm/kasan/report.c:337)
[ 2300.410448] ? ip_defrag (net/ipv4/ip_fragment.c:507
net/ipv4/ip_fragment.c:699)
[ 2300.423599] __kasan_slab_free (mm/kasan/kasan.c:502)
[ 2300.437165] ? ip_defrag (net/ipv4/ip_fragment.c:507
net/ipv4/ip_fragment.c:699)
[ 2300.450251] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
[ 2300.463497] ip_defrag (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
[ 2300.476352] ? ip4_obj_hashfn (net/ipv4/ip_fragment.c:684)
[ 2300.489711] ? ip_route_input_rcu (net/ipv4/route.c:2122)
[ 2300.503416] ip_local_deliver (net/ipv4/ip_input.c:252)
[ 2300.516739] ? ip_call_ra_chain (net/ipv4/ip_input.c:245)
[ 2300.530174] ? ip_rcv_finish_core.isra.19 (net/ipv4/ip_input.c:366)
[ 2300.544535] ? ip_local_deliver (net/ipv4/ip_input.c:518)
[ 2300.557862] ip_rcv (net/ipv4/ip_input.c:518)
[ 2300.569972] ? ip_local_deliver (net/ipv4/ip_input.c:518)
[ 2300.583216] ? ip_rcv_core.isra.20 (net/ipv4/ip_input.c:403)
[ 2300.596683] __netif_receive_skb_one_core (net/core/dev.c:4911)
[ 2300.610732] ? __netif_receive_skb_core (net/core/dev.c:4911)
[ 2300.624666] ? eth_gro_receive (net/ethernet/eth.c:157)
[ 2300.637374] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
arch/x86/kernel/tsc.c:1066)
[ 2300.650015] ? ktime_get_with_offset (kernel/time/timekeeping.c:267
kernel/time/timekeeping.c:371 kernel/time/timekeeping.c:799)
[ 2300.662708] ? __build_skb (include/linux/compiler.h:214
arch/x86/include/asm/atomic.h:43
include/asm-generic/atomic-instrumented.h:34 net/core/skbuff.c:300)
[ 2300.674529] netif_receive_skb_internal (net/core/dev.c:5097)
[ 2300.687430] ? dev_cpu_dead (net/core/dev.c:5097)
[ 2300.699351] ? efx_rx_mk_skb+0x5d0/0x1210 sfc]
[ 2300.711999] ? efx_time_sync_event+0x1b0/0x1b0 sfc]
[ 2300.725126] efx_rx_deliver+0x447/0x640 sfc]
[ 2300.737697] ? efx_free_rx_buffers+0x180/0x180 sfc]
[ 2300.750803] ? __efx_rx_packet+0x76e/0x23b0 sfc]
[ 2300.763572] ? efx_ssr+0x19c0/0x19c0 sfc]
[ 2300.775502] ? efx_ef10_ptp_set_ts_config+0x120/0x120 sfc]
[ 2300.788713] ? reweight_entity (kernel/sched/fair.c:2762
kernel/sched/fair.c:2830)
[ 2300.800224] ? efx_poll+0x991/0x12b0 sfc]
[ 2300.811467] ? net_rx_action (arch/x86/include/asm/jump_label.h:36
include/linux/jump_label.h:142 include/trace/events/napi.h:14
net/core/dev.c:6263 net/core/dev.c:6328)
[ 2300.822343] ? napi_complete_done (net/core/dev.c:6306)
[ 2300.833468] ? hrtimer_init (kernel/time/hrtimer.c:1430)
[ 2300.843830] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
arch/x86/kernel/tsc.c:1066)
[ 2300.854377] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:194
include/asm-generic/atomic-instrumented.h:58
include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:180
include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:144)
[ 2300.864214] ? handle_irq_event (kernel/irq/handle.c:209)
[ 2300.874106] ? __do_softirq (arch/x86/include/asm/jump_label.h:36
include/linux/jump_label.h:142 include/trace/events/irq.h:142
kernel/softirq.c:293)
[ 2300.883609] ? handle_irq (arch/x86/kernel/irq_64.c:79)
[ 2300.892849] ? irq_exit (kernel/softirq.c:372 kernel/softirq.c:412)
[ 2300.901709] ? do_IRQ (arch/x86/include/asm/irq_regs.h:19
arch/x86/include/asm/irq_regs.h:26 arch/x86/kernel/irq.c:260)
[ 2300.910059] ? common_interrupt (arch/x86/entry/entry_64.S:646)
[ 2300.918862] </IRQ>
[ 2300.925956] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:251)
[ 2300.935470] ? do_idle (kernel/sched/idle.c:204 kernel/sched/idle.c:262)
[ 2300.943904] ? arch_cpu_idle_exit (??:?)
[ 2300.953108] ? cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
[ 2300.962229] ? cpu_in_idle (kernel/sched/idle.c:349)
[ 2300.970788] ? clockevents_config.part.12 (kernel/time/clockevents.c:503)
[ 2300.980788] ? start_secondary (arch/x86/kernel/smpboot.c:213)
[ 2300.989915] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:213)
[ 2300.999569] ? secondary_startup_64 (arch/x86/kernel/head_64.S:243)
[ 2301.008969]
[ 2301.015480] Allocated by task 0:
[ 2301.023718] kasan_kmalloc (mm/kasan/kasan.c:460 mm/kasan/kasan.c:553)
[ 2301.032340] kmem_cache_alloc (arch/x86/include/asm/jump_label.h:36
include/linux/memcontrol.h:1292 mm/slab.h:447 mm/slub.c:2706
mm/slub.c:2714 mm/slub.c:2719)
[ 2301.041269] __build_skb (net/core/skbuff.c:282 (discriminator 4))
[ 2301.049724] __netdev_alloc_skb (net/core/skbuff.c:423)
[ 2301.058898] efx_rx_mk_skb+0x10e/0x1210 sfc]
[ 2301.068239]
[ 2301.074615] Freed by task 0:
[ 2301.082411] __kasan_slab_free (mm/kasan/kasan.c:522)
[ 2301.091429] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
[ 2301.100160] ip_defrag (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
[ 2301.108518] ipv4_conntrack_defrag+0x323/0x490 nf_defrag_ipv4]
[ 2301.119408] nf_hook_slow (net/netfilter/core.c:512)
[ 2301.127942] ip_rcv (include/linux/netfilter.h:288 net/ipv4/ip_input.c:524)
[ 2301.135977] __netif_receive_skb_one_core (net/core/dev.c:4911)
[ 2301.145905] netif_receive_skb_internal (net/core/dev.c:5097)
[ 2301.155687] efx_rx_deliver+0x447/0x640 sfc]
[ 2301.164986]
[ 2301.171326] The buggy address belongs to the object at ffff888bd8f543c0
[ 2301.171326] which belongs to the cache skbuff_head_cache of size 232
[ 2301.194483] The buggy address is located 0 bytes inside of
[ 2301.194483] 232-byte region [ffff888bd8f543c0, ffff888bd8f544a8)
[ 2301.216346] The buggy address belongs to the page:
[ 2301.226355] page:ffffea002f63d500 count:1 mapcount:0
mapping:ffff88a03c294540 index:0xffff888bd8f561c0 compound_mapcount: 0
[ 2301.243024] flags: 0x2ffff800008100(slab|head)
[ 2301.253041] raw: 002ffff800008100 ffffea002341d300 0000002d00000002
ffff88a03c294540
[ 2301.266600] raw: ffff888bd8f561c0 0000000080330030 00000001ffffffff
0000000000000000
[ 2301.280190] page dumped because: kasan: bad access detected
[ 2301.291627]
[ 2301.298900] Memory state around the buggy address:
[ 2301.309617] ffff888bd8f54280: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 2301.322930] ffff888bd8f54300: fb fb fb fb fb fb fb fb fb fb fb fb
fb fc fc fc
[ 2301.336183] >ffff888bd8f54380: fc fc fc fc fc fc fc fc fb fb fb fb
fb fb fb fb
[ 2301.349449] ^
[ 2301.360817] ffff888bd8f54400: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 2301.374248] ffff888bd8f54480: fb fb fb fb fb fc fc fc fc fc fc fc
fc fc fc fc
[ 2301.387663] ==================================================================
[ 2301.401334] ==================================================================
[ 2301.414780] BUG: KASAN: double-free or invalid-free in tcp_v4_rcv
(net/ipv4/tcp_ipv4.c:1693)
[ 2301.428222]
[ 2301.435965] CPU: 28 PID: 0 Comm: swapper/28 Tainted: G B O
4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
[ 2301.453552] Hardware name: Quanta Computer Inc. QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 2301.469737] Call Trace:
[ 2301.478962] <IRQ>
[ 2301.487699] dump_stack (lib/dump_stack.c:115)
[ 2301.497768] print_address_description (mm/kasan/report.c:257)
[ 2301.509256] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
[ 2301.519681] kasan_report_invalid_free (mm/kasan/report.c:337)
[ 2301.531138] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
[ 2301.541628] __kasan_slab_free (mm/kasan/kasan.c:502)
[ 2301.552571] ? tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
[ 2301.563087] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
[ 2301.573831] tcp_v4_rcv (net/ipv4/tcp_ipv4.c:1693)
[ 2301.584110] ? icmp_checkentry+0x70/0x70 ip_tables]
[ 2301.595966] ? tcp_v4_early_demux (net/ipv4/tcp_ipv4.c:1693)
[ 2301.607224] ip_local_deliver_finish (net/ipv4/ip_input.c:216)
[ 2301.618764] ip_local_deliver (net/ipv4/ip_input.c:245)
[ 2301.629636] ? ip_call_ra_chain (net/ipv4/ip_input.c:245)
[ 2301.640683] ? ip_sublist_rcv (net/ipv4/ip_input.c:192)
[ 2301.651493] ? ip_local_deliver (net/ipv4/ip_input.c:518)
[ 2301.662419] ip_rcv (net/ipv4/ip_input.c:518)
[ 2301.672198] ? ip_local_deliver (net/ipv4/ip_input.c:518)
[ 2301.683164] ? ip_rcv_core.isra.20 (net/ipv4/ip_input.c:403)
[ 2301.694340] __netif_receive_skb_one_core (net/core/dev.c:4911)
[ 2301.694344] ? __netif_receive_skb_core (net/core/dev.c:4911)
[ 2301.694361] ? eth_gro_receive (net/ethernet/eth.c:157)
[ 2301.694369] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
arch/x86/kernel/tsc.c:1066)
[ 2301.694375] ? ktime_get_with_offset (kernel/time/timekeeping.c:267
kernel/time/timekeeping.c:371 kernel/time/timekeeping.c:799)
[ 2301.694385] ? __build_skb (include/linux/compiler.h:214
arch/x86/include/asm/atomic.h:43
include/asm-generic/atomic-instrumented.h:34 net/core/skbuff.c:300)
[ 2301.760745] netif_receive_skb_internal (net/core/dev.c:5097)
[ 2301.760750] ? dev_cpu_dead (net/core/dev.c:5097)
[ 2301.760786] ? efx_rx_mk_skb+0x5d0/0x1210 sfc]
[ 2301.760808] ? efx_time_sync_event+0x1b0/0x1b0 sfc]
[ 2301.760831] efx_rx_deliver+0x447/0x640 sfc]
[ 2301.760851] ? efx_free_rx_buffers+0x180/0x180 sfc]
[ 2301.760872] ? __efx_rx_packet+0x76e/0x23b0 sfc]
[ 2301.835110] ? efx_ssr+0x19c0/0x19c0 sfc]
[ 2301.835142] ? efx_ef10_ptp_set_ts_config+0x120/0x120 sfc]
[ 2301.835152] ? reweight_entity (kernel/sched/fair.c:2762
kernel/sched/fair.c:2830)
[ 2301.835186] ? efx_poll+0x991/0x12b0 sfc]
[ 2301.876013] ? net_rx_action (arch/x86/include/asm/jump_label.h:36
include/linux/jump_label.h:142 include/trace/events/napi.h:14
net/core/dev.c:6263 net/core/dev.c:6328)
[ 2301.876019] ? napi_complete_done (net/core/dev.c:6306)
[ 2301.895619] ? hrtimer_init (kernel/time/hrtimer.c:1430)
[ 2301.895630] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
arch/x86/kernel/tsc.c:1066)
[ 2301.914880] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:194
include/asm-generic/atomic-instrumented.h:58
include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:180
include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:144)
[ 2301.914887] ? handle_irq_event (kernel/irq/handle.c:209)
[ 2301.914895] ? __do_softirq (arch/x86/include/asm/jump_label.h:36
include/linux/jump_label.h:142 include/trace/events/irq.h:142
kernel/softirq.c:293)
[ 2301.943072] ? handle_irq (arch/x86/kernel/irq_64.c:79)
[ 2301.943085] ? irq_exit (kernel/softirq.c:372 kernel/softirq.c:412)
[ 2301.960340] ? do_IRQ (arch/x86/include/asm/irq_regs.h:19
arch/x86/include/asm/irq_regs.h:26 arch/x86/kernel/irq.c:260)
[ 2301.960346] ? common_interrupt (arch/x86/entry/entry_64.S:646)
[ 2301.960348] </IRQ>
[ 2301.960359] ? cpuidle_enter_state (drivers/cpuidle/cpuidle.c:251)
[ 2301.960380] ? do_idle (kernel/sched/idle.c:204 kernel/sched/idle.c:262)
[ 2301.960383] ? arch_cpu_idle_exit (??:?)
[ 2301.960389] ? cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
[ 2301.960392] ? cpu_in_idle (kernel/sched/idle.c:349)
[ 2301.960413] ? clockevents_config.part.12 (kernel/time/clockevents.c:503)
[ 2301.960420] ? start_secondary (arch/x86/kernel/smpboot.c:213)
[ 2301.960423] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:213)
[ 2301.960430] ? secondary_startup_64 (arch/x86/kernel/head_64.S:243)
[ 2301.960435]
[ 2302.070728] Allocated by task 0:
[ 2302.070739] kasan_kmalloc (mm/kasan/kasan.c:460 mm/kasan/kasan.c:553)
[ 2302.070764] kmem_cache_alloc (arch/x86/include/asm/jump_label.h:36
include/linux/memcontrol.h:1292 mm/slab.h:447 mm/slub.c:2706
mm/slub.c:2714 mm/slub.c:2719)
[ 2302.095562] __build_skb (net/core/skbuff.c:282 (discriminator 4))
[ 2302.095565] __netdev_alloc_skb (net/core/skbuff.c:423)
[ 2302.095604] efx_rx_mk_skb+0x10e/0x1210 sfc]
[ 2302.095611]
[ 2302.127968] Freed by task 0:
[ 2302.127983] __kasan_slab_free (mm/kasan/kasan.c:522)
[ 2302.127993] kmem_cache_free (mm/slub.c:1398 mm/slub.c:2953 mm/slub.c:2969)
[ 2302.152762] ip_defrag (net/ipv4/ip_fragment.c:507 net/ipv4/ip_fragment.c:699)
[ 2302.152768] ipv4_conntrack_defrag+0x323/0x490 nf_defrag_ipv4]
[ 2302.152771] nf_hook_slow (net/netfilter/core.c:512)
[ 2302.152775] ip_rcv (include/linux/netfilter.h:288 net/ipv4/ip_input.c:524)
[ 2302.152779] __netif_receive_skb_one_core (net/core/dev.c:4911)
[ 2302.152782] netif_receive_skb_internal (net/core/dev.c:5097)
[ 2302.152808] efx_rx_deliver+0x447/0x640 sfc]
[ 2302.152810]
[ 2302.152813] The buggy address belongs to the object at ffff888bd8f543c0
[ 2302.152813] which belongs to the cache skbuff_head_cache of size 232
[ 2302.152815] The buggy address is located 0 bytes inside of
[ 2302.152815] 232-byte region [ffff888bd8f543c0, ffff888bd8f544a8)
[ 2302.152816] The buggy address belongs to the page:
[ 2302.152819] page:ffffea002f63d500 count:1 mapcount:0
mapping:ffff88a03c294540 index:0xffff888bd8f561c0 compound_mapcount: 0
[ 2302.152822] flags: 0x2ffff800008100(slab|head)
[ 2302.152827] raw: 002ffff800008100 ffffea002341d300 0000002d00000002
ffff88a03c294540
[ 2302.152829] raw: ffff888bd8f561c0 0000000080330030 00000001ffffffff
0000000000000000
[ 2302.152830] page dumped because: kasan: bad access detected
[ 2302.152830]
[ 2302.152831] Memory state around the buggy address:
[ 2302.152833] ffff888bd8f54280: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 2302.152835] ffff888bd8f54300: fb fb fb fb fb fb fb fb fb fb fb fb
fb fc fc fc
[ 2302.152836] >ffff888bd8f54380: fc fc fc fc fc fc fc fc fb fb fb fb
fb fb fb fb
[ 2302.152837] ^
[ 2302.152839] ffff888bd8f54400: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 2302.152840] ffff888bd8f54480: fb fb fb fb fb fc fc fc fc fc fc fc
fc fc fc fc
[ 2302.152841] ==================================================================
[ 2302.187379] BUG: Bad page state in process nginx-origin pfn:28b7f8
[ 2302.462537] page:ffffea000a2dfe00 count:-1 mapcount:0
mapping:0000000000000000 index:0x0
[ 2302.462542] flags: 0x2ffff800000000()
[ 2302.462549] raw: 002ffff800000000 dead000000000100 dead000000000200
0000000000000000
[ 2302.462553] raw: 0000000000000000 0000000000000000 ffffffffffffffff
0000000000000000
[ 2302.462554] page dumped because: nonzero _count
[ 2302.462555] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
xt_hashlimit iptable_security cls_flow cls_u32 sch_htb sch_fq md_mod
dm_crypt algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc sb_edac
x86_pkg_temp_thermal kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul
crc32c_intel sfc(O) pcbc aesni_intel aes_x86_64
[ 2302.650012] crypto_simd igb cryptd i2c_algo_bit glue_helper mdio
dca ipmi_si ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
[ 2302.650031] CPU: 1 PID: 74997 Comm: nginx-origin Tainted: G B
O 4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
[ 2302.650033] Hardware name: Quanta Computer Inc. QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 2302.650035] Call Trace:
[ 2302.650049] dump_stack (lib/dump_stack.c:115)
[ 2302.650062] bad_page.cold.116 (mm/page_alloc.c:542)
[ 2302.755115] ? si_mem_available (mm/page_alloc.c:507)
[ 2302.755119] ? ksys_write (fs/read_write.c:599)
[ 2302.755126] ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
[ 2302.755130] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2302.755135] get_page_from_freelist (mm/page_alloc.c:2997
mm/page_alloc.c:3342)
[ 2302.755140] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2302.755144] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2302.755153] ? kasan_unpoison_shadow (mm/kasan/kasan.c:71)
[ 2302.861765] ? __isolate_free_page (mm/page_alloc.c:3252)
[ 2302.861769] ? __kmalloc_node_track_caller (mm/slab.h:448
mm/slub.c:2706 mm/slub.c:4320)
[ 2302.861775] ? __alloc_skb (net/core/skbuff.c:206)
[ 2302.861783] __alloc_pages_nodemask (mm/page_alloc.c:4369)
[ 2302.915129] ? __alloc_pages_slowpath (mm/page_alloc.c:4345)
[ 2302.915135] skb_page_frag_refill (net/core/sock.c:2213)
[ 2302.915139] sk_page_frag_refill (net/core/sock.c:2234)
[ 2302.915144] tcp_sendmsg_locked (net/ipv4/tcp.c:1321)
[ 2302.915149] ? interrupt_entry (arch/x86/entry/entry_64.S:607)
[ 2302.915153] ? kasan_unpoison_shadow (mm/kasan/kasan.c:68)
[ 2302.915160] ? tcp_sendpage (net/ipv4/tcp.c:1175)
[ 2303.003254] ? selinux_secmark_relabel_packet (security/selinux/hooks.c:4532)
[ 2303.003260] ? release_pages (mm/swap.c:716)
[ 2303.028592] ? inet_sk_set_state (net/ipv4/af_inet.c:794)
[ 2303.028596] tcp_sendmsg (net/ipv4/tcp.c:1444)
[ 2303.028603] sock_sendmsg (net/socket.c:622 net/socket.c:631)
[ 2303.028609] sock_write_iter (net/socket.c:901)
[ 2303.075968] ? sock_sendmsg (net/socket.c:884)
[ 2303.075978] __vfs_write (fs/read_write.c:475 fs/read_write.c:487)
[ 2303.075986] ? __handle_mm_fault (mm/memory.c:3211 mm/memory.c:4030
mm/memory.c:4156)
[ 2303.111370] ? kernel_read (fs/read_write.c:483)
[ 2303.111375] ? file_has_perm (security/selinux/hooks.c:1919)
[ 2303.111379] ? bpf_fd_pass (security/selinux/hooks.c:1890)
[ 2303.111386] vfs_write (fs/read_write.c:550)
[ 2303.111389] ksys_write (fs/read_write.c:599)
[ 2303.111394] ? __ia32_sys_read (fs/read_write.c:592)
[ 2303.111401] do_syscall_64 (arch/x86/entry/common.c:290)
[ 2303.188508] ? page_fault (arch/x86/entry/entry_64.S:1161)
[ 2303.188513] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
[ 2303.188517] RIP: 0033:0x7f53e469f190
[ 2303.188521] Code: 2e 0f 1f 84 00 00 00 00 00 90 48 8b 05 39 7e 20
00 c3 0f 1f 84 00 00 00 00 00 83 3d 39 c2 20 00 00 75 10 b8 01 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff ff 48 89
04 24
All code
========
0: 2e 0f 1f 84 00 00 00 nopl %cs:0x0(%rax,%rax,1)
7: 00 00
9: 90 nop
a: 48 8b 05 39 7e 20 00 mov 0x207e39(%rip),%rax # 0x207e4a
11: c3 retq
12: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
19: 00
1a: 83 3d 39 c2 20 00 00 cmpl $0x0,0x20c239(%rip) # 0x20c25a
21: 75 10 jne 0x33
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
trapping instruction
30: 73 31 jae 0x63
32: c3 retq
33: 48 83 ec 08 sub $0x8,%rsp
37: e8 ae fc ff ff callq 0xfffffffffffffcea
3c: 48 89 04 24 mov %rax,(%rsp)
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 31 jae 0x39
8: c3 retq
9: 48 83 ec 08 sub $0x8,%rsp
d: e8 ae fc ff ff callq 0xfffffffffffffcc0
12: 48 89 04 24 mov %rax,(%rsp)
[ 2303.188523] RSP: 002b:00007ffcc6a0c118 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 2303.188528] RAX: ffffffffffffffda RBX: 00005562df6160b3 RCX: 00007f53e469f190
[ 2303.188531] RDX: 000000000000401d RSI: 00005562df6160b3 RDI: 0000000000000d4f
[ 2303.188533] RBP: 00007ffcc6a0c150 R08: 0000000000000005 R09: 0000000060640d3e
[ 2303.188535] R10: 00005562d20f7b10 R11: 0000000000000246 R12: 000000000000401d
[ 2303.188541] R13: 000000000000401d R14: 00007ffcc6a0c3a8 R15: 00005562dc0e6ec8
[ 2303.407074] WARNING: CPU: 21 PID: 74997 at lib/iov_iter.c:825
copy_page_to_iter (lib/iov_iter.c:825 lib/iov_iter.c:832)
[ 2303.420983] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
xt_hashlimit iptable_security cls_flow cls_u32 sch_htb sch_fq md_mod
dm_crypt algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc sb_edac
x86_pkg_temp_thermal kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul
crc32c_intel sfc(O) pcbc aesni_intel aes_x86_64
[ 2303.538009] crypto_simd igb cryptd i2c_algo_bit glue_helper mdio
dca ipmi_si ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
[ 2303.538034] CPU: 21 PID: 74997 Comm: nginx-origin Tainted: G B
O 4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
[ 2303.538037] Hardware name: Quanta Computer Inc. QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 2303.538050] RIP: 0010:copy_page_to_iter (??:?)
[ 2303.538055] Code: 07 00 00 4d 85 f6 4c 89 54 24 10 4d 8b 6f 18 4c
89 44 24 08 74 0c 4c 89 ff e8 65 43 ff ff 84 c0 75 12 45 31 f6 e9 d9
fe ff ff <0f> 0b 45 31 f6 e9 cf fe ff ff 49 8d 6f 08 4c 8b 44 24 08 48
b8 00
All code
========
0: 07 (bad)
1: 00 00 add %al,(%rax)
3: 4d 85 f6 test %r14,%r14
6: 4c 89 54 24 10 mov %r10,0x10(%rsp)
b: 4d 8b 6f 18 mov 0x18(%r15),%r13
f: 4c 89 44 24 08 mov %r8,0x8(%rsp)
14: 74 0c je 0x22
16: 4c 89 ff mov %r15,%rdi
19: e8 65 43 ff ff callq 0xffffffffffff4383
1e: 84 c0 test %al,%al
20: 75 12 jne 0x34
22: 45 31 f6 xor %r14d,%r14d
25: e9 d9 fe ff ff jmpq 0xffffffffffffff03
2a:* 0f 0b ud2 <-- trapping instruction
2c: 45 31 f6 xor %r14d,%r14d
2f: e9 cf fe ff ff jmpq 0xffffffffffffff03
34: 49 8d 6f 08 lea 0x8(%r15),%rbp
38: 4c 8b 44 24 08 mov 0x8(%rsp),%r8
3d: 48 rex.W
3e: b8 .byte 0xb8
...
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 45 31 f6 xor %r14d,%r14d
5: e9 cf fe ff ff jmpq 0xfffffffffffffed9
a: 49 8d 6f 08 lea 0x8(%r15),%rbp
e: 4c 8b 44 24 08 mov 0x8(%rsp),%r8
13: 48 rex.W
14: b8 .byte 0xb8
...
[ 2303.538057] RSP: 0018:ffff88a005e0f7c0 EFLAGS: 00010293
[ 2303.538061] RAX: 0000000000001000 RBX: 000000000000168d RCX: 002ffff800000000
[ 2303.538064] RDX: ffffffffa66bdcb0 RSI: ffffffffa66bdca0 RDI: ffffea000a2dfe00
[ 2303.538066] RBP: 0000000000000005 R08: ffffea000a2dfe00 R09: dffffc0000000000
[ 2303.538069] R10: 0000000000001688 R11: 0000000000000004 R12: ffffea000a2dfe08
[ 2303.538071] R13: ffffea000a2dfe00 R14: ffffea0000000000 R15: ffff88a005e0fc40
[ 2303.538075] FS: 00007f53e4ac0740(0000) GS:ffff888c3f4c0000(0000)
knlGS:0000000000000000
[ 2303.538077] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2303.538079] CR2: 00005562d36cc000 CR3: 0000002015486001 CR4: 00000000003606e0
[ 2303.538081] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2303.538083] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2303.538085] Call Trace:
[ 2303.538099] skb_copy_datagram_iter (net/core/datagram.c:453)
[ 2303.538108] tcp_recvmsg (net/ipv4/tcp.c:2104)
[ 2303.538115] ? tcp_get_md5sig_pool (net/ipv4/tcp.c:1917)
[ 2303.538119] ? tcp_poll (include/net/sock.h:1204
include/net/sock.h:1210 net/ipv4/tcp.c:569)
[ 2303.538123] ? tcp_splice_read (net/ipv4/tcp.c:504)
[ 2303.538131] ? bad_area_access_error (arch/x86/mm/fault.c:1213)
[ 2303.538134] ? tcp_splice_read (net/ipv4/tcp.c:504)
[ 2303.538144] ? ep_item_poll.isra.20 (fs/eventpoll.c:892)
[ 2303.538151] ? selinux_secmark_relabel_packet (security/selinux/hooks.c:4532)
[ 2303.538159] inet_recvmsg (net/ipv4/af_inet.c:838)
[ 2303.538164] ? inet_sendpage (net/ipv4/af_inet.c:828)
[ 2303.538172] sock_read_iter (net/socket.c:879)
[ 2303.538177] ? sock_recvmsg (net/socket.c:862)
[ 2303.538187] __vfs_read (fs/read_write.c:407 fs/read_write.c:418)
[ 2303.538193] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2303.538197] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2303.538202] ? __x64_sys_copy_file_range (fs/read_write.c:414)
[ 2303.538208] ? file_has_perm (security/selinux/hooks.c:1919)
[ 2303.538216] vfs_read (fs/read_write.c:453)
[ 2303.538221] ksys_read (fs/read_write.c:579)
[ 2303.538225] ? kernel_write (fs/read_write.c:572)
[ 2303.538232] do_syscall_64 (arch/x86/entry/common.c:290)
[ 2303.538236] ? prepare_exit_to_usermode (arch/x86/entry/common.c:197)
[ 2303.538240] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
[ 2303.538245] RIP: 0033:0x7f53e469f1f0
[ 2303.538249] Code: 73 01 c3 48 8b 0d b8 7d 20 00 f7 d8 64 89 01 48
83 c8 ff c3 66 0f 1f 44 00 00 83 3d d9 c1 20 00 00 75 10 b8 00 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 4e fc ff ff 48 89
04 24
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d b8 7d 20 00 mov 0x207db8(%rip),%rcx # 0x207dc2
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
1a: 83 3d d9 c1 20 00 00 cmpl $0x0,0x20c1d9(%rip) # 0x20c1fa
21: 75 10 jne 0x33
23: b8 00 00 00 00 mov $0x0,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
trapping instruction
30: 73 31 jae 0x63
32: c3 retq
33: 48 83 ec 08 sub $0x8,%rsp
37: e8 4e fc ff ff callq 0xfffffffffffffc8a
3c: 48 89 04 24 mov %rax,(%rsp)
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 31 jae 0x39
8: c3 retq
9: 48 83 ec 08 sub $0x8,%rsp
d: e8 4e fc ff ff callq 0xfffffffffffffc60
12: 48 89 04 24 mov %rax,(%rsp)
[ 2303.538251] RSP: 002b:00007ffcc6a0c188 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 2303.538254] RAX: ffffffffffffffda RBX: 00005562d5f89883 RCX: 00007f53e469f1f0
[ 2303.538256] RDX: 0000000000000005 RSI: 00005562d5f89883 RDI: 0000000000000dfb
[ 2303.538258] RBP: 00007ffcc6a0c1c0 R08: 0000000000000032 R09: 0000000000000020
[ 2303.538260] R10: 00005562d20944de R11: 0000000000000246 R12: 0000000000000005
[ 2303.538262] R13: 00005562dbb17f60 R14: 00005562d2570e80 R15: 00007f53c5866d98
[ 2303.538268] ---[ end trace d791391e77eef582 ]---
[ 2330.200708] kasan: CONFIG_KASAN_INLINE enabled
[ 2330.211020] kasan: GPF could be caused by NULL-ptr deref or user
memory access
[ 2330.224169] general protection fault: 0000 [#1] SMP KASAN PTI
[ 2330.235791] CPU: 28 PID: 69371 Comm: nginx-fl Tainted: G B W
O 4.19.18-cloudflare-2019.1.8-1-gcabf55c #gcabf55c
[ 2330.253036] Hardware name: Quanta Computer Inc. QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 2330.268679] RIP: 0010:rb_replace_node (??:?)
[ 2330.279645] Code: 55 48 89 f5 53 48 89 fb 48 83 ec 08 80 3c 01 00
0f 85 64 02 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 e8 4c 8b 23 48
c1 e8 03 <0f> b6 34 08 48 8d 45 17 48 89 c7 83 e0 07 48 c1 ef 03 49 83
e4 fc
All code
========
0: 55 push %rbp
1: 48 89 f5 mov %rsi,%rbp
4: 53 push %rbx
5: 48 89 fb mov %rdi,%rbx
8: 48 83 ec 08 sub $0x8,%rsp
c: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
10: 0f 85 64 02 00 00 jne 0x27a
16: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
1d: fc ff df
20: 48 89 e8 mov %rbp,%rax
23: 4c 8b 23 mov (%rbx),%r12
26: 48 c1 e8 03 shr $0x3,%rax
2a:* 0f b6 34 08 movzbl (%rax,%rcx,1),%esi <-- trapping instruction
2e: 48 8d 45 17 lea 0x17(%rbp),%rax
32: 48 89 c7 mov %rax,%rdi
35: 83 e0 07 and $0x7,%eax
38: 48 c1 ef 03 shr $0x3,%rdi
3c: 49 83 e4 fc and $0xfffffffffffffffc,%r12
Code starting with the faulting instruction
===========================================
0: 0f b6 34 08 movzbl (%rax,%rcx,1),%esi
4: 48 8d 45 17 lea 0x17(%rbp),%rax
8: 48 89 c7 mov %rax,%rdi
b: 83 e0 07 and $0x7,%eax
e: 48 c1 ef 03 shr $0x3,%rdi
12: 49 83 e4 fc and $0xfffffffffffffffc,%r12
[ 2330.311757] RSP: 0018:ffff888c3f687d88 EFLAGS: 00010206
[ 2330.323631] RAX: 0000000000000003 RBX: ffff888c081fc000 RCX: dffffc0000000000
[ 2330.323634] RDX: ffff888c0a5c38e0 RSI: 000000000000001a RDI: ffff888c081fc000
[ 2330.323636] RBP: 000000000000001a R08: fffffbfff4d88d09 R09: fffffbfff4d88d08
[ 2330.323639] R10: fffffbfff4d88d08 R11: ffffffffa6c46847 R12: 0000000030747865
[ 2330.323641] R13: ffff888c0a5c3910 R14: ffff888c0a5c3870 R15: ffff888c0a5c38e0
[ 2330.323644] FS: 00007f3375a30780(0000) GS:ffff888c3f680000(0000)
knlGS:0000000000000000
[ 2330.323647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2330.323649] CR2: 00007f19d3da5000 CR3: 0000000bee77a001 CR4: 00000000003606e0
[ 2330.323651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2330.323653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2330.323655] Call Trace:
[ 2330.323658] <IRQ>
[ 2330.323673] ip_expire (net/ipv4/ip_fragment.c:223)
[ 2330.323680] ? ip_check_defrag (net/ipv4/ip_fragment.c:187)
[ 2330.323686] call_timer_fn (arch/x86/include/asm/jump_label.h:36
include/linux/jump_label.h:142 include/trace/events/timer.h:121
kernel/time/timer.c:1327)
[ 2330.323691] run_timer_softirq (kernel/time/timer.c:1364
kernel/time/timer.c:1682 kernel/time/timer.c:1695)
[ 2330.323695] ? add_timer (kernel/time/timer.c:1692)
[ 2330.323699] ? hrtimer_init (kernel/time/hrtimer.c:1430)
[ 2330.323705] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
arch/x86/kernel/tsc.c:1066)
[ 2330.323709] ? recalibrate_cpu_khz (arch/x86/kernel/tsc.c:1066
arch/x86/kernel/tsc.c:1066)
[ 2330.323713] ? ktime_get (kernel/time/timekeeping.c:267
kernel/time/timekeeping.c:371 kernel/time/timekeeping.c:756)
[ 2330.323720] ? lapic_timer_set_oneshot (arch/x86/kernel/apic/apic.c:467)
[ 2330.323727] ? clockevents_program_event (kernel/time/clockevents.c:346)
[ 2330.323733] __do_softirq (arch/x86/include/asm/jump_label.h:36
include/linux/jump_label.h:142 include/trace/events/irq.h:142
kernel/softirq.c:293)
[ 2330.323741] irq_exit (kernel/softirq.c:372 kernel/softirq.c:412)
[ 2330.323744] smp_apic_timer_interrupt
(arch/x86/include/asm/irq_regs.h:19 arch/x86/include/asm/irq_regs.h:26
arch/x86/kernel/apic/apic.c:1058)
[ 2330.323751] apic_timer_interrupt (arch/x86/entry/entry_64.S:864)
[ 2330.323753] </IRQ>
[ 2330.323760] RIP: 0010:check_memory_region (??:?)
[ 2330.323765] Code: ff 41 54 49 b9 00 00 00 00 00 fc ff df 4d 89 da
55 49 c1 ea 03 53 48 89 fb 4d 01 ca 48 c1 eb 03 49 8d 6a 01 49 01 d9
49 89 e8 <4c> 89 c8 4d 29 c8 49 83 f8 10 0f 8e 98 00 00 00 44 89 cb 83
e3 07
All code
========
0: ff 41 54 incl 0x54(%rcx)
3: 49 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%r9
a: fc ff df
d: 4d 89 da mov %r11,%r10
10: 55 push %rbp
11: 49 c1 ea 03 shr $0x3,%r10
15: 53 push %rbx
16: 48 89 fb mov %rdi,%rbx
19: 4d 01 ca add %r9,%r10
1c: 48 c1 eb 03 shr $0x3,%rbx
20: 49 8d 6a 01 lea 0x1(%r10),%rbp
24: 49 01 d9 add %rbx,%r9
27: 49 89 e8 mov %rbp,%r8
2a:* 4c 89 c8 mov %r9,%rax <-- trapping instruction
2d: 4d 29 c8 sub %r9,%r8
30: 49 83 f8 10 cmp $0x10,%r8
34: 0f 8e 98 00 00 00 jle 0xd2
3a: 44 89 cb mov %r9d,%ebx
3d: 83 e3 07 and $0x7,%ebx
Code starting with the faulting instruction
===========================================
0: 4c 89 c8 mov %r9,%rax
3: 4d 29 c8 sub %r9,%r8
6: 49 83 f8 10 cmp $0x10,%r8
a: 0f 8e 98 00 00 00 jle 0xa8
10: 44 89 cb mov %r9d,%ebx
13: 83 e3 07 and $0x7,%ebx
[ 2330.323767] RSP: 0018:ffff888bcb66f830 EFLAGS: 00000286 ORIG_RAX:
ffffffffffffff13
[ 2330.323771] RAX: ffff7fffffffffff RBX: 1ffffd400601a58e RCX: ffffffffa5591192
[ 2330.323772] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffea00300d2c74
[ 2330.323775] RBP: fffff9400601a58f R08: fffff9400601a58f R09: fffff9400601a58e
[ 2330.323777] R10: fffff9400601a58e R11: ffffea00300d2c77 R12: dffffc0000000000
[ 2330.323779] R13: ffff888bf01d0500 R14: ffff88826902a7c0 R15: ffffea00300d2c40
[ 2330.323787] ? skb_release_data (arch/x86/include/asm/atomic.h:125
(discriminator 3) include/asm-generic/atomic-instrumented.h:260
(discriminator 3) include/linux/page_ref.h:139 (discriminator 3)
include/linux/mm.h:520 (discriminator 3) include/linux/mm.h:942
(discriminator 3) include/linux/skbuff.h:2795 (discriminator 3)
net/core/skbuff.c:564 (discriminator 3))
[ 2330.323793] skb_release_data (arch/x86/include/asm/atomic.h:125
(discriminator 3) include/asm-generic/atomic-instrumented.h:260
(discriminator 3) include/linux/page_ref.h:139 (discriminator 3)
include/linux/mm.h:520 (discriminator 3) include/linux/mm.h:942
(discriminator 3) include/linux/skbuff.h:2795 (discriminator 3)
net/core/skbuff.c:564 (discriminator 3))
[ 2330.323798] __kfree_skb (net/core/skbuff.c:642)
[ 2330.323804] tcp_recvmsg (include/net/sock.h:2405 net/ipv4/tcp.c:2134)
[ 2330.323808] ? sock_def_readable (arch/x86/include/asm/bitops.h:328
include/net/sock.h:828 include/net/sock.h:2181 net/core/sock.c:2698)
[ 2330.323814] ? tcp_get_md5sig_pool (net/ipv4/tcp.c:1917)
[ 2330.323817] ? tcp_poll (include/net/sock.h:1204
include/net/sock.h:1210 net/ipv4/tcp.c:569)
[ 2330.323825] ? unix_stream_sendpage (net/unix/af_unix.c:1829)
[ 2330.323831] ? sock_sendmsg (net/socket.c:622 net/socket.c:631)
[ 2330.323834] ? sock_write_iter (net/socket.c:901)
[ 2330.323838] ? sock_sendmsg (net/socket.c:884)
[ 2330.323846] inet_recvmsg (net/ipv4/af_inet.c:838)
[ 2330.323851] ? inet_sendpage (net/ipv4/af_inet.c:828)
[ 2330.323856] sock_read_iter (net/socket.c:879)
[ 2330.323860] ? sock_recvmsg (net/socket.c:862)
[ 2330.323870] __vfs_read (fs/read_write.c:407 fs/read_write.c:418)
[ 2330.323874] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2330.323878] ? __switch_to_asm (arch/x86/entry/entry_64.S:373)
[ 2330.323883] ? __x64_sys_copy_file_range (fs/read_write.c:414)
[ 2330.323890] ? file_has_perm (security/selinux/hooks.c:1919)
[ 2330.323898] vfs_read (fs/read_write.c:453)
[ 2330.323903] ksys_read (fs/read_write.c:579)
[ 2330.323908] ? kernel_write (fs/read_write.c:572)
[ 2330.323911] ? fput (arch/x86/include/asm/atomic64_64.h:118
include/asm-generic/atomic-instrumented.h:269
include/asm-generic/atomic-long.h:218 fs/file_table.c:331)
[ 2330.323918] do_syscall_64 (arch/x86/entry/common.c:290)
[ 2330.323921] ? prepare_exit_to_usermode (arch/x86/entry/common.c:197)
[ 2330.323926] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:247)
[ 2330.323930] RIP: 0033:0x7f337540b20d
[ 2330.323934] Code: c1 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01
f0 ff ff 73 31 c3 48 83 ec 08 e8 4e fc ff ff 48 89 04 24 b8 00 00 00
00 0f 05 <48> 8b 3c 24 48 89 c2 e8 97 fc ff ff 48 89 d0 48 83 c4 08 48
3d 01
All code
========
0: c1 20 00 shll $0x0,(%rax)
3: 00 75 10 add %dh,0x10(%rbp)
6: b8 00 00 00 00 mov $0x0,%eax
b: 0f 05 syscall
d: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
13: 73 31 jae 0x46
15: c3 retq
16: 48 83 ec 08 sub $0x8,%rsp
1a: e8 4e fc ff ff callq 0xfffffffffffffc6d
1f: 48 89 04 24 mov %rax,(%rsp)
23: b8 00 00 00 00 mov $0x0,%eax
28: 0f 05 syscall
2a:* 48 8b 3c 24 mov (%rsp),%rdi <-- trapping instruction
2e: 48 89 c2 mov %rax,%rdx
31: e8 97 fc ff ff callq 0xfffffffffffffccd
36: 48 89 d0 mov %rdx,%rax
39: 48 83 c4 08 add $0x8,%rsp
3d: 48 rex.W
3e: 3d .byte 0x3d
3f: 01 .byte 0x1
Code starting with the faulting instruction
===========================================
0: 48 8b 3c 24 mov (%rsp),%rdi
4: 48 89 c2 mov %rax,%rdx
7: e8 97 fc ff ff callq 0xfffffffffffffca3
c: 48 89 d0 mov %rdx,%rax
f: 48 83 c4 08 add $0x8,%rsp
13: 48 rex.W
14: 3d .byte 0x3d
15: 01 .byte 0x1
[ 2330.323936] RSP: 002b:00007ffe077a9510 EFLAGS: 00000293 ORIG_RAX:
0000000000000000
[ 2330.323940] RAX: ffffffffffffffda RBX: 00005640dee9dcb8 RCX: 00007f337540b20d
[ 2330.323942] RDX: 0000000000004018 RSI: 00005640dee9dcb8 RDI: 0000000000000185
[ 2330.323945] RBP: 00007ffe077a9550 R08: 00005640dd627720 R09: 0000000000004000
[ 2330.323947] R10: 0000000000000300 R11: 0000000000000293 R12: 0000000000004018
[ 2330.323949] R13: 00005640dddcb4c0 R14: 0000000000004000 R15: 00007f32435090e0
[ 2330.323954] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
xt_hashlimit iptable_security cls_flow cls_u32 sch_htb sch_fq md_mod
dm_crypt algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc sb_edac
x86_pkg_temp_thermal kvm_intel kvm ipmi_ssif irqbypass crc32_pclmul
crc32c_intel sfc(O) pcbc aesni_intel aes_x86_64
[ 2330.324038] crypto_simd igb cryptd i2c_algo_bit glue_helper mdio
dca ipmi_si ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
[ 2330.324111] ---[ end trace d791391e77eef583 ]---
[ 2330.324118] RIP: 0010:rb_replace_node (??:?)
[ 2330.324122] Code: 55 48 89 f5 53 48 89 fb 48 83 ec 08 80 3c 01 00
0f 85 64 02 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 e8 4c 8b 23 48
c1 e8 03 <0f> b6 34 08 48 8d 45 17 48 89 c7 83 e0 07 48 c1 ef 03 49 83
e4 fc
All code
========
0: 55 push %rbp
1: 48 89 f5 mov %rsi,%rbp
4: 53 push %rbx
5: 48 89 fb mov %rdi,%rbx
8: 48 83 ec 08 sub $0x8,%rsp
c: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
10: 0f 85 64 02 00 00 jne 0x27a
16: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
1d: fc ff df
20: 48 89 e8 mov %rbp,%rax
23: 4c 8b 23 mov (%rbx),%r12
26: 48 c1 e8 03 shr $0x3,%rax
2a:* 0f b6 34 08 movzbl (%rax,%rcx,1),%esi <-- trapping instruction
2e: 48 8d 45 17 lea 0x17(%rbp),%rax
32: 48 89 c7 mov %rax,%rdi
35: 83 e0 07 and $0x7,%eax
38: 48 c1 ef 03 shr $0x3,%rdi
3c: 49 83 e4 fc and $0xfffffffffffffffc,%r12
Code starting with the faulting instruction
===========================================
0: 0f b6 34 08 movzbl (%rax,%rcx,1),%esi
4: 48 8d 45 17 lea 0x17(%rbp),%rax
8: 48 89 c7 mov %rax,%rdi
b: 83 e0 07 and $0x7,%eax
e: 48 c1 ef 03 shr $0x3,%rdi
12: 49 83 e4 fc and $0xfffffffffffffffc,%r12
[ 2330.324129] RSP: 0018:ffff888c3f687d88 EFLAGS: 00010206
[ 2330.324133] RAX: 0000000000000003 RBX: ffff888c081fc000 RCX: dffffc0000000000
[ 2330.324135] RDX: ffff888c0a5c38e0 RSI: 000000000000001a RDI: ffff888c081fc000
[ 2330.324137] RBP: 000000000000001a R08: fffffbfff4d88d09 R09: fffffbfff4d88d08
[ 2330.324140] R10: fffffbfff4d88d08 R11: ffffffffa6c46847 R12: 0000000030747865
[ 2330.324142] R13: ffff888c0a5c3910 R14: ffff888c0a5c3870 R15: ffff888c0a5c38e0
[ 2330.324151] FS: 00007f3375a30780(0000) GS:ffff888c3f680000(0000)
knlGS:0000000000000000
[ 2330.324154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2330.324156] CR2: 00007f19d3da5000 CR3: 0000000bee77a001 CR4: 00000000003606e0
[ 2330.324158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2330.324161] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2330.324163] Kernel panic - not syncing: Fatal exception in interrupt
[ 2330.324214] Kernel Offset: 0x23000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
This commit from 4.19.14 seems relevant:
* https://github.com/torvalds/linux/commit/d5f9565c8d5ad3cf94982223cfcef1169b0bb60f
As a reminder, we upgraded from 4.19.13 and started seeing crashes.
^ permalink raw reply
* [PATCH net-next v3 0/8] devlink: add device (driver) information API
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
Hi!
fw_version field in ethtool -i does not suit modern needs with 31
characters being quite limiting on more complex systems. There is
also no distinction between the running and flashed versions of
the firmware.
Since the driver information pertains to the entire device, rather
than a particular netdev, it seems wise to move it do devlink, at
the same time fixing the aforementioned issues.
The new API allows exposing the device serial number and versions
of the components of the card - both hardware, firmware (running
and flashed). Driver authors can choose descriptive identifiers
for the version fields. A few version identifiers which seemed
relevant for most devices have been added to the global devlink
header.
Example:
$ devlink dev info pci/0000:05:00.0
pci/0000:05:00.0:
driver nfp
serial_number 16240145
versions:
fixed:
board.id AMDA0099-0001
board.rev 07
board.vendor SMA
board.model carbon
running:
fw.mgmt: 010156.010156.010156
fw.cpld: 0x44
fw.app: sriov-2.1.16
stored:
fw.mgmt: 010158.010158.010158
fw.cpld: 0x44
fw.app: sriov-2.1.20
Last patch also includes a compat code for ethtool. If driver
reports no fw_version via the traditional ethtool API, ethtool
can call into devlink and try to cram as many versions as possible
into the 31 characters.
v3 (Jiri):
- rename various functions and attributes;
- break out the version helpers per-type;
- make the compat code parse a dump instead of special casing
in each helper;
- move generic version defines to a separate patch.
v2:
- rebase.
this non-RFC, v3 some would say:
- add three more versions in the NFP patches;
- add last patch (ethool compat) - Andrew & Michal.
RFCv2:
- use one driver op;
- allow longer serial number;
- wrap the skb into an opaque request struct;
- add some common identifier into the devlink header.
Jakub Kicinski (8):
devlink: add device information API
devlink: add version reporting to devlink info API
devlink: add generic info version names
nfp: devlink: report driver name and serial number
nfp: devlink: report fixed versions
nfp: nsp: add support for versions command
nfp: devlink: report the running and flashed versions
ethtool: add compat for devlink info
.../networking/devlink-info-versions.rst | 38 +++
Documentation/networking/index.rst | 1 +
.../net/ethernet/netronome/nfp/nfp_devlink.c | 145 +++++++++++
.../ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 61 +++++
.../ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 20 ++
include/net/devlink.h | 72 ++++++
include/uapi/linux/devlink.h | 10 +
net/core/devlink.c | 232 ++++++++++++++++++
net/core/ethtool.c | 7 +
9 files changed, 586 insertions(+)
create mode 100644 Documentation/networking/devlink-info-versions.rst
--
2.19.2
^ permalink raw reply
* [PATCH net-next v3 1/8] devlink: add device information API
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
ethtool -i has served us well for a long time, but its showing
its limitations more and more. The device information should
also be reported per device not per-netdev.
Lay foundation for a simple devlink-based way of reading device
info. Add driver name and device serial number as initial pieces
of information exposed via this new API.
v3:
- rename helpers (Jiri);
- rename driver name attr (Jiri);
- remove double spacing in commit message (Jiri).
RFC v2:
- wrap the skb into an opaque structure (Jiri);
- allow the serial number of be any length (Jiri & Andrew);
- add driver name (Jonathan).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
include/net/devlink.h | 18 ++++++
include/uapi/linux/devlink.h | 5 ++
net/core/devlink.c | 112 +++++++++++++++++++++++++++++++++++
3 files changed, 135 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 85c9eabaf056..a6d0a530483d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -429,6 +429,7 @@ enum devlink_param_wol_types {
}
struct devlink_region;
+struct devlink_info_req;
typedef void devlink_snapshot_data_dest_t(const void *data);
@@ -484,6 +485,8 @@ struct devlink_ops {
int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 *p_encap_mode);
int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode,
struct netlink_ext_ack *extack);
+ int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
+ struct netlink_ext_ack *extack);
};
static inline void *devlink_priv(struct devlink *devlink)
@@ -607,6 +610,10 @@ u32 devlink_region_shapshot_id_get(struct devlink *devlink);
int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
u8 *data, u32 snapshot_id,
devlink_snapshot_data_dest_t *data_destructor);
+int devlink_info_serial_number_put(struct devlink_info_req *req,
+ const char *sn);
+int devlink_info_driver_name_put(struct devlink_info_req *req,
+ const char *name);
#else
@@ -905,6 +912,17 @@ devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
return 0;
}
+static inline int
+devlink_info_driver_name_put(struct devlink_info_req *req, const char *name)
+{
+ return 0;
+}
+
+static inline int
+devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn)
+{
+ return 0;
+}
#endif
#endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 61b4447a6c5b..142710d45093 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -94,6 +94,8 @@ enum devlink_command {
DEVLINK_CMD_PORT_PARAM_NEW,
DEVLINK_CMD_PORT_PARAM_DEL,
+ DEVLINK_CMD_INFO_GET, /* can dump */
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -290,6 +292,9 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_CHUNK_ADDR, /* u64 */
DEVLINK_ATTR_REGION_CHUNK_LEN, /* u64 */
+ DEVLINK_ATTR_INFO_DRIVER_NAME, /* string */
+ DEVLINK_ATTR_INFO_SERIAL_NUMBER, /* string */
+
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e6f170caf449..f456f6aa3d40 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3714,6 +3714,110 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb,
return 0;
}
+struct devlink_info_req {
+ struct sk_buff *msg;
+};
+
+int devlink_info_driver_name_put(struct devlink_info_req *req, const char *name)
+{
+ return nla_put_string(req->msg, DEVLINK_ATTR_INFO_DRIVER_NAME, name);
+}
+EXPORT_SYMBOL_GPL(devlink_info_driver_name_put);
+
+int devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn)
+{
+ return nla_put_string(req->msg, DEVLINK_ATTR_INFO_SERIAL_NUMBER, sn);
+}
+EXPORT_SYMBOL_GPL(devlink_info_serial_number_put);
+
+static int
+devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink,
+ enum devlink_command cmd, u32 portid,
+ u32 seq, int flags, struct netlink_ext_ack *extack)
+{
+ struct devlink_info_req req;
+ void *hdr;
+ int err;
+
+ hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+ if (!hdr)
+ return -EMSGSIZE;
+
+ err = -EMSGSIZE;
+ if (devlink_nl_put_handle(msg, devlink))
+ goto err_cancel_msg;
+
+ req.msg = msg;
+ err = devlink->ops->info_get(devlink, &req, extack);
+ if (err)
+ goto err_cancel_msg;
+
+ genlmsg_end(msg, hdr);
+ return 0;
+
+err_cancel_msg:
+ genlmsg_cancel(msg, hdr);
+ return err;
+}
+
+static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct sk_buff *msg;
+ int err;
+
+ if (!devlink->ops || !devlink->ops->info_get)
+ return -EOPNOTSUPP;
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET,
+ info->snd_portid, info->snd_seq, 0,
+ info->extack);
+ if (err) {
+ nlmsg_free(msg);
+ return err;
+ }
+
+ return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
+ struct netlink_callback *cb)
+{
+ struct devlink *devlink;
+ int start = cb->args[0];
+ int idx = 0;
+ int err;
+
+ mutex_lock(&devlink_mutex);
+ list_for_each_entry(devlink, &devlink_list, list) {
+ if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+ continue;
+ if (idx < start) {
+ idx++;
+ continue;
+ }
+
+ mutex_lock(&devlink->lock);
+ err = devlink_nl_info_fill(msg, devlink, DEVLINK_CMD_INFO_GET,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, NLM_F_MULTI,
+ cb->extack);
+ mutex_unlock(&devlink->lock);
+ if (err)
+ break;
+ idx++;
+ }
+ mutex_unlock(&devlink_mutex);
+
+ cb->args[0] = idx;
+ return msg->len;
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -3974,6 +4078,14 @@ static const struct genl_ops devlink_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+ {
+ .cmd = DEVLINK_CMD_INFO_GET,
+ .doit = devlink_nl_cmd_info_get_doit,
+ .dumpit = devlink_nl_cmd_info_get_dumpit,
+ .policy = devlink_nl_policy,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ /* can be retrieved by unprivileged users */
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 2/8] devlink: add version reporting to devlink info API
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
ethtool -i has a few fixed-size fields which can be used to report
firmware version and expansion ROM version. Unfortunately, modern
hardware has more firmware components. There is usually some
datapath microcode, management controller, PXE drivers, and a
CPLD load. Running ethtool -i on modern controllers reveals the
fact that vendors cram multiple values into firmware version field.
Here are some examples from systems I could lay my hands on quickly:
tg3: "FFV20.2.17 bc 5720-v1.39"
i40e: "6.01 0x800034a4 1.1747.0"
nfp: "0.0.3.5 0.25 sriov-2.1.16 nic"
Add a new devlink API to allow retrieving multiple versions, and
provide user-readable name for those versions.
While at it break down the versions into three categories:
- fixed - this is the board/fixed component version, usually vendors
report information like the board version in the PCI VPD,
but it will benefit from naming and common API as well;
- running - this is the running firmware version;
- stored - this is firmware in the flash, after firmware update
this value will reflect the flashed version, while the
running version may only be updated after reboot.
v3:
- add per-type helpers instead of using the special argument (Jiri).
RFCv2:
- remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
versions are mixed with other info attrs)l
- have the driver report versions from the same callback as
other info.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
include/net/devlink.h | 33 +++++++++++++++++++++
include/uapi/linux/devlink.h | 5 ++++
net/core/devlink.c | 57 ++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index a6d0a530483d..6dc0ef964392 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -614,6 +614,15 @@ int devlink_info_serial_number_put(struct devlink_info_req *req,
const char *sn);
int devlink_info_driver_name_put(struct devlink_info_req *req,
const char *name);
+int devlink_info_version_fixed_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value);
+int devlink_info_version_stored_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value);
+int devlink_info_version_running_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value);
#else
@@ -923,6 +932,30 @@ devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn)
{
return 0;
}
+
+static inline int
+devlink_info_version_fixed_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value)
+{
+ return 0;
+}
+
+static inline int
+devlink_info_version_stored_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value)
+{
+ return 0;
+}
+
+static inline int
+devlink_info_version_running_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value)
+{
+ return 0;
+}
#endif
#endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 142710d45093..7fffd879c328 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -294,6 +294,11 @@ enum devlink_attr {
DEVLINK_ATTR_INFO_DRIVER_NAME, /* string */
DEVLINK_ATTR_INFO_SERIAL_NUMBER, /* string */
+ DEVLINK_ATTR_INFO_VERSION_FIXED, /* nested */
+ DEVLINK_ATTR_INFO_VERSION_RUNNING, /* nested */
+ DEVLINK_ATTR_INFO_VERSION_STORED, /* nested */
+ DEVLINK_ATTR_INFO_VERSION_NAME, /* string */
+ DEVLINK_ATTR_INFO_VERSION_VALUE, /* string */
/* add new attributes above here, update the policy in devlink.c */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f456f6aa3d40..e31b6d617837 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3730,6 +3730,63 @@ int devlink_info_serial_number_put(struct devlink_info_req *req, const char *sn)
}
EXPORT_SYMBOL_GPL(devlink_info_serial_number_put);
+static int devlink_info_version_put(struct devlink_info_req *req, int attr,
+ const char *version_name,
+ const char *version_value)
+{
+ struct nlattr *nest;
+ int err;
+
+ nest = nla_nest_start(req->msg, attr);
+ if (!nest)
+ return -EMSGSIZE;
+
+ err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_NAME,
+ version_name);
+ if (err)
+ goto nla_put_failure;
+
+ err = nla_put_string(req->msg, DEVLINK_ATTR_INFO_VERSION_VALUE,
+ version_value);
+ if (err)
+ goto nla_put_failure;
+
+ nla_nest_end(req->msg, nest);
+
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(req->msg, nest);
+ return err;
+}
+
+int devlink_info_version_fixed_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value)
+{
+ return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_FIXED,
+ version_name, version_value);
+}
+EXPORT_SYMBOL_GPL(devlink_info_version_fixed_put);
+
+int devlink_info_version_stored_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value)
+{
+ return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_STORED,
+ version_name, version_value);
+}
+EXPORT_SYMBOL_GPL(devlink_info_version_stored_put);
+
+int devlink_info_version_running_put(struct devlink_info_req *req,
+ const char *version_name,
+ const char *version_value)
+{
+ return devlink_info_version_put(req, DEVLINK_ATTR_INFO_VERSION_RUNNING,
+ version_name, version_value);
+}
+EXPORT_SYMBOL_GPL(devlink_info_version_running_put);
+
static int
devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink,
enum devlink_command cmd, u32 portid,
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 3/8] devlink: add generic info version names
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
Add defines and docs for generic info versions.
v3:
- add docs;
- separate patch (Jiri).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
.../networking/devlink-info-versions.rst | 38 +++++++++++++++++++
Documentation/networking/index.rst | 1 +
include/net/devlink.h | 14 +++++++
3 files changed, 53 insertions(+)
create mode 100644 Documentation/networking/devlink-info-versions.rst
diff --git a/Documentation/networking/devlink-info-versions.rst b/Documentation/networking/devlink-info-versions.rst
new file mode 100644
index 000000000000..7d4ecf6b6f34
--- /dev/null
+++ b/Documentation/networking/devlink-info-versions.rst
@@ -0,0 +1,38 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+=====================
+Devlink info versions
+=====================
+
+board.id
+========
+
+Unique identifier of the board design.
+
+board.rev
+=========
+
+Board design revision.
+
+fw.mgmt
+=======
+
+Control unit firmware version. This firmware is responsible for house
+keeping tasks, PHY control etc. but not the packet-by-packet data path
+operation.
+
+fw.app
+======
+
+Data path microcode controlling high-speed packet processing.
+
+fw.undi
+=======
+
+UNDI software, may include the UEFI driver, firmware or both.
+
+fw.ncsi
+=======
+
+Version of the software responsible for supporting/handling the
+Network Controller Sideband Interface.
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index f1627ca2a0ea..9a32451cd201 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -24,6 +24,7 @@ Linux Networking Documentation
device_drivers/intel/i40e
device_drivers/intel/iavf
device_drivers/intel/ice
+ devlink-info-versions
kapi
z8530book
msg_zerocopy
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6dc0ef964392..6b417f141fd6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -428,6 +428,20 @@ enum devlink_param_wol_types {
.validate = _validate, \
}
+/* Part number, identifier of board design */
+#define DEVLINK_INFO_VERSION_GENERIC_BOARD_ID "board.id"
+/* Revision of board design */
+#define DEVLINK_INFO_VERSION_GENERIC_BOARD_REV "board.rev"
+
+/* Control processor FW version */
+#define DEVLINK_INFO_VERSION_GENERIC_FW_MGMT "fw.mgmt"
+/* Data path microcode controlling high-speed packet processing */
+#define DEVLINK_INFO_VERSION_GENERIC_FW_APP "fw.app"
+/* UNDI software version */
+#define DEVLINK_INFO_VERSION_GENERIC_FW_UNDI "fw.undi"
+/* NCSI support/handler version */
+#define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI "fw.ncsi"
+
struct devlink_region;
struct devlink_info_req;
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 4/8] nfp: devlink: report driver name and serial number
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
Report the basic info through new devlink info API.
RFCv2:
- add driver name;
- align serial to core changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
.../net/ethernet/netronome/nfp/nfp_devlink.c | 24 +++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 808647ec3573..2ba3b3891d99 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -4,6 +4,7 @@
#include <linux/rtnetlink.h>
#include <net/devlink.h>
+#include "nfpcore/nfp.h"
#include "nfpcore/nfp_nsp.h"
#include "nfp_app.h"
#include "nfp_main.h"
@@ -171,6 +172,28 @@ static int nfp_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
return ret;
}
+static int
+nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
+ struct netlink_ext_ack *extack)
+{
+ struct nfp_pf *pf = devlink_priv(devlink);
+ const char *sn;
+ int err;
+
+ err = devlink_info_driver_name_put(req, "nfp");
+ if (err)
+ return err;
+
+ sn = nfp_hwinfo_lookup(pf->hwinfo, "assembly.serial");
+ if (sn) {
+ err = devlink_info_serial_number_put(req, sn);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
const struct devlink_ops nfp_devlink_ops = {
.port_split = nfp_devlink_port_split,
.port_unsplit = nfp_devlink_port_unsplit,
@@ -178,6 +201,7 @@ const struct devlink_ops nfp_devlink_ops = {
.sb_pool_set = nfp_devlink_sb_pool_set,
.eswitch_mode_get = nfp_devlink_eswitch_mode_get,
.eswitch_mode_set = nfp_devlink_eswitch_mode_set,
+ .info_get = nfp_devlink_info_get,
};
int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 5/8] nfp: devlink: report fixed versions
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
Report information about the hardware.
RFCv2:
- add defines for board IDs which are likely to be reusable for
other drivers (Jiri).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
.../net/ethernet/netronome/nfp/nfp_devlink.c | 36 ++++++++++++++++++-
1 file changed, 35 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 2ba3b3891d99..75eda34fc1b4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -172,6 +172,40 @@ static int nfp_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
return ret;
}
+static const struct nfp_devlink_versions_simple {
+ const char *key;
+ const char *hwinfo;
+} nfp_devlink_versions_hwinfo[] = {
+ { DEVLINK_INFO_VERSION_GENERIC_BOARD_ID, "assembly.partno", },
+ { DEVLINK_INFO_VERSION_GENERIC_BOARD_REV, "assembly.revision", },
+ { "board.vendor", /* fab */ "assembly.vendor", },
+ { "board.model", /* code name */ "assembly.model", },
+};
+
+static int
+nfp_devlink_versions_get_hwinfo(struct nfp_pf *pf, struct devlink_info_req *req)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < ARRAY_SIZE(nfp_devlink_versions_hwinfo); i++) {
+ const struct nfp_devlink_versions_simple *info;
+ const char *val;
+
+ info = &nfp_devlink_versions_hwinfo[i];
+
+ val = nfp_hwinfo_lookup(pf->hwinfo, info->hwinfo);
+ if (!val)
+ continue;
+
+ err = devlink_info_version_fixed_put(req, info->key, val);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int
nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
struct netlink_ext_ack *extack)
@@ -191,7 +225,7 @@ nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
return err;
}
- return 0;
+ return nfp_devlink_versions_get_hwinfo(pf, req);
}
const struct devlink_ops nfp_devlink_ops = {
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 6/8] nfp: nsp: add support for versions command
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
Retrieve the FW versions with the new command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
.../ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 61 +++++++++++++++++++
.../ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 20 ++++++
2 files changed, 81 insertions(+)
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index ce1577bbbd2a..a9d53df0070c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -7,6 +7,7 @@
* Jason McMullan <jason.mcmullan@netronome.com>
*/
+#include <asm/unaligned.h>
#include <linux/bitfield.h>
#include <linux/delay.h>
#include <linux/firmware.h>
@@ -62,6 +63,16 @@
#define NFP_HWINFO_LOOKUP_SIZE GENMASK(11, 0)
+#define NFP_VERSIONS_SIZE GENMASK(11, 0)
+#define NFP_VERSIONS_CNT_OFF 0
+#define NFP_VERSIONS_BSP_OFF 2
+#define NFP_VERSIONS_CPLD_OFF 6
+#define NFP_VERSIONS_APP_OFF 10
+#define NFP_VERSIONS_BUNDLE_OFF 14
+#define NFP_VERSIONS_UNDI_OFF 18
+#define NFP_VERSIONS_NCSI_OFF 22
+#define NFP_VERSIONS_CFGR_OFF 26
+
enum nfp_nsp_cmd {
SPCODE_NOOP = 0, /* No operation */
SPCODE_SOFT_RESET = 1, /* Soft reset the NFP */
@@ -77,6 +88,7 @@ enum nfp_nsp_cmd {
SPCODE_NSP_IDENTIFY = 13, /* Read NSP version */
SPCODE_FW_STORED = 16, /* If no FW loaded, load flash app FW */
SPCODE_HWINFO_LOOKUP = 17, /* Lookup HWinfo with overwrites etc. */
+ SPCODE_VERSIONS = 21, /* Report FW versions */
};
static const struct {
@@ -711,3 +723,52 @@ int nfp_nsp_hwinfo_lookup(struct nfp_nsp *state, void *buf, unsigned int size)
return 0;
}
+
+int nfp_nsp_versions(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+ struct nfp_nsp_command_buf_arg versions = {
+ {
+ .code = SPCODE_VERSIONS,
+ .option = min_t(u32, size, NFP_VERSIONS_SIZE),
+ },
+ .out_buf = buf,
+ .out_size = min_t(u32, size, NFP_VERSIONS_SIZE),
+ };
+
+ return nfp_nsp_command_buf(state, &versions);
+}
+
+const char *nfp_nsp_versions_get(enum nfp_nsp_versions id, bool flash,
+ const u8 *buf, unsigned int size)
+{
+ static const u32 id2off[] = {
+ [NFP_VERSIONS_BSP] = NFP_VERSIONS_BSP_OFF,
+ [NFP_VERSIONS_CPLD] = NFP_VERSIONS_CPLD_OFF,
+ [NFP_VERSIONS_APP] = NFP_VERSIONS_APP_OFF,
+ [NFP_VERSIONS_BUNDLE] = NFP_VERSIONS_BUNDLE_OFF,
+ [NFP_VERSIONS_UNDI] = NFP_VERSIONS_UNDI_OFF,
+ [NFP_VERSIONS_NCSI] = NFP_VERSIONS_NCSI_OFF,
+ [NFP_VERSIONS_CFGR] = NFP_VERSIONS_CFGR_OFF,
+ };
+ unsigned int field, buf_field_cnt, buf_off;
+
+ if (id >= ARRAY_SIZE(id2off) || !id2off[id])
+ return ERR_PTR(-EINVAL);
+
+ field = id * 2 + flash;
+
+ buf_field_cnt = get_unaligned_le16(buf);
+ if (buf_field_cnt <= field)
+ return ERR_PTR(-ENOENT);
+
+ buf_off = get_unaligned_le16(buf + id2off[id] + flash * 2);
+ if (!buf_off)
+ return ERR_PTR(-ENOENT);
+
+ if (buf_off >= size)
+ return ERR_PTR(-EINVAL);
+ if (strnlen(&buf[buf_off], size - buf_off) == size - buf_off)
+ return ERR_PTR(-EINVAL);
+
+ return (const char *)&buf[buf_off];
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index ff33ac54097a..246e213f1514 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -38,6 +38,11 @@ static inline bool nfp_nsp_has_hwinfo_lookup(struct nfp_nsp *state)
return nfp_nsp_get_abi_ver_minor(state) > 24;
}
+static inline bool nfp_nsp_has_versions(struct nfp_nsp *state)
+{
+ return nfp_nsp_get_abi_ver_minor(state) > 27;
+}
+
enum nfp_eth_interface {
NFP_INTERFACE_NONE = 0,
NFP_INTERFACE_SFP = 1,
@@ -208,4 +213,19 @@ enum nfp_nsp_sensor_id {
int nfp_hwmon_read_sensor(struct nfp_cpp *cpp, enum nfp_nsp_sensor_id id,
long *val);
+#define NFP_NSP_VERSION_BUFSZ 1024 /* reasonable size, not in the ABI */
+
+enum nfp_nsp_versions {
+ NFP_VERSIONS_BSP,
+ NFP_VERSIONS_CPLD,
+ NFP_VERSIONS_APP,
+ NFP_VERSIONS_BUNDLE,
+ NFP_VERSIONS_UNDI,
+ NFP_VERSIONS_NCSI,
+ NFP_VERSIONS_CFGR,
+};
+
+int nfp_nsp_versions(struct nfp_nsp *state, void *buf, unsigned int size);
+const char *nfp_nsp_versions_get(enum nfp_nsp_versions id, bool flash,
+ const u8 *buf, unsigned int size);
#endif
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 7/8] nfp: devlink: report the running and flashed versions
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
Report versions of firmware components using the new NSP command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
.../net/ethernet/netronome/nfp/nfp_devlink.c | 87 +++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 75eda34fc1b4..dddbb0575be9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -206,11 +206,60 @@ nfp_devlink_versions_get_hwinfo(struct nfp_pf *pf, struct devlink_info_req *req)
return 0;
}
+static const struct nfp_devlink_versions {
+ enum nfp_nsp_versions id;
+ const char *key;
+} nfp_devlink_versions_nsp[] = {
+ { NFP_VERSIONS_BUNDLE, "fw.bundle_id", },
+ { NFP_VERSIONS_BSP, DEVLINK_INFO_VERSION_GENERIC_FW_MGMT, },
+ { NFP_VERSIONS_CPLD, "fw.cpld", },
+ { NFP_VERSIONS_APP, DEVLINK_INFO_VERSION_GENERIC_FW_APP, },
+ { NFP_VERSIONS_UNDI, DEVLINK_INFO_VERSION_GENERIC_FW_UNDI, },
+ { NFP_VERSIONS_NCSI, DEVLINK_INFO_VERSION_GENERIC_FW_NCSI, },
+ { NFP_VERSIONS_CFGR, "chip.init", },
+};
+
+static int
+nfp_devlink_versions_get_nsp(struct devlink_info_req *req, bool flash,
+ const u8 *buf, unsigned int size)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < ARRAY_SIZE(nfp_devlink_versions_nsp); i++) {
+ const struct nfp_devlink_versions *info;
+ const char *version;
+
+ info = &nfp_devlink_versions_nsp[i];
+
+ version = nfp_nsp_versions_get(info->id, flash, buf, size);
+ if (IS_ERR(version)) {
+ if (PTR_ERR(version) == -ENOENT)
+ continue;
+ else
+ return PTR_ERR(version);
+ }
+
+ if (flash)
+ err = devlink_info_version_stored_put(req, info->key,
+ version);
+ else
+ err = devlink_info_version_running_put(req, info->key,
+ version);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int
nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
struct netlink_ext_ack *extack)
{
struct nfp_pf *pf = devlink_priv(devlink);
+ struct nfp_nsp *nsp;
+ char *buf = NULL;
const char *sn;
int err;
@@ -225,7 +274,45 @@ nfp_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
return err;
}
+ nsp = nfp_nsp_open(pf->cpp);
+ if (IS_ERR(nsp)) {
+ NL_SET_ERR_MSG_MOD(extack, "can't access NSP");
+ return PTR_ERR(nsp);
+ }
+
+ if (nfp_nsp_has_versions(nsp)) {
+ buf = kzalloc(NFP_NSP_VERSION_BUFSZ, GFP_KERNEL);
+ if (!buf) {
+ err = -ENOMEM;
+ goto err_close_nsp;
+ }
+
+ err = nfp_nsp_versions(nsp, buf, NFP_NSP_VERSION_BUFSZ);
+ if (err)
+ goto err_free_buf;
+
+ err = nfp_devlink_versions_get_nsp(req, false,
+ buf, NFP_NSP_VERSION_BUFSZ);
+ if (err)
+ goto err_free_buf;
+
+ err = nfp_devlink_versions_get_nsp(req, true,
+ buf, NFP_NSP_VERSION_BUFSZ);
+ if (err)
+ goto err_free_buf;
+
+ kfree(buf);
+ }
+
+ nfp_nsp_close(nsp);
+
return nfp_devlink_versions_get_hwinfo(pf, req);
+
+err_free_buf:
+ kfree(buf);
+err_close_nsp:
+ nfp_nsp_close(nsp);
+ return err;
}
const struct devlink_ops nfp_devlink_ops = {
--
2.19.2
^ permalink raw reply related
* [PATCH net-next v3 8/8] ethtool: add compat for devlink info
From: Jakub Kicinski @ 2019-01-30 23:41 UTC (permalink / raw)
To: davem
Cc: netdev, oss-drivers, jiri, andrew, f.fainelli, mkubecek, eugenem,
jonathan.lemon, Jakub Kicinski
In-Reply-To: <20190130234133.4298-1-jakub.kicinski@netronome.com>
If driver did not fill the fw_version field, try to call into
the new devlink get_info op and collect the versions that way.
We assume ethtool was always reporting running versions.
v3 (Jiri):
- do a dump and then parse it instead of special handling;
- concatenate all versions (well, all that fit :)).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
include/net/devlink.h | 7 +++++
net/core/devlink.c | 63 +++++++++++++++++++++++++++++++++++++++++++
net/core/ethtool.c | 7 +++++
3 files changed, 77 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6b417f141fd6..3d9ebd548bac 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -637,6 +637,8 @@ int devlink_info_version_stored_put(struct devlink_info_req *req,
int devlink_info_version_running_put(struct devlink_info_req *req,
const char *version_name,
const char *version_value);
+void devlink_compat_running_version(struct net_device *dev,
+ char *buf, size_t len);
#else
@@ -970,6 +972,11 @@ devlink_info_version_running_put(struct devlink_info_req *req,
{
return 0;
}
+
+static inline void
+devlink_compat_running_version(struct net_device *dev, char *buf, size_t len)
+{
+}
#endif
#endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e31b6d617837..eb839d74bcc0 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -5278,6 +5278,69 @@ int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
}
EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
+static void __devlink_compat_running_version(struct devlink *devlink,
+ char *buf, size_t len)
+{
+ const struct nlattr *nlattr;
+ struct devlink_info_req req;
+ struct sk_buff *msg;
+ int rem, err;
+
+ if (!devlink->ops->info_get)
+ return;
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg)
+ return;
+
+ req.msg = msg;
+ err = devlink->ops->info_get(devlink, &req, NULL);
+ if (err)
+ goto free_msg;
+
+ nla_for_each_attr(nlattr, (void *)msg->data, msg->len, rem) {
+ const struct nlattr *kv;
+ int rem_kv;
+
+ if (nla_type(nlattr) != DEVLINK_ATTR_INFO_VERSION_RUNNING)
+ continue;
+
+ nla_for_each_nested(kv, nlattr, rem_kv) {
+ if (nla_type(kv) != DEVLINK_ATTR_INFO_VERSION_VALUE)
+ continue;
+
+ strlcat(buf, nla_data(kv), len);
+ strlcat(buf, " ", len);
+ }
+ }
+free_msg:
+ nlmsg_free(msg);
+}
+
+void devlink_compat_running_version(struct net_device *dev,
+ char *buf, size_t len)
+{
+ struct devlink_port *devlink_port;
+ struct devlink *devlink;
+
+ mutex_lock(&devlink_mutex);
+ list_for_each_entry(devlink, &devlink_list, list) {
+ mutex_lock(&devlink->lock);
+ list_for_each_entry(devlink_port, &devlink->port_list, list) {
+ if (devlink_port->type == DEVLINK_PORT_TYPE_ETH ||
+ devlink_port->type_dev == dev) {
+ __devlink_compat_running_version(devlink,
+ buf, len);
+ mutex_unlock(&devlink->lock);
+ goto out;
+ }
+ }
+ mutex_unlock(&devlink->lock);
+ }
+out:
+ mutex_unlock(&devlink_mutex);
+}
+
static int __init devlink_module_init(void)
{
return genl_register_family(&devlink_nl_family);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 158264f7cfaf..197a4dfb712d 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -27,6 +27,7 @@
#include <linux/rtnetlink.h>
#include <linux/sched/signal.h>
#include <linux/net.h>
+#include <net/devlink.h>
#include <net/xdp_sock.h>
/*
@@ -803,6 +804,12 @@ static noinline_for_stack int ethtool_get_drvinfo(struct net_device *dev,
if (ops->get_eeprom_len)
info.eedump_len = ops->get_eeprom_len(dev);
+ rtnl_unlock();
+ if (!info.fw_version[0])
+ devlink_compat_running_version(dev, info.fw_version,
+ sizeof(info.fw_version));
+ rtnl_lock();
+
if (copy_to_user(useraddr, &info, sizeof(info)))
return -EFAULT;
return 0;
--
2.19.2
^ permalink raw reply related
* [PATCH net] enic: fix checksum validation for IPv6
From: Govindarajulu Varadarajan @ 2019-01-30 14:59 UTC (permalink / raw)
To: davem, netdev; +Cc: benve, Govindarajulu Varadarajan
In case of IPv6 pkts, ipv4_csum_ok is 0. Because of this, driver does
not set skb->ip_summed. So IPv6 rx checksum is not offloaded.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
drivers/net/ethernet/cisco/enic/enic_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 60641e202534..9a7f70db20c7 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1434,7 +1434,8 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
* csum is correct or is zero.
*/
if ((netdev->features & NETIF_F_RXCSUM) && !csum_not_calc &&
- tcp_udp_csum_ok && ipv4_csum_ok && outer_csum_ok) {
+ tcp_udp_csum_ok && outer_csum_ok &&
+ (ipv4_csum_ok || ipv6)) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->csum_level = encap;
}
--
2.20.1
^ permalink raw reply related
* Re: [PATCH] : net : hso : do not call unregister_netdev if not registered
From: Yavuz, Tuba @ 2019-01-30 23:50 UTC (permalink / raw)
To: David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190130.135803.1455746285854639679.davem@davemloft.net>
Hi David,
I'll fix the spaces in the subject.
When I checked my patch it only had a p0 patch warning but I am not sure how to fix it.
WARNING: patch prefix 'drivers' exists, appears to be a -p0 patch
total: 0 errors, 1 warnings, 0 checks, 8 lines checked
drivers/net/usb/hso.patch has style problems, please review.
So, please advise.
Best,
Tuba Yavuz, Ph.D.
Assistant Professor
Electrical and Computer Engineering Department
University of Florida
Gainesville, FL 32611
Webpage: http://www.tuba.ece.ufl.edu/
Email: tuba@ece.ufl.edu
Phone: (352) 846 0202
________________________________________
From: David Miller <davem@davemloft.net>
Sent: Wednesday, January 30, 2019 4:58 PM
To: Yavuz, Tuba
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH] : net : hso : do not call unregister_netdev if not registered
From: "Yavuz, Tuba" <tuba@ece.ufl.edu>
Date: Mon, 28 Jan 2019 16:28:38 +0000
> On an error path inside the hso_create_net_device function of the hso
> driver, hso_free_net_device gets called. This causes potentially a
> negative reference count in the net device if register_netdev has not
> been called yet as hso_free_net_device calls unregister_netdev
> regardless. I think the driver should distinguish these cases and call
> unregister_netdev only if register_netdev has been called.
>
> Reported-by: Tuba Yavuz <tuba@ece.ufl.edu>
> Signed-off-by: Tuba Yavuz <tuba@ece.ufl.edu>
This does not apply cleanly to the net tree.
Also, please stop putting those spaces after the subsystem prefixes
in your Subject line. Put the colon character immediately after
each subsystem prefix.
Thank you.
^ permalink raw reply
* [PATCH bpf-next v5 0/5] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 23:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
V2 changes: Added flowi-based route lookup, IPv6 encapping, and
encapping on ingress.
V3 changes: incorporated David Ahern's suggestions:
- added l3mdev check/oif (patch 2)
- sync bpf.h from include/uapi into tools/include/uapi
- selftest tweaks
V4 changes: moved route lookup/dst change from bpf_push_ip_encap
to when BPF_LWT_REROUTE is handled, as suggested by David Ahern.
V5 changes: added a check in lwt_xmit that skb->protocol stays the
same if the skb is to be passed back to the stack (ret == BPF_OK).
Again, suggested by David Ahern.
Peter Oskolkov (5):
bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c
bpf: sync <kdir>/<uapi>/bpf.h with tools/<uapi>/bpf.h
selftests: bpf: add test_lwt_ip_encap selftest
include/net/lwtunnel.h | 3 +
include/uapi/linux/bpf.h | 23 +-
net/core/filter.c | 47 ++-
net/core/lwt_bpf.c | 184 +++++++++++
tools/include/uapi/linux/bpf.h | 23 +-
tools/testing/selftests/bpf/Makefile | 5 +-
.../testing/selftests/bpf/test_lwt_ip_encap.c | 85 +++++
.../selftests/bpf/test_lwt_ip_encap.sh | 311 ++++++++++++++++++
8 files changed, 670 insertions(+), 11 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_lwt_ip_encap.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_ip_encap.sh
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply
* [PATCH bpf-next v5 1/5] bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 23:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130235136.136527-1-posk@google.com>
This patch adds all needed plumbing in preparation to allowing
bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
implementation is added in the next patch in the patchset.
Of note:
- bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
prog types in addition to BPF_PROG_TYPE_LWT_IN;
- as route lookups are different for ingress vs egress, the single
external bpf_lwt_push_encap BPF helper is routed internally to
either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
depending on prog type.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
include/uapi/linux/bpf.h | 23 ++++++++++++++++++--
net/core/filter.c | 46 +++++++++++++++++++++++++++++++++++-----
2 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 60b99b730a41..911c15585fab 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2015,6 +2015,16 @@ union bpf_attr {
* Only works if *skb* contains an IPv6 packet. Insert a
* Segment Routing Header (**struct ipv6_sr_hdr**) inside
* the IPv6 header.
+ * **BPF_LWT_ENCAP_IP**
+ * IP encapsulation (GRE/GUE/IPIP/etc). The outer header
+ * must be IPv4 or IPv6, followed by zero or more
+ * additional headers, up to LWT_BPF_MAX_HEADROOM total
+ * bytes in all prepended headers.
+ *
+ * BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of
+ * type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called
+ * by bpf programs of types BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT.
*
* A call to this helper is susceptible to change the underlaying
* packet buffer. Therefore, at load time, all checks on pointers
@@ -2495,7 +2505,8 @@ enum bpf_hdr_start_off {
/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6,
- BPF_LWT_ENCAP_SEG6_INLINE
+ BPF_LWT_ENCAP_SEG6_INLINE,
+ BPF_LWT_ENCAP_IP,
};
#define __bpf_md_ptr(type, name) \
@@ -2583,7 +2594,15 @@ enum bpf_ret_code {
BPF_DROP = 2,
/* 3-6 reserved */
BPF_REDIRECT = 7,
- /* >127 are reserved for prog type specific return codes */
+ /* >127 are reserved for prog type specific return codes.
+ *
+ * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT to indicate that skb had been
+ * changed and should be routed based on its new L3 header.
+ * (This is an L3 redirect, as opposed to L2 redirect
+ * represented by BPF_REDIRECT above).
+ */
+ BPF_LWT_REROUTE = 128,
};
struct bpf_sock {
diff --git a/net/core/filter.c b/net/core/filter.c
index 41984ad4b9b4..27d3fbe4b77b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4801,7 +4801,13 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len
}
#endif /* CONFIG_IPV6_SEG6_BPF */
-BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
+static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
+ bool ingress)
+{
+ return -EINVAL; /* Implemented in the next patch. */
+}
+
+BPF_CALL_4(bpf_lwt_in_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
u32, len)
{
switch (type) {
@@ -4809,14 +4815,41 @@ BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
case BPF_LWT_ENCAP_SEG6:
case BPF_LWT_ENCAP_SEG6_INLINE:
return bpf_push_seg6_encap(skb, type, hdr, len);
+#endif
+#if IS_ENABLED(CONFIG_LWTUNNEL_BPF)
+ case BPF_LWT_ENCAP_IP:
+ return bpf_push_ip_encap(skb, hdr, len, true /* ingress */);
#endif
default:
return -EINVAL;
}
}
-static const struct bpf_func_proto bpf_lwt_push_encap_proto = {
- .func = bpf_lwt_push_encap,
+BPF_CALL_4(bpf_lwt_xmit_push_encap, struct sk_buff *, skb, u32, type,
+ void *, hdr, u32, len)
+{
+ switch (type) {
+#if IS_ENABLED(CONFIG_LWTUNNEL_BPF)
+ case BPF_LWT_ENCAP_IP:
+ return bpf_push_ip_encap(skb, hdr, len, false /* egress */);
+#endif
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct bpf_func_proto bpf_lwt_in_push_encap_proto = {
+ .func = bpf_lwt_in_push_encap,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+static const struct bpf_func_proto bpf_lwt_xmit_push_encap_proto = {
+ .func = bpf_lwt_xmit_push_encap,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
@@ -5282,7 +5315,8 @@ bool bpf_helper_changes_pkt_data(void *func)
func == bpf_lwt_seg6_adjust_srh ||
func == bpf_lwt_seg6_action ||
#endif
- func == bpf_lwt_push_encap)
+ func == bpf_lwt_in_push_encap ||
+ func == bpf_lwt_xmit_push_encap)
return true;
return false;
@@ -5660,7 +5694,7 @@ lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_lwt_push_encap:
- return &bpf_lwt_push_encap_proto;
+ return &bpf_lwt_in_push_encap_proto;
default:
return lwt_out_func_proto(func_id, prog);
}
@@ -5696,6 +5730,8 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_l4_csum_replace_proto;
case BPF_FUNC_set_hash_invalid:
return &bpf_set_hash_invalid_proto;
+ case BPF_FUNC_lwt_push_encap:
+ return &bpf_lwt_xmit_push_encap_proto;
default:
return lwt_out_func_proto(func_id, prog);
}
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply related
* [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 23:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130235136.136527-1-posk@google.com>
This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
Signed-off-by: Peter Oskolkov <posk@google.com>
---
include/net/lwtunnel.h | 3 +++
net/core/filter.c | 3 ++-
net/core/lwt_bpf.c | 59 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+), 1 deletion(-)
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index 33fd9ba7e0e5..f0973eca8036 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -126,6 +126,8 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb);
int lwtunnel_input(struct sk_buff *skb);
int lwtunnel_xmit(struct sk_buff *skb);
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
+ bool ingress);
static inline void lwtunnel_set_redirect(struct dst_entry *dst)
{
@@ -138,6 +140,7 @@ static inline void lwtunnel_set_redirect(struct dst_entry *dst)
dst->input = lwtunnel_input;
}
}
+
#else
static inline void lwtstate_free(struct lwtunnel_state *lws)
diff --git a/net/core/filter.c b/net/core/filter.c
index 27d3fbe4b77b..de6bd4b4e0a3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -73,6 +73,7 @@
#include <linux/seg6_local.h>
#include <net/seg6.h>
#include <net/seg6_local.h>
+#include <net/lwtunnel.h>
/**
* sk_filter_trim_cap - run a packet through a socket filter
@@ -4804,7 +4805,7 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len
static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
bool ingress)
{
- return -EINVAL; /* Implemented in the next patch. */
+ return bpf_lwt_push_ip_encap(skb, hdr, len, ingress);
}
BPF_CALL_4(bpf_lwt_in_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index a648568c5e8f..6a6e9acab73d 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -390,6 +390,65 @@ static const struct lwtunnel_encap_ops bpf_encap_ops = {
.owner = THIS_MODULE,
};
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
+{
+ struct iphdr *iph;
+ bool ipv4;
+ int err;
+
+ if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
+ return -EINVAL;
+
+ /* validate protocol and length */
+ iph = (struct iphdr *)hdr;
+ if (iph->version == 4) {
+ ipv4 = true;
+ if (unlikely(len < iph->ihl * 4))
+ return -EINVAL;
+ } else if (iph->version == 6) {
+ ipv4 = false;
+ if (unlikely(len < sizeof(struct ipv6hdr)))
+ return -EINVAL;
+ } else {
+ return -EINVAL;
+ }
+
+ if (ingress)
+ err = skb_cow_head(skb, len + skb->mac_len);
+ else
+ err = skb_cow_head(skb,
+ len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
+ if (unlikely(err))
+ return err;
+
+ /* push the encap headers and fix pointers */
+ skb_reset_inner_headers(skb);
+ skb->encapsulation = 1;
+ skb_push(skb, len);
+ if (ingress)
+ skb_postpush_rcsum(skb, iph, len);
+ skb_reset_network_header(skb);
+ memcpy(skb_network_header(skb), hdr, len);
+ bpf_compute_data_pointers(skb);
+
+ if (ipv4) {
+ skb->protocol = htons(ETH_P_IP);
+ iph = ip_hdr(skb);
+ if (iph->ihl * 4 < len)
+ skb_set_transport_header(skb, iph->ihl * 4);
+
+ if (!iph->check)
+ iph->check = ip_fast_csum((unsigned char *)iph,
+ iph->ihl);
+ } else {
+ skb->protocol = htons(ETH_P_IPV6);
+ if (sizeof(struct ipv6hdr) < len)
+ skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+ }
+
+ return 0;
+}
+
static int __init bpf_lwt_init(void)
{
return lwtunnel_encap_add_ops(&bpf_encap_ops, LWTUNNEL_ENCAP_BPF);
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply related
* [PATCH bpf-next v5 3/5] bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c
From: Peter Oskolkov @ 2019-01-30 23:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130235136.136527-1-posk@google.com>
This patch builds on top of the previous patch in the patchset,
which added BPF_LWT_ENCAP_IP mode to bpf_lwt_push_encap. As the
encapping can result in the skb needing to go via a different
interface/route/dst, bpf programs can indicate this by returning
BPF_LWT_REROUTE, which triggers a new route lookup for the skb.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
net/core/lwt_bpf.c | 125 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 125 insertions(+)
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 6a6e9acab73d..20581567f84a 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -16,6 +16,7 @@
#include <linux/types.h>
#include <linux/bpf.h>
#include <net/lwtunnel.h>
+#include <net/ip6_route.h>
struct bpf_lwt_prog {
struct bpf_prog *prog;
@@ -55,6 +56,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
switch (ret) {
case BPF_OK:
+ case BPF_LWT_REROUTE:
break;
case BPF_REDIRECT:
@@ -87,6 +89,32 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
return ret;
}
+static int bpf_lwt_input_reroute(struct sk_buff *skb)
+{
+ int err = -EINVAL;
+
+ if (skb->protocol == htons(ETH_P_IP)) {
+ struct iphdr *iph = ip_hdr(skb);
+
+ err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+ iph->tos, skb_dst(skb)->dev);
+ } else if (skb->protocol == htons(ETH_P_IPV6)) {
+ ip6_route_input(skb);
+ err = skb_dst(skb)->error;
+ } else {
+ pr_warn_once("BPF_LWT_REROUTE input: unsupported proto %d\n",
+ skb->protocol);
+ }
+
+ if (err)
+ goto err;
+ return dst_input(skb);
+
+err:
+ kfree_skb(skb);
+ return err;
+}
+
static int bpf_input(struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
@@ -98,6 +126,8 @@ static int bpf_input(struct sk_buff *skb)
ret = run_lwt_bpf(skb, &bpf->in, dst, NO_REDIRECT);
if (ret < 0)
return ret;
+ if (ret == BPF_LWT_REROUTE)
+ return bpf_lwt_input_reroute(skb);
}
if (unlikely(!dst->lwtstate->orig_input)) {
@@ -147,6 +177,90 @@ static int xmit_check_hhlen(struct sk_buff *skb)
return 0;
}
+static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
+{
+ struct net_device *l3mdev = l3mdev_master_dev_rcu(skb_dst(skb)->dev);
+ int oif = l3mdev ? l3mdev->ifindex : 0;
+ struct dst_entry *dst = NULL;
+ struct sock *sk;
+ struct net *net;
+ bool ipv4;
+ int err;
+
+ if (skb->protocol == htons(ETH_P_IP)) {
+ ipv4 = true;
+ } else if (skb->protocol == htons(ETH_P_IPV6)) {
+ ipv4 = false;
+ } else {
+ pr_warn_once("BPF_LWT_REROUTE xmit: unsupported proto %d\n",
+ skb->protocol);
+ return -EINVAL;
+ }
+
+ sk = sk_to_full_sk(skb->sk);
+ if (sk) {
+ if (sk->sk_bound_dev_if)
+ oif = sk->sk_bound_dev_if;
+ net = sock_net(sk);
+ } else {
+ net = dev_net(skb_dst(skb)->dev);
+ }
+
+ if (ipv4) {
+ struct iphdr *iph = ip_hdr(skb);
+ struct flowi4 fl4 = {0};
+ struct rtable *rt;
+
+ fl4.flowi4_oif = oif;
+ fl4.flowi4_mark = skb->mark;
+ fl4.flowi4_uid = sock_net_uid(net, sk);
+ fl4.flowi4_tos = RT_TOS(iph->tos);
+ fl4.flowi4_flags = FLOWI_FLAG_ANYSRC;
+ fl4.flowi4_proto = iph->protocol;
+ fl4.daddr = iph->daddr;
+ fl4.saddr = iph->saddr;
+
+ rt = ip_route_output_key(net, &fl4);
+ if (IS_ERR(rt) || rt->dst.error)
+ return -EINVAL;
+ dst = &rt->dst;
+ } else {
+ struct ipv6hdr *iph6 = ipv6_hdr(skb);
+ struct flowi6 fl6 = {0};
+
+ fl6.flowi6_oif = oif;
+ fl6.flowi6_mark = skb->mark;
+ fl6.flowi6_uid = sock_net_uid(net, sk);
+ fl6.flowlabel = ip6_flowinfo(iph6);
+ fl6.flowi6_proto = iph6->nexthdr;
+ fl6.daddr = iph6->daddr;
+ fl6.saddr = iph6->saddr;
+
+ dst = ip6_route_output(net, skb->sk, &fl6);
+ if (IS_ERR(dst) || dst->error)
+ return -EINVAL;
+ }
+
+ /* Although skb header was reserved in bpf_lwt_push_ip_encap(), it
+ * was done for the previous dst, so we are doing it here again, in
+ * case the new dst needs much more space. The call below is a noop
+ * if there is enough header space in skb.
+ */
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ return err;
+
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+
+ err = dst_output(dev_net(skb_dst(skb)->dev), skb->sk, skb);
+ if (unlikely(err))
+ return err;
+
+ /* ip[6]_finish_output2 understand LWTUNNEL_XMIT_DONE */
+ return LWTUNNEL_XMIT_DONE;
+}
+
static int bpf_xmit(struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
@@ -154,11 +268,20 @@ static int bpf_xmit(struct sk_buff *skb)
bpf = bpf_lwt_lwtunnel(dst->lwtstate);
if (bpf->xmit.prog) {
+ __be16 proto = skb->protocol;
int ret;
ret = run_lwt_bpf(skb, &bpf->xmit, dst, CAN_REDIRECT);
switch (ret) {
case BPF_OK:
+ /* If the header changed, e.g. via bpf_lwt_push_encap,
+ * BPF_LWT_REROUTE below should have been used if the
+ * protocol was also changed.
+ */
+ if (skb->protocol != proto) {
+ kfree_skb(skb);
+ return -EINVAL;
+ }
/* If the header was expanded, headroom might be too
* small for L2 header to come, expand as needed.
*/
@@ -169,6 +292,8 @@ static int bpf_xmit(struct sk_buff *skb)
return LWTUNNEL_XMIT_CONTINUE;
case BPF_REDIRECT:
return LWTUNNEL_XMIT_DONE;
+ case BPF_LWT_REROUTE:
+ return bpf_lwt_xmit_reroute(skb);
default:
return ret;
}
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply related
* [PATCH bpf-next v5 4/5] bpf: sync <kdir>/<uapi>/bpf.h with tools/<uapi>/bpf.h
From: Peter Oskolkov @ 2019-01-30 23:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130235136.136527-1-posk@google.com>
This patch copies changes in bpf.h done by a previous patch
in this patchset from the kernel uapi include dir into tools
uapi include dir.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
tools/include/uapi/linux/bpf.h | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 60b99b730a41..911c15585fab 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2015,6 +2015,16 @@ union bpf_attr {
* Only works if *skb* contains an IPv6 packet. Insert a
* Segment Routing Header (**struct ipv6_sr_hdr**) inside
* the IPv6 header.
+ * **BPF_LWT_ENCAP_IP**
+ * IP encapsulation (GRE/GUE/IPIP/etc). The outer header
+ * must be IPv4 or IPv6, followed by zero or more
+ * additional headers, up to LWT_BPF_MAX_HEADROOM total
+ * bytes in all prepended headers.
+ *
+ * BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of
+ * type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called
+ * by bpf programs of types BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT.
*
* A call to this helper is susceptible to change the underlaying
* packet buffer. Therefore, at load time, all checks on pointers
@@ -2495,7 +2505,8 @@ enum bpf_hdr_start_off {
/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6,
- BPF_LWT_ENCAP_SEG6_INLINE
+ BPF_LWT_ENCAP_SEG6_INLINE,
+ BPF_LWT_ENCAP_IP,
};
#define __bpf_md_ptr(type, name) \
@@ -2583,7 +2594,15 @@ enum bpf_ret_code {
BPF_DROP = 2,
/* 3-6 reserved */
BPF_REDIRECT = 7,
- /* >127 are reserved for prog type specific return codes */
+ /* >127 are reserved for prog type specific return codes.
+ *
+ * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT to indicate that skb had been
+ * changed and should be routed based on its new L3 header.
+ * (This is an L3 redirect, as opposed to L2 redirect
+ * represented by BPF_REDIRECT above).
+ */
+ BPF_LWT_REROUTE = 128,
};
struct bpf_sock {
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply related
* [PATCH bpf-next v5 5/5] selftests: bpf: add test_lwt_ip_encap selftest
From: Peter Oskolkov @ 2019-01-30 23:51 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Peter Oskolkov
In-Reply-To: <20190130235136.136527-1-posk@google.com>
This patch adds a bpf self-test to cover BPF_LWT_ENCAP_IP mode
in bpf_lwt_push_encap.
Covered:
- encapping in LWT_IN and LWT_XMIT
- IPv4 and IPv6
Signed-off-by: Peter Oskolkov <posk@google.com>
Change-Id: I9d0d1003a40c28a41467116f3c32a84730ff39b2
---
tools/testing/selftests/bpf/Makefile | 5 +-
.../testing/selftests/bpf/test_lwt_ip_encap.c | 85 +++++
.../selftests/bpf/test_lwt_ip_encap.sh | 311 ++++++++++++++++++
3 files changed, 399 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_lwt_ip_encap.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_ip_encap.sh
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8993e9c8f410..28aa3b3e297e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,7 +35,7 @@ BPF_OBJ_FILES = \
sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_xdp_vlan.o \
- xdp_dummy.o test_map_in_map.o
+ xdp_dummy.o test_map_in_map.o test_lwt_ip_encap.o
# Objects are built with default compilation flags and with sub-register
# code-gen enabled.
@@ -73,7 +73,8 @@ TEST_PROGS := test_kmod.sh \
test_lirc_mode2.sh \
test_skb_cgroup_id.sh \
test_flow_dissector.sh \
- test_xdp_vlan.sh
+ test_xdp_vlan.sh \
+ test_lwt_ip_encap.sh
TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh \
diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.c b/tools/testing/selftests/bpf/test_lwt_ip_encap.c
new file mode 100644
index 000000000000..c957d6dfe6d7
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_ip_encap.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+struct grehdr {
+ __be16 flags;
+ __be16 protocol;
+};
+
+SEC("encap_gre")
+int bpf_lwt_encap_gre(struct __sk_buff *skb)
+{
+ struct encap_hdr {
+ struct iphdr iph;
+ struct grehdr greh;
+ } hdr;
+ int err;
+
+ memset(&hdr, 0, sizeof(struct encap_hdr));
+
+ hdr.iph.ihl = 5;
+ hdr.iph.version = 4;
+ hdr.iph.ttl = 0x40;
+ hdr.iph.protocol = 47; /* IPPROTO_GRE */
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ hdr.iph.saddr = 0x640110ac; /* 172.16.1.100 */
+ hdr.iph.daddr = 0x641010ac; /* 172.16.16.100 */
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ hdr.iph.saddr = 0xac100164; /* 172.16.1.100 */
+ hdr.iph.daddr = 0xac101064; /* 172.16.16.100 */
+#else
+#error "Fix your compiler's __BYTE_ORDER__?!"
+#endif
+ hdr.iph.tot_len = bpf_htons(skb->len + sizeof(struct encap_hdr));
+
+ hdr.greh.protocol = skb->protocol;
+
+ err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr,
+ sizeof(struct encap_hdr));
+ if (err)
+ return BPF_DROP;
+
+ return BPF_LWT_REROUTE;
+}
+
+SEC("encap_gre6")
+int bpf_lwt_encap_gre6(struct __sk_buff *skb)
+{
+ struct encap_hdr {
+ struct ipv6hdr ip6hdr;
+ struct grehdr greh;
+ } hdr;
+ int err;
+
+ memset(&hdr, 0, sizeof(struct encap_hdr));
+
+ hdr.ip6hdr.version = 6;
+ hdr.ip6hdr.payload_len = bpf_htons(skb->len + sizeof(struct grehdr));
+ hdr.ip6hdr.nexthdr = 47; /* IPPROTO_GRE */
+ hdr.ip6hdr.hop_limit = 0x40;
+ /* fb01::1 */
+ hdr.ip6hdr.saddr.s6_addr[0] = 0xfb;
+ hdr.ip6hdr.saddr.s6_addr[1] = 1;
+ hdr.ip6hdr.saddr.s6_addr[15] = 1;
+ /* fb10::1 */
+ hdr.ip6hdr.daddr.s6_addr[0] = 0xfb;
+ hdr.ip6hdr.daddr.s6_addr[1] = 0x10;
+ hdr.ip6hdr.daddr.s6_addr[15] = 1;
+
+ hdr.greh.protocol = skb->protocol;
+
+ err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr,
+ sizeof(struct encap_hdr));
+ if (err)
+ return BPF_DROP;
+
+ return BPF_LWT_REROUTE;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh
new file mode 100755
index 000000000000..4ca714e23ab0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh
@@ -0,0 +1,311 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Setup/topology:
+#
+# NS1 NS2 NS3
+# veth1 <---> veth2 veth3 <---> veth4 (the top route)
+# veth5 <---> veth6 veth7 <---> veth8 (the bottom route)
+#
+# each vethN gets IPv[4|6]_N address
+#
+# IPv*_SRC = IPv*_1
+# IPv*_DST = IPv*_4
+#
+# all tests test pings from IPv*_SRC to IPv*_DST
+#
+# by default, routes are configured to allow packets to go
+# IP*_1 <=> IP*_2 <=> IP*_3 <=> IP*_4 (the top route)
+#
+# a GRE device is installed in NS3 with IPv*_GRE, and
+# NS1/NS2 are configured to route packets to IPv*_GRE via IP*_8
+# (the bottom route)
+#
+# Tests:
+#
+# 1. routes NS2->IPv*_DST are brought down, so the only way a ping
+# from IP*_SRC to IP*_DST can work is via IPv*_GRE
+#
+# 2a. in an egress test, a bpf LWT_XMIT program is installed on veth1
+# that encaps the packets with an IP/GRE header to route to IPv*_GRE
+#
+# ping: SRC->[encap at veth1:egress]->GRE:decap->DST
+# ping replies go DST->SRC directly
+#
+# 2b. in an ingress test, a bpf LWT_IN program is installed on veth2
+# that encaps the packets with an IP/GRE header to route to IPv*_GRE
+#
+# ping: SRC->[encap at veth2:ingress]->GRE:decap->DST
+# ping replies go DST->SRC directly
+
+set -e # exit on error
+
+if [[ $EUID -ne 0 ]]; then
+ echo "This script must be run as root"
+ echo "FAIL"
+ exit 1
+fi
+
+readonly NS1="ns1-$(mktemp -u XXXXXX)"
+readonly NS2="ns2-$(mktemp -u XXXXXX)"
+readonly NS3="ns3-$(mktemp -u XXXXXX)"
+
+readonly IPv4_1="172.16.1.100"
+readonly IPv4_2="172.16.2.100"
+readonly IPv4_3="172.16.3.100"
+readonly IPv4_4="172.16.4.100"
+readonly IPv4_5="172.16.5.100"
+readonly IPv4_6="172.16.6.100"
+readonly IPv4_7="172.16.7.100"
+readonly IPv4_8="172.16.8.100"
+readonly IPv4_GRE="172.16.16.100"
+
+readonly IPv4_SRC=$IPv4_1
+readonly IPv4_DST=$IPv4_4
+
+readonly IPv6_1="fb01::1"
+readonly IPv6_2="fb02::1"
+readonly IPv6_3="fb03::1"
+readonly IPv6_4="fb04::1"
+readonly IPv6_5="fb05::1"
+readonly IPv6_6="fb06::1"
+readonly IPv6_7="fb07::1"
+readonly IPv6_8="fb08::1"
+readonly IPv6_GRE="fb10::1"
+
+readonly IPv6_SRC=$IPv6_1
+readonly IPv6_DST=$IPv6_4
+
+setup() {
+set -e # exit on error
+ # create devices and namespaces
+ ip netns add "${NS1}"
+ ip netns add "${NS2}"
+ ip netns add "${NS3}"
+
+ ip link add veth1 type veth peer name veth2
+ ip link add veth3 type veth peer name veth4
+ ip link add veth5 type veth peer name veth6
+ ip link add veth7 type veth peer name veth8
+
+ ip netns exec ${NS2} sysctl -wq net.ipv4.ip_forward=1
+ ip netns exec ${NS2} sysctl -wq net.ipv6.conf.all.forwarding=1
+
+ ip link set veth1 netns ${NS1}
+ ip link set veth2 netns ${NS2}
+ ip link set veth3 netns ${NS2}
+ ip link set veth4 netns ${NS3}
+ ip link set veth5 netns ${NS1}
+ ip link set veth6 netns ${NS2}
+ ip link set veth7 netns ${NS2}
+ ip link set veth8 netns ${NS3}
+
+ # configure addesses: the top route (1-2-3-4)
+ ip -netns ${NS1} addr add ${IPv4_1}/24 dev veth1
+ ip -netns ${NS2} addr add ${IPv4_2}/24 dev veth2
+ ip -netns ${NS2} addr add ${IPv4_3}/24 dev veth3
+ ip -netns ${NS3} addr add ${IPv4_4}/24 dev veth4
+ ip -netns ${NS1} -6 addr add ${IPv6_1}/128 nodad dev veth1
+ ip -netns ${NS2} -6 addr add ${IPv6_2}/128 nodad dev veth2
+ ip -netns ${NS2} -6 addr add ${IPv6_3}/128 nodad dev veth3
+ ip -netns ${NS3} -6 addr add ${IPv6_4}/128 nodad dev veth4
+
+ # configure addresses: the bottom route (5-6-7-8)
+ ip -netns ${NS1} addr add ${IPv4_5}/24 dev veth5
+ ip -netns ${NS2} addr add ${IPv4_6}/24 dev veth6
+ ip -netns ${NS2} addr add ${IPv4_7}/24 dev veth7
+ ip -netns ${NS3} addr add ${IPv4_8}/24 dev veth8
+ ip -netns ${NS1} -6 addr add ${IPv6_5}/128 nodad dev veth5
+ ip -netns ${NS2} -6 addr add ${IPv6_6}/128 nodad dev veth6
+ ip -netns ${NS2} -6 addr add ${IPv6_7}/128 nodad dev veth7
+ ip -netns ${NS3} -6 addr add ${IPv6_8}/128 nodad dev veth8
+
+
+ ip -netns ${NS1} link set dev veth1 up
+ ip -netns ${NS2} link set dev veth2 up
+ ip -netns ${NS2} link set dev veth3 up
+ ip -netns ${NS3} link set dev veth4 up
+ ip -netns ${NS1} link set dev veth5 up
+ ip -netns ${NS2} link set dev veth6 up
+ ip -netns ${NS2} link set dev veth7 up
+ ip -netns ${NS3} link set dev veth8 up
+
+ # configure routes: IP*_SRC -> veth1/IP*_2 (= top route) default;
+ # the bottom route to specific bottom addresses
+
+ # NS1
+ # top route
+ ip -netns ${NS1} route add ${IPv4_2}/32 dev veth1
+ ip -netns ${NS1} route add default dev veth1 via ${IPv4_2} # go top by default
+ ip -netns ${NS1} -6 route add ${IPv6_2}/128 dev veth1
+ ip -netns ${NS1} -6 route add default dev veth1 via ${IPv6_2} # go top by default
+ # bottom route
+ ip -netns ${NS1} route add ${IPv4_6}/32 dev veth5
+ ip -netns ${NS1} route add ${IPv4_7}/32 dev veth5 via ${IPv4_6}
+ ip -netns ${NS1} route add ${IPv4_8}/32 dev veth5 via ${IPv4_6}
+ ip -netns ${NS1} -6 route add ${IPv6_6}/128 dev veth5
+ ip -netns ${NS1} -6 route add ${IPv6_7}/128 dev veth5 via ${IPv6_6}
+ ip -netns ${NS1} -6 route add ${IPv6_8}/128 dev veth5 via ${IPv6_6}
+
+ # NS2
+ # top route
+ ip -netns ${NS2} route add ${IPv4_1}/32 dev veth2
+ ip -netns ${NS2} route add ${IPv4_4}/32 dev veth3
+ ip -netns ${NS2} -6 route add ${IPv6_1}/128 dev veth2
+ ip -netns ${NS2} -6 route add ${IPv6_4}/128 dev veth3
+ # bottom route
+ ip -netns ${NS2} route add ${IPv4_5}/32 dev veth6
+ ip -netns ${NS2} route add ${IPv4_8}/32 dev veth7
+ ip -netns ${NS2} -6 route add ${IPv6_5}/128 dev veth6
+ ip -netns ${NS2} -6 route add ${IPv6_8}/128 dev veth7
+
+ # NS3
+ # top route
+ ip -netns ${NS3} route add ${IPv4_3}/32 dev veth4
+ ip -netns ${NS3} route add ${IPv4_1}/32 dev veth4 via ${IPv4_3}
+ ip -netns ${NS3} route add ${IPv4_2}/32 dev veth4 via ${IPv4_3}
+ ip -netns ${NS3} -6 route add ${IPv6_3}/128 dev veth4
+ ip -netns ${NS3} -6 route add ${IPv6_1}/128 dev veth4 via ${IPv6_3}
+ ip -netns ${NS3} -6 route add ${IPv6_2}/128 dev veth4 via ${IPv6_3}
+ # bottom route
+ ip -netns ${NS3} route add ${IPv4_7}/32 dev veth8
+ ip -netns ${NS3} route add ${IPv4_5}/32 dev veth8 via ${IPv4_7}
+ ip -netns ${NS3} route add ${IPv4_6}/32 dev veth8 via ${IPv4_7}
+ ip -netns ${NS3} -6 route add ${IPv6_7}/128 dev veth8
+ ip -netns ${NS3} -6 route add ${IPv6_5}/128 dev veth8 via ${IPv6_7}
+ ip -netns ${NS3} -6 route add ${IPv6_6}/128 dev veth8 via ${IPv6_7}
+
+ # configure IPv4 GRE device in NS3, and a route to it via the "bottom" route
+ ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local ${IPv4_GRE} ttl 255
+ ip -netns ${NS3} link set gre_dev up
+ ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
+ ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
+ ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
+
+
+ # configure IPv6 GRE device in NS3, and a route to it via the "bottom" route
+ ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote ${IPv6_1} local ${IPv6_GRE} ttl 255
+ ip -netns ${NS3} link set gre6_dev up
+ ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
+ ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
+ ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
+
+ # rp_filter gets confused by what these tests are doing, so disable it
+ ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
+ ip netns exec ${NS2} sysctl -wq net.ipv4.conf.all.rp_filter=0
+ ip netns exec ${NS3} sysctl -wq net.ipv4.conf.all.rp_filter=0
+}
+
+cleanup() {
+ ip netns del ${NS1} 2> /dev/null
+ ip netns del ${NS2} 2> /dev/null
+ ip netns del ${NS3} 2> /dev/null
+}
+
+trap cleanup EXIT
+
+test_ping() {
+ local readonly PROTO=$1
+ local readonly EXPECTED=$2
+ local RET=0
+
+ set +e
+ if [ "${PROTO}" == "IPv4" ] ; then
+ ip netns exec ${NS1} ping -c 1 -W 1 -I ${IPv4_SRC} ${IPv4_DST} 2>&1 > /dev/null
+ RET=$?
+ elif [ "${PROTO}" == "IPv6" ] ; then
+ ip netns exec ${NS1} ping6 -c 1 -W 6 -I ${IPv6_SRC} ${IPv6_DST} 2>&1 > /dev/null
+ RET=$?
+ else
+ echo "test_ping: unknown PROTO: ${PROTO}"
+ exit 1
+ fi
+ set -e
+
+ if [ "0" != "${RET}" ]; then
+ RET=1
+ fi
+
+ if [ "${EXPECTED}" != "${RET}" ] ; then
+ echo "FAIL: test_ping: ${RET}"
+ exit 1
+ fi
+}
+
+test_egress() {
+ local readonly ENCAP=$1
+ echo "starting egress ${ENCAP} encap test"
+ setup
+
+ # need to wait a bit for IPv6 to autoconf, otherwise
+ # ping6 sometimes fails with "unable to bind to address"
+
+ # by default, pings work
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ # remove NS2->DST routes, ping fails
+ ip -netns ${NS2} route del ${IPv4_DST}/32 dev veth3
+ ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3
+ test_ping IPv4 1
+ test_ping IPv6 1
+
+ # install replacement routes (LWT/eBPF), pings succeed
+ if [ "${ENCAP}" == "IPv4" ] ; then
+ ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre dev veth1
+ ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre dev veth1
+ elif [ "${ENCAP}" == "IPv6" ] ; then
+ ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre6 dev veth1
+ ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre6 dev veth1
+ else
+ echo "FAIL: unknown encap ${ENCAP}"
+ fi
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ cleanup
+ echo "PASS"
+}
+
+test_ingress() {
+ local readonly ENCAP=$1
+ echo "starting ingress ${ENCAP} encap test"
+ setup
+
+ # need to wait a bit for IPv6 to autoconf, otherwise
+ # ping6 sometimes fails with "unable to bind to address"
+
+ # by default, pings work
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ # remove NS2->DST routes, pings fail
+ ip -netns ${NS2} route del ${IPv4_DST}/32 dev veth3
+ ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3
+ test_ping IPv4 1
+ test_ping IPv6 1
+
+ # install replacement routes (LWT/eBPF), pings succeed
+ if [ "${ENCAP}" == "IPv4" ] ; then
+ ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre dev veth2
+ ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre dev veth2
+ elif [ "${ENCAP}" == "IPv6" ] ; then
+ ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre6 dev veth2
+ ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre6 dev veth2
+ else
+ echo "FAIL: unknown encap ${ENCAP}"
+ fi
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ cleanup
+ echo "PASS"
+}
+
+test_egress IPv4
+test_egress IPv6
+
+test_ingress IPv4
+test_ingress IPv6
+
+echo "all tests passed"
--
2.20.1.495.gaa96b0ce6b-goog
^ permalink raw reply related
* Re: [PATCH net-next v8 0/8] devlink: Add configuration parameters support for devlink_port
From: Jakub Kicinski @ 2019-01-30 23:57 UTC (permalink / raw)
To: Vasundhara Volam; +Cc: davem, michael.chan, jiri, mkubecek, netdev
In-Reply-To: <1548678627-21938-1-git-send-email-vasundhara-v.volam@broadcom.com>
On Mon, 28 Jan 2019 18:00:19 +0530, Vasundhara Volam wrote:
> This patchset adds support for configuration parameters setting through
> devlink_port. Each device registers supported configuration parameters
> table.
>
> The user can retrieve data on these parameters by
> "devlink port param show" command and can set new value to a
> parameter by "devlink port param set" command.
> All configuration modes supported by devlink_dev are supported
> by devlink_port also.
Hm, I think we were kind of going somewhere with the ethtool/nl
attribute encapsulation idea. You seem to have ignored those comments
on v7 and reposted v8 a day after.
I think we should explore the nesting further. The only obstacle is
that ethtool netlink conversion is not yet finished, but that's just
a simple matter of programming. Do you disagree with that direction?
Please comment.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox