* RE: [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling offsets
From: Bowers, AndrewX @ 2019-07-19 17:22 UTC (permalink / raw)
To: netdev@vger.kernel.org, ast@kernel.org
Cc: intel-wired-lan@lists.osuosl.org, bpf@vger.kernel.org
In-Reply-To: <20190716030637.5634-5-kevin.laatz@intel.com>
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 04/10] i40e: modify driver for handling
> offsets
>
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for each
> case.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++++++++++++++++-----
> 1 file changed, 21 insertions(+), 5 deletions(-)
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling offsets
From: Bowers, AndrewX @ 2019-07-19 17:22 UTC (permalink / raw)
To: netdev@vger.kernel.org, ast@kernel.org
Cc: intel-wired-lan@lists.osuosl.org, bpf@vger.kernel.org
In-Reply-To: <20190716030637.5634-6-kevin.laatz@intel.com>
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Kevin Laatz
> Sent: Monday, July 15, 2019 8:07 PM
> To: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> jonathan.lemon@gmail.com
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; intel-wired-lan@lists.osuosl.org;
> bpf@vger.kernel.org; Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [Intel-wired-lan] [PATCH v2 05/10] ixgbe: modify driver for handling
> offsets
>
> With the addition of the unaligned chunks option, we need to make sure we
> handle the offsets accordingly based on the mode we are currently running
> in. This patch modifies the driver to appropriately mask the address for each
> case.
>
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 ++++++++++++++++----
> 1 file changed, 21 insertions(+), 5 deletions(-)
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
^ permalink raw reply
* Re: [PATCH 2/2] libbpf: Avoid designated initializers for unnamed union members
From: Andrii Nakryiko @ 2019-07-19 17:28 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Daniel Borkmann, Alexei Starovoitov, Clark Williams, bpf,
Networking, Arnaldo Carvalho de Melo, Andrii Nakryiko,
Adrian Hunter, Jiri Olsa, Namhyung Kim
In-Reply-To: <20190719143407.20847-3-acme@kernel.org>
On Fri, Jul 19, 2019 at 7:34 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> As it fails to build in some systems with:
>
> libbpf.c: In function 'perf_buffer__new':
> libbpf.c:4515: error: unknown field 'sample_period' specified in initializer
> libbpf.c:4516: error: unknown field 'wakeup_events' specified in initializer
>
> Doing as:
>
> attr.sample_period = 1;
>
> I.e. not as a designated initializer makes it build everywhere.
>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Fixes: fb84b8224655 ("libbpf: add perf buffer API")
> Link: https://lkml.kernel.org/n/tip-hnlmch8qit1ieksfppmr32si@git.kernel.org
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
Thanks, Arnaldo!
Acked-by: Andrii Nakryiko <andriin@fb.com>
> tools/lib/bpf/libbpf.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index b1dec5b1de54..aaca132def74 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4508,13 +4508,13 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
> const struct perf_buffer_opts *opts)
> {
> struct perf_buffer_params p = {};
> - struct perf_event_attr attr = {
> - .config = PERF_COUNT_SW_BPF_OUTPUT,
> - .type = PERF_TYPE_SOFTWARE,
> - .sample_type = PERF_SAMPLE_RAW,
> - .sample_period = 1,
> - .wakeup_events = 1,
> - };
> + struct perf_event_attr attr = { 0, };
> +
> + attr.config = PERF_COUNT_SW_BPF_OUTPUT,
> + attr.type = PERF_TYPE_SOFTWARE;
> + attr.sample_type = PERF_SAMPLE_RAW;
> + attr.sample_period = 1;
> + attr.wakeup_events = 1;
>
> p.attr = &attr;
> p.sample_cb = opts ? opts->sample_cb : NULL;
> --
> 2.21.0
>
^ permalink raw reply
* [PATCH bpf v4 00/14] sockmap/tls fixes
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski
John says:
Resolve a series of splats discovered by syzbot and an unhash
TLS issue noted by Eric Dumazet.
The main issues revolved around interaction between TLS and
sockmap tear down. TLS and sockmap could both reset sk->prot
ops creating a condition where a close or unhash op could be
called forever. A rare race condition resulting from a missing
rcu sync operation was causing a use after free. Then on the
TLS side dropping the sock lock and re-acquiring it during the
close op could hang. Finally, sockmap must be deployed before
tls for current stack assumptions to be met. This is enforced
now. A feature series can enable it.
To fix this first refactor TLS code so the lock is held for the
entire teardown operation. Then add an unhash callback to ensure
TLS can not transition from ESTABLISHED to LISTEN state. This
transition is a similar bug to the one found and fixed previously
in sockmap. Then apply three fixes to sockmap to fix up races
on tear down around map free and close. Finally, if sockmap
is destroyed before TLS we add a new ULP op update to inform
the TLS stack it should not call sockmap ops. This last one
appears to be the most commonly found issue from syzbot.
v4:
- fix some use after frees;
- disable disconnect work for offload (ctx lifetime is much
more complex);
- remove some of the dead code which made it hard to understand
(for me) that things work correctly (e.g. the checks TLS is
the top ULP);
- add selftets.
Jakub Kicinski (7):
net/tls: don't arm strparser immediately in tls_set_sw_offload()
net/tls: don't call tls_sk_proto_close for hw record offload
selftests/tls: add a test for ULP but no keys
selftests/tls: test error codes around TLS ULP installation
selftests/tls: add a bidirectional test
selftests/tls: close the socket with open record
selftests/tls: add shutdown tests
John Fastabend (7):
net/tls: remove close callback sock unlock/lock around TX work flush
net/tls: remove sock unlock/lock around strp_done()
net/tls: fix transition through disconnect with close
bpf: sockmap, sock_map_delete needs to use xchg
bpf: sockmap, synchronize_rcu before free'ing map
bpf: sockmap, only create entry if ulp is not already enabled
bpf: sockmap/tls, close can race with map free
Documentation/networking/tls-offload.rst | 6 +
include/linux/skmsg.h | 8 +-
include/net/tcp.h | 3 +
include/net/tls.h | 15 +-
net/core/skmsg.c | 4 +-
net/core/sock_map.c | 19 ++-
net/ipv4/tcp_ulp.c | 13 ++
net/tls/tls_main.c | 142 +++++++++++++----
net/tls/tls_sw.c | 83 +++++++---
tools/testing/selftests/net/tls.c | 194 +++++++++++++++++++++++
10 files changed, 419 insertions(+), 68 deletions(-)
--
2.21.0
^ permalink raw reply
* [PATCH bpf v4 01/14] net/tls: don't arm strparser immediately in tls_set_sw_offload()
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
In tls_set_device_offload_rx() we prepare the software context
for RX fallback and proceed to add the connection to the device.
Unfortunately, software context prep includes arming strparser
so in case of a later error we have to release the socket lock
to call strp_done().
In preparation for not releasing the socket lock half way through
callbacks move arming strparser into a separate function.
Following patches will make use of that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
include/net/tls.h | 1 +
net/tls/tls_device.c | 1 +
net/tls/tls_main.c | 8 +++++---
net/tls/tls_sw.c | 19 ++++++++++++-------
4 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index 584609174fe0..43f551cd508b 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -355,6 +355,7 @@ int tls_sk_attach(struct sock *sk, int optname, char __user *optval,
unsigned int optlen);
int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
+void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx);
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 7c0b2b778703..4d67d72f007c 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -1045,6 +1045,7 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
rc = tls_set_sw_offload(sk, ctx, 0);
if (rc)
goto release_ctx;
+ tls_sw_strparser_arm(sk, ctx);
rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_RX,
&ctx->crypto_recv.info,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 4674e57e66b0..85a9d7d57b32 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -526,6 +526,8 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
{
#endif
rc = tls_set_sw_offload(sk, ctx, 1);
+ if (rc)
+ goto err_crypto_info;
conf = TLS_SW;
}
} else {
@@ -537,13 +539,13 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
{
#endif
rc = tls_set_sw_offload(sk, ctx, 0);
+ if (rc)
+ goto err_crypto_info;
+ tls_sw_strparser_arm(sk, ctx);
conf = TLS_SW;
}
}
- if (rc)
- goto err_crypto_info;
-
if (tx)
ctx->tx_conf = conf;
else
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 53b4ad94e74a..f58a8ffc2a9c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2160,6 +2160,18 @@ void tls_sw_write_space(struct sock *sk, struct tls_context *ctx)
}
}
+void tls_sw_strparser_arm(struct sock *sk, struct tls_context *tls_ctx)
+{
+ struct tls_sw_context_rx *rx_ctx = tls_sw_ctx_rx(tls_ctx);
+
+ write_lock_bh(&sk->sk_callback_lock);
+ rx_ctx->saved_data_ready = sk->sk_data_ready;
+ sk->sk_data_ready = tls_data_ready;
+ write_unlock_bh(&sk->sk_callback_lock);
+
+ strp_check_rcv(&rx_ctx->strp);
+}
+
int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -2357,13 +2369,6 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
cb.parse_msg = tls_read_size;
strp_init(&sw_ctx_rx->strp, sk, &cb);
-
- write_lock_bh(&sk->sk_callback_lock);
- sw_ctx_rx->saved_data_ready = sk->sk_data_ready;
- sk->sk_data_ready = tls_data_ready;
- write_unlock_bh(&sk->sk_callback_lock);
-
- strp_check_rcv(&sw_ctx_rx->strp);
}
goto out;
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 03/14] net/tls: remove close callback sock unlock/lock around TX work flush
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
The tls close() callback currently drops the sock lock, makes a
cancel_delayed_work_sync() call, and then relocks the sock.
By restructuring the code we can avoid droping lock and then
reclaiming it. To simplify this we do the following,
tls_sk_proto_close
set_bit(CLOSING)
set_bit(SCHEDULE)
cancel_delay_work_sync() <- cancel workqueue
lock_sock(sk)
...
release_sock(sk)
strp_done()
Setting the CLOSING bit prevents the SCHEDULE bit from being
cleared by any workqueue items e.g. if one happens to be
scheduled and run between when we set SCHEDULE bit and cancel
work. Then because SCHEDULE bit is set now no new work will
be scheduled.
Tested with net selftests and bpf selftests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
include/net/tls.h | 2 ++
net/tls/tls_main.c | 3 +++
net/tls/tls_sw.c | 24 +++++++++++++++++-------
3 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index 43f551cd508b..d4276cb6de53 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -162,6 +162,7 @@ struct tls_sw_context_tx {
int async_capable;
#define BIT_TX_SCHEDULED 0
+#define BIT_TX_CLOSING 1
unsigned long tx_bitmask;
};
@@ -360,6 +361,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
void tls_sw_close(struct sock *sk, long timeout);
+void tls_sw_cancel_work_tx(struct tls_context *tls_ctx);
void tls_sw_free_resources_tx(struct sock *sk);
void tls_sw_free_resources_rx(struct sock *sk);
void tls_sw_release_resources_rx(struct sock *sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7ab682ed99fa..5c29b410cf7d 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -268,6 +268,9 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
void (*sk_proto_close)(struct sock *sk, long timeout);
bool free_ctx = false;
+ if (ctx->tx_conf == TLS_SW)
+ tls_sw_cancel_work_tx(ctx);
+
lock_sock(sk);
sk_proto_close = ctx->sk_proto_close;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f58a8ffc2a9c..38c0e53c727d 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2054,6 +2054,15 @@ static void tls_data_ready(struct sock *sk)
}
}
+void tls_sw_cancel_work_tx(struct tls_context *tls_ctx)
+{
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
+
+ set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask);
+ set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask);
+ cancel_delayed_work_sync(&ctx->tx_work.work);
+}
+
void tls_sw_free_resources_tx(struct sock *sk)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
@@ -2065,11 +2074,6 @@ void tls_sw_free_resources_tx(struct sock *sk)
if (atomic_read(&ctx->encrypt_pending))
crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
- release_sock(sk);
- cancel_delayed_work_sync(&ctx->tx_work.work);
- lock_sock(sk);
-
- /* Tx whatever records we can transmit and abandon the rest */
tls_tx_records(sk, -1);
/* Free up un-sent records in tx_list. First, free
@@ -2137,11 +2141,17 @@ static void tx_work_handler(struct work_struct *work)
struct tx_work, work);
struct sock *sk = tx_work->sk;
struct tls_context *tls_ctx = tls_get_ctx(sk);
- struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
+ struct tls_sw_context_tx *ctx;
- if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
+ if (unlikely(!tls_ctx))
return;
+ ctx = tls_sw_ctx_tx(tls_ctx);
+ if (test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask))
+ return;
+
+ if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
+ return;
lock_sock(sk);
tls_tx_records(sk, -1);
release_sock(sk);
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 04/14] net/tls: remove sock unlock/lock around strp_done()
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
The tls close() callback currently drops the sock lock to call
strp_done(). Split up the RX cleanup into stopping the strparser
and releasing most resources, syncing strparser and finally
freeing the context.
To avoid the need for a strp_done() call on the cleanup path
of device offload make sure we don't arm the strparser until
we are sure init will be successful.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
include/net/tls.h | 7 ++---
net/tls/tls_device.c | 1 -
net/tls/tls_main.c | 61 ++++++++++++++++++++++----------------------
net/tls/tls_sw.c | 40 +++++++++++++++++++++--------
4 files changed, 64 insertions(+), 45 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index d4276cb6de53..235508e35fd4 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -107,9 +107,7 @@ struct tls_device {
enum {
TLS_BASE,
TLS_SW,
-#ifdef CONFIG_TLS_DEVICE
TLS_HW,
-#endif
TLS_HW_RECORD,
TLS_NUM_CONFIG,
};
@@ -357,14 +355,17 @@ int tls_sk_attach(struct sock *sk, int optname, char __user *optval,
int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx);
void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx);
+void tls_sw_strparser_done(struct tls_context *tls_ctx);
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
void tls_sw_close(struct sock *sk, long timeout);
void tls_sw_cancel_work_tx(struct tls_context *tls_ctx);
-void tls_sw_free_resources_tx(struct sock *sk);
+void tls_sw_release_resources_tx(struct sock *sk);
+void tls_sw_free_ctx_tx(struct tls_context *tls_ctx);
void tls_sw_free_resources_rx(struct sock *sk);
void tls_sw_release_resources_rx(struct sock *sk);
+void tls_sw_free_ctx_rx(struct tls_context *tls_ctx);
int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int nonblock, int flags, int *addr_len);
bool tls_sw_stream_read(const struct sock *sk);
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 4d67d72f007c..7c0b2b778703 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -1045,7 +1045,6 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
rc = tls_set_sw_offload(sk, ctx, 0);
if (rc)
goto release_ctx;
- tls_sw_strparser_arm(sk, ctx);
rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_RX,
&ctx->crypto_recv.info,
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 5c29b410cf7d..d152a00a7a27 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -261,24 +261,9 @@ void tls_ctx_free(struct tls_context *ctx)
kfree(ctx);
}
-static void tls_sk_proto_close(struct sock *sk, long timeout)
+static void tls_sk_proto_cleanup(struct sock *sk,
+ struct tls_context *ctx, long timeo)
{
- struct tls_context *ctx = tls_get_ctx(sk);
- long timeo = sock_sndtimeo(sk, 0);
- void (*sk_proto_close)(struct sock *sk, long timeout);
- bool free_ctx = false;
-
- if (ctx->tx_conf == TLS_SW)
- tls_sw_cancel_work_tx(ctx);
-
- lock_sock(sk);
- sk_proto_close = ctx->sk_proto_close;
-
- if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
- free_ctx = true;
- goto skip_tx_cleanup;
- }
-
if (unlikely(sk->sk_write_pending) &&
!wait_on_pending_writer(sk, &timeo))
tls_handle_open_record(sk, 0);
@@ -287,7 +272,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
if (ctx->tx_conf == TLS_SW) {
kfree(ctx->tx.rec_seq);
kfree(ctx->tx.iv);
- tls_sw_free_resources_tx(sk);
+ tls_sw_release_resources_tx(sk);
#ifdef CONFIG_TLS_DEVICE
} else if (ctx->tx_conf == TLS_HW) {
tls_device_free_resources_tx(sk);
@@ -295,26 +280,40 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
}
if (ctx->rx_conf == TLS_SW)
- tls_sw_free_resources_rx(sk);
+ tls_sw_release_resources_rx(sk);
#ifdef CONFIG_TLS_DEVICE
if (ctx->rx_conf == TLS_HW)
tls_device_offload_cleanup_rx(sk);
-
- if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW) {
-#else
- {
#endif
- tls_ctx_free(ctx);
- ctx = NULL;
- }
+}
+
+static void tls_sk_proto_close(struct sock *sk, long timeout)
+{
+ void (*sk_proto_close)(struct sock *sk, long timeout);
+ struct tls_context *ctx = tls_get_ctx(sk);
+ long timeo = sock_sndtimeo(sk, 0);
+ bool free_ctx;
+
+ if (ctx->tx_conf == TLS_SW)
+ tls_sw_cancel_work_tx(ctx);
+
+ lock_sock(sk);
+ free_ctx = ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW;
+ sk_proto_close = ctx->sk_proto_close;
+
+ if (ctx->tx_conf != TLS_BASE || ctx->rx_conf != TLS_BASE)
+ tls_sk_proto_cleanup(sk, ctx, timeo);
-skip_tx_cleanup:
release_sock(sk);
+ if (ctx->tx_conf == TLS_SW)
+ tls_sw_free_ctx_tx(ctx);
+ if (ctx->rx_conf == TLS_SW || ctx->rx_conf == TLS_HW)
+ tls_sw_strparser_done(ctx);
+ if (ctx->rx_conf == TLS_SW)
+ tls_sw_free_ctx_rx(ctx);
sk_proto_close(sk, timeout);
- /* free ctx for TLS_HW_RECORD, used by tcp_set_state
- * for sk->sk_prot->unhash [tls_hw_unhash]
- */
+
if (free_ctx)
tls_ctx_free(ctx);
}
@@ -541,9 +540,9 @@ static int do_tls_setsockopt_conf(struct sock *sk, char __user *optval,
rc = tls_set_sw_offload(sk, ctx, 0);
if (rc)
goto err_crypto_info;
- tls_sw_strparser_arm(sk, ctx);
conf = TLS_SW;
}
+ tls_sw_strparser_arm(sk, ctx);
}
if (tx)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 38c0e53c727d..91d21b048a9b 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2063,7 +2063,7 @@ void tls_sw_cancel_work_tx(struct tls_context *tls_ctx)
cancel_delayed_work_sync(&ctx->tx_work.work);
}
-void tls_sw_free_resources_tx(struct sock *sk)
+void tls_sw_release_resources_tx(struct sock *sk)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
@@ -2096,6 +2096,11 @@ void tls_sw_free_resources_tx(struct sock *sk)
crypto_free_aead(ctx->aead_send);
tls_free_open_rec(sk);
+}
+
+void tls_sw_free_ctx_tx(struct tls_context *tls_ctx)
+{
+ struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
kfree(ctx);
}
@@ -2114,25 +2119,40 @@ void tls_sw_release_resources_rx(struct sock *sk)
skb_queue_purge(&ctx->rx_list);
crypto_free_aead(ctx->aead_recv);
strp_stop(&ctx->strp);
- write_lock_bh(&sk->sk_callback_lock);
- sk->sk_data_ready = ctx->saved_data_ready;
- write_unlock_bh(&sk->sk_callback_lock);
- release_sock(sk);
- strp_done(&ctx->strp);
- lock_sock(sk);
+ /* If tls_sw_strparser_arm() was not called (cleanup paths)
+ * we still want to strp_stop(), but sk->sk_data_ready was
+ * never swapped.
+ */
+ if (ctx->saved_data_ready) {
+ write_lock_bh(&sk->sk_callback_lock);
+ sk->sk_data_ready = ctx->saved_data_ready;
+ write_unlock_bh(&sk->sk_callback_lock);
+ }
}
}
-void tls_sw_free_resources_rx(struct sock *sk)
+void tls_sw_strparser_done(struct tls_context *tls_ctx)
{
- struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
- tls_sw_release_resources_rx(sk);
+ strp_done(&ctx->strp);
+}
+
+void tls_sw_free_ctx_rx(struct tls_context *tls_ctx)
+{
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
kfree(ctx);
}
+void tls_sw_free_resources_rx(struct sock *sk)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+
+ tls_sw_release_resources_rx(sk);
+ tls_sw_free_ctx_rx(tls_ctx);
+}
+
/* The work handler to transmitt the encrypted records in tx_list */
static void tx_work_handler(struct work_struct *work)
{
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 02/14] net/tls: don't call tls_sk_proto_close for hw record offload
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
The deprecated TOE offload doesn't actually do anything in
tls_sk_proto_close() - all TLS code is skipped and context
not freed. Remove the callback to make it easier to refactor
tls_sk_proto_close().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
net/tls/tls_main.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 85a9d7d57b32..7ab682ed99fa 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -271,9 +271,6 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
lock_sock(sk);
sk_proto_close = ctx->sk_proto_close;
- if (ctx->tx_conf == TLS_HW_RECORD && ctx->rx_conf == TLS_HW_RECORD)
- goto skip_tx_cleanup;
-
if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE) {
free_ctx = true;
goto skip_tx_cleanup;
@@ -766,7 +763,6 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
prot[TLS_HW_RECORD][TLS_HW_RECORD] = *base;
prot[TLS_HW_RECORD][TLS_HW_RECORD].hash = tls_hw_hash;
prot[TLS_HW_RECORD][TLS_HW_RECORD].unhash = tls_hw_unhash;
- prot[TLS_HW_RECORD][TLS_HW_RECORD].close = tls_sk_proto_close;
}
static int tls_init(struct sock *sk)
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 05/14] net/tls: fix transition through disconnect with close
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call the tls close callback. Because of this a user could
disconnect a socket then put it in a LISTEN state which would break
our assumptions about sockets always being ESTABLISHED state.
More directly because close() can call unhash() and unhash is
implemented by sockmap if a sockmap socket has TLS enabled we can
incorrectly destroy the psock from unhash() and then call its close
handler again. But because the psock (sockmap socket representation)
is already destroyed we call close handler in sk->prot. However,
in some cases (TLS BASE/BASE case) this will still point at the
sockmap close handler resulting in a circular call and crash reported
by syzbot.
To fix both above issues implement the unhash() routine for TLS.
v4:
- add note about tls offload still needing the fix;
- move sk_proto to the cold cache line;
- split TX context free into "release" and "free",
otherwise the GC work itself is in already freed
memory;
- more TX before RX for consistency;
- reuse tls_ctx_free();
- schedule the GC work after we're done with context
to avoid UAF;
- don't set the unhash in all modes, all modes "inherit"
TLS_BASE's callbacks anyway;
- disable the unhash hook for TLS_HW.
Fixes: 3c4d7559159bf ("tls: kernel TLS support")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
Documentation/networking/tls-offload.rst | 6 +++
include/net/tls.h | 5 ++-
net/tls/tls_main.c | 55 ++++++++++++++++++++++++
3 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
index 048e5ca44824..8a1eeb393316 100644
--- a/Documentation/networking/tls-offload.rst
+++ b/Documentation/networking/tls-offload.rst
@@ -513,3 +513,9 @@ Redirects leak clear text
In the RX direction, if segment has already been decrypted by the device
and it gets redirected or mirrored - clear text will be transmitted out.
+
+shutdown() doesn't clear TLS state
+----------------------------------
+
+shutdown() system call allows for a TLS socket to be reused as a different
+connection. Offload doesn't currently handle that.
diff --git a/include/net/tls.h b/include/net/tls.h
index 235508e35fd4..9e425ac2de45 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -271,6 +271,8 @@ struct tls_context {
unsigned long flags;
/* cache cold stuff */
+ struct proto *sk_proto;
+
void (*sk_destruct)(struct sock *sk);
void (*sk_proto_close)(struct sock *sk, long timeout);
@@ -288,6 +290,8 @@ struct tls_context {
struct list_head list;
refcount_t refcount;
+
+ struct work_struct gc;
};
enum tls_offload_ctx_dir {
@@ -359,7 +363,6 @@ void tls_sw_strparser_done(struct tls_context *tls_ctx);
int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int tls_sw_sendpage(struct sock *sk, struct page *page,
int offset, size_t size, int flags);
-void tls_sw_close(struct sock *sk, long timeout);
void tls_sw_cancel_work_tx(struct tls_context *tls_ctx);
void tls_sw_release_resources_tx(struct sock *sk);
void tls_sw_free_ctx_tx(struct tls_context *tls_ctx);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index d152a00a7a27..48f1c26459d0 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -261,6 +261,33 @@ void tls_ctx_free(struct tls_context *ctx)
kfree(ctx);
}
+static void tls_ctx_free_deferred(struct work_struct *gc)
+{
+ struct tls_context *ctx = container_of(gc, struct tls_context, gc);
+
+ /* Ensure any remaining work items are completed. The sk will
+ * already have lost its tls_ctx reference by the time we get
+ * here so no xmit operation will actually be performed.
+ */
+ if (ctx->tx_conf == TLS_SW) {
+ tls_sw_cancel_work_tx(ctx);
+ tls_sw_free_ctx_tx(ctx);
+ }
+
+ if (ctx->rx_conf == TLS_SW) {
+ tls_sw_strparser_done(ctx);
+ tls_sw_free_ctx_rx(ctx);
+ }
+
+ tls_ctx_free(ctx);
+}
+
+static void tls_ctx_free_wq(struct tls_context *ctx)
+{
+ INIT_WORK(&ctx->gc, tls_ctx_free_deferred);
+ schedule_work(&ctx->gc);
+}
+
static void tls_sk_proto_cleanup(struct sock *sk,
struct tls_context *ctx, long timeo)
{
@@ -288,6 +315,26 @@ static void tls_sk_proto_cleanup(struct sock *sk,
#endif
}
+static void tls_sk_proto_unhash(struct sock *sk)
+{
+ struct inet_connection_sock *icsk = inet_csk(sk);
+ long timeo = sock_sndtimeo(sk, 0);
+ struct tls_context *ctx;
+
+ if (unlikely(!icsk->icsk_ulp_data)) {
+ if (sk->sk_prot->unhash)
+ sk->sk_prot->unhash(sk);
+ }
+
+ ctx = tls_get_ctx(sk);
+ tls_sk_proto_cleanup(sk, ctx, timeo);
+ icsk->icsk_ulp_data = NULL;
+
+ if (ctx->sk_proto->unhash)
+ ctx->sk_proto->unhash(sk);
+ tls_ctx_free_wq(ctx);
+}
+
static void tls_sk_proto_close(struct sock *sk, long timeout)
{
void (*sk_proto_close)(struct sock *sk, long timeout);
@@ -305,6 +352,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
if (ctx->tx_conf != TLS_BASE || ctx->rx_conf != TLS_BASE)
tls_sk_proto_cleanup(sk, ctx, timeo);
+ sk->sk_prot = ctx->sk_proto;
release_sock(sk);
if (ctx->tx_conf == TLS_SW)
tls_sw_free_ctx_tx(ctx);
@@ -608,6 +656,7 @@ static struct tls_context *create_ctx(struct sock *sk)
ctx->setsockopt = sk->sk_prot->setsockopt;
ctx->getsockopt = sk->sk_prot->getsockopt;
ctx->sk_proto_close = sk->sk_prot->close;
+ ctx->unhash = sk->sk_prot->unhash;
return ctx;
}
@@ -731,6 +780,7 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
prot[TLS_BASE][TLS_BASE].setsockopt = tls_setsockopt;
prot[TLS_BASE][TLS_BASE].getsockopt = tls_getsockopt;
prot[TLS_BASE][TLS_BASE].close = tls_sk_proto_close;
+ prot[TLS_BASE][TLS_BASE].unhash = tls_sk_proto_unhash;
prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
prot[TLS_SW][TLS_BASE].sendmsg = tls_sw_sendmsg;
@@ -748,16 +798,20 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG],
#ifdef CONFIG_TLS_DEVICE
prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE];
+ prot[TLS_HW][TLS_BASE].unhash = base->unhash;
prot[TLS_HW][TLS_BASE].sendmsg = tls_device_sendmsg;
prot[TLS_HW][TLS_BASE].sendpage = tls_device_sendpage;
prot[TLS_HW][TLS_SW] = prot[TLS_BASE][TLS_SW];
+ prot[TLS_HW][TLS_SW].unhash = base->unhash;
prot[TLS_HW][TLS_SW].sendmsg = tls_device_sendmsg;
prot[TLS_HW][TLS_SW].sendpage = tls_device_sendpage;
prot[TLS_BASE][TLS_HW] = prot[TLS_BASE][TLS_SW];
+ prot[TLS_BASE][TLS_HW].unhash = base->unhash;
prot[TLS_SW][TLS_HW] = prot[TLS_SW][TLS_SW];
+ prot[TLS_SW][TLS_HW].unhash = base->unhash;
prot[TLS_HW][TLS_HW] = prot[TLS_HW][TLS_SW];
#endif
@@ -794,6 +848,7 @@ static int tls_init(struct sock *sk)
tls_build_proto(sk);
ctx->tx_conf = TLS_BASE;
ctx->rx_conf = TLS_BASE;
+ ctx->sk_proto = sk->sk_prot;
update_sk_prot(sk, ctx);
out:
return rc;
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 07/14] bpf: sockmap, synchronize_rcu before free'ing map
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel; +Cc: edumazet, netdev, bpf
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
We need to have a synchronize_rcu before free'ing the sockmap because
any outstanding psock references will have a pointer to the map and
when they use this could trigger a use after free.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/sock_map.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 28702f2e9a4a..56bcabe7c2f2 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -247,6 +247,8 @@ static void sock_map_free(struct bpf_map *map)
raw_spin_unlock_bh(&stab->lock);
rcu_read_unlock();
+ synchronize_rcu();
+
bpf_map_area_free(stab->sks);
kfree(stab);
}
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 08/14] bpf: sockmap, only create entry if ulp is not already enabled
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel; +Cc: edumazet, netdev, bpf
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
Sockmap does not currently support adding sockets after TLS has been
enabled. There never was a real use case for this so it was never
added. But, we lost the test for ULP at some point so add it here
and fail the socket insert if TLS is enabled. Future work could
make sockmap support this use case but fixup the bug here.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/sock_map.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 56bcabe7c2f2..1330a7442e5b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -334,6 +334,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
struct sock *sk, u64 flags)
{
struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
+ struct inet_connection_sock *icsk = inet_csk(sk);
struct sk_psock_link *link;
struct sk_psock *psock;
struct sock *osk;
@@ -344,6 +345,8 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
return -EINVAL;
if (unlikely(idx >= map->max_entries))
return -E2BIG;
+ if (unlikely(icsk->icsk_ulp_data))
+ return -EINVAL;
link = sk_psock_init_link();
if (!link)
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 09/14] bpf: sockmap/tls, close can race with map free
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, syzbot+06537213db7ba2745c4a,
Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
When a map free is called and in parallel a socket is closed we
have two paths that can potentially reset the socket prot ops, the
bpf close() path and the map free path. This creates a problem
with which prot ops should be used from the socket closed side.
If the map_free side completes first then we want to call the
original lowest level ops. However, if the tls path runs first
we want to call the sockmap ops. Additionally there was no locking
around prot updates in TLS code paths so the prot ops could
be changed multiple times once from TLS path and again from sockmap
side potentially leaving ops pointed at either TLS or sockmap
when psock and/or tls context have already been destroyed.
To fix this race first only update ops inside callback lock
so that TLS, sockmap and lowest level all agree on prot state.
Second and a ULP callback update() so that lower layers can
inform the upper layer when they are being removed allowing the
upper layer to reset prot ops.
This gets us close to allowing sockmap and tls to be stacked
in arbitrary order but will save that patch for *next trees.
v4:
- make sure we don't free things for device;
- remove the checks which swap the callbacks back
only if TLS is at the top.
Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
Let's add the check that TLS's callbacks are on top when stacking in
any order is actually supported. Or perhaps let's just not support
it until we fix the remaining bugs and races?
---
include/linux/skmsg.h | 8 +++++++-
include/net/tcp.h | 3 +++
net/core/skmsg.c | 4 ++--
net/ipv4/tcp_ulp.c | 13 +++++++++++++
net/tls/tls_main.c | 33 ++++++++++++++++++++++++++++-----
5 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 50ced8aba9db..e4b3fb4bb77c 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -354,7 +354,13 @@ static inline void sk_psock_restore_proto(struct sock *sk,
sk->sk_write_space = psock->saved_write_space;
if (psock->sk_proto) {
- sk->sk_prot = psock->sk_proto;
+ struct inet_connection_sock *icsk = inet_csk(sk);
+ bool has_ulp = !!icsk->icsk_ulp_data;
+
+ if (has_ulp)
+ tcp_update_ulp(sk, psock->sk_proto);
+ else
+ sk->sk_prot = psock->sk_proto;
psock->sk_proto = NULL;
}
}
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f42d300f0cfa..c82a23470081 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2103,6 +2103,8 @@ struct tcp_ulp_ops {
/* initialize ulp */
int (*init)(struct sock *sk);
+ /* update ulp */
+ void (*update)(struct sock *sk, struct proto *p);
/* cleanup ulp */
void (*release)(struct sock *sk);
@@ -2114,6 +2116,7 @@ void tcp_unregister_ulp(struct tcp_ulp_ops *type);
int tcp_set_ulp(struct sock *sk, const char *name);
void tcp_get_available_ulp(char *buf, size_t len);
void tcp_cleanup_ulp(struct sock *sk);
+void tcp_update_ulp(struct sock *sk, struct proto *p);
#define MODULE_ALIAS_TCP_ULP(name) \
__MODULE_INFO(alias, alias_userspace, name); \
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 93bffaad2135..6832eeb4b785 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -585,12 +585,12 @@ EXPORT_SYMBOL_GPL(sk_psock_destroy);
void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
{
- rcu_assign_sk_user_data(sk, NULL);
sk_psock_cork_free(psock);
sk_psock_zap_ingress(psock);
- sk_psock_restore_proto(sk, psock);
write_lock_bh(&sk->sk_callback_lock);
+ sk_psock_restore_proto(sk, psock);
+ rcu_assign_sk_user_data(sk, NULL);
if (psock->progs.skb_parser)
sk_psock_stop_strp(sk, psock);
write_unlock_bh(&sk->sk_callback_lock);
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
index 3d8a1d835471..4849edb62d52 100644
--- a/net/ipv4/tcp_ulp.c
+++ b/net/ipv4/tcp_ulp.c
@@ -96,6 +96,19 @@ void tcp_get_available_ulp(char *buf, size_t maxlen)
rcu_read_unlock();
}
+void tcp_update_ulp(struct sock *sk, struct proto *proto)
+{
+ struct inet_connection_sock *icsk = inet_csk(sk);
+
+ if (!icsk->icsk_ulp_ops) {
+ sk->sk_prot = proto;
+ return;
+ }
+
+ if (icsk->icsk_ulp_ops->update)
+ icsk->icsk_ulp_ops->update(sk, proto);
+}
+
void tcp_cleanup_ulp(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 48f1c26459d0..f208f8455ef2 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -328,7 +328,10 @@ static void tls_sk_proto_unhash(struct sock *sk)
ctx = tls_get_ctx(sk);
tls_sk_proto_cleanup(sk, ctx, timeo);
+ write_lock_bh(&sk->sk_callback_lock);
icsk->icsk_ulp_data = NULL;
+ sk->sk_prot = ctx->sk_proto;
+ write_unlock_bh(&sk->sk_callback_lock);
if (ctx->sk_proto->unhash)
ctx->sk_proto->unhash(sk);
@@ -337,7 +340,7 @@ static void tls_sk_proto_unhash(struct sock *sk)
static void tls_sk_proto_close(struct sock *sk, long timeout)
{
- void (*sk_proto_close)(struct sock *sk, long timeout);
+ struct inet_connection_sock *icsk = inet_csk(sk);
struct tls_context *ctx = tls_get_ctx(sk);
long timeo = sock_sndtimeo(sk, 0);
bool free_ctx;
@@ -347,12 +350,15 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
lock_sock(sk);
free_ctx = ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW;
- sk_proto_close = ctx->sk_proto_close;
if (ctx->tx_conf != TLS_BASE || ctx->rx_conf != TLS_BASE)
tls_sk_proto_cleanup(sk, ctx, timeo);
+ write_lock_bh(&sk->sk_callback_lock);
+ if (free_ctx)
+ icsk->icsk_ulp_data = NULL;
sk->sk_prot = ctx->sk_proto;
+ write_unlock_bh(&sk->sk_callback_lock);
release_sock(sk);
if (ctx->tx_conf == TLS_SW)
tls_sw_free_ctx_tx(ctx);
@@ -360,7 +366,7 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
tls_sw_strparser_done(ctx);
if (ctx->rx_conf == TLS_SW)
tls_sw_free_ctx_rx(ctx);
- sk_proto_close(sk, timeout);
+ ctx->sk_proto_close(sk, timeout);
if (free_ctx)
tls_ctx_free(ctx);
@@ -827,7 +833,7 @@ static int tls_init(struct sock *sk)
int rc = 0;
if (tls_hw_prot(sk))
- goto out;
+ return 0;
/* The TLS ulp is currently supported only for TCP sockets
* in ESTABLISHED state.
@@ -838,22 +844,38 @@ static int tls_init(struct sock *sk)
if (sk->sk_state != TCP_ESTABLISHED)
return -ENOTSUPP;
+ tls_build_proto(sk);
+
/* allocate tls context */
+ write_lock_bh(&sk->sk_callback_lock);
ctx = create_ctx(sk);
if (!ctx) {
rc = -ENOMEM;
goto out;
}
- tls_build_proto(sk);
ctx->tx_conf = TLS_BASE;
ctx->rx_conf = TLS_BASE;
ctx->sk_proto = sk->sk_prot;
update_sk_prot(sk, ctx);
out:
+ write_unlock_bh(&sk->sk_callback_lock);
return rc;
}
+static void tls_update(struct sock *sk, struct proto *p)
+{
+ struct tls_context *ctx;
+
+ ctx = tls_get_ctx(sk);
+ if (likely(ctx)) {
+ ctx->sk_proto_close = p->close;
+ ctx->sk_proto = p;
+ } else {
+ sk->sk_prot = p;
+ }
+}
+
void tls_register_device(struct tls_device *device)
{
spin_lock_bh(&device_spinlock);
@@ -874,6 +896,7 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = {
.name = "tls",
.owner = THIS_MODULE,
.init = tls_init,
+ .update = tls_update,
};
static int __init tls_register(void)
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 10/14] selftests/tls: add a test for ULP but no keys
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
Make sure we test the TLS_BASE/TLS_BASE case both with data
and the tear down/clean up path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
tools/testing/selftests/net/tls.c | 74 +++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 090fff9dbc48..194826fee4f7 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -25,6 +25,80 @@
#define TLS_PAYLOAD_MAX_LEN 16384
#define SOL_TLS 282
+#ifndef ENOTSUPP
+#define ENOTSUPP 524
+#endif
+
+FIXTURE(tls_basic)
+{
+ int fd, cfd;
+ bool notls;
+};
+
+FIXTURE_SETUP(tls_basic)
+{
+ struct sockaddr_in addr;
+ socklen_t len;
+ int sfd, ret;
+
+ self->notls = false;
+ len = sizeof(addr);
+
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = 0;
+
+ self->fd = socket(AF_INET, SOCK_STREAM, 0);
+ sfd = socket(AF_INET, SOCK_STREAM, 0);
+
+ ret = bind(sfd, &addr, sizeof(addr));
+ ASSERT_EQ(ret, 0);
+ ret = listen(sfd, 10);
+ ASSERT_EQ(ret, 0);
+
+ ret = getsockname(sfd, &addr, &len);
+ ASSERT_EQ(ret, 0);
+
+ ret = connect(self->fd, &addr, sizeof(addr));
+ ASSERT_EQ(ret, 0);
+
+ self->cfd = accept(sfd, &addr, &len);
+ ASSERT_GE(self->cfd, 0);
+
+ close(sfd);
+
+ ret = setsockopt(self->fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ if (ret != 0) {
+ ASSERT_EQ(errno, ENOTSUPP);
+ self->notls = true;
+ printf("Failure setting TCP_ULP, testing without tls\n");
+ return;
+ }
+
+ ret = setsockopt(self->cfd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ ASSERT_EQ(ret, 0);
+}
+
+FIXTURE_TEARDOWN(tls_basic)
+{
+ close(self->fd);
+ close(self->cfd);
+}
+
+/* Send some data through with ULP but no keys */
+TEST_F(tls_basic, base_base)
+{
+ char const *test_str = "test_read";
+ int send_len = 10;
+ char buf[10];
+
+ ASSERT_EQ(strlen(test_str) + 1, send_len);
+
+ EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len);
+ EXPECT_NE(recv(self->cfd, buf, send_len, 0), -1);
+ EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+};
+
FIXTURE(tls)
{
int fd, cfd;
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 11/14] selftests/tls: test error codes around TLS ULP installation
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
Test the error codes returned when TCP connection is not
in ESTABLISHED state.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
tools/testing/selftests/net/tls.c | 52 +++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 194826fee4f7..10df77326d34 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -911,6 +911,58 @@ TEST_F(tls, control_msg)
EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
}
+TEST(non_established) {
+ struct tls12_crypto_info_aes_gcm_256 tls12;
+ struct sockaddr_in addr;
+ int sfd, ret, fd;
+ socklen_t len;
+
+ len = sizeof(addr);
+
+ memset(&tls12, 0, sizeof(tls12));
+ tls12.info.version = TLS_1_2_VERSION;
+ tls12.info.cipher_type = TLS_CIPHER_AES_GCM_256;
+
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = 0;
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ sfd = socket(AF_INET, SOCK_STREAM, 0);
+
+ ret = bind(sfd, &addr, sizeof(addr));
+ ASSERT_EQ(ret, 0);
+ ret = listen(sfd, 10);
+ ASSERT_EQ(ret, 0);
+
+ ret = setsockopt(fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ EXPECT_EQ(ret, -1);
+ /* TLS ULP not supported */
+ if (errno == ENOENT)
+ return;
+ EXPECT_EQ(errno, ENOTSUPP);
+
+ ret = setsockopt(sfd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ EXPECT_EQ(ret, -1);
+ EXPECT_EQ(errno, ENOTSUPP);
+
+ ret = getsockname(sfd, &addr, &len);
+ ASSERT_EQ(ret, 0);
+
+ ret = connect(fd, &addr, sizeof(addr));
+ ASSERT_EQ(ret, 0);
+
+ ret = setsockopt(fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ ASSERT_EQ(ret, 0);
+
+ ret = setsockopt(fd, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
+ EXPECT_EQ(ret, -1);
+ EXPECT_EQ(errno, EEXIST);
+
+ close(fd);
+ close(sfd);
+}
+
TEST(keysizes) {
struct tls12_crypto_info_aes_gcm_256 tls12;
struct sockaddr_in addr;
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 12/14] selftests/tls: add a bidirectional test
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
Add a simple test which installs the TLS state for both directions,
sends and receives data on both sockets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
tools/testing/selftests/net/tls.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 10df77326d34..6d78bd050813 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -684,6 +684,37 @@ TEST_F(tls, recv_lowat)
EXPECT_EQ(memcmp(send_mem, recv_mem + 10, 5), 0);
}
+TEST_F(tls, bidir)
+{
+ struct tls12_crypto_info_aes_gcm_128 tls12;
+ char const *test_str = "test_read";
+ int send_len = 10;
+ char buf[10];
+ int ret;
+
+ memset(&tls12, 0, sizeof(tls12));
+ tls12.info.version = TLS_1_3_VERSION;
+ tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+
+ ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12, sizeof(tls12));
+ ASSERT_EQ(ret, 0);
+
+ ret = setsockopt(self->cfd, SOL_TLS, TLS_TX, &tls12, sizeof(tls12));
+ ASSERT_EQ(ret, 0);
+
+ ASSERT_EQ(strlen(test_str) + 1, send_len);
+
+ EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len);
+ EXPECT_NE(recv(self->cfd, buf, send_len, 0), -1);
+ EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+
+ memset(buf, 0, sizeof(buf));
+
+ EXPECT_EQ(send(self->cfd, test_str, send_len, 0), send_len);
+ EXPECT_NE(recv(self->fd, buf, send_len, 0), -1);
+ EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+};
+
TEST_F(tls, pollin)
{
char const *test_str = "test_poll";
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 13/14] selftests/tls: close the socket with open record
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
Add test which sends some data with MSG_MORE and then
closes the socket (never calling send without MSG_MORE).
This should make sure we clean up open records correctly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
tools/testing/selftests/net/tls.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 6d78bd050813..94a86ca882de 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -239,6 +239,16 @@ TEST_F(tls, msg_more)
EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
}
+TEST_F(tls, msg_more_unsent)
+{
+ char const *test_str = "test_read";
+ int send_len = 10;
+ char buf[10];
+
+ EXPECT_EQ(send(self->fd, test_str, send_len, MSG_MORE), send_len);
+ EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_DONTWAIT), -1);
+}
+
TEST_F(tls, sendmsg_single)
{
struct msghdr msg;
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 14/14] selftests/tls: add shutdown tests
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, Jakub Kicinski, Dirk van der Merwe
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
Add test for killing the connection via shutdown.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
---
tools/testing/selftests/net/tls.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 94a86ca882de..630c5b884d43 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -952,6 +952,33 @@ TEST_F(tls, control_msg)
EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
}
+TEST_F(tls, shutdown)
+{
+ char const *test_str = "test_read";
+ int send_len = 10;
+ char buf[10];
+
+ ASSERT_EQ(strlen(test_str) + 1, send_len);
+
+ EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len);
+ EXPECT_NE(recv(self->cfd, buf, send_len, 0), -1);
+ EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+
+ shutdown(self->fd, SHUT_RDWR);
+ shutdown(self->cfd, SHUT_RDWR);
+}
+
+TEST_F(tls, shutdown_unsent)
+{
+ char const *test_str = "test_read";
+ int send_len = 10;
+
+ EXPECT_EQ(send(self->fd, test_str, send_len, MSG_MORE), send_len);
+
+ shutdown(self->fd, SHUT_RDWR);
+ shutdown(self->cfd, SHUT_RDWR);
+}
+
TEST(non_established) {
struct tls12_crypto_info_aes_gcm_256 tls12;
struct sockaddr_in addr;
--
2.21.0
^ permalink raw reply related
* [PATCH bpf v4 06/14] bpf: sockmap, sock_map_delete needs to use xchg
From: Jakub Kicinski @ 2019-07-19 17:29 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel; +Cc: edumazet, netdev, bpf
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
From: John Fastabend <john.fastabend@gmail.com>
__sock_map_delete() may be called from a tcp event such as unhash or
close from the following trace,
tcp_bpf_close()
tcp_bpf_remove()
sk_psock_unlink()
sock_map_delete_from_link()
__sock_map_delete()
In this case the sock lock is held but this only protects against
duplicate removals on the TCP side. If the map is free'd then we have
this trace,
sock_map_free
xchg() <- replaces map entry
sock_map_unref()
sk_psock_put()
sock_map_del_link()
The __sock_map_delete() call however uses a read, test, null over the
map entry which can result in both paths trying to free the map
entry.
To fix use xchg in TCP paths as well so we avoid having two references
to the same map entry.
Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
net/core/sock_map.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 52d4faeee18b..28702f2e9a4a 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -276,16 +276,20 @@ static int __sock_map_delete(struct bpf_stab *stab, struct sock *sk_test,
struct sock **psk)
{
struct sock *sk;
+ int err = 0;
raw_spin_lock_bh(&stab->lock);
sk = *psk;
if (!sk_test || sk_test == sk)
- *psk = NULL;
+ sk = xchg(psk, NULL);
+
+ if (likely(sk))
+ sock_map_unref(sk, psk);
+ else
+ err = -EINVAL;
+
raw_spin_unlock_bh(&stab->lock);
- if (unlikely(!sk))
- return -EINVAL;
- sock_map_unref(sk, psk);
- return 0;
+ return err;
}
static void sock_map_delete_from_link(struct bpf_map *map, struct sock *sk,
--
2.21.0
^ permalink raw reply related
* [PATCH net] hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
From: Haiyang Zhang @ 2019-07-19 17:33 UTC (permalink / raw)
To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org
There is an extra rcu_read_unlock left in netvsc_recv_callback(),
after a previous patch that removes RCU from this function.
This patch removes the extra RCU unlock.
Fixes: 345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/hyperv/netvsc_drv.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index afdcc56..3544e19 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -836,7 +836,6 @@ int netvsc_recv_callback(struct net_device *net,
if (unlikely(!skb)) {
++net_device_ctx->eth_stats.rx_no_memory;
- rcu_read_unlock();
return NVSP_STAT_FAIL;
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH bpf v4 00/14] sockmap/tls fixes
From: Jakub Kicinski @ 2019-07-19 17:37 UTC (permalink / raw)
To: john.fastabend, alexei.starovoitov, daniel
Cc: edumazet, netdev, bpf, oss-drivers
In-Reply-To: <20190719172927.18181-1-jakub.kicinski@netronome.com>
On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
> John says:
>
> Resolve a series of splats discovered by syzbot and an unhash
> TLS issue noted by Eric Dumazet.
Sorry for the delay, this code is quite tricky. According to my testing
TLS SW and HW should now work, I hope I didn't regress things on the
sockmap side.
This is not solving all the issues (ugh), apart from HW needing the
unhash/shutdown treatment, as discussed we may have a sender stuck in
wmem wait while we free context underneath. That's a "minor" UAF for
another day..
^ permalink raw reply
* Re: [PATCH] gve: replace kfree with kvfree
From: Catherine Sullivan @ 2019-07-19 17:44 UTC (permalink / raw)
To: David Miller; +Cc: chuhongyuan, netdev, linux-kernel
In-Reply-To: <20190718.163207.289099133864098969.davem@davemloft.net>
On Thu, Jul 18, 2019 at 4:32 PM David Miller <davem@davemloft.net> wrote:
>
> From: Chuhong YUAN <chuhongyuan@outlook.com>
> Date: Wed, 17 Jul 2019 00:59:02 +0000
>
> > Variables allocated by kvzalloc should not be freed by kfree.
> > Because they may be allocated by vmalloc.
> > So we replace kfree with kvfree here.
> >
> > Signed-off-by: Chuhong Yuan <chuhongyuan@outlook.com>
>
> Applied, thanks Chuhong.
>
> GVE maintainers, you are upstream now and have to stay on top of review
> of changes like this. Otherwise I'll just review it myself and apply
> it unless I find problems, and that may not be what you want :)
Will do, thanks for the review. And thanks for the fix Chuhong.
^ permalink raw reply
* Re: [PATCH iproute2-rc v1 0/7] Statistics counter support
From: Stephen Hemminger @ 2019-07-19 17:53 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Leon Romanovsky, netdev, David Ahern, Mark Zhang,
RDMA mailing list
In-Reply-To: <20190717143157.27205-1-leon@kernel.org>
On Wed, 17 Jul 2019 17:31:49 +0300
Leon Romanovsky <leon@kernel.org> wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Changelog v0->v1:
> * Fixed typo in manual page (Gal)
> * Rebased on top of d035cc1b "ip tunnel: warn when changing IPv6 tunnel without tunnel name"
> * Dropped update header file because it was already merged.
>
> ---------------------------------------------------------------------------------------
>
> Hi,
>
> This is supplementary part of accepted to rdma-next kernel series,
> that kernel series provided an option to get various counters: global
> and per-objects.
>
> Currently, all counters are printed in format similar to other
> device/link properties, while "-p" option will print them in table like
> format.
>
> [leonro@server ~]$ rdma stat show
> link mlx5_0/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests
> 0 out_of_buffer 0 duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err
> 0 implied_nak_seq_err 0 local_ack_timeout_err 0 resp_local_length_error
> 0 resp_cqe_error 0 req_cqe_error 0 req_remote_invalid_request 0
> req_remote_access_errors 0 resp_remote_access_errors 0
> resp_cqe_flush_error 0 req_cqe_flush_error 0 rp_cnp_ignored 0
> rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0
>
> [leonro@server ~]$ rdma stat show -p
> link mlx5_0/1
> rx_write_requests 0
> rx_read_requests 0
> rx_atomic_requests 0
> out_of_buffer 0
> duplicate_request 0
> rnr_nak_retry_err 0
> packet_seq_err 0
> implied_nak_seq_err 0
> local_ack_timeout_err 0
> resp_local_length_error 0
> resp_cqe_error 0
> req_cqe_error 0
> req_remote_invalid_request 0
> req_remote_access_errors 0
> resp_remote_access_errors 0
> resp_cqe_flush_error 0
> req_cqe_flush_error 0
> rp_cnp_ignored 0
> rp_cnp_handled 0
> np_ecn_marked_roce_packets 0
> np_cnp_sent 0
>
> Thanks
>
> Mark Zhang (7):
> rdma: Add "stat qp show" support
> rdma: Add get per-port counter mode support
> rdma: Add rdma statistic counter per-port auto mode support
> rdma: Make get_port_from_argv() returns valid port in strict port mode
> rdma: Add stat manual mode support
> rdma: Add default counter show support
> rdma: Document counter statistic
>
> man/man8/rdma-dev.8 | 1 +
> man/man8/rdma-link.8 | 1 +
> man/man8/rdma-resource.8 | 1 +
> man/man8/rdma-statistic.8 | 167 +++++++++
> man/man8/rdma.8 | 7 +-
> rdma/Makefile | 2 +-
> rdma/rdma.c | 3 +-
> rdma/rdma.h | 1 +
> rdma/stat.c | 759 ++++++++++++++++++++++++++++++++++++++
> rdma/utils.c | 17 +-
> 10 files changed, 954 insertions(+), 5 deletions(-)
> create mode 100644 man/man8/rdma-statistic.8
> create mode 100644 rdma/stat.c
>
> --
> 2.20.1
>
Applied now, thanks
^ permalink raw reply
* Re: [PATCH iproute2] json: fix backslash escape typo in jsonw_puts
From: Stephen Hemminger @ 2019-07-19 17:54 UTC (permalink / raw)
To: Ivan Delalande; +Cc: David Ahern, netdev
In-Reply-To: <20190718011531.GA29129@visor>
On Wed, 17 Jul 2019 18:15:31 -0700
Ivan Delalande <colona@arista.com> wrote:
> Fixes: fcc16c22 ("provide common json output formatter")
> Signed-off-by: Ivan Delalande <colona@arista.com>
Applied.
^ permalink raw reply
* Re: [PATCH bpf] libbpf: fix missing __WORDSIZE definition
From: Andrii Nakryiko @ 2019-07-19 17:54 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann,
Alexei Starovoitov, Kernel Team, Jiri Olsa, Namhyung Kim
In-Reply-To: <20190719011644.GN3624@kernel.org>
On Thu, Jul 18, 2019 at 6:16 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Thu, Jul 18, 2019 at 02:16:29PM -0700, Andrii Nakryiko escreveu:
> > On Thu, Jul 18, 2019 at 12:14 PM Arnaldo Carvalho de Melo
> > <arnaldo.melo@gmail.com> wrote:
> > >
> > > Em Thu, Jul 18, 2019 at 03:56:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > I'll stop and replace my patch with yours to see if it survives all the
> > > > test builds...
> > >
> > > So, Alpine:3.4, the first image for this distro I did when I started
> > > these builds, survives the 6 builds with gcc and clang with your patch:
> > >
> > >
> > > [perfbuilder@quaco linux-perf-tools-build]$ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz
> > > [perfbuilder@quaco linux-perf-tools-build]$ dm
> > > 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
> > >
> > >
> > > [perfbuilder@quaco linux-perf-tools-build]$ grep "+ make" dm.log/alpine\:3.4
> > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C /git/linux/tools/perf O=/tmp/build/perf
> > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= NO_LIBELF=1 -C /git/linux/tools/perf O=/tmp/build/perf
> > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C /git/linux/tools/perf O=/tmp/build/perf CC=clang
> > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= NO_LIBELF=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang
> > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang
> > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf
> > > [perfbuilder@quaco linux-perf-tools-build]$
> > >
> > > Probably all the rest will go well, will let you know.
> > >
> > > Daniel, do you mind if I carry this one in my perf/core branch? Its
> > > small and shouldn't clash with other patches, I think. It should go
> > > upstream soon:
> > >
> > > Andrii, there are these others:
> >
> > I took a look at them, but I think it would be better, if you could
> > post them as proper patches to
> > bpf@vger.kernel.org/netdev@vger.kernel.org, so that others can check
> > and comment, if necessary.
> >
> > One nit for all three of them: we typically prefix subject with just
> > "libbpf: " instead of "tools lib libbpf".
>
> Sure, that was mechanic, I do it like that for the patches I upstream,
> and that was like that in the beginning:
>
> [acme@quaco perf]$ git log --oneline tools/lib/bpf | grep lib | tail
> 9d759a9b4ac2 tools lib bpf: Collect map definition in bpf_object
> d8ad6a15cc3a tools lib bpf: Don't do a feature check when cleaning
> 6371ca3b541c bpf tools: Improve libbpf error reporting
> 0c77c04aa9c2 tools lib bpf: Change FEATURE-DUMP to FEATURE-DUMP.libbpf
> 715f8db9102f tools lib bpf: Fix compiler warning on CentOS 6
> 7c422f557266 tools build: Build fixdep helper from perf and basic libs
> 65f041bee783 tools lib bpf: Use FEATURE_USER to allow building in the same dir as perf
> 20517cd9c593 tools lib bpf: Fix up FEATURE_{TESTS,DISPLAY} usage
> cc4228d57c4c bpf tools: Check endianness and make libbpf fail early
> 1b76c13e4b36 bpf tools: Introduce 'bpf' library and add bpf feature check
> [acme@quaco perf]$
>
> Anyway, I'll resubmit the patches that you acked to bpf@vger and will
> let my container tests fail for those cases, sticking a warning so that
> Ingo knows that this is being dealt with and those problems will get
> fixed soon when the bpf tree merges upstream.
Great, thanks!
>
> > >
> > > 8dfb6ed300bf tools lib bpf: Avoid designated initializers for unnamed union members
> >
> > + attr.sample_period = attr.wakeup_events = 1;
> >
> > let's instead
> >
> > + attr.sample_period = 1;
> > + attr.wakeup_events = 1;
> >
> > I don't like multi-assignments.
>
> Meh, what's wrong with it? :)
Nothing, objectively :) But I don't remember seeing multi-assignments
in libbpf code base, so nitpicking for consistency's sake....
>
> > Also, if we are doing explicit assignment, let's do it for all the
> > fields, not split initialization like that.
>
> If that is what takes to get it to build everywhere, no problem. In
> tools/perf I'm used to doing it, documents that this is an oddity to
> support more systems :)
>
> > > 80f7f8f21441 tools lib bpf: Avoid using 'link' as it shadows a global definition in some systems
> >
> > For this one I'm confused. What compiler/system you are getting it on?
>
> > I tried to reproduce it with this example (trying both global
> > variable, as well as function):
> >
> > #include <stdio.h>
> >
> > //int link = 1;
> > void link() {}
> >
> > int f(int link) {
> > return link;
> > }
> > int main() {
> > printf("%d\n", f(123));
> > return 0;
> > }
> >
> > I haven't gotten any errors nor warnings. I'm certainly liking
> > existing naming better, but my main concern is that we'll probably add
> > more code like this, and we'll forget about this problem and will
> > re-introduce.
>
> yeah, this happens from time to time with centos:6 and IIRC
> amazonlinux:1, oraclelinux:6.
>
> I still remember when I got reports from the twitter guys when something
> broke on rhel:5, that was the main reason to get these container tests
> in place, you never know where people are using this, and since before
> upstreaming I do the tests, fixing those became second nature 8-)
>
> > So I'd rather figure out why it's happening and some rare system and
> > see if we can mitigate that without all the renames.
Ok, did some more googling. This warning (turned error in your setup)
is emitted when -Wshadow option is enabled for GCC/clang. It appears
to be disabled by default, so it must be enabled somewhere for perf
build or something.
Would it be possible to disable it at least for libbpf when building
from perf either everywhere or for those systems where you see this
warning? I don't think this warning is useful, to be honest, just
random name conflict between any local and global variables will cause
this.
> >
> >
> > > d93fe741e291 tools lib bpf: Fix endianness macro usage for some compilers
> >
> > This one looks good to me, thanks!
> >
> > Acked-by: Andrii Nakryiko <andriin@fb.com>
>
> Ok, will collect this, change the prefix to: "libbpf: ..." and send to
> bpf@vger
>
> >
> > >
> > > If you could take a look at them at my tmp.perf/core branch at:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/core
> > >
> > > I'm force pushing it now to replace my __WORDSIZE patch with yours, the
> > > SHAs should be the 3 above and the one below.
> > >
> > > And to build cleanly I had to cherry pick this one:
> > >
> > > 3091dafc1884 (HEAD -> perf/core) libbpf: fix ptr to u64 conversion warning on 32-bit platforms
> > >
> > > Alpine:3.5 just finished:
> > >
> > > 2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
> > >
> > > - Arnaldo
>
> --
>
> - Arnaldo
^ permalink raw reply
* Re: [PATCH bpf] libbpf: fix missing __WORDSIZE definition
From: Arnaldo Carvalho de Melo @ 2019-07-19 18:14 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Arnaldo Carvalho de Melo, Andrii Nakryiko, bpf, Networking,
Daniel Borkmann, Alexei Starovoitov, Kernel Team, Jiri Olsa,
Namhyung Kim
In-Reply-To: <CAEf4BzaKDTnqe4QYebNSoCLfhcUJbhzgXC5sG+y+c4JLc9PFqg@mail.gmail.com>
Em Fri, Jul 19, 2019 at 10:54:44AM -0700, Andrii Nakryiko escreveu:
> On Thu, Jul 18, 2019 at 6:16 PM Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> >
> > Em Thu, Jul 18, 2019 at 02:16:29PM -0700, Andrii Nakryiko escreveu:
> > > On Thu, Jul 18, 2019 at 12:14 PM Arnaldo Carvalho de Melo
> > > <arnaldo.melo@gmail.com> wrote:
> > > >
> > > > Em Thu, Jul 18, 2019 at 03:56:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > I'll stop and replace my patch with yours to see if it survives all the
> > > > > test builds...
> > > >
> > > > So, Alpine:3.4, the first image for this distro I did when I started
> > > > these builds, survives the 6 builds with gcc and clang with your patch:
> > > >
> > > >
> > > > [perfbuilder@quaco linux-perf-tools-build]$ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz
> > > > [perfbuilder@quaco linux-perf-tools-build]$ dm
> > > > 1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
> > > >
> > > >
> > > > [perfbuilder@quaco linux-perf-tools-build]$ grep "+ make" dm.log/alpine\:3.4
> > > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C /git/linux/tools/perf O=/tmp/build/perf
> > > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= NO_LIBELF=1 -C /git/linux/tools/perf O=/tmp/build/perf
> > > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C /git/linux/tools/perf O=/tmp/build/perf CC=clang
> > > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= NO_LIBELF=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang
> > > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf CC=clang
> > > > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= LIBCLANGLLVM=1 -C /git/linux/tools/perf O=/tmp/build/perf
> > > > [perfbuilder@quaco linux-perf-tools-build]$
> > > >
> > > > Probably all the rest will go well, will let you know.
> > > >
> > > > Daniel, do you mind if I carry this one in my perf/core branch? Its
> > > > small and shouldn't clash with other patches, I think. It should go
> > > > upstream soon:
> > > >
> > > > Andrii, there are these others:
> > >
> > > I took a look at them, but I think it would be better, if you could
> > > post them as proper patches to
> > > bpf@vger.kernel.org/netdev@vger.kernel.org, so that others can check
> > > and comment, if necessary.
> > >
> > > One nit for all three of them: we typically prefix subject with just
> > > "libbpf: " instead of "tools lib libbpf".
> >
> > Sure, that was mechanic, I do it like that for the patches I upstream,
> > and that was like that in the beginning:
> >
> > [acme@quaco perf]$ git log --oneline tools/lib/bpf | grep lib | tail
> > 9d759a9b4ac2 tools lib bpf: Collect map definition in bpf_object
> > d8ad6a15cc3a tools lib bpf: Don't do a feature check when cleaning
> > 6371ca3b541c bpf tools: Improve libbpf error reporting
> > 0c77c04aa9c2 tools lib bpf: Change FEATURE-DUMP to FEATURE-DUMP.libbpf
> > 715f8db9102f tools lib bpf: Fix compiler warning on CentOS 6
> > 7c422f557266 tools build: Build fixdep helper from perf and basic libs
> > 65f041bee783 tools lib bpf: Use FEATURE_USER to allow building in the same dir as perf
> > 20517cd9c593 tools lib bpf: Fix up FEATURE_{TESTS,DISPLAY} usage
> > cc4228d57c4c bpf tools: Check endianness and make libbpf fail early
> > 1b76c13e4b36 bpf tools: Introduce 'bpf' library and add bpf feature check
> > [acme@quaco perf]$
> >
> > Anyway, I'll resubmit the patches that you acked to bpf@vger and will
> > let my container tests fail for those cases, sticking a warning so that
> > Ingo knows that this is being dealt with and those problems will get
> > fixed soon when the bpf tree merges upstream.
>
> Great, thanks!
>
> >
> > > >
> > > > 8dfb6ed300bf tools lib bpf: Avoid designated initializers for unnamed union members
> > >
> > > + attr.sample_period = attr.wakeup_events = 1;
> > >
> > > let's instead
> > >
> > > + attr.sample_period = 1;
> > > + attr.wakeup_events = 1;
> > >
> > > I don't like multi-assignments.
> >
> > Meh, what's wrong with it? :)
>
> Nothing, objectively :) But I don't remember seeing multi-assignments
> in libbpf code base, so nitpicking for consistency's sake....
>
>
> >
> > > Also, if we are doing explicit assignment, let's do it for all the
> > > fields, not split initialization like that.
> >
> > If that is what takes to get it to build everywhere, no problem. In
> > tools/perf I'm used to doing it, documents that this is an oddity to
> > support more systems :)
> >
> > > > 80f7f8f21441 tools lib bpf: Avoid using 'link' as it shadows a global definition in some systems
> > >
> > > For this one I'm confused. What compiler/system you are getting it on?
> >
> > > I tried to reproduce it with this example (trying both global
> > > variable, as well as function):
> > >
> > > #include <stdio.h>
> > >
> > > //int link = 1;
> > > void link() {}
> > >
> > > int f(int link) {
> > > return link;
> > > }
> > > int main() {
> > > printf("%d\n", f(123));
> > > return 0;
> > > }
> > >
> > > I haven't gotten any errors nor warnings. I'm certainly liking
> > > existing naming better, but my main concern is that we'll probably add
> > > more code like this, and we'll forget about this problem and will
> > > re-introduce.
> >
> > yeah, this happens from time to time with centos:6 and IIRC
> > amazonlinux:1, oraclelinux:6.
> >
> > I still remember when I got reports from the twitter guys when something
> > broke on rhel:5, that was the main reason to get these container tests
> > in place, you never know where people are using this, and since before
> > upstreaming I do the tests, fixing those became second nature 8-)
> >
> > > So I'd rather figure out why it's happening and some rare system and
> > > see if we can mitigate that without all the renames.
>
> Ok, did some more googling. This warning (turned error in your setup)
> is emitted when -Wshadow option is enabled for GCC/clang. It appears
> to be disabled by default, so it must be enabled somewhere for perf
> build or something.
Right, I came to the exact same conclusion, doing tests here:
[perfbuilder@3a58896a648d tmp]$ gcc -Wshadow shadow_global_decl.c -o shadow_global_decl
shadow_global_decl.c: In function 'main':
shadow_global_decl.c:9: warning: declaration of 'link' shadows a global declaration
shadow_global_decl.c:4: warning: shadowed declaration is here
[perfbuilder@3a58896a648d tmp]$ gcc --version |& head -1
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
[perfbuilder@3a58896a648d tmp]$ gcc shadow_global_decl.c -o shadow_global_decl
[perfbuilder@3a58896a648d tmp]$
So I'm going to remove this warning from the places where it causes
problems.
> Would it be possible to disable it at least for libbpf when building
> from perf either everywhere or for those systems where you see this
> warning? I don't think this warning is useful, to be honest, just
> random name conflict between any local and global variables will cause
> this.
Yeah, I might end up having this applied.
[acme@quaco perf]$ git diff
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 495066bafbe3..b6e902a2312f 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -32,7 +32,6 @@ EXTRA_WARNINGS += -Wno-system-headers
EXTRA_WARNINGS += -Wold-style-definition
EXTRA_WARNINGS += -Wpacked
EXTRA_WARNINGS += -Wredundant-decls
-EXTRA_WARNINGS += -Wshadow
EXTRA_WARNINGS += -Wstrict-prototypes
EXTRA_WARNINGS += -Wswitch-default
EXTRA_WARNINGS += -Wswitch-enum
[acme@quaco perf]$
Sorry for the noise...
- Arnaldo
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox