* Re: [PATCH net] tcp: restore autocorking
From: Soheil Hassas Yeganeh @ 2018-05-03 14:52 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>
On Wed, May 2, 2018 at 11:25 PM, Eric Dumazet <edumazet@google.com> wrote:
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Thank you for catching and fixing this!
^ permalink raw reply
* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: David Miller @ 2018-05-03 15:07 UTC (permalink / raw)
To: bjorn.topel
Cc: magnus.karlsson, alexander.h.duyck, alexander.duyck,
john.fastabend, ast, brouer, willemdebruijn.kernel, daniel, mst,
netdev, bjorn.topel, michael.lundkvist, jesse.brandeburg,
anjali.singhai, qi.z.zhang
In-Reply-To: <20180502110136.3738-1-bjorn.topel@gmail.com>
From: Björn Töpel <bjorn.topel@gmail.com>
Date: Wed, 2 May 2018 13:01:21 +0200
> This patch set introduces a new address family called AF_XDP that is
> optimized for high performance packet processing and, in upcoming
> patch sets, zero-copy semantics. In this patch set, we have removed
> all zero-copy related code in order to make it smaller, simpler and
> hopefully more review friendly. This patch set only supports copy-mode
> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> for RX using the XDP_DRV path. Zero-copy support requires XDP and
> driver changes that Jesper Dangaard Brouer is working on. Some of his
> work has already been accepted. We will publish our zero-copy support
> for RX and TX on top of his patch sets at a later point in time.
...
Looks great.
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Do You Need A Hand?
From: Mavis Wanczyk @ 2018-05-03 9:04 UTC (permalink / raw)
I am Mavis Wanczyk i know you may not know me but am the latest
largest US Powerball lottery winner of $758.7m just of recent, am
currently helping out people in need of financial assistance, i know
it's hard to believe anything on the internet, so if you don't need
my help please don't reply to this message.
Regards
Mavis Wanczyk
^ permalink raw reply
* Re: [PATCH net-next 0/2] Update csum tc action for batch operation.
From: David Miller @ 2018-05-03 15:16 UTC (permalink / raw)
To: cdillaba; +Cc: netdev, jhs, xiyou.wangcong
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>
From: Craig Dillabaugh <cdillaba@mojatatu.com>
Date: Tue, 1 May 2018 10:17:42 -0400
> This patchset includes two patches the first updating act_csum.c
> to include the get_fill_size routine required for batch operation, and
> the second including updated TDC tests for the feature.
Series applied.
^ permalink raw reply
* [PATCH] qed: fix spelling mistake: "offloded" -> "offloaded"
From: Colin King @ 2018-05-03 15:19 UTC (permalink / raw)
To: Ariel Elior, everest-linux-l2, David S . Miller, netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
Trivial fix to spelling mistake in DP_NOTICE message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index fb7c2d1562ae..6acfd43c1a4f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -848,7 +848,7 @@ int qed_roce_query_qp(struct qed_hwfn *p_hwfn,
if (!(qp->resp_offloaded)) {
DP_NOTICE(p_hwfn,
- "The responder's qp should be offloded before requester's\n");
+ "The responder's qp should be offloaded before requester's\n");
return -EINVAL;
}
--
2.17.0
^ permalink raw reply related
* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
From: Naveen N. Rao @ 2018-05-03 15:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Sandipan Das
Cc: jakub.kicinski, linuxppc-dev, netdev, Michael Ellerman
In-Reply-To: <415b415e-f47f-082c-1bc9-87d3e9d3aed1__9575.16645216874$1520270545$gmane$org@ fb.com>
Alexei Starovoitov wrote:
> On 3/1/18 12:51 AM, Naveen N. Rao wrote:
>> Daniel Borkmann wrote:
>>>
>>> Worst case if there's nothing better, potentially what one could do in
>>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>>> have the imm part as the index pointing to one of them, just unfortunate
>>> that it's likely only needed in ppc64.
>>
>> Ok. We seem to have discussed a few different aspects in this thread.
>> Let me summarize the different aspects we have discussed:
>> 1. Passing address of JIT'ed function to the JIT engines:
>> Two approaches discussed:
>> a. Existing approach, where the subprog address is encoded as an
>> offset from __bpf_call_base() in imm32 field of the BPF call
>> instruction. This requires the JIT'ed function to be within 2GB of
>> __bpf_call_base(), which won't be true on ppc64, at the least. So,
>> this won't on ppc64 (and any other architectures where vmalloc'ed
>> (module_alloc()) memory is from a different, far, address range).
>
> it looks like ppc64 doesn't guarantee today that all of module_alloc()
> will be within 32-bit, but I think it should be trivial to add such
> guarantee. If so, we can define another __bpf_call_base specifically
> for bpf-to-bpf calls when jit is on.
Ok, we prefer not to do that for powerpc (atleast, not for all of
module_alloc()) at this point.
And since option (c) below is not preferable, I think we will implement
what Daniel suggested above. This patchset already handles communicating
the BPF function addresses to the JIT engine, and enhancing
bpf_prog_get_info_by_fd() should address the concerns with bpftool.
- Naveen
^ permalink raw reply
* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: David Miller @ 2018-05-03 15:24 UTC (permalink / raw)
To: toke; +Cc: netdev, cake
In-Reply-To: <152527386316.14936.5409621935637217368.stgit@alrua-kau>
From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 02 May 2018 17:11:03 +0200
> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> new file mode 100644
> index 000000000000..18bc147f12bc
> --- /dev/null
> +++ b/net/sched/sch_cake.c
> +static inline cobalt_time_t cobalt_get_time(void)
> +{
> + return ktime_get_ns();
> +}
Please do not use inline in foo.c files, let the compiler decide.
> +static inline u32
> +cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
> +{
Especially for this monster! Yikes!
^ permalink raw reply
* Re: [PATCH net,stable] qmi_wwan: do not steal interfaces from class drivers
From: David Miller @ 2018-05-03 15:25 UTC (permalink / raw)
To: bjorn; +Cc: netdev, linux-usb
In-Reply-To: <20180502202254.3021-1-bjorn@mork.no>
From: Bjørn Mork <bjorn@mork.no>
Date: Wed, 2 May 2018 22:22:54 +0200
> The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that
> the { vendorid, productid, interfacenumber } set uniquely
> identifies one specific function. This has proven to fail
> for some configurable devices. One example is the Quectel
> EM06/EP06 where the same interface number can be either
> QMI or MBIM, without the device ID changing either.
>
> Fix by requiring the vendor-specific class for interface number
> based matching. Functions of other classes can and should use
> class based matching instead.
>
> Fixes: 03304bcb5ec4 ("net: qmi_wwan: use fixed interface number matching")
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
> It's quite possible that the fix should be integrated in the
> USB_DEVICE_INTERFACE_NUMBER macro instead. But that has grown a few
> other users since it was added, so changing it now seems risky.
> Another option is of course adding a new match macro with the
> USB_CLASS_VENDOR_SPEC match integrated. Maybe best?
>
> But I'm proposing this as-is for now, since this quickfix seems most
> suitable for stable backporting.
Yes, this simpler approache is better for net and -stable.
Applied.
If you want to do something more sophisticated, that can be done
in net-next.
Thanks.
^ permalink raw reply
* Re: [PATCH net] rds: do not leak kernel memory to user land
From: David Miller @ 2018-05-03 15:26 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, santosh.shilimkar, linux-rdma
In-Reply-To: <20180502215339.117702-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 2 May 2018 14:53:39 -0700
> syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
> from rds_cmsg_recv().
>
> Simply clear the structure, since we have holes there, or since
> rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.
...
> Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH net-next] inet: add bound ports statistic
From: David Miller @ 2018-05-03 15:27 UTC (permalink / raw)
To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20180502221108.3261-1-sthemmin@microsoft.com>
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 2 May 2018 15:11:08 -0700
> +int inet_bind_bucket_count(struct proto *prot);
...
> +/* Count how many any entries are in the bind hash table */
> +unsigned int inet_bind_bucket_count(struct proto *prot)
If it doesn't build, it definitely wasn't tested.
Right? :)
^ permalink raw reply
* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-05-03 15:28 UTC (permalink / raw)
To: David Miller; +Cc: netdev, cake
In-Reply-To: <20180503.112402.774278226377528271.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 02 May 2018 17:11:03 +0200
>
>> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
>> new file mode 100644
>> index 000000000000..18bc147f12bc
>> --- /dev/null
>> +++ b/net/sched/sch_cake.c
>> +static inline cobalt_time_t cobalt_get_time(void)
>> +{
>> + return ktime_get_ns();
>> +}
>
> Please do not use inline in foo.c files, let the compiler decide.
Right, will fix :)
>> +static inline u32
>> +cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
>> +{
>
> Especially for this monster! Yikes!
Fair point...
^ permalink raw reply
* Re: [PATCH net-next] ip6_gre: correct the function name in ip6gre_tnl_addr_conflict() comment
From: David Miller @ 2018-05-03 15:29 UTC (permalink / raw)
To: sunlw.fnst; +Cc: netdev
In-Reply-To: <20180503013429.4330-1-sunlw.fnst@cn.fujitsu.com>
From: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Date: Thu, 3 May 2018 09:34:29 +0800
> The function name is wrong in ip6gre_tnl_addr_conflict() comment, which
> use ip6_tnl_addr_conflict instead of ip6gre_tnl_addr_conflict.
>
> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] tcp: restore autocorking
From: David Miller @ 2018-05-03 15:29 UTC (permalink / raw)
To: edumazet; +Cc: netdev, mwenig, eric.dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 2 May 2018 20:25:13 -0700
> When adding rb-tree for TCP retransmit queue, we inadvertently broke
> TCP autocorking.
>
> tcp_should_autocork() should really check if the rtx queue is not empty.
>
> Tested:
...
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>
Applied and queued up for -stable, thanks Eric.
^ permalink raw reply
* Re: [bpf-next v1 0/9] bpf: Add helper to do FIB lookups
From: David Miller @ 2018-05-03 15:40 UTC (permalink / raw)
To: dsahern; +Cc: netdev, borkmann, ast, shm, roopa, brouer, toke, john.fastabend
In-Reply-To: <20180503035319.18290-1-dsahern@gmail.com>
From: David Ahern <dsahern@gmail.com>
Date: Wed, 2 May 2018 20:53:10 -0700
> Provide a helper for doing a FIB and neighbor lookup in the kernel
> tables from an XDP program. The helper provides a fastpath for forwarding
> packets. If the packet is a local delivery or for any reason is not a
> simple lookup and forward, the packet is expected to continue up the stack
> for full processing.
...
Great work David.
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* [PATCH net 1/3] net/smc: call consolidation
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503155739.55616-1-ubraun@linux.ibm.com>
From: Karsten Graul <kgraul@linux.ibm.com>
Consolidate the call to smc_wr_reg_send() in a new function.
No functional changes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
net/smc/af_smc.c | 35 +++++++++++++++--------------------
1 file changed, 15 insertions(+), 20 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8b4c059bd13b..fdb2976117bd 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -292,6 +292,15 @@ static void smc_copy_sock_settings_to_smc(struct smc_sock *smc)
smc_copy_sock_settings(&smc->sk, smc->clcsock->sk, SK_FLAGS_CLC_TO_SMC);
}
+/* register a new rmb */
+static int smc_reg_rmb(struct smc_link *link, struct smc_buf_desc *rmb_desc)
+{
+ /* register memory region for new rmb */
+ if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK]))
+ return -EFAULT;
+ return 0;
+}
+
static int smc_clnt_conf_first_link(struct smc_sock *smc)
{
struct smc_link_group *lgr = smc->conn.lgr;
@@ -321,9 +330,7 @@ static int smc_clnt_conf_first_link(struct smc_sock *smc)
smc_wr_remember_qp_attr(link);
- rc = smc_wr_reg_send(link,
- smc->conn.rmb_desc->mr_rx[SMC_SINGLE_LINK]);
- if (rc)
+ if (smc_reg_rmb(link, smc->conn.rmb_desc))
return SMC_CLC_DECL_INTERR;
/* send CONFIRM LINK response over RoCE fabric */
@@ -473,13 +480,8 @@ static int smc_connect_rdma(struct smc_sock *smc)
goto decline_rdma_unlock;
}
} else {
- struct smc_buf_desc *buf_desc = smc->conn.rmb_desc;
-
- if (!buf_desc->reused) {
- /* register memory region for new rmb */
- rc = smc_wr_reg_send(link,
- buf_desc->mr_rx[SMC_SINGLE_LINK]);
- if (rc) {
+ if (!smc->conn.rmb_desc->reused) {
+ if (smc_reg_rmb(link, smc->conn.rmb_desc)) {
reason_code = SMC_CLC_DECL_INTERR;
goto decline_rdma_unlock;
}
@@ -719,9 +721,7 @@ static int smc_serv_conf_first_link(struct smc_sock *smc)
link = &lgr->lnk[SMC_SINGLE_LINK];
- rc = smc_wr_reg_send(link,
- smc->conn.rmb_desc->mr_rx[SMC_SINGLE_LINK]);
- if (rc)
+ if (smc_reg_rmb(link, smc->conn.rmb_desc))
return SMC_CLC_DECL_INTERR;
/* send CONFIRM LINK request to client over the RoCE fabric */
@@ -854,13 +854,8 @@ static void smc_listen_work(struct work_struct *work)
smc_rx_init(new_smc);
if (local_contact != SMC_FIRST_CONTACT) {
- struct smc_buf_desc *buf_desc = new_smc->conn.rmb_desc;
-
- if (!buf_desc->reused) {
- /* register memory region for new rmb */
- rc = smc_wr_reg_send(link,
- buf_desc->mr_rx[SMC_SINGLE_LINK]);
- if (rc) {
+ if (!new_smc->conn.rmb_desc->reused) {
+ if (smc_reg_rmb(link, new_smc->conn.rmb_desc)) {
reason_code = SMC_CLC_DECL_INTERR;
goto decline_rdma_unlock;
}
--
2.13.5
^ permalink raw reply related
* [PATCH net 0/3] net/smc: fixes 2018/05/03
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
From: Ursula Braun <ursula.braun@de.ibm.com>
Dave,
here are smc fixes for 2 problems:
* receive buffers in SMC must be registered. If registration fails
these buffers must not be kept within the link group for reuse.
Patch 1 is a preparational patch; patch 2 contains the fix.
* sendpage: do not hold the sock lock when calling kernel_sendpage()
or sock_no_sendpage()
Thanks, Ursula
Karsten Graul (2):
net/smc: call consolidation
net/smc: handle unregistered buffers
Stefan Raspl (1):
smc: fix sendpage() call
net/smc/af_smc.c | 43 +++++++++++++++++++++----------------------
net/smc/smc_core.c | 22 +++++++++++++++++++---
net/smc/smc_core.h | 3 ++-
3 files changed, 42 insertions(+), 26 deletions(-)
--
2.13.5
^ permalink raw reply
* [PATCH net 2/3] net/smc: handle unregistered buffers
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503155739.55616-1-ubraun@linux.ibm.com>
From: Karsten Graul <kgraul@linux.ibm.com>
When smc_wr_reg_send() fails then tag (regerr) the affected buffer and
free it in smc_buf_unuse().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
net/smc/af_smc.c | 4 +++-
net/smc/smc_core.c | 22 +++++++++++++++++++---
net/smc/smc_core.h | 3 ++-
3 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index fdb2976117bd..d03b8d29ffc0 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -296,8 +296,10 @@ static void smc_copy_sock_settings_to_smc(struct smc_sock *smc)
static int smc_reg_rmb(struct smc_link *link, struct smc_buf_desc *rmb_desc)
{
/* register memory region for new rmb */
- if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK]))
+ if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK])) {
+ rmb_desc->regerr = 1;
return -EFAULT;
+ }
return 0;
}
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index f44f6803f7ff..d4bd01bb44e1 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -32,6 +32,9 @@
static u32 smc_lgr_num; /* unique link group number */
+static void smc_buf_free(struct smc_buf_desc *buf_desc, struct smc_link *lnk,
+ bool is_rmb);
+
static void smc_lgr_schedule_free_work(struct smc_link_group *lgr)
{
/* client link group creation always follows the server link group
@@ -234,9 +237,22 @@ static void smc_buf_unuse(struct smc_connection *conn)
conn->sndbuf_size = 0;
}
if (conn->rmb_desc) {
- conn->rmb_desc->reused = true;
- conn->rmb_desc->used = 0;
- conn->rmbe_size = 0;
+ if (!conn->rmb_desc->regerr) {
+ conn->rmb_desc->reused = 1;
+ conn->rmb_desc->used = 0;
+ conn->rmbe_size = 0;
+ } else {
+ /* buf registration failed, reuse not possible */
+ struct smc_link_group *lgr = conn->lgr;
+ struct smc_link *lnk;
+
+ write_lock_bh(&lgr->rmbs_lock);
+ list_del(&conn->rmb_desc->list);
+ write_unlock_bh(&lgr->rmbs_lock);
+
+ lnk = &lgr->lnk[SMC_SINGLE_LINK];
+ smc_buf_free(conn->rmb_desc, lnk, true);
+ }
}
}
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 07e2a393e6d9..5dfcb15d529f 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -123,7 +123,8 @@ struct smc_buf_desc {
*/
u32 order; /* allocation order */
u32 used; /* currently used / unused */
- bool reused; /* new created / reused */
+ u8 reused : 1; /* new created / reused */
+ u8 regerr : 1; /* err during registration */
};
struct smc_rtoken { /* address/key of remote RMB */
--
2.13.5
^ permalink raw reply related
* [PATCH net 3/3] smc: fix sendpage() call
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503155739.55616-1-ubraun@linux.ibm.com>
From: Stefan Raspl <stefan.raspl@linux.ibm.com>
The sendpage() call grabs the sock lock before calling the default
implementation - which tries to grab it once again.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
net/smc/af_smc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index d03b8d29ffc0..544bab42f925 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1315,8 +1315,11 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
smc = smc_sk(sk);
lock_sock(sk);
- if (sk->sk_state != SMC_ACTIVE)
+ if (sk->sk_state != SMC_ACTIVE) {
+ release_sock(sk);
goto out;
+ }
+ release_sock(sk);
if (smc->use_fallback)
rc = kernel_sendpage(smc->clcsock, page, offset,
size, flags);
@@ -1324,7 +1327,6 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
rc = sock_no_sendpage(sock, page, offset, size, flags);
out:
- release_sock(sk);
return rc;
}
--
2.13.5
^ permalink raw reply related
* Re: [bug report] net/mlx5e: TLS, Add Innova TLS TX offload data path
From: Boris Pismenny @ 2018-05-03 16:04 UTC (permalink / raw)
To: Leon Romanovsky, Dan Carpenter, Saeed Mahameed; +Cc: linux-rdma, netdev
In-Reply-To: <20180503120757.GA1641@mtr-leonro.local>
Hi Dan,
Thanks for taking a look at our code.
On 05/03/18 15:07, Leon Romanovsky wrote:
> + Saeed, Boris and netdev
>
> On Thu, May 03, 2018 at 02:23:00PM +0300, Dan Carpenter wrote:
>> Hello Ilya Lesokhin,
>>
>> The patch bf23974104fa: "net/mlx5e: TLS, Add Innova TLS TX offload
>> data path" from Apr 30, 2018, leads to the following static checker
>> warning:
>>
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:63 mlx5e_tls_add_metadata()
>> warn: struct type mismatch 'ethhdr vs mlx5e_tls_metadata'
>>
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
>> 55 static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 swid)
>> 56 {
>> 57 struct mlx5e_tls_metadata *pet;
>> 58 struct ethhdr *eth;
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> 59
>> 60 if (skb_cow_head(skb, sizeof(struct mlx5e_tls_metadata)))
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> 61 return -ENOMEM;
>> 62
>> 63 eth = (struct ethhdr *)skb_push(skb, sizeof(struct mlx5e_tls_metadata));
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> This feels like it should be sizeof(*eth) + sizeof(*pet)?
No, we add only "sizeof(struct mlx5e_tls_metadata)" bytes to the skb
headroom for the mlx5_tls_metadata. The Ethernet header is already there.
>>
>> 64 skb->mac_header -= sizeof(struct mlx5e_tls_metadata);
>> 65 pet = (struct mlx5e_tls_metadata *)(eth + 1);
>> 66
>> 67 memmove(skb->data, skb->data + sizeof(struct mlx5e_tls_metadata),
>> 68 2 * ETH_ALEN);
>> 69
>> 70 eth->h_proto = cpu_to_be16(MLX5E_METADATA_ETHER_TYPE);
>> 71 pet->syndrome_swid = htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
>> 72
>> 73 return 0;
>> 74 }
>>
>> regards,
>> dan carpenter
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
Best,
Boris.
^ permalink raw reply
* [PATCH] bpf: fix possible spectre-v1 in find_and_alloc_map()
From: Mark Rutland @ 2018-05-03 16:04 UTC (permalink / raw)
To: linux-kernel
Cc: Mark Rutland, Alexei Starovoitov, Dan Carpenter, Daniel Borkmann,
Peter Zijlstra, netdev
It's possible for userspace to control attr->map_type. Sanitize it when
using it as an array index to prevent an out-of-bounds value being used
under speculation.
Found by smatch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
---
kernel/bpf/syscall.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
I found this when running smatch over a v4.17-rc2 arm64 allyesconfig kernel.
IIUC this may allow for a speculative branch to an arbitrary gadget when we
subsequently call ops->map_alloc_check(attr).
Mark.
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4ca46df19c9a..8a7acd0dbeb6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -26,6 +26,7 @@
#include <linux/cred.h>
#include <linux/timekeeping.h>
#include <linux/ctype.h>
+#include <linux/nospec.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
@@ -102,12 +103,14 @@ const struct bpf_map_ops bpf_map_offload_ops = {
static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
{
const struct bpf_map_ops *ops;
+ u32 type = attr->map_type;
struct bpf_map *map;
int err;
- if (attr->map_type >= ARRAY_SIZE(bpf_map_types))
+ if (type >= ARRAY_SIZE(bpf_map_types))
return ERR_PTR(-EINVAL);
- ops = bpf_map_types[attr->map_type];
+ type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));
+ ops = bpf_map_types[type];
if (!ops)
return ERR_PTR(-EINVAL);
@@ -122,7 +125,7 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
if (IS_ERR(map))
return map;
map->ops = ops;
- map->map_type = attr->map_type;
+ map->map_type = type;
return map;
}
--
2.11.0
^ permalink raw reply related
* [PATCH net-next 0/4] net/smc: splice implementation
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
From: Ursula Braun <ursula.braun@de.ibm.com>
Dave,
Stefan comes up with an smc implementation for splice(). The first
three patches are preparational patches, the 4th patch implements
splice().
Thanks, Ursula
Stefan Raspl (4):
smc: simplify abort logic
smc: make smc_rx_wait_data() generic
smc: allocate RMBs as compound pages
smc: add support for splice()
net/smc/af_smc.c | 40 +++++++++--
net/smc/smc.h | 3 +
net/smc/smc_core.c | 18 ++---
net/smc/smc_core.h | 1 +
net/smc/smc_rx.c | 200 +++++++++++++++++++++++++++++++++++++++++++----------
net/smc/smc_rx.h | 12 +++-
6 files changed, 219 insertions(+), 55 deletions(-)
--
2.13.5
^ permalink raw reply
* [PATCH net-next 1/4] smc: simplify abort logic
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>
From: Stefan Raspl <stefan.raspl@linux.ibm.com>
Some of the conditions to exit recv() are common in two pathes - cleaning up
code by moving the check up so we have it only once.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
net/smc/smc_rx.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index af851d8df1f8..def33fb29ac9 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -112,26 +112,22 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
if (atomic_read(&conn->bytes_to_rcv))
goto copy;
+ if (sk->sk_shutdown & RCV_SHUTDOWN ||
+ smc_cdc_rxed_any_close_or_senddone(conn) ||
+ conn->local_tx_ctrl.conn_state_flags.peer_conn_abort)
+ break;
+
if (read_done) {
if (sk->sk_err ||
sk->sk_state == SMC_CLOSED ||
- sk->sk_shutdown & RCV_SHUTDOWN ||
!timeo ||
- signal_pending(current) ||
- smc_cdc_rxed_any_close_or_senddone(conn) ||
- conn->local_tx_ctrl.conn_state_flags.
- peer_conn_abort)
+ signal_pending(current))
break;
} else {
if (sk->sk_err) {
read_done = sock_error(sk);
break;
}
- if (sk->sk_shutdown & RCV_SHUTDOWN ||
- smc_cdc_rxed_any_close_or_senddone(conn) ||
- conn->local_tx_ctrl.conn_state_flags.
- peer_conn_abort)
- break;
if (sk->sk_state == SMC_CLOSED) {
if (!sock_flag(sk, SOCK_DONE)) {
/* This occurs when user tries to read
--
2.13.5
^ permalink raw reply related
* [PATCH net-next 2/4] smc: make smc_rx_wait_data() generic
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>
From: Stefan Raspl <stefan.raspl@linux.ibm.com>
Turn smc_rx_wait_data into a generic function that can be used at various
instances to wait on traffic to complete with varying criteria.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
net/smc/af_smc.c | 2 +-
net/smc/smc_rx.c | 21 +++++++++++----------
net/smc/smc_rx.h | 8 +++++++-
3 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 823ea3371575..747fdf1a2d6f 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1092,7 +1092,7 @@ static int smc_accept(struct socket *sock, struct socket *new_sock,
release_sock(clcsk);
} else if (!atomic_read(&smc_sk(nsk)->conn.bytes_to_rcv)) {
lock_sock(nsk);
- smc_rx_wait_data(smc_sk(nsk), &timeo);
+ smc_rx_wait(smc_sk(nsk), &timeo, smc_rx_data_available);
release_sock(nsk);
}
}
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index def33fb29ac9..7b64bee656e8 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -22,11 +22,10 @@
#include "smc_tx.h" /* smc_tx_consumer_update() */
#include "smc_rx.h"
-/* callback implementation for sk.sk_data_ready()
- * to wakeup rcvbuf consumers that blocked with smc_rx_wait_data().
+/* callback implementation to wakeup consumers blocked with smc_rx_wait().
* indirectly called by smc_cdc_msg_recv_action().
*/
-static void smc_rx_data_ready(struct sock *sk)
+static void smc_rx_wake_up(struct sock *sk)
{
struct socket_wq *wq;
@@ -47,25 +46,27 @@ static void smc_rx_data_ready(struct sock *sk)
/* blocks rcvbuf consumer until >=len bytes available or timeout or interrupted
* @smc smc socket
* @timeo pointer to max seconds to wait, pointer to value 0 for no timeout
+ * @fcrit add'l criterion to evaluate as function pointer
* Returns:
* 1 if at least 1 byte available in rcvbuf or if socket error/shutdown.
* 0 otherwise (nothing in rcvbuf nor timeout, e.g. interrupted).
*/
-int smc_rx_wait_data(struct smc_sock *smc, long *timeo)
+int smc_rx_wait(struct smc_sock *smc, long *timeo,
+ int (*fcrit)(struct smc_connection *conn))
{
DEFINE_WAIT_FUNC(wait, woken_wake_function);
struct smc_connection *conn = &smc->conn;
struct sock *sk = &smc->sk;
int rc;
- if (atomic_read(&conn->bytes_to_rcv))
+ if (fcrit(conn))
return 1;
sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
add_wait_queue(sk_sleep(sk), &wait);
rc = sk_wait_event(sk, timeo,
sk->sk_err ||
sk->sk_shutdown & RCV_SHUTDOWN ||
- atomic_read(&conn->bytes_to_rcv) ||
+ fcrit(conn) ||
smc_cdc_rxed_any_close_or_senddone(conn),
&wait);
remove_wait_queue(sk_sleep(sk), &wait);
@@ -146,14 +147,14 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
return -EAGAIN;
}
- if (!atomic_read(&conn->bytes_to_rcv)) {
- smc_rx_wait_data(smc, &timeo);
+ if (!smc_rx_data_available(conn)) {
+ smc_rx_wait(smc, &timeo, smc_rx_data_available);
continue;
}
copy:
/* initialize variables for 1st iteration of subsequent loop */
- /* could be just 1 byte, even after smc_rx_wait_data above */
+ /* could be just 1 byte, even after waiting on data above */
readable = atomic_read(&conn->bytes_to_rcv);
/* not more than what user space asked for */
copylen = min_t(size_t, read_remaining, readable);
@@ -213,5 +214,5 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
/* Initialize receive properties on connection establishment. NB: not __init! */
void smc_rx_init(struct smc_sock *smc)
{
- smc->sk.sk_data_ready = smc_rx_data_ready;
+ smc->sk.sk_data_ready = smc_rx_wake_up;
}
diff --git a/net/smc/smc_rx.h b/net/smc/smc_rx.h
index 0b75a6b470e6..8f9f00997641 100644
--- a/net/smc/smc_rx.h
+++ b/net/smc/smc_rx.h
@@ -20,6 +20,12 @@
void smc_rx_init(struct smc_sock *smc);
int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
int flags);
-int smc_rx_wait_data(struct smc_sock *smc, long *timeo);
+int smc_rx_wait(struct smc_sock *smc, long *timeo,
+ int (*fcrit)(struct smc_connection *conn));
+static inline int smc_rx_data_available(struct smc_connection *conn)
+{
+ return atomic_read(&conn->bytes_to_rcv);
+}
+
#endif /* SMC_RX_H */
--
2.13.5
^ permalink raw reply related
* [PATCH net-next 3/4] smc: allocate RMBs as compound pages
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>
From: Stefan Raspl <stefan.raspl@linux.ibm.com>
Preparatory work for splice() support.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
net/smc/smc_core.c | 18 +++++++++---------
net/smc/smc_core.h | 1 +
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 1f3ea62fac5c..4051fb504393 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -274,8 +274,8 @@ static void smc_buf_free(struct smc_buf_desc *buf_desc, struct smc_link *lnk,
DMA_TO_DEVICE);
}
sg_free_table(&buf_desc->sgt[SMC_SINGLE_LINK]);
- if (buf_desc->cpu_addr)
- free_pages((unsigned long)buf_desc->cpu_addr, buf_desc->order);
+ if (buf_desc->pages)
+ __free_pages(buf_desc->pages, buf_desc->order);
kfree(buf_desc);
}
@@ -550,16 +550,16 @@ static struct smc_buf_desc *smc_new_buf_create(struct smc_link_group *lgr,
if (!buf_desc)
return ERR_PTR(-ENOMEM);
- buf_desc->cpu_addr =
- (void *)__get_free_pages(GFP_KERNEL | __GFP_NOWARN |
- __GFP_NOMEMALLOC |
- __GFP_NORETRY | __GFP_ZERO,
- get_order(bufsize));
- if (!buf_desc->cpu_addr) {
+ buf_desc->order = get_order(bufsize);
+ buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | __GFP_COMP |
+ __GFP_NORETRY | __GFP_ZERO,
+ buf_desc->order);
+ if (!buf_desc->pages) {
kfree(buf_desc);
return ERR_PTR(-EAGAIN);
}
- buf_desc->order = get_order(bufsize);
+ buf_desc->cpu_addr = (void *)page_address(buf_desc->pages);
/* build the sg table from the pages */
lnk = &lgr->lnk[SMC_SINGLE_LINK];
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 97339f03ba79..cdf800f1fae7 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -120,6 +120,7 @@ struct smc_link {
struct smc_buf_desc {
struct list_head list;
void *cpu_addr; /* virtual address of buffer */
+ struct page *pages;
struct sg_table sgt[SMC_LINKS_PER_LGR_MAX];/* virtual buffer */
struct ib_mr *mr_rx[SMC_LINKS_PER_LGR_MAX];
/* for rmb only: memory region
--
2.13.5
^ permalink raw reply related
* [PATCH net-next 4/4] smc: add support for splice()
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>
From: Stefan Raspl <stefan.raspl@linux.ibm.com>
Provide an implementation for splice() when we are using SMC. See
smc_splice_read() for further details.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
net/smc/af_smc.c | 38 +++++++++++--
net/smc/smc.h | 3 +
net/smc/smc_rx.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++------
net/smc/smc_rx.h | 6 +-
4 files changed, 185 insertions(+), 25 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 747fdf1a2d6f..553fe4eb4066 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1166,10 +1166,12 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
goto out;
}
- if (smc->use_fallback)
+ if (smc->use_fallback) {
rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
- else
- rc = smc_rx_recvmsg(smc, msg, len, flags);
+ } else {
+ msg->msg_namelen = 0;
+ rc = smc_rx_recvmsg(smc, msg, NULL, len, flags);
+ }
out:
release_sock(sk);
@@ -1446,9 +1448,15 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
return rc;
}
+/* Map the affected portions of the rmbe into an spd, note the number of bytes
+ * to splice in conn->splice_pending, and press 'go'. Delays consumer cursor
+ * updates till whenever a respective page has been fully processed.
+ * Note that subsequent recv() calls have to wait till all splice() processing
+ * completed.
+ */
static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
- unsigned int flags)
+ unsigned int flags)
{
struct sock *sk = sock->sk;
struct smc_sock *smc;
@@ -1456,16 +1464,34 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
smc = smc_sk(sk);
lock_sock(sk);
- if ((sk->sk_state != SMC_ACTIVE) && (sk->sk_state != SMC_CLOSED))
+
+ if (sk->sk_state == SMC_INIT ||
+ sk->sk_state == SMC_LISTEN ||
+ sk->sk_state == SMC_CLOSED)
+ goto out;
+
+ if (sk->sk_state == SMC_PEERFINCLOSEWAIT) {
+ rc = 0;
goto out;
+ }
+
if (smc->use_fallback) {
rc = smc->clcsock->ops->splice_read(smc->clcsock, ppos,
pipe, len, flags);
} else {
- rc = -EOPNOTSUPP;
+ if (*ppos) {
+ rc = -ESPIPE;
+ goto out;
+ }
+ if (flags & SPLICE_F_NONBLOCK)
+ flags = MSG_DONTWAIT;
+ else
+ flags = 0;
+ rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags);
}
out:
release_sock(sk);
+
return rc;
}
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 2405e889b93d..ec209cd48d42 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -164,6 +164,9 @@ struct smc_connection {
atomic_t bytes_to_rcv; /* arrived data,
* not yet received
*/
+ atomic_t splice_pending; /* number of spliced bytes
+ * pending processing
+ */
#ifndef KERNEL_HAS_ATOMIC64
spinlock_t acurs_lock; /* protect cursors */
#endif
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index 7b64bee656e8..ed45569289f5 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -43,6 +43,116 @@ static void smc_rx_wake_up(struct sock *sk)
rcu_read_unlock();
}
+/* Update consumer cursor
+ * @conn connection to update
+ * @cons consumer cursor
+ * @len number of Bytes consumed
+ */
+static void smc_rx_update_consumer(struct smc_connection *conn,
+ union smc_host_cursor cons, size_t len)
+{
+ smc_curs_add(conn->rmbe_size, &cons, len);
+ smc_curs_write(&conn->local_tx_ctrl.cons, smc_curs_read(&cons, conn),
+ conn);
+ /* send consumer cursor update if required */
+ /* similar to advertising new TCP rcv_wnd if required */
+ smc_tx_consumer_update(conn);
+}
+
+struct smc_spd_priv {
+ struct smc_sock *smc;
+ size_t len;
+};
+
+static void smc_rx_pipe_buf_release(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ struct smc_spd_priv *priv = (struct smc_spd_priv *)buf->private;
+ struct smc_sock *smc = priv->smc;
+ struct smc_connection *conn;
+ union smc_host_cursor cons;
+ struct sock *sk = &smc->sk;
+
+ if (sk->sk_state == SMC_CLOSED ||
+ sk->sk_state == SMC_PEERFINCLOSEWAIT ||
+ sk->sk_state == SMC_APPFINCLOSEWAIT)
+ goto out;
+ conn = &smc->conn;
+ lock_sock(sk);
+ smc_curs_write(&cons, smc_curs_read(&conn->local_tx_ctrl.cons, conn),
+ conn);
+ smc_rx_update_consumer(conn, cons, priv->len);
+ release_sock(sk);
+ if (atomic_sub_and_test(priv->len, &conn->splice_pending))
+ smc_rx_wake_up(sk);
+out:
+ kfree(priv);
+ put_page(buf->page);
+ sock_put(sk);
+}
+
+static int smc_rx_pipe_buf_nosteal(struct pipe_inode_info *pipe,
+ struct pipe_buffer *buf)
+{
+ return 1;
+}
+
+static const struct pipe_buf_operations smc_pipe_ops = {
+ .can_merge = 0,
+ .confirm = generic_pipe_buf_confirm,
+ .release = smc_rx_pipe_buf_release,
+ .steal = smc_rx_pipe_buf_nosteal,
+ .get = generic_pipe_buf_get
+};
+
+static void smc_rx_spd_release(struct splice_pipe_desc *spd,
+ unsigned int i)
+{
+ put_page(spd->pages[i]);
+}
+
+static int smc_rx_splice(struct pipe_inode_info *pipe, char *src, size_t len,
+ struct smc_sock *smc)
+{
+ struct splice_pipe_desc spd;
+ struct partial_page partial;
+ struct smc_spd_priv *priv;
+ struct page *page;
+ int bytes;
+
+ page = virt_to_page(smc->conn.rmb_desc->cpu_addr);
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+ priv->len = len;
+ priv->smc = smc;
+ partial.offset = src - (char *)smc->conn.rmb_desc->cpu_addr;
+ partial.len = len;
+ partial.private = (unsigned long)priv;
+
+ spd.nr_pages_max = 1;
+ spd.nr_pages = 1;
+ spd.pages = &page;
+ spd.partial = &partial;
+ spd.ops = &smc_pipe_ops;
+ spd.spd_release = smc_rx_spd_release;
+
+ bytes = splice_to_pipe(pipe, &spd);
+ if (bytes > 0) {
+ sock_hold(&smc->sk);
+ get_page(smc->conn.rmb_desc->pages);
+ atomic_add(bytes, &smc->conn.splice_pending);
+ }
+
+ return bytes;
+}
+
+static int smc_rx_data_available_and_no_splice_pend(struct smc_connection *conn)
+{
+ return atomic_read(&conn->bytes_to_rcv) &&
+ !atomic_read(&conn->splice_pending);
+}
+
/* blocks rcvbuf consumer until >=len bytes available or timeout or interrupted
* @smc smc socket
* @timeo pointer to max seconds to wait, pointer to value 0 for no timeout
@@ -74,19 +184,25 @@ int smc_rx_wait(struct smc_sock *smc, long *timeo,
return rc;
}
-/* rcvbuf consumer: main API called by socket layer.
- * called under sk lock.
+/* smc_rx_recvmsg - receive data from RMBE
+ * @msg: copy data to receive buffer
+ * @pipe: copy data to pipe if set - indicates splice() call
+ *
+ * rcvbuf consumer: main API called by socket layer.
+ * Called under sk lock.
*/
-int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
- int flags)
+int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
+ struct pipe_inode_info *pipe, size_t len, int flags)
{
size_t copylen, read_done = 0, read_remaining = len;
size_t chunk_len, chunk_off, chunk_len_sum;
struct smc_connection *conn = &smc->conn;
+ int (*func)(struct smc_connection *conn);
union smc_host_cursor cons;
int readable, chunk;
char *rcvbuf_base;
struct sock *sk;
+ int splbytes;
long timeo;
int target; /* Read at least these many bytes */
int rc;
@@ -102,12 +218,11 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
- msg->msg_namelen = 0;
/* we currently use 1 RMBE per RMB, so RMBE == RMB base addr */
rcvbuf_base = conn->rmb_desc->cpu_addr;
do { /* while (read_remaining) */
- if (read_done >= target)
+ if (read_done >= target || (pipe && read_done))
break;
if (atomic_read(&conn->bytes_to_rcv))
@@ -156,11 +271,24 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
/* initialize variables for 1st iteration of subsequent loop */
/* could be just 1 byte, even after waiting on data above */
readable = atomic_read(&conn->bytes_to_rcv);
+ splbytes = atomic_read(&conn->splice_pending);
+ if (!readable || (msg && splbytes)) {
+ if (splbytes)
+ func = smc_rx_data_available_and_no_splice_pend;
+ else
+ func = smc_rx_data_available;
+ smc_rx_wait(smc, &timeo, func);
+ continue;
+ }
+
/* not more than what user space asked for */
copylen = min_t(size_t, read_remaining, readable);
smc_curs_write(&cons,
smc_curs_read(&conn->local_tx_ctrl.cons, conn),
conn);
+ /* subsequent splice() calls pick up where previous left */
+ if (splbytes)
+ smc_curs_add(conn->rmbe_size, &cons, splbytes);
/* determine chunks where to read from rcvbuf */
/* either unwrapped case, or 1st chunk of wrapped case */
chunk_len = min_t(size_t,
@@ -170,9 +298,16 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
smc_rmb_sync_sg_for_cpu(conn);
for (chunk = 0; chunk < 2; chunk++) {
if (!(flags & MSG_TRUNC)) {
- rc = memcpy_to_msg(msg, rcvbuf_base + chunk_off,
- chunk_len);
- if (rc) {
+ if (msg) {
+ rc = memcpy_to_msg(msg, rcvbuf_base +
+ chunk_off,
+ chunk_len);
+ } else {
+ rc = smc_rx_splice(pipe, rcvbuf_base +
+ chunk_off, chunk_len,
+ smc);
+ }
+ if (rc < 0) {
if (!read_done)
read_done = -EFAULT;
smc_rmb_sync_sg_for_device(conn);
@@ -193,18 +328,13 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
/* update cursors */
if (!(flags & MSG_PEEK)) {
- smc_curs_add(conn->rmbe_size, &cons, copylen);
/* increased in recv tasklet smc_cdc_msg_rcv() */
smp_mb__before_atomic();
atomic_sub(copylen, &conn->bytes_to_rcv);
/* guarantee 0 <= bytes_to_rcv <= rmbe_size */
smp_mb__after_atomic();
- smc_curs_write(&conn->local_tx_ctrl.cons,
- smc_curs_read(&cons, conn),
- conn);
- /* send consumer cursor update if required */
- /* similar to advertising new TCP rcv_wnd if required */
- smc_tx_consumer_update(conn);
+ if (msg)
+ smc_rx_update_consumer(conn, cons, copylen);
}
} while (read_remaining);
out:
@@ -215,4 +345,5 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
void smc_rx_init(struct smc_sock *smc)
{
smc->sk.sk_data_ready = smc_rx_wake_up;
+ atomic_set(&smc->conn.splice_pending, 0);
}
diff --git a/net/smc/smc_rx.h b/net/smc/smc_rx.h
index 8f9f00997641..db823c97d824 100644
--- a/net/smc/smc_rx.h
+++ b/net/smc/smc_rx.h
@@ -18,8 +18,9 @@
#include "smc.h"
void smc_rx_init(struct smc_sock *smc);
-int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
- int flags);
+
+int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
+ struct pipe_inode_info *pipe, size_t len, int flags);
int smc_rx_wait(struct smc_sock *smc, long *timeo,
int (*fcrit)(struct smc_connection *conn));
static inline int smc_rx_data_available(struct smc_connection *conn)
@@ -27,5 +28,4 @@ static inline int smc_rx_data_available(struct smc_connection *conn)
return atomic_read(&conn->bytes_to_rcv);
}
-
#endif /* SMC_RX_H */
--
2.13.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox