Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] tcp: restore autocorking
From: Soheil Hassas Yeganeh @ 2018-05-03 14:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Michael Wenig, Eric Dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

On Wed, May 2, 2018 at 11:25 PM, Eric Dumazet <edumazet@google.com> wrote:
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you for catching and fixing this!

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: David Miller @ 2018-05-03 15:07 UTC (permalink / raw)
  To: bjorn.topel
  Cc: magnus.karlsson, alexander.h.duyck, alexander.duyck,
	john.fastabend, ast, brouer, willemdebruijn.kernel, daniel, mst,
	netdev, bjorn.topel, michael.lundkvist, jesse.brandeburg,
	anjali.singhai, qi.z.zhang
In-Reply-To: <20180502110136.3738-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@gmail.com>
Date: Wed,  2 May 2018 13:01:21 +0200

> This patch set introduces a new address family called AF_XDP that is
> optimized for high performance packet processing and, in upcoming
> patch sets, zero-copy semantics. In this patch set, we have removed
> all zero-copy related code in order to make it smaller, simpler and
> hopefully more review friendly. This patch set only supports copy-mode
> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> for RX using the XDP_DRV path. Zero-copy support requires XDP and
> driver changes that Jesper Dangaard Brouer is working on. Some of his
> work has already been accepted. We will publish our zero-copy support
> for RX and TX on top of his patch sets at a later point in time.
 ...

Looks great.

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Do You Need A Hand?
From: Mavis Wanczyk @ 2018-05-03  9:04 UTC (permalink / raw)





I am Mavis Wanczyk i know you may not know me but am the latest  
largest US Powerball lottery winner of $758.7m just of recent, am  
currently helping out  people in need of financial assistance, i know  
it's hard to believe anything on the internet,  so if you don't need  
my help please don't reply to this message.

Regards
Mavis Wanczyk

^ permalink raw reply

* Re: [PATCH net-next 0/2] Update csum tc action for batch operation.
From: David Miller @ 2018-05-03 15:16 UTC (permalink / raw)
  To: cdillaba; +Cc: netdev, jhs, xiyou.wangcong
In-Reply-To: <1525184264-9436-1-git-send-email-cdillaba@mojatatu.com>

From: Craig Dillabaugh <cdillaba@mojatatu.com>
Date: Tue,  1 May 2018 10:17:42 -0400

> This patchset includes two patches the first updating act_csum.c 
> to include the get_fill_size routine required for batch operation, and
> the second including updated TDC tests for the feature.

Series applied.

^ permalink raw reply

* [PATCH] qed: fix spelling mistake: "offloded" -> "offloaded"
From: Colin King @ 2018-05-03 15:19 UTC (permalink / raw)
  To: Ariel Elior, everest-linux-l2, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in DP_NOTICE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/qlogic/qed/qed_roce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index fb7c2d1562ae..6acfd43c1a4f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -848,7 +848,7 @@ int qed_roce_query_qp(struct qed_hwfn *p_hwfn,
 
 	if (!(qp->resp_offloaded)) {
 		DP_NOTICE(p_hwfn,
-			  "The responder's qp should be offloded before requester's\n");
+			  "The responder's qp should be offloaded before requester's\n");
 		return -EINVAL;
 	}
 
-- 
2.17.0

^ permalink raw reply related

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
From: Naveen N. Rao @ 2018-05-03 15:20 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, netdev, Michael Ellerman
In-Reply-To: <415b415e-f47f-082c-1bc9-87d3e9d3aed1__9575.16645216874$1520270545$gmane$org@ fb.com>

Alexei Starovoitov wrote:
> On 3/1/18 12:51 AM, Naveen N. Rao wrote:
>> Daniel Borkmann wrote:
>>>
>>> Worst case if there's nothing better, potentially what one could do in
>>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>>> have the imm part as the index pointing to one of them, just unfortunate
>>> that it's likely only needed in ppc64.
>>
>> Ok. We seem to have discussed a few different aspects in this thread.
>> Let me summarize the different aspects we have discussed:
>> 1. Passing address of JIT'ed function to the JIT engines:
>>    Two approaches discussed:
>>    a. Existing approach, where the subprog address is encoded as an
>> offset from __bpf_call_base() in imm32 field of the BPF call
>> instruction. This requires the JIT'ed function to be within 2GB of
>> __bpf_call_base(), which won't be true on ppc64, at the least. So,
>> this won't on ppc64 (and any other architectures where vmalloc'ed
>> (module_alloc()) memory is from a different, far, address range).
> 
> it looks like ppc64 doesn't guarantee today that all of module_alloc()
> will be within 32-bit, but I think it should be trivial to add such
> guarantee. If so, we can define another __bpf_call_base specifically
> for bpf-to-bpf calls when jit is on.

Ok, we prefer not to do that for powerpc (atleast, not for all of 
module_alloc()) at this point.

And since option (c) below is not preferable, I think we will implement 
what Daniel suggested above. This patchset already handles communicating 
the BPF function addresses to the JIT engine, and enhancing 
bpf_prog_get_info_by_fd() should address the concerns with bpftool.

- Naveen

^ permalink raw reply

* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: David Miller @ 2018-05-03 15:24 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake
In-Reply-To: <152527386316.14936.5409621935637217368.stgit@alrua-kau>

From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 02 May 2018 17:11:03 +0200

> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> new file mode 100644
> index 000000000000..18bc147f12bc
> --- /dev/null
> +++ b/net/sched/sch_cake.c
> +static inline cobalt_time_t cobalt_get_time(void)
> +{
> +	return ktime_get_ns();
> +}

Please do not use inline in foo.c files, let the compiler decide.

> +static inline u32
> +cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
> +{

Especially for this monster!  Yikes!

^ permalink raw reply

* Re: [PATCH net,stable] qmi_wwan: do not steal interfaces from class drivers
From: David Miller @ 2018-05-03 15:25 UTC (permalink / raw)
  To: bjorn; +Cc: netdev, linux-usb
In-Reply-To: <20180502202254.3021-1-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Wed,  2 May 2018 22:22:54 +0200

> The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that
> the { vendorid, productid, interfacenumber } set uniquely
> identifies one specific function.  This has proven to fail
> for some configurable devices. One example is the Quectel
> EM06/EP06 where the same interface number can be either
> QMI or MBIM, without the device ID changing either.
> 
> Fix by requiring the vendor-specific class for interface number
> based matching.  Functions of other classes can and should use
> class based matching instead.
> 
> Fixes: 03304bcb5ec4 ("net: qmi_wwan: use fixed interface number matching")
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
> It's quite possible that the fix should be integrated in the
> USB_DEVICE_INTERFACE_NUMBER macro instead.  But that has grown a few
> other users since it was added, so changing it now seems risky. 
> Another option is of course adding a new match macro with the
> USB_CLASS_VENDOR_SPEC match integrated. Maybe best?
> 
> But I'm proposing this as-is for now, since this quickfix seems most
> suitable for stable backporting.

Yes, this simpler approache is better for net and -stable.

Applied.

If you want to do something more sophisticated, that can be done
in net-next.

Thanks.

^ permalink raw reply

* Re: [PATCH net] rds: do not leak kernel memory to user land
From: David Miller @ 2018-05-03 15:26 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, santosh.shilimkar, linux-rdma
In-Reply-To: <20180502215339.117702-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed,  2 May 2018 14:53:39 -0700

> syzbot/KMSAN reported an uninit-value in put_cmsg(), originating
> from rds_cmsg_recv().
> 
> Simply clear the structure, since we have holes there, or since
> rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX.
 ...
> Fixes: 3289025aedc0 ("RDS: add receive message trace used by application")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] inet: add bound ports statistic
From: David Miller @ 2018-05-03 15:27 UTC (permalink / raw)
  To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20180502221108.3261-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed,  2 May 2018 15:11:08 -0700

> +int inet_bind_bucket_count(struct proto *prot);
 ...
> +/* Count how many any entries are in the bind hash table */
> +unsigned int inet_bind_bucket_count(struct proto *prot)

If it doesn't build, it definitely wasn't tested.

Right? :)

^ permalink raw reply

* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-05-03 15:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake
In-Reply-To: <20180503.112402.774278226377528271.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 02 May 2018 17:11:03 +0200
>
>> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
>> new file mode 100644
>> index 000000000000..18bc147f12bc
>> --- /dev/null
>> +++ b/net/sched/sch_cake.c
>> +static inline cobalt_time_t cobalt_get_time(void)
>> +{
>> +	return ktime_get_ns();
>> +}
>
> Please do not use inline in foo.c files, let the compiler decide.

Right, will fix :)

>> +static inline u32
>> +cake_hash(struct cake_tin_data *q, const struct sk_buff *skb, int flow_mode)
>> +{
>
> Especially for this monster!  Yikes!

Fair point...

^ permalink raw reply

* Re: [PATCH net-next] ip6_gre: correct the function name in ip6gre_tnl_addr_conflict() comment
From: David Miller @ 2018-05-03 15:29 UTC (permalink / raw)
  To: sunlw.fnst; +Cc: netdev
In-Reply-To: <20180503013429.4330-1-sunlw.fnst@cn.fujitsu.com>

From: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Date: Thu, 3 May 2018 09:34:29 +0800

> The function name is wrong in ip6gre_tnl_addr_conflict() comment, which
> use ip6_tnl_addr_conflict instead of ip6gre_tnl_addr_conflict.
> 
> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] tcp: restore autocorking
From: David Miller @ 2018-05-03 15:29 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, mwenig, eric.dumazet
In-Reply-To: <20180503032513.210324-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed,  2 May 2018 20:25:13 -0700

> When adding rb-tree for TCP retransmit queue, we inadvertently broke
> TCP autocorking.
> 
> tcp_should_autocork() should really check if the rtx queue is not empty.
> 
> Tested:
 ...
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Michael Wenig <mwenig@vmware.com>
> Tested-by: Michael Wenig <mwenig@vmware.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [bpf-next v1 0/9] bpf: Add helper to do FIB lookups
From: David Miller @ 2018-05-03 15:40 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, borkmann, ast, shm, roopa, brouer, toke, john.fastabend
In-Reply-To: <20180503035319.18290-1-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Wed,  2 May 2018 20:53:10 -0700

> Provide a helper for doing a FIB and neighbor lookup in the kernel
> tables from an XDP program. The helper provides a fastpath for forwarding
> packets. If the packet is a local delivery or for any reason is not a
> simple lookup and forward, the packet is expected to continue up the stack
> for full processing.
 ...

Great work David.

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH net 1/3] net/smc: call consolidation
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503155739.55616-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Consolidate the call to smc_wr_reg_send() in a new function.
No functional changes.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/af_smc.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8b4c059bd13b..fdb2976117bd 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -292,6 +292,15 @@ static void smc_copy_sock_settings_to_smc(struct smc_sock *smc)
 	smc_copy_sock_settings(&smc->sk, smc->clcsock->sk, SK_FLAGS_CLC_TO_SMC);
 }
 
+/* register a new rmb */
+static int smc_reg_rmb(struct smc_link *link, struct smc_buf_desc *rmb_desc)
+{
+	/* register memory region for new rmb */
+	if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK]))
+		return -EFAULT;
+	return 0;
+}
+
 static int smc_clnt_conf_first_link(struct smc_sock *smc)
 {
 	struct smc_link_group *lgr = smc->conn.lgr;
@@ -321,9 +330,7 @@ static int smc_clnt_conf_first_link(struct smc_sock *smc)
 
 	smc_wr_remember_qp_attr(link);
 
-	rc = smc_wr_reg_send(link,
-			     smc->conn.rmb_desc->mr_rx[SMC_SINGLE_LINK]);
-	if (rc)
+	if (smc_reg_rmb(link, smc->conn.rmb_desc))
 		return SMC_CLC_DECL_INTERR;
 
 	/* send CONFIRM LINK response over RoCE fabric */
@@ -473,13 +480,8 @@ static int smc_connect_rdma(struct smc_sock *smc)
 			goto decline_rdma_unlock;
 		}
 	} else {
-		struct smc_buf_desc *buf_desc = smc->conn.rmb_desc;
-
-		if (!buf_desc->reused) {
-			/* register memory region for new rmb */
-			rc = smc_wr_reg_send(link,
-					     buf_desc->mr_rx[SMC_SINGLE_LINK]);
-			if (rc) {
+		if (!smc->conn.rmb_desc->reused) {
+			if (smc_reg_rmb(link, smc->conn.rmb_desc)) {
 				reason_code = SMC_CLC_DECL_INTERR;
 				goto decline_rdma_unlock;
 			}
@@ -719,9 +721,7 @@ static int smc_serv_conf_first_link(struct smc_sock *smc)
 
 	link = &lgr->lnk[SMC_SINGLE_LINK];
 
-	rc = smc_wr_reg_send(link,
-			     smc->conn.rmb_desc->mr_rx[SMC_SINGLE_LINK]);
-	if (rc)
+	if (smc_reg_rmb(link, smc->conn.rmb_desc))
 		return SMC_CLC_DECL_INTERR;
 
 	/* send CONFIRM LINK request to client over the RoCE fabric */
@@ -854,13 +854,8 @@ static void smc_listen_work(struct work_struct *work)
 	smc_rx_init(new_smc);
 
 	if (local_contact != SMC_FIRST_CONTACT) {
-		struct smc_buf_desc *buf_desc = new_smc->conn.rmb_desc;
-
-		if (!buf_desc->reused) {
-			/* register memory region for new rmb */
-			rc = smc_wr_reg_send(link,
-					     buf_desc->mr_rx[SMC_SINGLE_LINK]);
-			if (rc) {
+		if (!new_smc->conn.rmb_desc->reused) {
+			if (smc_reg_rmb(link, new_smc->conn.rmb_desc)) {
 				reason_code = SMC_CLC_DECL_INTERR;
 				goto decline_rdma_unlock;
 			}
-- 
2.13.5

^ permalink raw reply related

* [PATCH net 0/3] net/smc: fixes 2018/05/03
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun

From: Ursula Braun <ursula.braun@de.ibm.com>

Dave,

here are smc fixes for 2 problems:
 * receive buffers in SMC must be registered. If registration fails
   these buffers must not be kept within the link group for reuse.
   Patch 1 is a preparational patch; patch 2 contains the fix.
 * sendpage: do not hold the sock lock when calling kernel_sendpage()
             or sock_no_sendpage()

Thanks, Ursula

Karsten Graul (2):
  net/smc: call consolidation
  net/smc: handle unregistered buffers

Stefan Raspl (1):
  smc: fix sendpage() call

 net/smc/af_smc.c   | 43 +++++++++++++++++++++----------------------
 net/smc/smc_core.c | 22 +++++++++++++++++++---
 net/smc/smc_core.h |  3 ++-
 3 files changed, 42 insertions(+), 26 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH net 2/3] net/smc: handle unregistered buffers
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503155739.55616-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

When smc_wr_reg_send() fails then tag (regerr) the affected buffer and
free it in smc_buf_unuse().

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/af_smc.c   |  4 +++-
 net/smc/smc_core.c | 22 +++++++++++++++++++---
 net/smc/smc_core.h |  3 ++-
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index fdb2976117bd..d03b8d29ffc0 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -296,8 +296,10 @@ static void smc_copy_sock_settings_to_smc(struct smc_sock *smc)
 static int smc_reg_rmb(struct smc_link *link, struct smc_buf_desc *rmb_desc)
 {
 	/* register memory region for new rmb */
-	if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK]))
+	if (smc_wr_reg_send(link, rmb_desc->mr_rx[SMC_SINGLE_LINK])) {
+		rmb_desc->regerr = 1;
 		return -EFAULT;
+	}
 	return 0;
 }
 
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index f44f6803f7ff..d4bd01bb44e1 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -32,6 +32,9 @@
 
 static u32 smc_lgr_num;			/* unique link group number */
 
+static void smc_buf_free(struct smc_buf_desc *buf_desc, struct smc_link *lnk,
+			 bool is_rmb);
+
 static void smc_lgr_schedule_free_work(struct smc_link_group *lgr)
 {
 	/* client link group creation always follows the server link group
@@ -234,9 +237,22 @@ static void smc_buf_unuse(struct smc_connection *conn)
 		conn->sndbuf_size = 0;
 	}
 	if (conn->rmb_desc) {
-		conn->rmb_desc->reused = true;
-		conn->rmb_desc->used = 0;
-		conn->rmbe_size = 0;
+		if (!conn->rmb_desc->regerr) {
+			conn->rmb_desc->reused = 1;
+			conn->rmb_desc->used = 0;
+			conn->rmbe_size = 0;
+		} else {
+			/* buf registration failed, reuse not possible */
+			struct smc_link_group *lgr = conn->lgr;
+			struct smc_link *lnk;
+
+			write_lock_bh(&lgr->rmbs_lock);
+			list_del(&conn->rmb_desc->list);
+			write_unlock_bh(&lgr->rmbs_lock);
+
+			lnk = &lgr->lnk[SMC_SINGLE_LINK];
+			smc_buf_free(conn->rmb_desc, lnk, true);
+		}
 	}
 }
 
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 07e2a393e6d9..5dfcb15d529f 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -123,7 +123,8 @@ struct smc_buf_desc {
 						 */
 	u32			order;		/* allocation order */
 	u32			used;		/* currently used / unused */
-	bool			reused;		/* new created / reused */
+	u8			reused	: 1;	/* new created / reused */
+	u8			regerr	: 1;	/* err during registration */
 };
 
 struct smc_rtoken {				/* address/key of remote RMB */
-- 
2.13.5

^ permalink raw reply related

* [PATCH net 3/3] smc: fix sendpage() call
From: Ursula Braun @ 2018-05-03 15:57 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503155739.55616-1-ubraun@linux.ibm.com>

From: Stefan Raspl <stefan.raspl@linux.ibm.com>

The sendpage() call grabs the sock lock before calling the default
implementation - which tries to grab it once again.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
 net/smc/af_smc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index d03b8d29ffc0..544bab42f925 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1315,8 +1315,11 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
 
 	smc = smc_sk(sk);
 	lock_sock(sk);
-	if (sk->sk_state != SMC_ACTIVE)
+	if (sk->sk_state != SMC_ACTIVE) {
+		release_sock(sk);
 		goto out;
+	}
+	release_sock(sk);
 	if (smc->use_fallback)
 		rc = kernel_sendpage(smc->clcsock, page, offset,
 				     size, flags);
@@ -1324,7 +1327,6 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
 		rc = sock_no_sendpage(sock, page, offset, size, flags);
 
 out:
-	release_sock(sk);
 	return rc;
 }
 
-- 
2.13.5

^ permalink raw reply related

* Re: [bug report] net/mlx5e: TLS, Add Innova TLS TX offload data path
From: Boris Pismenny @ 2018-05-03 16:04 UTC (permalink / raw)
  To: Leon Romanovsky, Dan Carpenter, Saeed Mahameed; +Cc: linux-rdma, netdev
In-Reply-To: <20180503120757.GA1641@mtr-leonro.local>

Hi Dan,

Thanks for taking a look at our code.

On 05/03/18 15:07, Leon Romanovsky wrote:
> + Saeed, Boris and netdev
>
> On Thu, May 03, 2018 at 02:23:00PM +0300, Dan Carpenter wrote:
>> Hello Ilya Lesokhin,
>>
>> The patch bf23974104fa: "net/mlx5e: TLS, Add Innova TLS TX offload
>> data path" from Apr 30, 2018, leads to the following static checker
>> warning:
>>
>> 	drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c:63 mlx5e_tls_add_metadata()
>> 	warn: struct type mismatch 'ethhdr vs mlx5e_tls_metadata'
>>
>> drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c
>>      55  static int mlx5e_tls_add_metadata(struct sk_buff *skb, __be32 swid)
>>      56  {
>>      57          struct mlx5e_tls_metadata *pet;
>>      58          struct ethhdr *eth;
>>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>      59
>>      60          if (skb_cow_head(skb, sizeof(struct mlx5e_tls_metadata)))
>>                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>      61                  return -ENOMEM;
>>      62
>>      63          eth = (struct ethhdr *)skb_push(skb, sizeof(struct mlx5e_tls_metadata));
>>                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> This feels like it should be sizeof(*eth) + sizeof(*pet)?
No, we add only "sizeof(struct mlx5e_tls_metadata)" bytes to the skb 
headroom for the mlx5_tls_metadata. The Ethernet header is already there.
>>
>>      64          skb->mac_header -= sizeof(struct mlx5e_tls_metadata);
>>      65          pet = (struct mlx5e_tls_metadata *)(eth + 1);
>>      66
>>      67          memmove(skb->data, skb->data + sizeof(struct mlx5e_tls_metadata),
>>      68                  2 * ETH_ALEN);
>>      69
>>      70          eth->h_proto = cpu_to_be16(MLX5E_METADATA_ETHER_TYPE);
>>      71          pet->syndrome_swid = htonl(SYNDROME_OFFLOAD_REQUIRED << 24) | swid;
>>      72
>>      73          return 0;
>>      74  }
>>
>> regards,
>> dan carpenter
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Best,
Boris.

^ permalink raw reply

* [PATCH] bpf: fix possible spectre-v1 in find_and_alloc_map()
From: Mark Rutland @ 2018-05-03 16:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Rutland, Alexei Starovoitov, Dan Carpenter, Daniel Borkmann,
	Peter Zijlstra, netdev

It's possible for userspace to control attr->map_type. Sanitize it when
using it as an array index to prevent an out-of-bounds value being used
under speculation.

Found by smatch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
---
 kernel/bpf/syscall.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

I found this when running smatch over a v4.17-rc2 arm64 allyesconfig kernel.

IIUC this may allow for a speculative branch to an arbitrary gadget when we
subsequently call ops->map_alloc_check(attr).

Mark.

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4ca46df19c9a..8a7acd0dbeb6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -26,6 +26,7 @@
 #include <linux/cred.h>
 #include <linux/timekeeping.h>
 #include <linux/ctype.h>
+#include <linux/nospec.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
 			   (map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
@@ -102,12 +103,14 @@ const struct bpf_map_ops bpf_map_offload_ops = {
 static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
 {
 	const struct bpf_map_ops *ops;
+	u32 type = attr->map_type;
 	struct bpf_map *map;
 	int err;
 
-	if (attr->map_type >= ARRAY_SIZE(bpf_map_types))
+	if (type >= ARRAY_SIZE(bpf_map_types))
 		return ERR_PTR(-EINVAL);
-	ops = bpf_map_types[attr->map_type];
+	type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));
+	ops = bpf_map_types[type];
 	if (!ops)
 		return ERR_PTR(-EINVAL);
 
@@ -122,7 +125,7 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return map;
 	map->ops = ops;
-	map->map_type = attr->map_type;
+	map->map_type = type;
 	return map;
 }
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next 0/4] net/smc: splice implementation
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun

From: Ursula Braun <ursula.braun@de.ibm.com>

Dave,

Stefan comes up with an smc implementation for splice(). The first
three patches are preparational patches, the 4th patch implements
splice().

Thanks, Ursula

Stefan Raspl (4):
  smc: simplify abort logic
  smc: make smc_rx_wait_data() generic
  smc: allocate RMBs as compound pages
  smc: add support for splice()

 net/smc/af_smc.c   |  40 +++++++++--
 net/smc/smc.h      |   3 +
 net/smc/smc_core.c |  18 ++---
 net/smc/smc_core.h |   1 +
 net/smc/smc_rx.c   | 200 +++++++++++++++++++++++++++++++++++++++++++----------
 net/smc/smc_rx.h   |  12 +++-
 6 files changed, 219 insertions(+), 55 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH net-next 1/4] smc: simplify abort logic
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>

From: Stefan Raspl <stefan.raspl@linux.ibm.com>

Some of the conditions to exit recv() are common in two pathes - cleaning up
code by moving the check up so we have it only once.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
 net/smc/smc_rx.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index af851d8df1f8..def33fb29ac9 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -112,26 +112,22 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 		if (atomic_read(&conn->bytes_to_rcv))
 			goto copy;
 
+		if (sk->sk_shutdown & RCV_SHUTDOWN ||
+		    smc_cdc_rxed_any_close_or_senddone(conn) ||
+		    conn->local_tx_ctrl.conn_state_flags.peer_conn_abort)
+			break;
+
 		if (read_done) {
 			if (sk->sk_err ||
 			    sk->sk_state == SMC_CLOSED ||
-			    sk->sk_shutdown & RCV_SHUTDOWN ||
 			    !timeo ||
-			    signal_pending(current) ||
-			    smc_cdc_rxed_any_close_or_senddone(conn) ||
-			    conn->local_tx_ctrl.conn_state_flags.
-			    peer_conn_abort)
+			    signal_pending(current))
 				break;
 		} else {
 			if (sk->sk_err) {
 				read_done = sock_error(sk);
 				break;
 			}
-			if (sk->sk_shutdown & RCV_SHUTDOWN ||
-			    smc_cdc_rxed_any_close_or_senddone(conn) ||
-			    conn->local_tx_ctrl.conn_state_flags.
-			    peer_conn_abort)
-				break;
 			if (sk->sk_state == SMC_CLOSED) {
 				if (!sock_flag(sk, SOCK_DONE)) {
 					/* This occurs when user tries to read
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 2/4] smc: make smc_rx_wait_data() generic
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>

From: Stefan Raspl <stefan.raspl@linux.ibm.com>

Turn smc_rx_wait_data into a generic function that can be used at various
instances to wait on traffic to complete with varying criteria.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
 net/smc/af_smc.c |  2 +-
 net/smc/smc_rx.c | 21 +++++++++++----------
 net/smc/smc_rx.h |  8 +++++++-
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 823ea3371575..747fdf1a2d6f 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1092,7 +1092,7 @@ static int smc_accept(struct socket *sock, struct socket *new_sock,
 			release_sock(clcsk);
 		} else if (!atomic_read(&smc_sk(nsk)->conn.bytes_to_rcv)) {
 			lock_sock(nsk);
-			smc_rx_wait_data(smc_sk(nsk), &timeo);
+			smc_rx_wait(smc_sk(nsk), &timeo, smc_rx_data_available);
 			release_sock(nsk);
 		}
 	}
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index def33fb29ac9..7b64bee656e8 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -22,11 +22,10 @@
 #include "smc_tx.h" /* smc_tx_consumer_update() */
 #include "smc_rx.h"
 
-/* callback implementation for sk.sk_data_ready()
- * to wakeup rcvbuf consumers that blocked with smc_rx_wait_data().
+/* callback implementation to wakeup consumers blocked with smc_rx_wait().
  * indirectly called by smc_cdc_msg_recv_action().
  */
-static void smc_rx_data_ready(struct sock *sk)
+static void smc_rx_wake_up(struct sock *sk)
 {
 	struct socket_wq *wq;
 
@@ -47,25 +46,27 @@ static void smc_rx_data_ready(struct sock *sk)
 /* blocks rcvbuf consumer until >=len bytes available or timeout or interrupted
  *   @smc    smc socket
  *   @timeo  pointer to max seconds to wait, pointer to value 0 for no timeout
+ *   @fcrit  add'l criterion to evaluate as function pointer
  * Returns:
  * 1 if at least 1 byte available in rcvbuf or if socket error/shutdown.
  * 0 otherwise (nothing in rcvbuf nor timeout, e.g. interrupted).
  */
-int smc_rx_wait_data(struct smc_sock *smc, long *timeo)
+int smc_rx_wait(struct smc_sock *smc, long *timeo,
+		int (*fcrit)(struct smc_connection *conn))
 {
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct smc_connection *conn = &smc->conn;
 	struct sock *sk = &smc->sk;
 	int rc;
 
-	if (atomic_read(&conn->bytes_to_rcv))
+	if (fcrit(conn))
 		return 1;
 	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 	add_wait_queue(sk_sleep(sk), &wait);
 	rc = sk_wait_event(sk, timeo,
 			   sk->sk_err ||
 			   sk->sk_shutdown & RCV_SHUTDOWN ||
-			   atomic_read(&conn->bytes_to_rcv) ||
+			   fcrit(conn) ||
 			   smc_cdc_rxed_any_close_or_senddone(conn),
 			   &wait);
 	remove_wait_queue(sk_sleep(sk), &wait);
@@ -146,14 +147,14 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 				return -EAGAIN;
 		}
 
-		if (!atomic_read(&conn->bytes_to_rcv)) {
-			smc_rx_wait_data(smc, &timeo);
+		if (!smc_rx_data_available(conn)) {
+			smc_rx_wait(smc, &timeo, smc_rx_data_available);
 			continue;
 		}
 
 copy:
 		/* initialize variables for 1st iteration of subsequent loop */
-		/* could be just 1 byte, even after smc_rx_wait_data above */
+		/* could be just 1 byte, even after waiting on data above */
 		readable = atomic_read(&conn->bytes_to_rcv);
 		/* not more than what user space asked for */
 		copylen = min_t(size_t, read_remaining, readable);
@@ -213,5 +214,5 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 /* Initialize receive properties on connection establishment. NB: not __init! */
 void smc_rx_init(struct smc_sock *smc)
 {
-	smc->sk.sk_data_ready = smc_rx_data_ready;
+	smc->sk.sk_data_ready = smc_rx_wake_up;
 }
diff --git a/net/smc/smc_rx.h b/net/smc/smc_rx.h
index 0b75a6b470e6..8f9f00997641 100644
--- a/net/smc/smc_rx.h
+++ b/net/smc/smc_rx.h
@@ -20,6 +20,12 @@
 void smc_rx_init(struct smc_sock *smc);
 int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 		   int flags);
-int smc_rx_wait_data(struct smc_sock *smc, long *timeo);
+int smc_rx_wait(struct smc_sock *smc, long *timeo,
+		int (*fcrit)(struct smc_connection *conn));
+static inline int smc_rx_data_available(struct smc_connection *conn)
+{
+	return atomic_read(&conn->bytes_to_rcv);
+}
+
 
 #endif /* SMC_RX_H */
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 3/4] smc: allocate RMBs as compound pages
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>

From: Stefan Raspl <stefan.raspl@linux.ibm.com>

Preparatory work for splice() support.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
 net/smc/smc_core.c | 18 +++++++++---------
 net/smc/smc_core.h |  1 +
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 1f3ea62fac5c..4051fb504393 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -274,8 +274,8 @@ static void smc_buf_free(struct smc_buf_desc *buf_desc, struct smc_link *lnk,
 				    DMA_TO_DEVICE);
 	}
 	sg_free_table(&buf_desc->sgt[SMC_SINGLE_LINK]);
-	if (buf_desc->cpu_addr)
-		free_pages((unsigned long)buf_desc->cpu_addr, buf_desc->order);
+	if (buf_desc->pages)
+		__free_pages(buf_desc->pages, buf_desc->order);
 	kfree(buf_desc);
 }
 
@@ -550,16 +550,16 @@ static struct smc_buf_desc *smc_new_buf_create(struct smc_link_group *lgr,
 	if (!buf_desc)
 		return ERR_PTR(-ENOMEM);
 
-	buf_desc->cpu_addr =
-		(void *)__get_free_pages(GFP_KERNEL | __GFP_NOWARN |
-					 __GFP_NOMEMALLOC |
-					 __GFP_NORETRY | __GFP_ZERO,
-					 get_order(bufsize));
-	if (!buf_desc->cpu_addr) {
+	buf_desc->order = get_order(bufsize);
+	buf_desc->pages = alloc_pages(GFP_KERNEL | __GFP_NOWARN |
+				      __GFP_NOMEMALLOC | __GFP_COMP |
+				      __GFP_NORETRY | __GFP_ZERO,
+				      buf_desc->order);
+	if (!buf_desc->pages) {
 		kfree(buf_desc);
 		return ERR_PTR(-EAGAIN);
 	}
-	buf_desc->order = get_order(bufsize);
+	buf_desc->cpu_addr = (void *)page_address(buf_desc->pages);
 
 	/* build the sg table from the pages */
 	lnk = &lgr->lnk[SMC_SINGLE_LINK];
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 97339f03ba79..cdf800f1fae7 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -120,6 +120,7 @@ struct smc_link {
 struct smc_buf_desc {
 	struct list_head	list;
 	void			*cpu_addr;	/* virtual address of buffer */
+	struct page		*pages;
 	struct sg_table		sgt[SMC_LINKS_PER_LGR_MAX];/* virtual buffer */
 	struct ib_mr		*mr_rx[SMC_LINKS_PER_LGR_MAX];
 						/* for rmb only: memory region
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 4/4] smc: add support for splice()
From: Ursula Braun @ 2018-05-03 16:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180503161239.71747-1-ubraun@linux.ibm.com>

From: Stefan Raspl <stefan.raspl@linux.ibm.com>

Provide an implementation for splice() when we are using SMC. See
smc_splice_read() for further details.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com><
---
 net/smc/af_smc.c |  38 +++++++++++--
 net/smc/smc.h    |   3 +
 net/smc/smc_rx.c | 163 +++++++++++++++++++++++++++++++++++++++++++++++++------
 net/smc/smc_rx.h |   6 +-
 4 files changed, 185 insertions(+), 25 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 747fdf1a2d6f..553fe4eb4066 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1166,10 +1166,12 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		goto out;
 	}
 
-	if (smc->use_fallback)
+	if (smc->use_fallback) {
 		rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
-	else
-		rc = smc_rx_recvmsg(smc, msg, len, flags);
+	} else {
+		msg->msg_namelen = 0;
+		rc = smc_rx_recvmsg(smc, msg, NULL, len, flags);
+	}
 
 out:
 	release_sock(sk);
@@ -1446,9 +1448,15 @@ static ssize_t smc_sendpage(struct socket *sock, struct page *page,
 	return rc;
 }
 
+/* Map the affected portions of the rmbe into an spd, note the number of bytes
+ * to splice in conn->splice_pending, and press 'go'. Delays consumer cursor
+ * updates till whenever a respective page has been fully processed.
+ * Note that subsequent recv() calls have to wait till all splice() processing
+ * completed.
+ */
 static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
 			       struct pipe_inode_info *pipe, size_t len,
-				    unsigned int flags)
+			       unsigned int flags)
 {
 	struct sock *sk = sock->sk;
 	struct smc_sock *smc;
@@ -1456,16 +1464,34 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
 
 	smc = smc_sk(sk);
 	lock_sock(sk);
-	if ((sk->sk_state != SMC_ACTIVE) && (sk->sk_state != SMC_CLOSED))
+
+	if (sk->sk_state == SMC_INIT ||
+	    sk->sk_state == SMC_LISTEN ||
+	    sk->sk_state == SMC_CLOSED)
+		goto out;
+
+	if (sk->sk_state == SMC_PEERFINCLOSEWAIT) {
+		rc = 0;
 		goto out;
+	}
+
 	if (smc->use_fallback) {
 		rc = smc->clcsock->ops->splice_read(smc->clcsock, ppos,
 						    pipe, len, flags);
 	} else {
-		rc = -EOPNOTSUPP;
+		if (*ppos) {
+			rc = -ESPIPE;
+			goto out;
+		}
+		if (flags & SPLICE_F_NONBLOCK)
+			flags = MSG_DONTWAIT;
+		else
+			flags = 0;
+		rc = smc_rx_recvmsg(smc, NULL, pipe, len, flags);
 	}
 out:
 	release_sock(sk);
+
 	return rc;
 }
 
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 2405e889b93d..ec209cd48d42 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -164,6 +164,9 @@ struct smc_connection {
 	atomic_t		bytes_to_rcv;	/* arrived data,
 						 * not yet received
 						 */
+	atomic_t		splice_pending;	/* number of spliced bytes
+						 * pending processing
+						 */
 #ifndef KERNEL_HAS_ATOMIC64
 	spinlock_t		acurs_lock;	/* protect cursors */
 #endif
diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index 7b64bee656e8..ed45569289f5 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -43,6 +43,116 @@ static void smc_rx_wake_up(struct sock *sk)
 	rcu_read_unlock();
 }
 
+/* Update consumer cursor
+ *   @conn   connection to update
+ *   @cons   consumer cursor
+ *   @len    number of Bytes consumed
+ */
+static void smc_rx_update_consumer(struct smc_connection *conn,
+				   union smc_host_cursor cons, size_t len)
+{
+	smc_curs_add(conn->rmbe_size, &cons, len);
+	smc_curs_write(&conn->local_tx_ctrl.cons, smc_curs_read(&cons, conn),
+		       conn);
+	/* send consumer cursor update if required */
+	/* similar to advertising new TCP rcv_wnd if required */
+	smc_tx_consumer_update(conn);
+}
+
+struct smc_spd_priv {
+	struct smc_sock *smc;
+	size_t		 len;
+};
+
+static void smc_rx_pipe_buf_release(struct pipe_inode_info *pipe,
+				    struct pipe_buffer *buf)
+{
+	struct smc_spd_priv *priv = (struct smc_spd_priv *)buf->private;
+	struct smc_sock *smc = priv->smc;
+	struct smc_connection *conn;
+	union smc_host_cursor cons;
+	struct sock *sk = &smc->sk;
+
+	if (sk->sk_state == SMC_CLOSED ||
+	    sk->sk_state == SMC_PEERFINCLOSEWAIT ||
+	    sk->sk_state == SMC_APPFINCLOSEWAIT)
+		goto out;
+	conn = &smc->conn;
+	lock_sock(sk);
+	smc_curs_write(&cons, smc_curs_read(&conn->local_tx_ctrl.cons, conn),
+		       conn);
+	smc_rx_update_consumer(conn, cons, priv->len);
+	release_sock(sk);
+	if (atomic_sub_and_test(priv->len, &conn->splice_pending))
+		smc_rx_wake_up(sk);
+out:
+	kfree(priv);
+	put_page(buf->page);
+	sock_put(sk);
+}
+
+static int smc_rx_pipe_buf_nosteal(struct pipe_inode_info *pipe,
+				   struct pipe_buffer *buf)
+{
+	return 1;
+}
+
+static const struct pipe_buf_operations smc_pipe_ops = {
+	.can_merge = 0,
+	.confirm = generic_pipe_buf_confirm,
+	.release = smc_rx_pipe_buf_release,
+	.steal = smc_rx_pipe_buf_nosteal,
+	.get = generic_pipe_buf_get
+};
+
+static void smc_rx_spd_release(struct splice_pipe_desc *spd,
+			       unsigned int i)
+{
+	put_page(spd->pages[i]);
+}
+
+static int smc_rx_splice(struct pipe_inode_info *pipe, char *src, size_t len,
+			 struct smc_sock *smc)
+{
+	struct splice_pipe_desc spd;
+	struct partial_page partial;
+	struct smc_spd_priv *priv;
+	struct page *page;
+	int bytes;
+
+	page = virt_to_page(smc->conn.rmb_desc->cpu_addr);
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+	priv->len = len;
+	priv->smc = smc;
+	partial.offset = src - (char *)smc->conn.rmb_desc->cpu_addr;
+	partial.len = len;
+	partial.private = (unsigned long)priv;
+
+	spd.nr_pages_max = 1;
+	spd.nr_pages = 1;
+	spd.pages = &page;
+	spd.partial = &partial;
+	spd.ops = &smc_pipe_ops;
+	spd.spd_release = smc_rx_spd_release;
+
+	bytes = splice_to_pipe(pipe, &spd);
+	if (bytes > 0) {
+		sock_hold(&smc->sk);
+		get_page(smc->conn.rmb_desc->pages);
+		atomic_add(bytes, &smc->conn.splice_pending);
+	}
+
+	return bytes;
+}
+
+static int smc_rx_data_available_and_no_splice_pend(struct smc_connection *conn)
+{
+	return atomic_read(&conn->bytes_to_rcv) &&
+	       !atomic_read(&conn->splice_pending);
+}
+
 /* blocks rcvbuf consumer until >=len bytes available or timeout or interrupted
  *   @smc    smc socket
  *   @timeo  pointer to max seconds to wait, pointer to value 0 for no timeout
@@ -74,19 +184,25 @@ int smc_rx_wait(struct smc_sock *smc, long *timeo,
 	return rc;
 }
 
-/* rcvbuf consumer: main API called by socket layer.
- * called under sk lock.
+/* smc_rx_recvmsg - receive data from RMBE
+ * @msg:	copy data to receive buffer
+ * @pipe:	copy data to pipe if set - indicates splice() call
+ *
+ * rcvbuf consumer: main API called by socket layer.
+ * Called under sk lock.
  */
-int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
-		   int flags)
+int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
+		   struct pipe_inode_info *pipe, size_t len, int flags)
 {
 	size_t copylen, read_done = 0, read_remaining = len;
 	size_t chunk_len, chunk_off, chunk_len_sum;
 	struct smc_connection *conn = &smc->conn;
+	int (*func)(struct smc_connection *conn);
 	union smc_host_cursor cons;
 	int readable, chunk;
 	char *rcvbuf_base;
 	struct sock *sk;
+	int splbytes;
 	long timeo;
 	int target;		/* Read at least these many bytes */
 	int rc;
@@ -102,12 +218,11 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 
-	msg->msg_namelen = 0;
 	/* we currently use 1 RMBE per RMB, so RMBE == RMB base addr */
 	rcvbuf_base = conn->rmb_desc->cpu_addr;
 
 	do { /* while (read_remaining) */
-		if (read_done >= target)
+		if (read_done >= target || (pipe && read_done))
 			break;
 
 		if (atomic_read(&conn->bytes_to_rcv))
@@ -156,11 +271,24 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 		/* initialize variables for 1st iteration of subsequent loop */
 		/* could be just 1 byte, even after waiting on data above */
 		readable = atomic_read(&conn->bytes_to_rcv);
+		splbytes = atomic_read(&conn->splice_pending);
+		if (!readable || (msg && splbytes)) {
+			if (splbytes)
+				func = smc_rx_data_available_and_no_splice_pend;
+			else
+				func = smc_rx_data_available;
+			smc_rx_wait(smc, &timeo, func);
+			continue;
+		}
+
 		/* not more than what user space asked for */
 		copylen = min_t(size_t, read_remaining, readable);
 		smc_curs_write(&cons,
 			       smc_curs_read(&conn->local_tx_ctrl.cons, conn),
 			       conn);
+		/* subsequent splice() calls pick up where previous left */
+		if (splbytes)
+			smc_curs_add(conn->rmbe_size, &cons, splbytes);
 		/* determine chunks where to read from rcvbuf */
 		/* either unwrapped case, or 1st chunk of wrapped case */
 		chunk_len = min_t(size_t,
@@ -170,9 +298,16 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 		smc_rmb_sync_sg_for_cpu(conn);
 		for (chunk = 0; chunk < 2; chunk++) {
 			if (!(flags & MSG_TRUNC)) {
-				rc = memcpy_to_msg(msg, rcvbuf_base + chunk_off,
-						   chunk_len);
-				if (rc) {
+				if (msg) {
+					rc = memcpy_to_msg(msg, rcvbuf_base +
+							   chunk_off,
+							   chunk_len);
+				} else {
+					rc = smc_rx_splice(pipe, rcvbuf_base +
+							chunk_off, chunk_len,
+							smc);
+				}
+				if (rc < 0) {
 					if (!read_done)
 						read_done = -EFAULT;
 					smc_rmb_sync_sg_for_device(conn);
@@ -193,18 +328,13 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 
 		/* update cursors */
 		if (!(flags & MSG_PEEK)) {
-			smc_curs_add(conn->rmbe_size, &cons, copylen);
 			/* increased in recv tasklet smc_cdc_msg_rcv() */
 			smp_mb__before_atomic();
 			atomic_sub(copylen, &conn->bytes_to_rcv);
 			/* guarantee 0 <= bytes_to_rcv <= rmbe_size */
 			smp_mb__after_atomic();
-			smc_curs_write(&conn->local_tx_ctrl.cons,
-				       smc_curs_read(&cons, conn),
-				       conn);
-			/* send consumer cursor update if required */
-			/* similar to advertising new TCP rcv_wnd if required */
-			smc_tx_consumer_update(conn);
+			if (msg)
+				smc_rx_update_consumer(conn, cons, copylen);
 		}
 	} while (read_remaining);
 out:
@@ -215,4 +345,5 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
 void smc_rx_init(struct smc_sock *smc)
 {
 	smc->sk.sk_data_ready = smc_rx_wake_up;
+	atomic_set(&smc->conn.splice_pending, 0);
 }
diff --git a/net/smc/smc_rx.h b/net/smc/smc_rx.h
index 8f9f00997641..db823c97d824 100644
--- a/net/smc/smc_rx.h
+++ b/net/smc/smc_rx.h
@@ -18,8 +18,9 @@
 #include "smc.h"
 
 void smc_rx_init(struct smc_sock *smc);
-int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg, size_t len,
-		   int flags);
+
+int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr *msg,
+		   struct pipe_inode_info *pipe, size_t len, int flags);
 int smc_rx_wait(struct smc_sock *smc, long *timeo,
 		int (*fcrit)(struct smc_connection *conn));
 static inline int smc_rx_data_available(struct smc_connection *conn)
@@ -27,5 +28,4 @@ static inline int smc_rx_data_available(struct smc_connection *conn)
 	return atomic_read(&conn->bytes_to_rcv);
 }
 
-
 #endif /* SMC_RX_H */
-- 
2.13.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox